tags and style markup), but it also meant that the changes coming from the new stylesheet were not being applied universally as any properties assigned on a page overrode the global CSS. There was also the issue of paragraph text () that was sometimes styled as fake headings (made larger or bolder to look like headings, but not using the proper tags) which needed to be corrected for consistency and accessibility purposes. Replanting and Sprucing Up With an overwhelming majority of the guides (and their associated assets) deleted, it was finally time to rework the remaining guides into clear, easy-to-use resources that would benefit our students. At this point the guides fell into three categories: • Guides that just needed to be pruned and updated. • Guides that should be combined into a single subject area guide. • Guides that should be created to fill an unmet need. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 TENDING TO AN OVERGROWN GARDEN | HYAMS 8 Pruning and updating tasks were generally the least-arduous, as many of the guides included content that was also housed on discrete guides (citations, resource evaluation, etc.). Instead of duplicating, for example, citation formats on every guide, those pages were replaced with navigation-level links out to the existing citation guide. This was also the point that we could do more extensive quality control such as switching to a single content column which further emphasized the extraneous information on many of our guides. Infographics, videos, and long blocks of links or text were scrutinized to determine if they were helping to enhance students’ understanding of the core content or if they were merely providing clutter that would make it more difficult to understand the important information.9 In some cases, by going from guide to guide, it became apparent that there were guides for multiple courses in a subject area where the resources were basically identical. This was most noticeable in the criminal justice and health education subject areas. In these cases, it made little sense to keep separate course guides when the content was basically the same across them. To remedy this duplication, one of the course guides for each subject was transformed into the subject area guide, and resources were added to ensure they covered the same materials that the separate course guides may have covered. The remaining course guides were then marked for future deletion as they were no longer needed. Lastly, subject areas without guides were identified so that work could be done later to create them. As we had discussed moving towards using the “automagic” integration of guide content into our Blackboard Learning Management System (LMS), this step will be key in ensuring that all subject areas have at least some resources students can use. However, as of this time we have yet to finish creating these additional guides, and several subject areas (including computer science, nursing, and gender studies) have no guides at all. NEXT STEPS Now that all of the work to clean and update our LibGuides is done, the most important next step is coming up with a workflow to ensure that the guides stay relevant and useful. The web and systems librarian mostly left the guides alone for the Fall 2019 semester to allow their colleagues time to use them and report back any issues. To the web and systems librarian’s surprise there were few issues reported, but that does not mean there is no room for future improvement. As a department, it is clear that we need a formal plan for maintaining the guides, including update frequency, content review, and guidelines for when guides should be added or deleted. Additionally, immediately following the conclusion of this cleanup project the library’s website was forced into a server migration and full rebuild for reasons outside of the scope of this article. However, as a result there were changes made on the library’s site involving the look and feel of pages that will need to be carried through into our guides and associated Springshare platforms. While most of this work is relatively simple, mimicking changes developed in WordPress to work properly on external services will take time and effort. CONCLUSION Overall, while this project was a massive undertaking (done almost entirely by a single person), the end result, at least on the surface, has made our guides much easier to use and understand. There were obviously several things that, if the project were to be done over, should have been done differently, mostly involving the cleaning of the asset library. However, it is now much easier INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 TENDING TO AN OVERGROWN GARDEN | HYAMS 9 to refer students to guides for their courses and the feelings about the guides amongst the Library faculty have become much more positive. ENDNOTES 1 “LibGuides: The Next Generation!,” Springshare Blog (blog), June 26, 2013, https://blog.springshare.com/2013/06/26/libguides-the-next-generation/. 2 The guide can be viewed at: https://bmcc.libguides.com/guidecleanup. 3 Though the author only learned of the project undertaken at UNC a few years ago, after they had already finished this project, a similar project was outlined here: Sarah Joy Arnold, “Out with the Old, in with the New: Migrating to LibGuides A-Z Database List,” Journal of Electronic Resources Librarianship 29, no. 2 (April 2017): 117–20, https://doi.org/10.1080/1941126X.2017.1304769. 4 Because there was no way to view the documents before a bulk deletion, documents were manually reviewed and deleted as needed. 5 It was only long after this process that Springshare promoted that they could do this on the backend by request. 6 However, it turned out that due to the differences in URL structure between classic Primo and Primo VE that this change was completely unnecessary as the URLs did actually needed to be changed again post-migration. At least they were consistent which meant a systemwide find- and-replace could take care of most of the links. 7 Several studies have been done since the roll out of LibGuides v2 including: Sarah Thorngate and Allison Hoden, “Exploratory Usability Testing of User Interface Options in LibGuides 2,” College and Research Libraries 78, no. 6 (2017): 844–61, https://doi.org/10.5860/crl.78.6.844; Kate Conerton and Cheryl Goldenstein, “Making LibGuides Work: Student Interviews and Usability Tests,” Internet Reference Services Quarterly 22, no. 1 (January 2017): 43–54, https://doi.org/10.1080/10875301.2017.1290002. 8 Of the many guides the author consulted, the following were the most informative: Stephanie Jacobs, “Best Practices for LibGuides at USF,” https://guides.lib.usf.edu/c.php?g=388525&p=2635904; Jesse Martinez, “LibGuides Standards and Best Practices,” https://libguides.bc.edu/guidestandards/getting-started; Carrie Williams, “Best Practices for Building Guides & Accessibility Tips,” https://training.springshare.com/libguides/best-practices-accessibility/video. 9 There is a very detailed discussion of cognitive overload in LibGuides in Jennifer J. Little, “Cognitive Load Theory and Library Research Guides,” Internet Reference Services Quarterly 15, no. 1 (March 1, 2010): 53–63, https://doi.org/10.1080/10875300903530199. https://blog.springshare.com/2013/06/26/libguides-the-next-generation/ https://bmcc.libguides.com/guidecleanup https://doi.org/10.1080/1941126X.2017.1304769 https://doi.org/10.5860/crl.78.6.844 https://doi.org/10.1080/10875301.2017.1290002 https://guides.lib.usf.edu/c.php?g=388525&p=2635904 https://libguides.bc.edu/guidestandards/getting-started https://training.springshare.com/libguides/best-practices-accessibility/video https://doi.org/10.1080/10875300903530199 ABSTRACT Introduction Getting Started Process The Database List The Asset Library The Guides Removing Debris Cosmetic Improvements Replanting and Sprucing Up Next Steps Conclusion ENDNOTES
12191 ---- Making Disciplinary Research Audible: The Academic Library as Podcaster ARTICLES Making Disciplinary Research Audible The Academic Library as Podcaster Drew Smith, Meghan L. Cook, and Matt Torrence INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12191 Drew Smith (dsmith@usf.edu) is Associate Librarian, University of South Florida. Meghan L. Cook (mlcook3@usf.edu) is Coordinator of Library Operations, University of South Florida. Matt Torrence (torrence@usf.edu) is Associate Librarian, University of South Florida. © 2020. ABSTRACT Academic libraries have long consulted with faculty and graduate students on ways to measure the impact of their published research, which now include altmetrics. Podcasting is becoming a more viable method of publicizing academic research to a broad audience. Because individual academic departments may lack the ability to produce podcasts, the library can serve as the most appropriate academic unit to undertake podcast production on behalf of researchers. The article identifies what library staff and equipment are required, describes the process needed to produce and market the published episodes, and offers preliminary assessments of the podcast impact. INTRODUCTION The academic library has always had an essential role in the research activities of university faculty and graduate students, but until the last several years, that role has primarily focused on assisting university researchers with obtaining access to all relevant published research in their fields, making it possible for those researchers to complete a thorough literature review. More recently, that role has evolved to encompass assisting with other aspects of research and publication, including consulting on copyright-related issues, advising researchers on the most appropriate places to publish, preserving publications and data in institutional repositories, helping tenure-track faculty to evaluate their research impact as part of the tenure and promotion process, and hosting open-access journals. Meanwhile, libraries of all types have experimented in the last ten to fifteen years with using social media to promote library collections, services, and events. Many libraries have taken advantage of Facebook, Twitter, and YouTube as part of these efforts. Increasingly, libraries have incorporated makerspaces so that library patrons can create and edit video and audio files, meaning that this same equipment and software is now available to librarians and other library staff for their own purposes. This has resulted in libraries producing promotional videos and podcasts. The dramatic increase in mobile technology (smartphones and tablets) ownership and usage over the last decade has resulted in an increase in the consumption of podcasts wherever the listener happens to be when their ears are not otherwise fully occupied, such as commuting, exercising, and engaging in home chores. As a result, academic libraries are now finding themselves in an excellent position to use podcasting for instructional and promotional purposes in an effort to reach a broad audience. What happens when the university library combines its inherent interest in supporting the promotion of faculty and graduate student research with its ability to create podcasts to quickly and inexpensively reach an international audience? This paper documents the efforts of an academic library at a high-level research university to partner with one of the university’s mailto:dsmith@usf.edu mailto:mlcook3@usf.edu mailto:torrence@usf.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 2 academic departments to use podcasting to promote the research done by that department’s faculty and doctoral candidates. We will describe which library staff were involved, how the podcast was planned, the execution of the podcasting process, the issues that were encountered throughout the process, and how the impact of the podcast was assessed. Calling: Earth, the podcast produced by the University of South Florida (USF) Libraries, can be found at http://callingearth.lib.usf.edu/. LITERATURE REVIEW Podcasting as a means for promoting scholarly communication is a relatively new and uncommon idea in a library setting, therefore the extant literature is scarce on the subject. A high percentage of the contemporary articles on the aforementioned topic focus on the use of podcasts as a means to satisfy a wide array of student learning needs. While pedagogical best practices knowledge is useful, what current literature does exist is not an exact match for the concept of promoting scholarly communication, which offers subject specificity, faculty and graduate interaction, marketing of libraries, and research visibility as aggregate goals. What follows in this literature review is a summary of a slice of the literature related to podcasting, academia, and/or libraries. The researchers chose as a starting point to look at the general use of podcasting, as well as social media, in various academic and library environments. In a recent article on the use of social media and altmetrics, for example, the increased use of these tools is outlined, but with numerous caveats regarding the initial non-probabilistic methods of gathering information on the how and why of their adoption.1 To further emphasize the use of podcasts and, in a related way, social marketing, an examination of an article related to Association of Research Libraries (ARL) efforts in this vein was examined. A comprehensive study of ARL member libraries published in 2011, with not much on this topic published since this date, demonstrated in figure 1 of their research that five of the 37 respondents contained recorded interviews and only one included scholarly publishing content.2 This ten-year vacuum in further research was unexpected but indicates an opportunity for a new type of podcast focusing on academic production. Scholars in academic libraries have long examined student preferences for new technologies and types of information transfer, including the use of podcasts. A study from Sam Houston State University found that 36 percent of users in 2011 were using podcasts for recreational purposes as opposed to much lower use for academic and scholarly communication benefits. 3 In the future, academic creation and utilization of podcasts for scholarly communication is ripe for a hearty statistical and qualitative analysis. Specific to this inquiry, the application of podcasts for scholarly communication in a subject discipline present in the literature appears to be lacking. Furthermore, this literature review emphasizes the dearth of research that relates to promoting the research efforts of geosciences faculty and graduate students. In terms of recent literature, there are also a number of publications available that deal with the history and evolution of podcasting in education and, specifically, higher education. One such current work provides an excellent outline of this growth in use, as well as outlining several major types, or genres, of podcasting in these types of environments. Following a strong and succinct overview of the technology and its use in college and university settings, the author continues to effectively define, with examples, the three main genres they have identified: the “Quick Burst,” the “Narrative,” and the “Chat Show.”4 The model that most represents USF’s Calling: Earth program is “Narrative,” as this includes a subcategory of “Storytelling.” This work is truly beneficial for any group or individual developing, or improving, an educational podcast effort. http://callingearth.lib.usf.edu/ INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 3 In 2011, Peoples and Tilley outlined the emergence of podcasts to disseminate information in academic libraries. One of the excellent questions that arises from this work deals with the access, advancement, and archiving of the content; is this content to be archived, or cataloged, as more permanent material, or is it electronic ephemera?5 This is a question for the USF Calling: Earth podcast group going forward as the level and quality of content and, ideally, use is expanded. Additionally, educators are studying more about the limitations of podcasts; not to rule them out as academic tools, but to inspire and enhance the best possible outcomes. One excellent warning to be heeded by any library hoping to utilize podcasts for education and dissemination of research is summed up well in this quote: “If students do not utilize or do not realize the benefits of the self- pacing multimedia characteristics of podcasting, then the resource becomes a more likely contributor to cognitive overload.”6 There have been a small number of the quantitative elements of podcast use in academic libraries. An article in VINE: The Journal of Information & Knowledge Management Systems outlined, via content analysis and other methods, various unique and shared characteristics of existing academic podcasts, while also furthering the concept of podcasting as a “library service.”7 This may not have been the first publication to make this assertion, but this is a view that is also held by these authors and this view shapes the development and advancement of the USF Libraries podcasting efforts. Librarians of all types must be wary, however, as there are numerous articles that focus on the better understanding of student learning preferences. As presented by a recent article on the success of satellite and distance learners showed, though, these tools often hit the spot on the delivery preferences for these types of students.8 Switching gears to a bit more topic specificity, a number of news and academic articles were identified on the use of podcasts in areas of the geosciences. One such effort is th e Geology Flannelcast. The development and implementation of this combination of education and entertainment, which is also a goal of these authors, is outlined by the creators’ poster presentation at a recent Geological Society of America conference. With a focus on the increasing ease of podcasting technology, reduced cost of equipment, and the use of “conversational atmosphere” within a pedagogical framework, this model stood out as one worth studying.9 Furthermore, the geosciences are, or can be, interesting and exciting. A recent podcast on communicating geosciences with fun and flair is just the encouragement this research group needed to go all-in on this project. And that the geosciences are far from boring!10 As is evidenced by an examination of current and historical literature on this topic, there are multiple opportunities for further exploration and library efforts, expressly as one of the main points of this work is to emphasize faculty and graduate research efforts and scholarly communication and original content creation. In addition to the focus on these publications and presentation efforts, the results will be measured by the initial assessment projects including download and utilization data and, hopefully, positive feedback from participants and library administration. Further measurement is expected to demonstrate advanced citation counts and downloads of the publications of the faculty and graduate student interviewees. It will be correlation and not causation, of course, but the team hopes to have positive feedback for participants and the library. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 4 STAFFING As with any successful project, a project to produce a podcast focused on academic research had to begin with individuals who had either the interest or the expertise, ideally both, to initiate the work. One was an associate librarian with more than 13 years of experiencing in producing regular podcasts, while the other was a library staff member who was a doctoral candidate serving on the USF Libraries Research Platform Team (RPT) for the USF School of Geosciences. The RPT was already tasked with assisting the Geosciences faculty and graduate students in maximizing the impact of their work and had been using various means in order to accomplish this, such as an institutional repository for research output, and tools to measure the impact of previously published work. During a conversation in late 2018, the librarian suggested to the RPT staff person that podcasting could be used to promote research to a variety of audiences, including USF faculty and students, faculty and students at other universities, K-12 science teachers, and members of the general public (both local and beyond). The librarian offered to initiate the podcast and train the RPT staff on how to continue the podcast after a number of episodes had been produced. The librarian brought to the project the needed expertise with launching and maintaining a podcast, while the RPT doctoral candidate was already familiar with the Geosciences faculty and other doctoral candidates and could identify those who would make good candidates for being interviewed about their research. PLANNING The initial planning for the podcast began approximately two months before the first episode release. The original project managers and podcast creators met a number of times to discuss logistics, equipment, and staffing needs, and to agree upon a podcast name (Calling: Earth). Since the notion of podcasting for researcher promotion was an unexplored territory, the support from higher administration was cautious. However, after production of the first episodes, traction behind the podcast grew and additional support for future endeavors was received. The podcasters acquired handheld recording equipment, a Tascam DR-05 Linear PCM Recorder, from the USF Libraries Digital Media Commons and tested it out in multiple environments (for instance, a quiet office versus a recording studio) to find the optimal location to record the interviews. We found the hand-held recording equipment worked well in a quiet office and allowed for travel to the researcher’s office if they requested. The podcast creation team discussed how to add intro and outro music to the podcast that would not violate any copyright restrictions but that would fit the mood of the podcast. Th e RPT staff person knew of a local Tampa-based band, The Growlers, as a potential source for music because the bass guitarist was an adjunct professor and alumnus of the USF School of Geosciences. The alumnus gave permission to use a portion of the band’s recorded music for the podcast. A hosting service was needed to host and publish the podcast. The librarian suggested using Libsyn, because of their 13 years of previous experience with the platform, Libsyn’s inexpensive hosting plan, and the ability to acquire statistics including the geographic locations (countries and states) where the podcast was being downloaded. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 5 EXECUTION Potential interviewees were contacted via email and invited to be interviewed. Once the potential interviewee agreed, a time and a place to conduct the interview was agreed upon. The RPT staff person determined what the most recent research was for each interviewee, and then pro vided that content to the librarian host for review. The host then prepared interview questions based on the research content. The host went over the questions with the interviewee before the interview began to clear the content with the interviewee and to make sure everything would be covered in the interview that the interviewee wished to cover. The interviews took approximately 30 minutes to an hour. Editing of the podcast was done using GarageBand, allowing for the addition of the music to the beginning and end, as well as the host introducing both the general podcast and the specific episode, identifying the academic units involved in the podcast, indicating how listeners might provide feedback, and thanking the music group for allowing the use of their music. In a few rare cases, small interview segments were removed, usually due to the interviewee feeling that it did not represent them well. CHALLENGES As with any new endeavor, challenges were faced at all stages in the process of getting the podcast to production and beyond. Buy-in from Library Administration An early challenge was to gain buy-in from the library administration. This began with requesting that the library fund the hosting service, and the feeling of the administrator was that it was a worthwhile experiment, at least in the short term. Once a number of episodes had been produced, the library administration had a better sense of the quality of the production and how it would serve the interests of the library in its academic support role. Lack of Budget With no budget for this project (beyond the administration’s monthly payment of the hosting service), the podcasters were at the mercy of the quality of the recorders available for library checkout. If the recorders did not produce a high-quality recording, the podcast would possibly lack the sophistication needed for production. Also, high-quality graphics work was needed and required us to look into other library units for help with creating a logo. Getting the Podcast into Apple Podcasts Once content was being produced and published, it was time to submit the podcast to Apple Podcasts. Apple initially rejected the submission because the first logo looked very similar to an iPhone. It should be noted that Apple did not supply a specific explanation of what copyright was being infringed, so the podcasters were faced with making a best guess as to what the problem was. Based on our assumption, we changed the logo and resubmitted the podcast. A further problem arose when Apple required that the new submission use a different RSS feed than the original submission. Eventually the podcasters sought assistance from Libsyn, who explained how to make a minor change to the URL of the RSS feed so that the podcast could be successfully resubmitted. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 6 New Logo Creation The first logo continued to be used for the entire first season, but before the second season was released, the library’s new communications and marketing coordinator assisted with the creation of a new logo that looked more sophisticated and more in-line with other podcast logos. Having an in-house graphics designer was extremely helpful in rolling out a new logo (See figures 1 and 2). Figure 1. Season 1 Logo Figure 2. Current Logo Setting Up Interviews Identifying potential interviewees, requesting interviews, and setting good times and locations for the interviews brought on another batch of challenges. The USF School of Geosciences is composed of geologists, geographers, and environmental scientists so when planning out the schedule for the potential interviewees, an effort was made to involve a wide range of researchers. Some potential interviewees denied the request altogether, while others were not available for the needed time period. Given that the podcast was released every two weeks, there was a little wiggle room for scheduling hiccups, but once or twice a last-minute request to a new potential interviewee was made to ensure production stayed on schedule. Where the interview was held and what time required a lot of back-and-forth emails between the RPT staff person and the interviewee. Preference on time and location was given to the interviewee, but it was requested that, if they did not want to come to the library to be interviewed, their own office/lab space could be used if it was a sufficiently quiet environment for recording purposes. Comfortability of the Interviewee Once an interview began, the challenge of engagement from the host and comfortability of the interviewee became apparent. The host had to engage the researcher at a level appropriate for a general audience, which was challenging given that the research done by the USF School of Geosciences is often at a high-level of critical thinking and problem-solving. To add on to the complexity of the research being explained, the comfort level of the interviewee had the potential INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 7 to dampen the interview. One researcher was so uncomfortable speaking in an interview that they typed up in advance what they wanted to say. ASSESSMENT Libsyn Statistics According to Libsyn statistics (as of July 17, 2020) there were a total of 3,593 unique downloads from 48 different countries of the published 35 episodes of Calling: Earth. In table 1, the 48 countries where Calling: Earth has been downloaded are shown, as well as how many times the podcast has been downloaded in each country. It is worth noting that there are 105 downloads that do not have a location specified, so the total of the downloads in table 1 does not equal the total number of downloads reported by Libsyn. Table 1. Downloads by Country Name Downloads Name Downloads United States 2,729 Chile 3 United Kingdom 103 Denmark 3 India 98 Romania 3 Australia 88 South Africa 3 France 62 Yemen 3 Ireland 50 Argentina 2 Bangladesh 43 Ecuador 2 Spain 37 Poland 2 Russian Federation 36 Taiwan 2 Norway 30 Turkey 2 Portugal 30 Belgium 1 Germany 20 Bulgari 1 Japan 19 Colombia 1 Mexico 18 Costa Rica 1 Italy 14 Estonia 1 Netherlands 12 Greece 1 New Zealand 11 Latvia 1 Brazil 9 Macedonia 1 Korea, Republic of 9 Nigeria 1 Czech Republic 7 Pakistan 1 Ukraine 7 Saudi Arabia 1 China 6 United Arab Emirates 1 Hong Kong 5 Vietnam 1 Sweden 4 Without a location 105 Canada 3 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 8 Preliminary Survey and Scholarly Impact A survey was sent out to the interviewees to gauge their impressions of the podcast and to see if they had noticed any impact to their citations or document downloads. Our goal for the survey was to find out if the podcast was accomplishing the intention for starting a podcast, which was to increase researcher impact by research dissemination, as well as to inform the podcast processes and procedures. The questions asked were: 1. In what ways do you view the Calling: Earth podcast as a way to positively affect your research impact? 2. What evidence do you have, if any, to suggest your research has been positively impacted because of being an interviewee on the Calling: Earth podcast? 3. What would you have liked to be different about your interview process for the Calling: Earth podcast? 4. What suggestions do you have for the future seasons of the Calling: Earth podcast? For example, should the format change, the focus be different, change the length of the interview, etc. Furthermore, each interviewee was asked to contribute their scholarship to the library’s institutional repository, Scholar Commons, to allow for the archiving of their research publications and to use as a means of tracking scholarship impact as a result of the podcast. Once an interviewee’s scholarship was placed in Scholar Commons, a Selected Works profile was created so that a direct link to the scholar’s work could be disseminated through the podcast notes. Impact on faculty has also been noteworthy. The download totals for faculty interview participants (when comparing roughly the same amount of time just prior to and following their published interview) showed an average increase of 30 percent and suggest a strong correlative link between the podcast and researcher impact. Furthermore, anecdotal evidence from interviewees such as “puts my name out there to a wider audience,” “enhances the visibility of my work,” and “allow others to hear about [my research] in a more passive way” indicates the potential impact a researcher can see from being a part of the podcast. A second survey was sent to the faculty, students, and staff of the entire School of Geosciences to determine who was listening to the podcast and conversely, who was not, and their reasons for listening or not listening. The survey contained five questions in total, but according to how the participant selected their answers, not all were available to be answered (figure 3). The first question asked their status in the School of Geosciences (faculty, staff, undergraduate, graduate, or other). The second question asked if they had heard of the podcast and if they had or had not listened to it. If a participant chose the option that they had never heard of the podcast, then the survey ended for them. If a participant chose the option that they had heard of the podcast, but had not listened to it, then the survey directed them to a question that asked them to provide reasons they had not listened to the podcast. If a participant chose the option that they had heard of the podcast, and had listened to at least one episode, the survey directed them to a question that asked how many episodes the participant had listened to and for what reasons were they listening to the podcast. This data was collected to inform the future direction of the podcast. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 9 Figure 3. Flow Chart for the Entire School of Geosciences Survey CHECKLIST FOR PODCAST PLANNING/EXECUTION Based on our experiences in the production of the Calling: Earth podcast, we recommend that academic librarians and library staff use the following list to help with planning and executing the production of their own podcasts: • Get general buy-in from library staff and administration, and update as the planning progresses and budgeting is needed. • Decide on goals, audience, content, format, frequency of production, and methods of assessment. • Work with media staff to design marketing, including podcast title (avoiding duplication with other podcasts) and logo development. • Choose a podcast hosting service. • Identify relevant staff for hosting, recording, editing, and publishing and train as needed. • Evaluate existing hardware and software and make additional purchases as needed. • Contact potential interviewees and create a schedule. • Prepare customized interview questions and share as appropriate with interviewees. • Record interviews. • Edit and publish episodes. • Submit podcast to Apple Podcasts, Spotify, and other popular podcast directories. • Monitor statistics. • Continue to engage in marketing and assessment activities. What choice best describes your current status in the USF School of Geosciences: Which of the following describes you: I have never heard of the Calling: Earth podcast I have heard of the Calling: Earth podcast, but have not listened to it I have heard of the Calling: Earth podcast and have listened to at least 1 episode What choice best describes why you have not listened to the Calling: Earth podcast: I know what a podcast is, but I do not have time to listen to the Calling: Earth podcast. END SURVEY Faculty Staff Graduate Student Undergraduate Student Other I do not know what a podcast is. Other I know what a podcast is, but I am not interested in listening to Calling: earth podcast. I know what a podcast is, but I do not have time to listen to the Calling: Earth podcast. END SURVEY Approximately how many episodes of the Calling: Earth podcast have you listened to? 1 For enjoyable content For awareness of current research in the USF School of Geosciences For instructional purposes For ways to find collaborators Other 2 3 4 5 6 7 8 END SURVEY INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 10 CONCLUSIONS AND FUTURE DIRECTIONS Enthusiasm and anecdotal positive feedback are enough fuel for current activities and levels of enthusiasm and the future of podcasting in libraries also appears open and exciting. At the USF Libraries, Calling: Earth is currently in its third season and with each new episode, new ideas and increased archival content become a permanent part of the library’s legacy and collections. This is another area ripe for future exploration, as this type of original content is archived, cataloged, and disseminated, becoming another part of regular academic impact measure. In this vein, the USF Libraries podcasting group plans to further codify cyclical assessment tools, including the receipt of IRB clearance for future surveys and data collection. In addition to cleaning up and refining these assessment practices, this will also provide the opportunity to publish and present publicly on more specific data. Ideally, the group will be able to correlate the show’s presence to positive citation or metrics levels with show participants. The USF Libraries Geosciences RPT is currently collecting baseline aggregate information, which could then be compared following further maturation and dissemination of the podcast. Causality may never be within reach, but any positive impacts will be exciting and beneficial. It is also the hope of those involved with Calling: Earth that it might provide a model or template for other RPT or library podcasts or media efforts. One of the current benefits is the strong and effective support from the Development and Communication directors at the USF Libraries and their partnerships in the future will certainly be key to the success of this and any other potential projects of this type. In closing, the academic library podcasting landscape is wide-open for further exploration and examination, and the USF Libraries plans to lead and learn. ENDNOTES 1 Cassidy R Sugimoto et al., “Scholarly Use of Social Media and Altmetrics: A Review of the Literature,” Journal of the Association for Information Science and Technology 68, no. 9 (2017): 2,037–62. 2 James Bierman and Maura L. Valentino, “Podcasting Initiatives in American Research Libraries,” Library Hi Tech 29, no. 2 (May 2011): 349, https://doi.org/10.1108/07378831111138215. 3 Erin Dorris Cassidy et al., “Higher Education and Emerging Technologies: Student Usage, Preferences, and Lessons for Library Services,” Reference & User Services Quarterly 50, no. 4 (2011): 380–91, https://doi.org/10.5860/rusq.50n4.380. 4 Christopher Drew, “Educational Podcasts: A Genre Analysis,” E-Learning and Digital Media 14, no. 4 (2017): 201–11, https://doi.org/10.1177/2042753017736177. 5 Brock Peoples and Carol Tilley, “Podcasts as an Emerging Information Resource,” College & Undergraduate Libraries 18, no. 1 (January 2011): 44, https://doi.org/10.1080/10691316.2010.550529. 6 Stephen M Walls et al., “Podcasting in Education: Are Students as Ready and Eager as We Think They Are?”, Computers & Education 54, no. 2 (January 2010): 372, https://doi.org/10.1016/j.compedu.2009.08.018. https://doi.org/10.1108/07378831111138215 https://doi.org/10.5860/rusq.50n4.380 https://doi.org/10.1177/2042753017736177 https://doi.org/10.1080/10691316.2010.550529 https://doi.org/10.1016/j.compedu.2009.08.018 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 11 7 Tanmay De Sarkar, “Introducing Podcast in Library Service: An Analytical Study,” Vine 42, no. 2 (2012): 191–213, https://doi.org/10.1108/03055721211227237. 8 Lizah Ismail, “Removing the Road Block to Students’ Success: In-Person or Online? Library Instructional Delivery Preferences of Satellite Students,” Journal of Library & Information Services in Distance Learning 10, no. 3–4 (2016): 286–311, https://doi.org/10.1080/1533290X.2016.1219206. 9 Jesse Thornburg, “Podcasting to Educate a Diverse Audience: Introducing the Geology Flannelcast,” in Innovative and Multidisciplinary Approaches to Geoscience Education (Posters) (Boulder, CO: Geological Society of America, 2015). 10 Catherine Pennington, “PODCAST: Geology Is Boring, Right? What?! NO! Why Scientists Should Communicate Geoscience...,” n.d., https://britgeopeople.blogspot.com/2018/10/PODCAST- geology-is-boring-right.html. https://doi.org/10.1108/03055721211227237 https://doi.org/10.1080/1533290X.2016.1219206 https://britgeopeople.blogspot.com/2018/10/PODCAST-geology-is-boring-right.html https://britgeopeople.blogspot.com/2018/10/PODCAST-geology-is-boring-right.html ABSTRACT Introduction Literature Review Staffing Planning Execution Challenges Buy-in from Library Administration Lack of Budget Getting the Podcast into Apple Podcasts New Logo Creation Setting Up Interviews Comfortability of the Interviewee Assessment Libsyn Statistics Preliminary Survey and Scholarly Impact Checklist for Podcast Planning/Execution Conclusions and Future Directions ENDNOTES
12197 ---- Using the Harvesting Method to Submit ETDs into ProQuest: A Case Study of a Lesser-Known Approach COMMUNICATIONS Using the Harvesting Method to Submit ETDs into ProQuest A Case Study of a Lesser-Known Approach Marielle Veve INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12197 Marielle Veve (m.veve@unf.edu) is Metadata Librarian, University of North Florida. © 2020. ABSTRACT The following case study describes an academic library’s recent experience implementing the harvesting method to submit electronic theses and dissertations (ETDs) into the ProQuest Dissertations & Theses Global database (PQDT). In this lesser-known approach, ETDs are deposited first in the institutional repository (IR), where they get processed, to be later harvested for free by ProQuest through the IR’s Open Archives Initiative (OAI) feed. The method provides a series of advantages over some of the alternative methods, including students’ choice to opt-in or out from ProQuest, better control over the embargo restrictions, and more customization power without having to rely on overly complicated workflows. Institutions interested in adopting a simple, automated, post-IR method to submit ETDs into ProQuest, while keeping the local workflow, should benefit from this method. INTRODUCTION The University of North Florida (UNF) is a midsize public institution established in 1972, with the first theses and dissertations (TDs) submitted in 1974. Since then, copies have been deposited in the library, where bibliographic records are created and entered in the library catalog and the Online Computer Library Center (OCLC). During the period of 1999 to 2012, some TDs were also deposited in ProQuest by the graduate school on behalf of students who decided to. This practice, however, was discontinued in the summer of 2012, when the institutional repository, Digital Commons, was established and submission to it became mandatory. Five years later, in the summer of 2017, interest in getting UNF TDs hosted in ProQuest resurfaced. This renewed interest grew out from a desire of some faculty and graduate students to see the institution’s electronic theses and dissertations (ETDs) posted there, in addition to a recent library subscription to the ProQuest Dissertations & Theses Global database (PQDT). A month later, conversations between the library and graduate school began on the possibility of resuming hosting UNF ETDs in ProQuest. Consensus was reached that the PQDT database would be a good exposure point for our ETDs, in addition to the institutional repository (IR), yet some concerns were raised. One of the concerns was cost of the service and who would be paying for it. Neither the library nor the graduate school had allocated funds for this. The next concern was the possibility of ProQuest imposing restrictions that could prevent students, or the university, from posting ETDs in other places. It was important to make sure there were no such restrictions. Another concern was expressed over students entering embargo dates in ProQuest that do not match the embargo dates selected for the IR. This is a common problem encountered by other libraries.1 For that reason, we wanted to keep the local workflow. The last concern expressed during the conversations was preserving students’ right to opt-in or out from distributing their theses in ProQuest. This is something both the graduate school and library have been adamant mailto:m.veve@unf.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 2 about. In higher education, requiring students to submit to ProQuest is a controversial issue which has raised ethical concerns and has been highly debated over the years.2 Once conversations between the library and graduate school were held and concerns were gathered, the library moved ahead to investigate the available options to submit ETDs into ProQuest. LITERATURE REVIEW Currently, there are three options to submit ETDs into ProQuest: (1) submission through the ProQuest ETD Administrator tool, (2) submission via File Transfer Protocol (FTP), and (3) submission through harvests performed by ProQuest.3 ProQuest ETD Administrator Submission Option In this option, a proprietary submission tool called ProQuest ETD Administrator is used by students, or assigned administrators, to upload ETDs into ProQuest. Inside the tool, a fixed metadata form is completed with information on the degree, subject terms are selected from a proprietary list, and keywords are provided. The whole administrative and review process gets done inside the tool. Afterwards, zip packages with the ETDs and ProQuest’s Extensible Markup Language (XML) files are sent to the institution via FTP transfers, or through direct deposits to the IR using the Simple Web-service Offering Repository Deposit (SWORD) protocol. The ETD Administrator submission method presents several shortcomings. First, the ProQuest XML metadata that is returned to the institutions must be transformed into IR metadata for ingest in the IR, a process that can be long and labor intensive.4 Second, the subject terms supplied in the returned files come from a proprietary list of categories maintained by ProQuest, which does not match the Library of Congress Subject Headings (LCSH) used by libraries.5 Third, control over the metadata provided is lost because the metadata form cannot be altered, plus customizations to other parts of the system can be difficult to integrate. 6 Fourth, there have been issues with students indicating different embargo periods in the ProQuest and IR publishing options, with instances of students choosing to embargo ETDs in the IR, while not in ProQuest.7 Lastly, this method does not allow students’ choice, unless the ETDs are submitted separately in two systems in a process that can be burdensome. Ultimately, for these reasons, we found the ETD Administrator not a suitable option for our institution. FTP Submission Option In this option, an administrator sends zip packages with the institution’s ETD files and ProQuest XML metadata to ProQuest via FTP.8 At the time of this investigation, there was a $25 charge per ETD submitted through this method.9 We did not want to pursue this option because of the charge and the tedious metadata transformations that would be needed between IR and ProQuest XML schemas. Another way to go around this would have been to submit the ETDs through the VIREO application. VIREO is an open source, ETD management system used by libraries to freely submit ETDs into ProQuest via FTP.10 This alternative, however, was not an option for us as our IR, Digital Commons, does not support the VIREO application. Harvesting Submission Option This is the latest method available to submit ETDs into ProQuest. In this option, ETDs are submitted first into an IR, or other internal system, where they get processed to be later harvested by ProQuest through the IR’s existing Open Archives Initiative (OAI) feed.11 At the time of this writing, we were not able to find a single study that documents the use of this method. This option INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 3 looked appealing and worth pursuing as it met most of our desired criteria. First, with this option, students’ choice would not be compromised as ETDs would be submitted to ProQuest after being posted in the IR. Second, because the ETD Administrator would not be used, issues with conflicting embargo dates and unalterable metadata forms would be avoided. In addition, the local workflow would be retained, thus eliminating the need for tedious metadata transformations between ProQuest and IR schemas. From the available options, this one seemed the most feasible solution for our institution. IMPLEMENTATION OF THE HARVESTING METHOD AT UNF After research on the different submittal options was performed, the library approached ProQuest to express interest in depositing our future ETDs into their system by using a post-IR option. In the first communications, ProQuest suggested we use the ETD Administrator to submit ETDs because is the most commonly used method. When we expressed interest in the harvesting option, they said “we have not been harvesting from BePress sites” (the company that makes Digital Commons) and suggested we use the FTP option instead.12 Ten months later, they clarified the harvests could be performed from BePress sites and that the option is free, with the only requirement of a non-exclusive agreement between the university and ProQuest. The news appeased both the library’s and the graduate school’s previous concerns, as we would be able to adopt a free method that would not compromise on students’ choice nor restrict students from posting in other places, while keeping the local workflow. After agreement on the submittal method was established, planning and testing of the harvesting method began. The library worked with ProQuest and BePress to customize the harvesting process while the university’s Office of the General Counsel worked with ProQuest on the negotiation process. Negotiation Process Before ProQuest could harvest UNF ETDs, two legal documents needed to be in place. The first document was the Theses and Dissertations Distribution Agreement, which specifies the conditions under which ETDs can be obtained, reproduced, and disseminated by ProQuest. The document had to be signed by the UNF’s Board of Trustees and ProQuest. The agreement stipulated the following conditions: • The agreement must be non-exclusive. • The university must make the full-text Uniform Resource Locators (URLs) and abstracts of ETDs available to ProQuest. • ProQuest must harvest the ETDs from the university’s IR. • The university and students have the option to elect not to submit individual works or to withdraw them. • No fees are due from the university or students for the service. • ProQuest must include the ETDs in the PQDT database. The second document that needed to be in place was the Theses and Dissertations Availability Agreement, which grants the university the non-exclusive right to reproduce and distribute the ETDs. This agreement between students and UNF specifies the places where ETDs can be hosted and the embargo restrictions, if any. UNF already has been using this document as part of its ETD workflow, but the document needed to be modified to include the additional option to submit INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 4 ETDs into ProQuest. Beginning with the spring 2019 semester, the revised version of the agreement provided students with two hosting alternatives: posting in the IR only or in the IR and ProQuest. Local Steps Performed Before the Harvesting The workflow begins when students upload their ETDs and supplemental files (Certificate of Approval and Availability Agreements) directly into the Digital Commons IR. In there, students complete a metadata template with information on the degree and keywords related to the thesis are provided. After this, the graduate school reviews the submitted ETDs and approves them inside the IR platform. Next, the Library Digital Projects’ staff downloads the native PDF files of ETDs, processes them, and creates public and archival versions for each ETD. Availability Agreements are reviewed to determine which students chose to embargo their ETDs and which ones chose to host them in ProQuest, in addition to the IR. If students choose to embargo their ETDs, the embargo dates are entered in the metadata template. If students choose to publish their ETDs in ProQuest, a “ProQuest: Yes” option is checked in their metadata template, while students who choose not to host in ProQuest would get a “ProQuest: No” in their template. (The ProQuest field is a new administrative field that was added to the ETD metadata template, starting with the spring 2019 semester, to assist with the harvesting process. It was designed to alert ProQuest of the ETDs that were authorized for harvesting. More detail on its functionality will be provided in the next section.) The reason library staff enters the ProQuest and embargo fields on behalf of students is to avoid having students enter incorrect data on the template. Following this review, the Metadata Librarian assigns Library of Congress Subject Headings to each ETD and creates authority files for the authors. These are also entered in the metadata template. Afterwards, the ETDs get posted in the Digital Commons’ public display, with the full- text PDF files available only for the non-embargoed ETDs. Information that appears in the public display of Digital Commons will also appear immediately in the OAI feed for harvesting. At this point, two separate processes take place: 1. Metadata Librarian harvests the ETDs’ metadata from the OAI feed and converts it into MARC records that are sent to OCLC, with the IR’s URL attached. The workflow is described at https://journal.code4lib.org/articles/11676. 2. On the seventh of each month, ProQuest harvests the full-text PDF files, with some metadata, of the non-embargoed ETDs that were authorized for harvesting from the OAI feed. Harvesting Process (Customized for Our Institution) To perform the harvests, ProQuest creates a customized robot for each institution that crawls OAI- PMH compliant repositories to harvest metadata and full-text PDF files of ETDs.13 The robot performs a date-limited OAI request to pull everything that has been published or edited in an IR’s publication set during a specific timeframe. Information to formulate the date limited request is provided to ProQuest by the institution for the first harvest only, subsequently, the process gets done automatically by the robot. The request contains the following elements: INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 5 • Base URL of the OAI repository • Publication set • Metadata prefix or type of metadata • Date range of titles to be harvested In the particular case of our institution, we needed to customize the robot to limit the harvests to authorized ETDs only. To achieve this, we worked with BePress to add a new, hidden field at the bottom of our Digital Commons’ ETD metadata template. The field, called ProQuest, consisted of a dropdown menu with 2 alternatives: “ProQuest Yes” or “ProQuest No” (see figure 1). The field was mapped to an element in the OAI feed that displays the value of “ProQuest: Yes” or “ProQuest: No,” thus alerting the robot of the ETDs that were authorized for harvesting and the ones that were not. The element used to map the ProQuest field in the OAI feed is the , which is a Qualified Dublin Core (QDC) element (figure 2). For that reason, the robot needs to perform the harvests from the QDC OAI feed in order to see this field. Figure 1. Display of the ProQuest Field’s Dropdown Menu in the Metadata Template Figure 2. Display of the ProQuest Field in the QDC OAI Feed After the ETDs authorized for harvesting have been identified with help from the “ProQuest: Yes” field, the robot narrows down the ones that can be harvested at the present moment by using the element. This element, as the name implies, provides the date when the full - text file of an ETD becomes available. It also displays in the QDC OAI feed (see figure 3). If the date is on or before the monthly harvest day, the ETD is currently available for harvesting. If the date is in the future, the robot identifies that ETD as embargoed and adds its title to a log of embargoed ETDs with some basic metadata (including the ETD’s author and the last time it was checked). The log of embargoed ETDs is then pulled out in the future to identify the ETDs that come out of embargo so the robot can retrieve them. Figure 3. Display of the Element in the QDC OAI Feed INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 6 After the ETDs that are currently available for harvesting have been identified (because they have the “ProQuest: Yes” field and a present or past availability date), the robot performs a harvest of their full-text PDF files by using the third element, which displays at the bottom of records in the OAI feed (figure 4). The third element contains a URL with direct access to the complete PDF file of ETDs that are currently not embargoed. ETDs that are currently on embargo contain a URL that redirects the user to a webpage with the message: “The full-text of this ETD is currently under embargo. It will be available for download on [future date]” (see figure 5). Figure 4. Display of the Third Element at the Bottom of Records in the QDC OAI Feed Figure 5. Message that Displays in the URL of Embargoed ETDs Once the metadata and full-text PDF files of authorized, non-embargoed ETDs have been obtained by the robot, they get queued for processing by the ProQuest editorial team, who then assigns them International Standard Book Numbers (ISBNs) and ProQuest’s proprietary terms. It takes an average of four to nine weeks for the ETDs to display in the PQDT database after been harvested. Records in the PQDT come with the institutional repository’s original cover page and a copyright statement that leaves copyright to the author. Afterwards, the process gets repeated once a month. This frequency can be set to quarterly or semi-annually if desired. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 7 ADDITIONAL POINTS ON THE HARVESTING METHOD Handling of ETDs that come out of embargo. When the embargo period of an ETD expires, the full-text PDF of it becomes automatically available in the IR’s webpage, and consequently, in the third element that displays in the OAI record. Each month, when the robot prepares to crawl the OAI feed, it will first check for the titles in the log of embargoed ETDs to determine if any of them have become fully available through the third element. The ones that become available are then pulled by the robot through this element. Handling of metadata edits performed after the ETDs have been harvested and published in PQDT. Edits performed to metadata of ETDs will trigger a change of date in the element that displays in the OAI records. This change of date will alert the robot of an update that took place in a record, which is then manually edited or re-harvested, depending on the type of update that took place. Sending MARC records to OCLC. As part of the harvesting process, ProQuest provides free MARC records for the ETDs hosted in their PQDT database. These can be delivered to OCLC on behalf of the institution on an irregular basis. Records are machine-generated “K” level and come with URLs that link to the PQDT database and with ProQuest’s proprietary subject terms. We requested to be excluded from these deliveries and continue our local practice of sending MARC records to OCLC with LCSH, authority file headings, and the IR’s URLs. Notifications of harvests performed by ProQuest and imports to the PQDT database. When harvests or imports to the PQDT have been performed by ProQuest, institutions do not get automatically notified. Still, they can request to receive scheduled monthly reports of the titles that have been added to the PQDT. UNF requested to receive these monthly reports. Usage statistics of ETDs hosted in PQDT. Usage statistics of an institution’s ETDs hosted in the PQDT can be retrieved from a tool called Dissertation Dashboard. This tool is available to the institution’s ETD administrators and provides the number of times some aspect of an ETD (e.g., citations, abstract viewings, page previews, and downloads) has been accessed through the PQDT database. Royalty payments to authors. Students who submit ETDs through this method are also eligible to receive royalties from ProQuest. OBSTACLES FACED During the planning phase, we encountered some obstacles that hindered progress on the implementation. These were: • Amount of time it took to get the ball rolling. Initially, we were misled by the assumption we would not be able to use the harvesting method to submit ETDs into ProQuest because we were BePress users, as we were originally told, but that ended up not being the case. Ten months later, we were notified by the same source that the harvesting option for BePress sites would be possible and doable by ProQuest. These were ten months that delayed the implementation process. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 8 • Amount of time it took to get the paperwork finalized and signed before the harvesting. From the moment first contact was initiated with ProQuest, to the moment the last agreement was finalized and signed by both parties, 21 months went by. There was a lot of back and forth in the negotiation process and paperwork between the University and ProQuest. • Inconsistent lines of communication. There were multiple parties involved in the communication process and some of the emails began with one person only to be later transferred to someone else. This lack of consistency in the communication lines made it difficult to determine who was in charge of particular tasks at certain stages of the process. CONCLUSION AND RECOMMENDATIONS Although problems were encountered at the beginning, implementation of the harvesting process at UNF was a complete success. Once the process started, it ran smoothly without complications. Harvests were performed on schedule and no issues with unauthorized content been pulled from the OAI were faced. Fields used to alert the robot in the OAI of the ETDs authorized for harvesting worked as planned, and so did the embargo log used to identify and pull the out of embargo ETDs. It should be noted that Digital Commons users who want to exclude embargoed ETDs from displaying in the OAI can do so by setting up an optional yes/no button in their submission form. This button prevents metadata of particular records from displaying in the OAI feed. We did not pursue this option because we have been using the ETD metadata that displays in th e OAI to generate the MARC records we send to OCLC. In addition, we took the necessary precautions to avoid exposing full content of the embargoed ETDs in the OAI feed. Institutions planning to use this method should be very careful with the content they display in the OAI as to avoid embargoed ETDs from been mistakenly pulled by ProQuest. Access restrictions can be set by either suppressing the metadata of embargoed ETDs from displaying in the OAI or by suppressing the URLs with full access to the embargoed ETDs. The same precaution should be taken if planning to provide students with the choice to opt-in or out from ProQuest. Altogether, the harvesting option proved to be a reliable solution to submit ETDs into ProQuest without having to compromise on students’ choice nor rely on complicated workflows with metadata transformations between IR and ProQuest schemas. Institutions interested in adopting a simple, automated, post-IR method, while keeping the local workflow, should benefit from this method. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 9 ENDNOTES 1 Dan Tam Do and Laura Gewissler, “Managing ETDs: The Good, the Bad, and the Ugly,” in What’s Past Is Prologue: Charleston Conference Proceedings, eds. Beth R. Bernhardt et al. (West Lafayette, IN: Purdue University Press, 2017), 200-04, https://doi.org/10.5703/1288284316661; Emily Symonds Stenberg, September 7, 2016, reply to Wendy Robertson, “Anything to watch out for with etd embargoes?,” Digital Commons Google Users Group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7Csort:da te/digitalcommons/RNInGtRarNY/6byzT9apAQAJ. 2 Gail P. Clement, “American ETD Dissemination in the Age of Open Access: ProQuest, NoQuest, or Allowing Student Choice,” College & Research Libraries News 74, no. 11 (December 2013): 562– 66, https://doi.org/10.5860/crln.74.11.9039; FUSE, 2012-2013, Graduate Students Re-FUSE!, https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/Graduate%20Students %20Re-FUSE.pdf?sequence=25&isAllowed=y. 3 “PQDT Submissions Options for Universities,” ProQuest, http://contentz.mkt5049.com/lp/43888/382619/PQDTsubmissionsguide_0.pdf . 4 Meghan Banach Bergin and Charlotte Roh, “Systematically Populating an IR With ETDs: Launching a Retrospective Digitization Project and Collecting Current ETDs,” in Making Institutional Repositories Work, eds. Burton B. Callicott, David Scherer, and Andrew Wesolek (West Lafayette, IN: Purdue University Press, 2016), 127–37, https://docs.lib.purdue.edu/purduepress_ebooks/41/. 5 Cedar C. Middleton, Jason W. Dean, and Mary A. Gilbertson, “A Process for the Original Cataloging of Theses and Dissertations,” Cataloging and Classification Quarterly 53, no. 2 (February 2015): 234–46, https://doi.org/10.1080/01639374.2014.971997. 6 Wendy Robertson and Rebecca Routh, “Light on ETD’s: Out from the Shadows” (presentation, Annual Meeting for the ILA/ACRL Spring Conference, Cedar Rapids, IA, April 23, 2010), http://ir.uiowa.edu/lib_pubs/52/; Yuan Li, Sarah H. Theimer, and Suzanne M. Preate, “Campus Partnerships Advance both ETD Implementation and IR Development: A Win-win Strategy at Syracuse University,” Library Management 35, no. 4/5 (2014): 398–404, https://doi.org/10.1108/LM-09-2013-0093. 7 Do and Gewissler, “Managing ETDs,” 202; Banach Bergin and Roh, “Systematically Populating,” 134; Donna O’Malley, June 27, 2017, reply to Andrew Wesolek, “ETD Embargoes through ProQuest,” Digital Commons Google Users Group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7Csort :date/digitalcommons/Gadwi8INfgA/sg7de7SdCAAJ. 8 Gail P. Clement and Fred Rascoe, “ETD Management & Publishing in the ProQuest System and the University Repository: A Comparative Analysis,” Journal of Librarianship and Scholarly Communication 1, no. 4 (August 2013): 8, http://doi.org/10.7710/2162-3309.1074. 9 “U.S. Dissertations Publishing Services: 2017-2018 Fee Schedule,” ProQuest. https://doi.org/10.5703/1288284316661 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7Csort:date/digitalcommons/RNInGtRarNY/6byzT9apAQAJ https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7Csort:date/digitalcommons/RNInGtRarNY/6byzT9apAQAJ https://doi.org/10.5860/crln.74.11.9039 https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/Graduate%20Students%20Re-FUSE.pdf?sequence=25&isAllowed=y https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/Graduate%20Students%20Re-FUSE.pdf?sequence=25&isAllowed=y http://contentz.mkt5049.com/lp/43888/382619/PQDTsubmissionsguide_0.pdf https://docs.lib.purdue.edu/purduepress_ebooks/41/ https://doi.org/10.1080/01639374.2014.971997 http://ir.uiowa.edu/lib_pubs/52/ https://doi.org/10.1108/LM-09-2013-0093 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7Csort:date/digitalcommons/Gadwi8INfgA/sg7de7SdCAAJ https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7Csort:date/digitalcommons/Gadwi8INfgA/sg7de7SdCAAJ http://doi.org/10.7710/2162-3309.1074 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 10 10 “Support: ProQuest Export Documentation,” Vireo Users Group, https://vireoetd.org/vireo/support/ProQuest-export-documentation/. 11 “PQDT Global Submission Options, Institutional Repository + Harvesting,” ProQuest, https://media2.proquest.com/documents/dissertations-submissionsguide.pdf. 12 Marlene Coles, email message to author, January 19, 2018. 13 “ProQuest Dissertations & Theses Global Harvesting Process,” ProQuest. https://vireoetd.org/vireo/support/ProQuest-export-documentation/ https://media2.proquest.com/documents/dissertations-submissionsguide.pdf ABSTRACT INTRODUCTION LITERATURE REVIEW ProQuest ETD Administrator Submission Option FTP Submission Option Harvesting Submission Option IMPLEMENTATION OF THE HARVESTING METHOD AT UNF Negotiation Process Local Steps Performed Before the Harvesting Harvesting Process (Customized for Our Institution) ADDITIONAL POINTS ON THE HARVESTING METHOD Handling of ETDs that come out of embargo. Handling of metadata edits performed after the ETDs have been harvested and published in PQDT. Sending MARC records to OCLC. Notifications of harvests performed by ProQuest and imports to the PQDT database. Usage statistics of ETDs hosted in PQDT. Royalty payments to authors. OBSTACLES FACED CONCLUSION AND RECOMMENDATIONS ENDNOTES
12207 ---- Intro to Coding Using Python at the Worcester Public Library PUBLIC LIBRARIES LEADING THE WAY Intro to Coding Using Python at the Worcester Public Library Melody Friedenthal INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12207 Melody Friedenthal (mfriedenthal@mywpl.org) is a Public Services Librarian, Worcester Public Library. ABSTRACT The Worcester Public Library (WPL) offers several Digital Learning courses to our adult patrons, and among them is “Intro to Coding Using Python”. This 6-session class teaches basic programming concepts and the vocabulary of software development. It prepares students to take more intensive, college-level classes. The Bureau of Labor Statistics predicts a bright future for software developers, web developers, and software engineers. WPL is committed to helping patrons increase their “hireability” and we believe our Python class will help patrons break into these lucrative and gratifying professions… or just have fun. HISTORY AND DETAILS OF OUR CLASS I came to librarianship from a long career in software development, so when I joined the Worcester Public Library in January 2018 as a Public Services Librarian, my manager proposed that I teach a class in programming. She asked me to research what language would be best. Python got high marks for ease of use, flexibility, growing popularity, and a very active online community. Once I selected a language, I had to choose an environment to teach it in – or so I thought. I had absolutely no experience in front of a classroom, and few pedagogical skills, so I sought out an online Python course within which to teach. I decided to use the Code Academy (CA) website as our programming environment. CA has self- guided classes in a number of subjects and the free Beginning Python course seemed to be just what we needed. I went through the whole class myself before using it as courseware. My intent was to help students register for CA, then, each day, teach them the concepts in that day’s CA lesson. They would then be set to do the online lesson and assignments. We first offered Python in June 2018. Problems with CA came up right from the start: students registered for the wrong class (despite the handout explicitly naming the correct class) and CA frequently tried to upsell to a not-free Python class. Since CA’s classes are MOOCs (Massive Open Online Courses), the developers built in an automated way of correcting student code: embedded behind each web page of the course, there’s code that examines the student’s code and decides whether it is acceptable or not. Good in theory, not so good in practice. CA’s “code-behind” is flawed and sometimes prevented students from advancing to the next lesson. mailto:mfriedenthal@mywpl.org INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 INTRO TO CODING USING PYTHON AT THE WORCESTER PUBLIC LIBRARY | FRIEDENTHAL 2 Moreover, some of the CA tasks were inane. For example, one lesson incorporated a kind of Mad Libs game. This is where the instructions ask, for example, for 13 nouns and 11 adjectives, and these are combined with set sentences to generate a silly story. This assignment turned out to be too long and difficult to fulfill, preventing students from advancing. Although I used CA the first few times I offered the class, I subsequently abandoned it and wrote my own classroom material. After determining that CA wasn’t appropriate, I chose an online IDE where the students could code independently. This platform worked well when I tested it ahead of time, but when the whole class tried to log on at once, we received denial-of-service error messages. Hurriedly moving on to Plan C, I chose Thonny, a free Python IDE which we downloaded to each PC in the Lab (see https://thonny.org/). Each student receives a free manual (see figure 1), which I wrote. Every time I’ve offered this class I’ve edited the manual, clarifying those topics the students had a hard time with. I’ve also added new material, including commands students have shown me. It is now 90 pages long, written in Microsoft Word, and printed in color. We use soft binders with metal fasteners. Figure 1. Intro to Coding Using Python manual developed for the course. https://thonny.org/ INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 INTRO TO CODING USING PYTHON AT THE WORCESTER PUBLIC LIBRARY | FRIEDENTHAL 3 The manual consists of the following sections: • Cover: course name, dates we meet, time class starts and ends, location, instructor’s name, manual version number, and a place for the student to write their own name. • Syllabus: goals for each of the six sessions. This is aspirational. • Basic information about programming, including an online alternative to Thonny, for students who don’t have a computer at home and wish to use our public computers for homework. • Lessons 1 – 17: “Hello World” and beyond. • Lesson 18: Object Oriented Design, which I consider to be advanced, optional material. Skipped if time is pressing or the class isn’t ready for it. • Lesson 19: Wrap-up: o How to write good code. o How to debug. o List of suggested topics for further study. o Online resources for Python forums and community. • List of WPL‘s print resources on Python and programming. • Relevant comic strips and cartoons. In March 2019, my manager asked me to start assigning homework. If a student attends all six sessions and makes a decent attempt at each assignment, at the sixth session they receive a Certificate of Completion. The certificate has the WPL name & logo, the student’s name, and my signature. Typically three or four students earn a certificate. Homework is emailed to me as an attachment. This class meets on Tuesday evenings and I tell students to send me their homework as soon as possible. Inevitably, several students don’t email me until the following Monday. While I don’t give out grades, I do spend considerable time reviewing homework, line by line, and I email back detailed feedback. When the January 2020 course started, I found that between October’s class and January, Outlook implemented a security protocol which removes certain file extensions from incoming email. And – you can see where this is going – the .py Python extension was one of them. I told students to rename their Python code files from xxxx.py to xxxx.py.doc, where “xxxx” is their program name. This fools Outlook into thinking the file is a Microsoft Word document and the email is delivered to me intact. When it arrives, I remove the .doc extension from the attachment and save it to a student-specific file. Then I open the file in Thonny and review it. Physically, our Computer Lab contains an instructor’s computer and twelve student computers (see figure 2). It also has a projector which projects the active window from the instructor’s computer onto a screen: usually the class manual. I use dry erase markers in a variety of colors to illustrate the concepts on a whiteboard. There is also a supply of pencils on hand for student note- taking use. The class is offered once per season. Although the classroom can accommodate twelve students, we set our maximum registration to fourteen, which allows us to maximize attendance even if patrons cancel or don’t show up. And if all fourteen do attend the first class, we have two lap tops I INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 INTRO TO CODING USING PYTHON AT THE WORCESTER PUBLIC LIBRARY | FRIEDENTHAL 4 can bring into the Lab. We also maintain a small waitlist, usually of five spots. We’ve offered this class seven times and the registration and waitlists have been full every time. Sometimes we have to turn students away. Figure 2. Classroom at Worcester Public Library. However, we had a problem with registered patrons not showing up, so last spring we implemented a process where, about a week before class starts, I email each student, asking them confirm their continued interest in the class. I tell them that if they are no longer interested—or don’t respond - I will give the seat we reserved for them to another interested patron (from the waitlist). In this email I also outline how the course is structured and that they can each earn a Certificate of Completion. I tell them class starts promptly at 5:30 and to please plan accordingly. Some students don’t check their email. Some patrons show up without ever registering; they are told registration is required and to try again in a few months. I keep track of attendance on an Excel spreadsheet. Here in Worcester, MA, weather is definitely a factor for our winter sessions. INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 INTRO TO CODING USING PYTHON AT THE WORCESTER PUBLIC LIBRARY | FRIEDENTHAL 5 Over time I’ve made the class more dynamic. I have a student read a paragraph in the manual aloud. I’ve switched around the order of some lessons, in response to student questions. I have them play a game to teach Boolean logic: “If you live in Worcester And you love pizza, stand up!”… then: “If you live in Worcester Or you love pizza, stand up!” Students range from experienced programmers (of other languages), to people with no experience but great aptitude, to people who just never seem to “get it”. This material is technical and I try hard to communicate the concepts but I lose a few students every time. We ask our patrons for feedback on all of our programs. Our Python students have written: • “… the classes were formatted in an organized manner that was beginner friendly” • “The manual is a big help. I'm thankful that the program is free.” • “… coding is fun and I learned a new skill.” • “This made me think critically and helped me understand where my errors in the programs were.” WPL is proud to offer classes that make a difference in our patrons’ lives. ABSTRACT History and Details of Our Class
12209 ---- Applying Gamification to the Library Orientation: A Study of Interactive User Experience and Engagement Preferences ARTICLES Applying Gamification to the Library Orientation A Study of Interactive User Experience and Engagement Preferences Karen Nourse Reed and A. Miller INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12209 Karen Nourse Reed (karen.reed@mtsu.edu) is Associate Professor, Middle Tennessee State University. A. Miller (a.miller@mtsu.edu) is Associate Professor, Middle Tennessee State University. © 2020. ABSTRACT By providing an overview of library services as well as the building layout, the library orientation can help newcomers make optimal use of the library. The benefits of this outreach can be curtailed, however, by the significant staffing required to offer in-person tours. One academic library overcame this issue by turning to user experience research and gamification to provide an individualized online library orientation for four specific user groups: undergraduate students, graduate students, faculty, and community members. The library surveyed 167 users to investigate preferences regarding orientation format, as well as likelihood of future library use as a result of the gamified orientation format. Results demonstrated a preference for the gamified experience among undergraduate students as compared to other surveyed groups. INTRODUCTION Background Newcomers to the academic campus can be a bit overwhelmed by their unfamiliar environment: there are faces to learn, services and processes to navigate, and an unexplored landscape of academic buildings to traverse. Whether one is an incoming student or recently hired employee of the university, all need to become quickly oriented to their surroundings to ensure productivity. In the midst of this transition, the academic library may or may not be on the list of immediate inquiries; however, the library is an important place to start. Newcomers would be wise to familiarize themselves with the building and its services so that they can make optimal use of its offerings. Two studies found that students who used the library received better grades and had higher retention rates. 1 Another study regarding university employees revealed that untenured faculty made less use of the library than tenured faculty, a problem attributed to lack of familiarity with the library.2 Researchers have also found that faculty will often express interest in different library services without realizing that these services are in fact available.3 It is safe to say that libraries cannot always rely on newcomers to discover the physical and electronic services on their own; they need to be shown these items in order to mitigate the risk of unawareness. In consideration of these issues, the Walker Library at Middle Tennessee State University (MTSU) recognized that more could be done to welcome its new arrivals to campus. The public university enrolls approximately 21,000 students, the majority of whom are undergraduates. However, with a Carnegie classification of doctoral/professional and over one hundred graduate degree programs, there was a strong need for specialized research among the university’s graduate students and faculty. Other groups needed to use the library too: non-faculty employees on campus as well as community users who frequently used Walker Library for its specialized and general collections. The authors realized that when new members of these different groups mailto:karen.reed@mtsu.edu mailto:a.miller@mtsu.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 2 arrived on campus, few opportunities were available for acclimation to the library’s services or building layout. Limited orientation experiences were conducted within library instruction classes, but these sessions primarily taught research skills and targeted freshman general- education classes as well as select upper-division and graduate classes. In short, it appeared that students, employees, and visitors to the university would largely have to discover the library’s services on their own through a search on the library website or an exploration of the physical library. It was very likely that, in doing so, the newcomers might miss out on valuable services and information. As MTSU librarians, the authors felt strongly that library orientations were important to everyone at the university so that they might make optimal use of the library’s offerings. The authors based this opinion on their knowledge of relevant scholarly literature as well as their own anecdotal experiences with students and faculty.4 The authors defined the library orientation differently from library instruction: in their view, an orientation should acquaint users with the services and physical spaces of the library, as compared to instruction that would teach users how to use the library’s electronic resources such as databases. The desired new approach would structure orientations in response to the different needs of the library’s users. For example, the authors found that undergraduates typically had distinct library interests compared to faculty. It was recognized that library orientations were time-consuming for everyone: library patrons at MTSU often did not want to take the time for a physical tour, nor did the library have the staffing to accommodate large-scale requests. The authors turned to the gamification trend, and specifically interactive storytelling, as a solution. Interactive storytelling has previous applications in librarianship as a means of creating an immersive and self-guided user experience.5 However, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. To overcome this gap, the authors developed an online, interactive, game-like experience via storytelling software to orient four different groups of users to the library’s services. These groups were undergraduate students, graduate students, faculty members (which included both faculty and staff at the university), and community members (i.e., visitors to the university or alumni); see figure 1 for an illustration of each groups’ game avatars. These groups were invited to participate in the gamified experience called LibGO (short for library game orientation). After playing LibGO, participants gave feedback through an online survey. This paper will give a brief explanation of the creation of the game, as well as describe the results of research conducted to understand the impact of the gamified experience across the four user groups. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 3 Figure 1. LibGO players were allowed to self-select their user group upon entering the game. Each of the four user groups was assigned an avatar and followed a logic path specified for that group. LITERATURE REVIEW Traditional Orientation Searches for literature on library orientation yield very broad and yet limited details about users of the traditional library orientation method. It is important to note that the terms “library tour” and “library orientation” can be somewhat vague, because this terminology is not interchangeable, yet is frequently treated as such in the literature.6 These terms are often included among library instruction materials which predominately influence undergraduate students.7 Kylie Bailin, Benjamin Jahre, and Sarah Morris define orientation as “any attempt to reduce library anxiety by introducing students to what a college/university library is, what it contains, and where to find information while also showing how helpful librarians can be.”8 Their book is a culmination of case studies of academic library orientation in various forms worldwide where the common theme across most chapters is the need to assess, revise, and change library orientation models as needed, especially in response to feedback, staff demands, and the evolving trend of libraries and technology.9 Furthermore, the majority of these studies are undergraduate-focused, and often freshman-focused, while only a few studies are geared towards graduate students. Other traditional orientation problems discussed in the literature include students lacking intrinsic motivation to attend library orientation, library staff time required to execute the orientation, and lack of attendance.10 Additionally, among librarians there seems to be consensus that the traditional library tours are the least effective means of orientation, yet they are the most highly used and with attention predominately focused on the undergraduate population alone. 11 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 4 In 1997, Pixey Anne Mosely described the traditional guided library tour as ineffective, and documented the trend of libraries discontinuing it in favor of more active learning options.12 Her study surveyed 44 students who took a redesigned library tour, all of whom were undergraduates (with freshmen as the target population). Although Mosely’s study only addressed one group of library users, it does attempt to answer a question on library perception whereby 93 percent of surveyed students indicated feeling more comfortable in using the library after the more active learning approach.13 A comparison study by Marcus and Beck looked at traditional vs treasure hunt orientations, and ultimately discovered that perception of the traditional method is limited by the selective user population and lack of effective measurements. They cited the need for continued study of alternative approaches to academic library orientation.14 A study by Kenneth Burhanna, Tammy Eschedor Voelker, and Julie Gedeon looked at the traditional library tour from the physical and virtual perspective. Confronted with a lack of access to the physical library, these researchers at Kent State University decided to add an online option for the required traditional freshman library tour.15 Their study compared the efficacy of learning and affective outcomes between face-to-face library tours and those of online library tours. Of the 3,610 students who took the required library tour assignment, 3,567 chose the online tour method and 63 opted or were required to take the in-person, librarian-led tour. Surveys were later sent to a random list of 250 students who did not take the in-person tour and the 63 students who did take the in-person tour. Of the 46 usable responses all but one were undergraduates and 39 (85 percent) of them were freshman.16 This is a small sample size with a ratio of slightly greater than 2:1 for online versus in-person tour participation. Although results showed that an instructor’s recommendation on format selection was the strongest influencing factor, convenience was also significant for those who selected the online option (81.5 percent). In contrast, only 18.5 percent of the students who took the face-to- face tour rated it as convenient. The authors found that regardless of tour type, students were more comfortable using the library (85 percent) and more likely to use library resources (80 percent) after having taken a library tour. Interestingly, students who took the online tour seemed slightly more likely to visit the physical library than those who took the in-person tour. Ultimately the analysis of both tours showed this method of library orientation encourages library resource use, and the “online tour seems to perform as well, if not slightly better than the in-person tour.”17 Gamification Use in Libraries An alternative format to the traditional method is gamification. Gamification has become a familiar trend within academic libraries in recent years, and most often refers to the use of a technology - based game delivery within an instructional setting. Some users find gamified library instruction to be more enjoyable than traditional methods. For these people, gamification can potentially increase student engagement as well as retention of information.18 The goal of gamification is to create a simplified reality with a defined user experience. Kyle Felker and Eric Phetteplace emphasized the importance of user interaction over “specific mechanics or technologies” in thinking about the gamification design process.19 Proponents of gamification of library instructional content indicate that it connects to the broader mission of library discovery and exploration as exemplified through collaboration and the stimulation of learning.20 Additional benefits of gamification are its teaching, outreach and engagement functions.21 Many researchers have documented specific applications of online gaming as a means of imparting library instruction. Mary J. Broussard and Jessica Urick Oberlin described the work of librarians at Lycoming College in developing an online game as one approach to teaching about INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 5 plagiarism.22 Melissa Mallon offered summaries of nine games produced for higher education, several of which were specifically created for use by academic libraries.23 Many of these online library games reviewed used Flash, or required players to download the game before playing. By contrast, J. Long detailed an initiative at Miami University to integrate gamification into the library instruction, a project which utilized Twine.24 Twine is an in-browser method and therefore avoids the problem of requiring users to download additional software prior to playing the game. Other libraries have used online gamification specifically as a tool for library orientations. Although researchers have demonstrated that the library orientation is an important practice in establishing positive first impressions of the library and counteracting library anxiety among new users, the differences between in-person versus online delivery formats are unclear.25 Several successful instances have been documented in which the orientation was moved to an online game format. Nancy O’Hanlon, Karen Diaz, and Fred Roecker described a collaboration at Ohio State University Libraries between librarians and the Office of First Year Experience; for this project, they created a game to orient all new students to the library prior to arrival on campus.26 The game was called “Head Hunt,” and was cited among those games listed in the article by Mallon. 27 Anna-Lise Smith and Lesli Baker reported the “Get a Clue” game at Utah Valley University which oriented new students over two semesters.28 Another orientation game developed at California State University-Fresno was noteworthy for its placement in the university’s learning management system (LMS).29 In reviewing the literature regarding online library gamification efforts, there appear to be several best practices. Several studies cite initial student assessment to understand student knowledge and/or perceptions of the content, followed by an iterative design process with a team of librarians and computer programmers.30 Felker and Phetteplace reinforced the need for this iterative process of prototyping, testing, deployment, and assessment as one key to success; however they also stated that the most prevalent reason for failure is that the games are not fun for users.31 Librarians are information experts, and are not necessarily trained in fun game design. Some libraries have solved this problem by partnering with or hiring professional designers; however for many under-resourced libraries, this is not an option.32 Taking advantage of open- source tools, as well as the documented trial-and-error practices of others, can be helpful to newcomers who wish to break into new library engagement methods utilizing gamification. As literature has shown, a traditional library tour may have a place in the list of library services, but for whom and at what cost are questions with limited answers in studies done to date. Gamification has offered an alternative perspective but with narrow accounts of its success in the online storytelling format and for users outside of the heavily focused freshman group. Across all literature of library orientation studies, there is little reference to other library user populations such as faculty, staff, community users, distance students, or students not formally part of a class that requires library orientation. DEVELOPMENT OF THE LIBRARY GAME ORIENTATION (LIBGO) LibGO was developed by the authors with not only a consideration for the Walker Library user experience, but also with a specific attention to the differing needs of the multiple user groups served by the library. This user-focused concern led to exploring creative methodologies such as user experience research and human-centered design thinking, a process of overlapping phases that produces a creative and meaningful solution in a non-linear way. The three pillars of design INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 6 thinking are inspiration, ideation, and iteration.33 Defining the problem and empathizing with the users (inspiration) led into the ideation phase, whereby the authors created low- and high-fidelity prototypes. The prototypes were tested and improved (iteration) through the use of beta testing in which playtesters interacted with the gamified orientation. The authors were novice developers of the gamified orientation, and this entailed a learning curve for not only the design thinking mindset but also the technical achievability. The development started with design thinking conversations and quickly turned to low-fidelity prototypes designed on paper. The development soon advanced to the actual coding so that the authors could get early designs tested before launching the final version. Prior to deployment on the library’s website, LibGO underwent a series of playtesting by library faculty, staff, and student employees. This testing was invaluable and led to such improvements as streamlining of processes and less ambiguity of text. LibGO was developed with the Twine open-source software (https://twinery.org), a product which is primarily used for telling interactive, non-linear stories with HTML. Twine was an excellent application for this project as it allowed the creation of an online and interactive “choose your own adventure” styled library orientation game, in which users could explore the library based upon their selection of one of multiple available plot directions. With a modest learning curve and as an open source software, Twine is highly accessible for those who are not accustomed to coding. For those who know HTML, CSS, JavaScript, variables, and conditional logic, Twine’s capabilities can be extended. The library’s interactive orientation adventure requires users to select one of the four available personas: undergraduate student, graduate student, faculty, or community member. Users subsequently follow that persona through a non-linear series of places, resources and points of interest built with the HTML output of using Twee (Twine’s programming language). See figure 2 for an example point of interest page and figure 3 for an example of a user’s final score after completing the gamified experience. Once the Twine story went through several iterations of design and testing, the HTML file was placed on the library’s website for the gamified orientation to be implemented with actual users. https://twinery.org/ INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 7 Figure 2. This instructional page within LibGO explains how to reserve different library spaces online. Upon reading this content, the user will progress by clicking on one of the hypertext lines in blue font at the bottom. Figure 3. Based upon the displayed avatar, this LibGO page is representative of a graduate student’s completion of LibGO. The page indicates the player’s final score and gives additional options to return to the home page or complete the survey. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 8 Purpose of Study LibGO utilized the common "choose your own adventure" format whereby players progress through a storyline based upon their selection of one of multiple available plot directions. Although the literature suggests that other technology-based methods are an engaging and instructive mode of content delivery, little prior research exists regarding this specific approach to library outreach. Furthermore, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. The researchers wanted to understand the potential of interactive storytelling as a means to educate a range of users on library services as well as make the library more approachable from a user perspective. The study was designed to understand the user experience of each of the four groups. The researchers hoped to discern which users, if any, found the gamified experience to be a helpful method of orientation to the library’s physical and electronic services. Another area of inquiry was to determine whether this might be an effective delivery method by which to target certain segments of the campus for outreach. Finally, the study intended to determine whether this method of orientation might incline participants toward future use of the library. METHODOLOGY Overview The authors selected an embedded mixed methods design approach in which quantitative and qualitative data were collected concurrently through the same assessment instrument.34 The survey instrument primarily collected quantitative data, however a qualitative open-response question was embedded at the end of the survey: this question gathered additional data by which to answer the research questions. Each data set (one quantitative and one qualitative) was analyzed separately for each participant group, and then the groups were compared to develop a richer understanding of participant behavior. Research Questions The data collection and subsequent analysis attempted to answer the following questions: 1. Which group(s) of library users prefer to be oriented to library services and resources through the interactive storytelling format, as compared to other formats? 2. Which group(s) of library users are more likely to use library services and resources after participating in the interactive storytelling format of orientation? 3. What are user impressions of LibGO, and are there any differences in impression based on the characteristics of the unique user group? Participants Participants for the study were recruited in-person and via the library website. In-person recruitment entailed the distribution of flyers and use of signage to recruit participants to play LibGO in a library computer lab during a one-day event. Online recruitment lasted approximately ten weeks and simply involved the placement of a link to LibGO on the home page of th e library’s website. A total of 167 responses were gathered through both methods and participants were distributed as shown in table 1. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 9 Table 1. Composition of Study’s Participants Group number Affiliation Number of responses 1 Undergraduate students 55 2 Graduate students 62 3 Faculty 13 4 Staff 28 5 Community members 9 TOTAL 167 For the purposes of statistical data analysis, groups 3 and 4 were combined to produce a single group of 41 university employee respondents; also, group 5’s data was not included in the statistical analysis due to the low number of participants. Qualitative data for all groups, however, was included in the non-statistical analysis. Survey Instrument A survey with twelve total questions was developed for this study and was administered online through Qualtrics. After playing LibGO, participants were asked to voluntarily complete the survey; if they agreed, they were redirected to the survey’s website. Before answering any survey questions, the instrument administered an informed consent statement to participants . All aspects of the research, including the survey instrument, were approved through the university’s Institutional Review Board (protocol number 18-1293). The first part of the survey (see appendix A) consisted of ten questions, each with a ten-point Likert scaled response. The first five questions were each designed to measure a Preference construct, and the next five questions each measured a Likelihood construct. The Pref erence construct referred to participant’s preference for a library orientation: did they prefer LibGO’s online interactive storytelling format, or did they prefer another format such as in-person talks? The Likelihood construct referred to the participant’s self-perceived likelihood of more readily engaging with the library in the future (both in-person and online) after playing LibGO. The second part of the survey gathered the participant’s self-reported affiliation (see table 1 for the list of possible group affiliations) as well as offered participants an open-ended response area for optional qualitative feedback. Data Collection The study’s data was collected in two stages. In stage one, LibGO was unveiled to library visitors during a special campus-wide week of student programming events. On the library’s designated event day, the researchers held a drop-in event at one of the library’s computer labs (see figure 4 for an example of event advertisement). Library visitors were offered a prize bag and snacks if they agreed to play LibGO and complete the survey. During the three-hour-long drop-in session, 58 individual responses were collected: the vast majority of these came from undergraduate students (51 responses), with additional responses from graduate students (n = 4), university staff employees (n = 2), and one community member responding. Community members were defined as anyone not currently directly affiliated with the university; this group may have included prospective students or alumni. Stage 2 began the following day after the library drop-in event, and simply involved the placement of a link to LibGO on the home page of the library’s website. Any visitor to the library’s website could click on the advertisement to be taken to LibGO. This link INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 10 remained active on the library website for ten weeks, at which point the final data was gathered. A total of 167 responses were gathered during both stages and participants were distributed as previously shown in table 1. Figure 4. Example of Student LibGO Event Advertisement RESULTS Quantitative Findings Statistical analysis of each of the ten quantitative questions required the use of one-way ANOVA in SPSS. A post hoc test (Hochberg’s GT2) was run in each instance to account for the different sample sizes. For all statistical analysis, only the data from undergraduates, graduate students, and university employees (a group which combined both faculty and staff results) were utilized. A listing of mean comparisons by group, for each of the ten survey questions, may be found in table 2. The analysis of the one-way ANOVAs yielded statistically significant results for three of the ten individual questions in the first part of the survey: questions 2, 3, and 6 (see table 3). Table 2. Descriptive Statistics for Survey Results (10-point scale, with 10 as most likely) Survey Question Mean for Undergraduate Students Mean for Graduate Students Mean for University Employees 1. In considering the different ways to learn about Walker Library, do you find this library orientation game to be more or less preferable as compared to other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)? 7.02 6.39 6.02 2. In your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources? 8.13 6.94 7.12 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 11 3. If your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own?) 7.38 5.94 5.98 4. Please indicate your level of agreement with the following statement: “As compared to playing the game, I would have preferred to learn about the library’s resources and services by my own exploration of the library website?” 6.11 6.50 5.88 5. Please indicate your level of agreement with the following statement: “As compared to playing the game, I would have preferred to learn about the library’s resources and services through an in- person orientation tour.” 6.11 5.08 5.76 6. After playing this orientation game, are you more or less likely to visit Walker Library in person? 8.27 6.94 6.90 7. After playing this library orientation game, are you more or less likely to use the Walker Library website to find out about the library (such as hours of operation, where to go to get different materials/services, etc.)? 7.82 6.97 7.20 8. After playing this library orientation game, are you more or less likely to seek help from a librarian at Walker Library? 6.95 6.58 6.63 9. After playing this library orientation game, are you more or less likely to use the library’s online resources (such as databases, journals, e-books)? 7.67 7.15 6.90 10. After playing this library orientation game, are you more or less likely to attend a library workshop, training, or event? 6.96 6.73 6.24 TABLE 3. Overall Statistically Significant Group Differences df F p w2 Question 2 2 3.714 .027 .03 Question 3 2 4.508 .012 .04 Question 6 2 7.178 .001 .07 Question 2 asked “In your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources?” The one-way ANOVA found that there was a statistically significant difference between groups (F(2,155) = 3.714, p = .027, ω2 = .03). The post hoc comparison using the Hochberg’s GT2 test revealed that undergraduates were statistically significantly more likely to prefer LibGO in this manner (M = 8.13, SD = 1.94, p = .031) as INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 12 compared to the graduate students (M = 6.94, SD = 2.72). There was no statistically significant difference between undergraduates and the university employees (p = .145). According to criteria suggested by Roger Kirk, the effect size of .03 indicates a small effect in perceived usefulness of LibGO as an introduction among undergraduates.35 Question 3 asked “If your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)?” The one-way ANOVA found that there was a statistically significant difference between groups (F(2, 155) = 4.508, p = .012, ω2 = .04). The post hoc comparison using the Hochberg’s GT2 test found that undergraduates were statistically significantly more likely to prefer LibGO over other orientation options (M = 7.38, SD = 2.49, p = .021) as compared to graduate students (M = 5.94, SD = 3.06). There was no statistically significant difference between undergraduates and university employees (p = .053). The effect size of .04 indicates a small effect regarding undergraduate preference for LibGO versus other orientation options. Question 6 asked “After playing this library orientation game, are you more or less likely to visit Walker Library in person?” The one-way ANOVA found that there was a statistically significant difference between groups (F(2,155) = 7.178, p = .001, ω2 = .07). The post hoc comparison using the Hochberg’s GT2 test revealed that undergraduates were statistically significantly more likely to visit the library after playing LibGO (M = 8.27, SD = 2.09, p = .003) as compared to graduate students (M = 6.94, SD = 2.20). Additionally, the test found that undergraduates were statistically significantly more likely to visit the library after playing LibGO (p = .007) as compared to university employees (M = 6.90, SD = 2.08). According to criteria suggested by Kirk, the effect size of .07 indicates a medium effect regarding undergraduate potential to visit the library in person after playing LibGO. 36 In addition to testing each individual survey question, tests were run to understand the possible group differences by construct (Preference and Likelihood). The Preference construct was an aggregate of survey questions 1-5, and the Likelihood construct was an aggregate of survey questions 6-10. For both constructs, the one-way ANOVA found results which were not statistically significant. In all, the quantitative findings indicated three areas by which the experience of playing LibGO was more helpful for the surveyed undergraduates than the other surveyed groups (i.e., graduate students or university employees). At this point, the analysis turned to the qualitative data so as to better understand participant views of LibGO. Qualitative Findings Analysis of the qualitative results was limited to the data collected in the survey’s final question. Question 12 was an open-response area, and was intentionally prefaced with a vague prompt: “Do you have any final thoughts for the library (suggestions, additions, modification, comments, criticisms, praise, etc.)?” Of the 167 total survey responses, 67 individuals chose to answer this question. Preliminary analysis showed that the feedback derived from this question covered a spectrum of topics, ranging from remarks on the LibGO experience itself to broader concerns regarding other library services. Open coding strategies were utilized to interpret the content of participant responses. Under this methodology, the responses were evaluated for general themes and then coded and grouped INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 13 under a constant comparative approach.37 NVivo 12 software was used to code all 67 participant responses. Initial coding yielded eight open codes, but these were later consolidated into six final codes (see table 4). One code (LibGO Improvement Tip) was rather nuanced and yielded five axial codes (see table 5). Axial codes denoted secondary concerns which fell under a larger category of interest. Although some participants gave longer feedback which addressed multiple concerns, care was taken to segregate each distinct concern to a specific code. Therefore, it is important to note that some comments addressed multiple concerns, and so the total number of concerns (n = 76) is greater than the total number of individuals responding to the prompt (n = 67). TABLE 4. Distribution of Qualitative Codes by User Group Code Undergraduate Graduate Faculty Staff Community member Total # concerns Positive feedback 7 7 1 4 2 21 Negative feedback 1 2 0 3 0 6 In-person tour preference 2 3 0 1 0 6 LibGO improvement tip 5 11 1 3 3 23 Library services feedback 2 4 3 0 0 9 Library building feedback 1 7 1 2 0 11 Total: 18 34 6 13 5 76 Discussion of Qualitative Themes Positive Feedback (21 separate concerns). Affirmative comments regarding LibGO were primarily split between undergraduate and graduate students, with a small number of comments coming from the other groups. Although all groups stated that the game was helpful, one undergraduate wrote “I wish I would’ve received this orientation at the very beginning of the year!” A graduate student declared “This was a creative way to engage students, and I think it should be included on the website for fun.” Both community members commented on the utility of LibGO in providing an orientation without having to physically come to the library; for example, “Interactive without having to actually attend the library in person which I liked.” Additionally, a community member pointed out the instructional capability of LibGO, writing “I think I learned more from the game than walking around in the library.” Negative Feedback (6 separate concerns). Unfavorable comments regarding LibGO primarily challenged the orientation’s characterization as a “game” in terms of its lack of fun. One graduate student wrote a comment representative of this concern by stating, “The game didn’t really seem like a game at all.” A particularly searing comment came from a university staff member who INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 14 wrote, “Calling this collection of web pages an ‘interactive game’ is a stretch, which is a generous way of stating it.” In-person Tour Preference (6 separate concerns). A small number of concerns indicated a preference for in-person orientations versus online. One undergraduate cited the ability to ask questions during an in-person tour as an advantage of that delivery medium. A graduate student mentioned their desire for kinesthetic learning over an online approach, writing, “I prefer hands - on exploration of the library.” LibGO Improvement Tip (23 separate concerns). Suggested improvements to LibGO were the largest area of qualitative feedback and produced five axial themes (subthemes); see table 5 for a breakdown of the five axial themes by group. 1. Design issues were the largest cited area of improvement, and the most commonly mentioned design problem was the inability of the user to go back to previously seen content. Although this functionality did in fact exist, it was apparently not intuitive to users; design modifications in future iterations are therefore critical. Other users made suggestions as to the color scheme used and the ability to magnify image sizes. 2. User experience was another area of feedback, and primarily included suggestions on how to make LibGO a more fun experience. One graduate student offered a role-playing game alternative. Another graduate student expressed an interest in a game with side missions, in addition to the overall goals, where tokens could be earned for completed missions; the student justified these changes by stating “I feel that incorporating these types of idea will make the game more enjoyable.” In suggesting similar improvements, one undergraduate stated that LibGO “felt more like a quiz than a game.” 3. Technology issues primarily addressed two related issues: images not loading and broken links. Images not loading could be dependent on many factors, including the user’s browser settings, internet traffic (volume) delaying load time, or broken image links, among others. Broken links could be the root issue since the images used in LibGO were taken from other areas of the library website. This method of gathering content pointed out a design vulnerability of using existing image locations (controlled by non-LibGO developers) rather than images exclusively for LibGO. 4. Content issues were raised exclusively by graduate students. One student felt that LibGO placed an emphasis on physical spaces in the library and did not give a deep enough treatment to library services. Another graduate student asked for “an interactive map to click on so that we physically see the areas” of the library, thus making the interaction more user-friendly with a visual. 5. Didn’t understand purpose is a subtheme where improvement is needed and is based on two comments made by the two university staff members. One wrote that “An online tour would have been better and just as informative,” although LibGO was not only designed to be an online tour of the library, but also an orientation of the library’s services. The other staff member wrote, “I read the rules but it was still unclear what the objective was.” In all, it is clear that LibGO’s purpose was confusing for some. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 15 Table 5. LibGO Improvement Tip Axial Codes by User Group Axial Code Undergraduate Graduate Faculty Staff Community member Total # concerns Design 4 3 0 0 1 8 User experience 1 2 1 0 1 5 Tech issue 0 1 0 1 0 2 Content 0 5 0 0 1 6 Didn’t understand purpose 0 0 0 2 0 2 Total: 5 11 1 3 3 23 Library Services Feedback (9 separate concerns). Several participants took the opportunity to provide feedback on general library services rather than on LibGO itself. Undergraduates simply gave general positive feedback about the value of the library, but many graduate students gave recommendations regarding specific electronic resource improvements. Additionally, one graduate student wrote, “I think it is critical to meet with new graduate students before they start their program,” something the library used to do but had not pursued in recent years. Although these comments did not directly pertain to LibGO, the authors accepted all of them as valuable feedback to the library. Library Building Feedback (11 separate concerns). This was another theme in which graduate students dominated the comments. Feedback ranging from requests for microwave use, additional study tables and better temperature control in the building appeared. Several participants asked for greater enforcement of quiet zones. Like the Library Services Feedback, the authors again took these comments as helpful to the overall library rather than LibGO. DISCUSSION The results of this study indicated that some groups of library visitors better received the gamified library orientation experience than other groups. Undergraduate students indicated the largest appreciation for a library orientation via LibGO. Specifically, they demonstrated a statistically significant difference over the other groups in supporting LibGO’s usefulness as an orientation tool, a preference for LibGO over other orientation formats, and a likelihood of future use of the physical library after playing LibGO. These very encouraging results provide evidence for the efficacy of alternative means of library orientation. The qualitative results provided additional helpful insight regarding the user impressions from each of the five surveyed groups. This feedback demonstrated that a variety of groups benefited from the experience of playing LibGO, including some community members who appreciated LibGO as a means of becoming acclimated to the library without having to enter the building. A virtual orientation format was not ideal for a few players who indicated a preference for a face-to- face orientation due to the ability to ask questions. Many people identified areas of improvement for LibGO. Graduate students in particular offered a disproportionate number of suggestions as compared to the other groups. While they provided a great deal of helpful feedback, it is possible that graduate students were so distracted by the perceived problems that they could not fully take in the experience or gain value from LibGO’s orientation purpose. It is also very likely that LibGO INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 16 simply was not very fun for these players: several players noted that it did not feel like a game but rather a collection of content. The review of literature indicated that this amusement issue is a common pitfall of educational games. Although the authors tried to design an enjoyable orientation experience, it is possible that more work is needed to satisfy user expectations. The mixed-methods design of this study was instrumental in providing a richer understanding of user perceptions. While the statistical analysis of participant survey responses was very helpful in identifying clear trends between groups, the qualitative analysis helped the authors draw valuable conclusions. Specifically, the open-response data demonstrated that additional groups such as graduate students and community members appreciated the experience of playing LibGO; this information was not readily apparent through the statistical analysis. Additionally, the qualitative analysis demonstrated that many groups had concerns regarding areas of improvement that may have impaired their user experience. These important findings could help guide future directions of the research. In all, the authors concluded this phase of the research feeling satisfied that LibGO showed great promise for library orientation delivery but could benefit from continued development and future user assessment. Although undergraduate students seemed most receptive overall to a virtual orientation experience, other groups appeared to have benefited from the resource. STUDY LIMITATIONS A primary limitation of this study was its small sample size. As the entire university campus was targeted for participation in the study, the number of respondents was far too small to generalize the results. Despite this limitation however, the study’s population reflected many different groups of library patrons on campus. The findings are therefore valuable as a means of stimulating future discussion regarding the value of alternative library orientation methods utilizing gamification. Another limitation is that the authors did not pre-assess the targeted groups for their prior knowledge of Walker Library services and building layout, nor for their interest in learning about these topics. It is possible that various groups did not see the value in learning about the library for a variety of reasons. Faculty members, in particular, may have considered their prior knowledge adequate for navigating the electronic holdings or building layout without recognizing the value of the other many services offered physically and electronically by the library. All groups may have experienced a level of “library anxiety” that prevented them from being motivated to learn more about the library.38 It is difficult to understand the range of covariate factors without a pre-assessment. Finally, there was qualitative evidence supporting the limitation that LibGO did not properly convey its stated purpose of orientation rather than imparting research skills. Without understanding LibGO’s focus on library orientation, users could have been confused or disappointed by the experience. Although care was taken to make this purpose explicit, some users indicated their confusion in the qualitative data. This observed problem points to a design flaw that undoubtedly had some bearing on the study’s results. CONCLUSION & FUTURE RESEARCH Convinced of the importance of the library orientation, the authors sought to move this traditional in-person experience to a virtual one. The quantitative results indicated that the gamified INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 17 orientation experience was useful to undergraduate students in its intended purpose of acclimating users to the library, as well as encouraging their future use of the physical library. At a time in which physical traffic to the library has shown a marked decline, new outreach strategies should be considered.39 The results were also helpful in showing that this particular iteration of the gamified orientation was preferred over other delivery methods by undergraduate students, as compared to other groups, to a statistically significant level. This is an important finding as it demonstrates that a diversified outreach strategy is necessary: different groups of library patrons desire their orientation information in different formats. The next logical question to ask however is: Why did the other groups examined through the statistical data analysis (graduate students and faculty) not appreciate the gamified orientation to the same level as undergraduates? The answers to this question are complicated and may be explained in part by the qualitative analysis. Based upon those findings, it is possible that the game did not appeal to these groups on the basis of fun or enjoyment; this concern was specifically mentioned by graduate students. Faculty members, including staff, provided a smaller level of qualitative feedback; it is therefore difficult to speculate as to their exact reasons for disengagement with LibGO. With this concern in mind, the authors would like to concentrate their next iteration of research on the specific library orientation needs of graduate students and faculty. Both groups present different, but critical, needs for outreach. Graduate students were the largest group of survey respondents, presumably indicating a high level of interest in learning more about the library. Many graduate programs at MTSU are delivered partially or entirely online; as a result, these students may be less likely to come to campus. Due to graduate students’ relatively infrequent visits to campus, a virtual library orientation could be even more meaningful for them in meeting their need for library services information. Faculty are another important group to target because if they lack a full understanding of the library’s offerings, they are unlikely to assign assignments that wholly utilize the library’s services. Although it is possible that faculty prefer an in-person orientation, many new faculty have indicated limited availability for such events. A virtual orientation seems conducive to busy schedules. However, it is possible that the issue is simply a matter of marketing: faculty may not know that a virtual option is available, nor do they necessarily understand all that the library has to offer. In all, future research should begin with a survey to understand what both groups already know about the library, as well as the library services they desire. Another necessary step in future research would be the expansion of the development team to include computer programmers. Although the authors feel that LibGO holds great promise as a virtual orientation tool, more needs to be done to enhance the user’s enjoyment of the experience. Twine is a user-friendly software that other librarians could pick up without having to be computer programmers; however, programmers (professional or student) could bring a design expertise to the project. Future iterations of this project should incorporate the skills of multiple groups, including expertise in libraries, user research, visual design, interaction design, programming, marketing, and testers from each type of intended audience. Collectively, this group will have the greatest impact on improving the user experience and ultimately the usefulness of a gamified orientation experience. This experience with gamification, and specifically interactive storytelling, was a valuable experience for Walker Library. These results should encourage other libraries seeking an alternate INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 18 delivery method for orientations. The authors hope to build upon the lessons learned from this mixed methods research study of LibGO to find the correct outreach medium for their range of library users. ACKNOWLEDGMENTS Special thanks to our beta playtesters and student assistants who worked the LibGO Event, which was funded, in part, by MT Engage and Walker Library at Middle Tennessee State University. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 19 APPENDIX A: SURVEY INSTRUMENT INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 20 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 21 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 22 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 23 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 24 ENDNOTES 1 Sandra Calemme McCarthy, “At Issue: Exploring Library Usage by Online Learners with Student Success,” Community College Enterprise 23, no. 2 (January 2017): 27–31; Angie Thorpe et al., “The Impact of the Academic Library on Student Success: Connecting the Dots,” Portal: Libraries and the Academy 16, no. 2 (2016): 373–92, https://doi.org/10.1353/pla.20160027. 2 Steven Ovadia, “How Does Tenure Status Impact Library Usage: A Study of LaGuardia Community College,” Journal of Academic Librarianship 35, no. 4 (January 2009): 332–40, https://doi.org/10.1016/j.acalib.2009.04.022. 3 Chris Leeder and Steven Lonn, “Faculty Usage of Library Tools in a Learning Management System,” College & Research Libraries, 75, no. 5 (September 2014): 641–63, https://doi.org/10.5860/crl.75.5.641. 4 Kyle Felker and Eric Phetteplace, “Gamification in Libraries: The State of the Art,” Reference and User Services Quarterly 54, no. 2 (2014): 19-23, https://doi.org/10.5860/rusq.54n2.19; Nancy O’Hanlon, Karen Diaz, and Fred Roecker, “A Game-Based Multimedia Approach to Library Orientation,” (paper, 35th National LOEX Library Instruction Conference, San Diego, May 2007), https://commons.emich.edu/loexconf2007/19/; Leila June Rod-Welch, “Let’s Get Oriented: Getting Intimate with the Library, Small Group Sessions for Library Orientation,” (paper, Association of College and Research Libraries Conference, Baltimore, March 2017), http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/LetsGetOriented.pdf. 5 Kelly Czarnecki, “Chapter 4: Digital Storytelling in Different Library Settings,” Library Technology Reports, no. 7 (2009): 20-30; Rebecca J. Morris, “Creating, Viewing, and Assessing: Fluid Roles of the Student Self in Digital Storytelling,” School Libraries Worldwide, no. 2 (2013): 54–68. 6 Sandra Marcus and Sheila Beck, “A Library Adventure: Comparing a Treasure Hunt with a Traditional Freshman Orientation Tour,” College & Research Libraries 64, no. 1 (January 2003): 23–44, https://doi.org/10.5860/crl.64.1.23. 7 Lori Oling and Michelle Mach, “Tour Trends in Academic ARL Libraries,” College & Research Libraries, 63, no. 1 (January 2002): 13-23, https://doi.org/10.5860/crl.63.1.13. 8 Kylie Bailin, Benjamin Jahre, and Sarah Morriss, “Planning Academic Library Orientations: Case Studies from Around the World,” (Oxford, UK: Chandos Publishing, 2018): xvi. 9 Bailin, Jahre, and Morriss, “Planning Academic Library Orientations.” 10 Marcus and Beck, “A Library Adventure”; A. Carolyn Miller, “The Round Robin Library Tour,” Journal of Academic Librarianship 6, no. 4 (1980): 215–18; Michael Simmons, “Evaluation of Library Tours,” EDRS, ED 331513 (1990): 1-24. 11 Marcus and Beck, “A Library Adventure”; Oling and Mach, “Tour Trends”; Rod-Welch, “Let’s Get Oriented.” https://doi.org/10.1353/pla.20160027 https://doi.org/10.1016/j.acalib.2009.04.022 https://doi.org/10.5860/crl.75.5.641 https://commons.emich.edu/loexconf2007/19/ http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/LetsGetOriented.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/LetsGetOriented.pdf https://doi.org/10.5860/crl.64.1.23 https://doi.org/10.5860/crl.63.1.13 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 25 12 Pixey Anne Mosley, “Assessing the Comfort Level Impact and Perceptual Value of Library Tours,” Research Strategies 15, no. 4 (1997): 261–70, https://doi.org/10.1016/S0734- 3310(97)90013-6. 13 Mosley, “Assessing the Comfort Level Impact and Perceptual Value of Library Tours.” 14 Marcus and Beck, “A Library Adventure,” 27. 15 Kenneth J. Burhanna, Tammy J. Eschedor Voelker, and Jule A. Gedeon, “Virtually the Same: Comparing the Effectiveness of Online Versus In-Person Library Tours,” Public Services Quarterly 4, no. 4(2008): 317–38, https://doi.org/10.1080/15228950802461616. 16 Burhanna, Voelker, and Gedeon, “Virtually the Same,” 326. 17 Burhanna, Voelker, and Gedeon, “Virtually the Same,” 329. 18 Felker and Phetteplace, “Gamification in Libraries.” 19 Felker and Phetteplace, “Gamification in Libraries,”20. 20 Felker and Phetteplace, “Gamification in Libraries.” 21 Felker and Phetteplace, “Gamification in Libraries”; O’Hanlon et al., “A Game-Based Multimedia Approach.” 22 Mary J. Broussard and Jessica Urick Oberlin, “Using Online Games to Fight Plagiarism: A Spoonful of Sugar Helps the Medicine Go Down,” Indiana Libraries 30, no. 1 (January 2011): 28–39. 23 Melissa Mallon, “Gaming and Gamification,” Public Services Quarterly 9, no. 3 (2013): 210–21, https://doi.org/10.1080/15228959.2013.815502. 24 J. Long, “Chapter 21: Gaming Library Instruction: Using Interactive Play to Promote Research as a Process,” Distributed Learning (January 1, 2017), 385–401, https://doi.org/10.1016/B978-0- 08-100598-9.00021-0. 25 Rod-Welch, “Let’s Get Oriented.” 26 O’Hanlon et al., “A Game-Based Multimedia Approach.” 27 Mallon, “Gaming and Gamification.” 28 Anna-Lise Smith and Lesli Baker, “Getting a Clue: Creating Student Detectives and Dragon Slayers in Your Library,” Reference Services Review 39, no. 4 (November 2011): 628–42, https://doi.org/10.1108/00907321111186659. 29 Monica Fusich et al., “HML-IQ: Frenso State’s Online Library Orientation Game,” College & Research Libraries News 72, no. 11 (December 2011): 626–30, https://doi.org/10.5860/crln.72.11.8667. https://doi.org/10.1016/S0734-3310(97)90013-6 https://doi.org/10.1016/S0734-3310(97)90013-6 https://doi.org/10.1080/15228950802461616 https://doi.org/10.1080/15228959.2013.815502 https://doi.org/10.1016/B978-0-08-100598-9.00021-0 https://doi.org/10.1016/B978-0-08-100598-9.00021-0 https://doi.org/10.1108/00907321111186659 https://doi.org/10.5860/crln.72.11.8667 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 26 30 Broussard and Oberlin, “Using Online Games”; Fusich et al., “HML-IQ”; O’Hanlon et al., “A Game- Based Multimedia Approach.” 31 Felker and Phetteplace, “Gamification in Libraries.” 32 Felker and Phetteplace, “Gamification in Libraries”; Fusich et al., “HML-IQ.” 33 “Design Thinking for Libraries: A Toolkit for Patron-Centered Design,” Ideo (2015), http://designthinkingforlibraries.com. 34 John W. Creswell and Vicki L. Plano Clark, Designing and Conducting Mixed Methods Research (Thousand Oaks, CA: Sage Publications, 2007). 35 Roger Kirk, “Practical Significance: A Concept Whose Time Has Come,” Educational and Psychological Measurement, no. 5 (1996). 36 Kirk, “Practical Significance.” 37 Sandra Mathison, “Encyclopedia of Evaluation,” SAGE, 2005, https://doi.org/10.4135/9781412950558. 38 Rod-Welch, “Let’s Get Oriented.” 39 Felker and Phetteplace, “Gamification in Libraries.” http://designthinkingforlibraries.com/ https://doi.org/10.4135/9781412950558 ABSTRACT INTRODUCTION Background Literature Review Traditional Orientation Gamification Use in Libraries Development of the Library Game Orientation (LibGO) Purpose of Study Methodology Overview Research Questions Participants Survey Instrument Data Collection Results Quantitative Findings Qualitative Findings Discussion of Qualitative Themes Discussion Study Limitations Conclusion & Future Research Acknowledgments Appendix A: Survey Instrument ENDNOTES
12211 ---- Likes, Comments, Views: A Content Analysis of Academic Library Instagram Posts ARTICLES Likes, Comments, Views A Content Analysis of Academic Library Instagram Posts Jylisa Doney, Olivia Wikle, and Jessica Martinez INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12211 Jylisa Doney (jylisadoney@uidaho.edu) is Social Sciences Librarian, University of Idaho. Olivia Wikle (omwikle@uidaho.edu) is Digital Initiatives Librarian, University of Idaho. Jessica Martinez (jessicamartinez@uidaho.edu) is Science Librarian, University of Idaho. © 2020. ABSTRACT This article presents a content analysis of academic library Instagram accounts at eleven land-grant universities. Previous research has examined personal, corporate, and university use of Instagram, but fewer studies have used this methodology to examine how academic libraries share content on this platform and the engagement generated by different categories of posts. Findings indicate that showcasing posts (highlighting library or campus resources) accounted for more than 50 percent of posts shared, while a much smaller percentage of posts reflected humanizing content (emphasizing warmth or humor) or crowdsourcing content (encouraging user feedback). Crowdsourcing posts generated the most likes on average, followed closely by orienting posts (situating the library within the campus community), while a larger proportion of crowdsourcing posts, compared to other post categories, included comments. The results of this study indicate that libraries should seek to create Instagram posts that include various types of content while also ensuring that the content shared reflects their unique campus contexts. By sharing a framework for analyzing library Instagram content, this article will provide libraries with the tools they need to more effectively identify the types of content their users respond to and enjoy as well as make their social media marketing on Instagram more impactful. INTRODUCTION Library use of social media has steadily increased over time; in 2013, 86 percent of libraries reported using social media to connect with their patron communities.1 The ways in which libraries use social media tend to vary, but common themes include marketing services, content, and spaces to patrons, as well as creating a sense of community.2 Even with this wealth of research, fewer studies have examined how libraries use Instagram, and those that do often utilize a formal or informal case study methodology.3 This research seeks to fill that gap by examining the types of content shared most frequently by a subset of academic library Instagram accounts. Although this research focused on academic libraries, its methods and findings could be leveraged by educational institutions and non-profits in their own investigations of Instagram usage and impact. LITERATURE REVIEW Since its inception in 2010, Instagram’s number of account holders has been steadily increasing. By 2019, more than one billion user accounts were active each month, making it the third most popular social media network in the world, and the Pew Research Center has reported that Instagram is the second most used social media platform among people ages 18-29 in the United States, after Facebook.4 Instagram has estimated that 90 percent of user accounts follow at least one business account.5 Previous research has also shown that individuals who use Instagram to follow specific brands have the highest rates of engagement with, and commitment to, those mailto:jylisadoney@uidaho.edu mailto:omwikle@uidaho.edu mailto:jessicamartinez@uidaho.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 2 brands when compared to users of other social media platforms.6 Though businesses are fundamentally different in the products or services they are trying to market, academic libraries share a desire to provide information to, and engage with, their followers. As such, in the past decade, libraries have begun to adopt Instagram as a way to market their libraries and interact with patrons.7 However, methods and parameters for libraries’ use of Instagram vary across types of libraries and even within specific library types.8 Research has demonstrated that academic libraries’ use of social media, including Instagram, is often for the purpose of increasing the sense of community among librarians and patrons by marketing the library’s services and encouraging student feedback and interaction.9 Similarly, Harrison et al. discovered that academic library social media posts reflected three main themes: “community connections, inviting environment, and provision of content.”10 Chatten and Roughley have also reported that libraries’ use of social media ranges from providing customer service to promoting the library and building a community of users.11 Indeed, when comparing modern social networking systems, such as Instagram, to older platforms, such as Myspace, Fernandez posited that today’s popular social media sites encourage networking and are especially suited to creating community.12 Ideally, community engagement in the virtual social media environment would encourage more patrons to enter the library and thus engage in more face-to-face encounters.13 Libraries’ methods for measuring the success of their social media engagement are as varied as the ways in which they use social media. Assessment of libraries’ social media efficacy is tricky, and highly variable from institution to institution. Hastings has cautioned that librarians should recognize that patrons both actively and passively interact with social media content.14 For this reason, while a large number of comments or likes may be identified as positive markers for active engagement, passive forms of engagement, such as the number of times a post appeared in users’ Instagram feeds, may also be relevant.15 Therefore, when librarians measure the success of an Instagram post by examining only the number of likes and comments, they should be aware that they are measuring a very specific type of engagement: one which, on its own, may not determine a post’s full reach or effectiveness. Other ways to measure engagement include monitoring how the number of people subscribed to an account changes over time, evaluating reach and impressions,16 or analyzing the content of comments (a type of qualitative measure that may indicate the type of community developing around the library’s social media). Despite, or perhaps because of, the general excitement surrounding the possibilities that libraries’ engagement with social media can produce, very little has been written about how different types of libraries (such as academic libraries, law libraries, public libraries, etc.), or libraries in general, use these platforms.17 Additionally, many librarians may lack expertise in marketing, including those who are managing social media accounts.18 As social media culture continues to evolve, librarians should move toward a more targeted and pragmatic approach to their Instagram practices. This refinement in social media practices may enable libraries to develop more structure, so that they may create and share the type of content that would achieve their desired result at a given time. However, in order to develop this kind of measured approach, it is necessary for researchers to first analyze libraries’ current Instagram practices to determine how posts are being used and the outcomes they generate. One effective method of analyzing Instagram content centers on coding and classifying images. While many such schemas have been developed for analyzing images posted by Instagram users and businesses, transferring these schemas to academic contexts has been difficult. 19 To address INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 3 this gap, Stuart et al. adapted a schema that had been used to examine how “news media [and] non-profits,” as well as businesses, used Instagram.20 This new schema allowed Stuart et al. to classify Instagram posts produced by academic institutions in the UK and measure the effect of these universities’ attempts to engage with students via Instagram.21 Stuart et al.’s schema, which classified Instagram images into six categories (orienting, humanizing, interacting, placemaking, showcasing, and crowdsourcing), was the basis for the present study.22 METHODS Research Questions The impetus for this study was to learn more about how academic libraries use Instagram to connect with their campus communities and promote their services and events. The authors of the present study adapted the research questions posed by Stuart et al. to reflect academic library contexts:23 • RQ1: Which type of post category is used most frequently by libraries on Instagram? • RQ2: Is the number of likes or the existence of comments related to the post category? Identifying a Sample Population This study investigated a small subset of academic institutions: the University of Idaho’s sixteen peer institutions. These peers have similar “student profiles, enrollment characteristics, research expenditures, [or] academic disciplines and degrees”; each is designated as a land-grant institution; and the University of Idaho considers three to be “aspirational peers.”24 After selecting this population, the authors investigated the library websites of each of the sixteen peer institutions to determine whether or not they had a library-specific Instagram account. When a link was not available on the library websites, the authors conducted a search within Instagram as well as a general Google search in an attempt to identify these Instagram accounts. Of the University of Idaho’s sixteen peer institutions, eleven had active, library-specific Instagram accounts. Data Collection The authors undertook manual data collection between November and December 2018 for these eleven library Instagram accounts. Initial information about each Instagram account was gathered prior to the study on October 23, 2018: the date of the first post, the total number of posts shared by the account, the total number of followers, and the total number of accounts followed. For each account, the authors identified posts shared from January 1, 2018, to June 30, 2018. The “print to PDF” function available in the Chrome browser was used to preserve a record of the content, in case the accounts were later discontinued while research was underway. If a post included more than one image, only the first image was captured in the PDF and analyzed. To organize the 3 77 Instagram posts shared within this timeframe, the authors assigned each institution a unique, five- digit identifier; file names included this identifier as well as the date of the post (e.g. , 00004_IGpost_20180423). This file naming convention ensured that posts were separated based on institution and that future studies could use the same file naming convention, even if the sample size increased significantly. The authors added the file names of all 377 Instagram posts to a shared Google Sheet, and for each post they reported the kind of post (photo or video), the number of likes, and whether comments existed. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 4 Research Data Analysis Content Analysis This project adapted the coding schema Stuart et al. employed to investigate the ways in which UK universities used Instagram.25 Expanding on research by McNely, Stuart et al. employed six Instagram post categories: orienting, humanizing, interacting, placemaking, showcasing, and crowdsourcing.26 For the purposes of the present study, the authors used the same category names when coding library Instagram posts. However, they updated and adapted the descriptions of each category over the course of two rounds of coding to better reflect academic library contexts (see table 1). Within this coding schema, the authors elected to apply only a single category name (i.e., a code) to each library Instagram post. Interrater Reliability During the first round of coding, the authors selected two or three institutions every month, independently coded the posts based on the initial adapted schema, met to discuss discrepancies, and identified the final code based on consensus.27 However, during these discussions, it became evident that there was substantial disagreement concerning how specific categories were interpreted. To examine the impact of this disagreement, the authors calculated Fleiss’ kappa, which can be used to assess interrater reliability when two or more coders categorically evaluate data.28 Although this project’s Fleiss’ kappa (0.683554901) was relatively close to a score of 1.0, demonstrating moderate agreement between each of the three coders, the authors recognized that additional fine-tuning of the adapted coding schema would allow for a more accurate representation of the types of content shared by academic libraries. After updating the schema (table 1), a small sample of collected Instagram posts (20 percent, or 76 posts) was randomly selected for independent recoding by each of the authors. Again, after coding this random sample individually, the authors met to seek consensus. Anecdotal feedback from the coders, as well as an increase in the project’s Fleiss’ kappa (0.795494117), demonstrated that the updated coding schema was more robust and representative. Based on this evidence, the authors randomly distributed the remaining 301 posts amongst themselves; each post was coded by one author. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 5 Table 1. Coding Schema for Library Instagram Posts [Adapted from: Emma Stuart, David Stuart, and Mike Thelwall, “An Investigation of the Online Presence of UK Universities on Instagram,” Online Information Review 41, no. 5 (2017): 588, https://doi.org/10.1108/OIR-02-2016-0057.] Category Description Example1 Crowdsourcing Posts that were created with the intention of generating feedback within the platform. If the content of the post itself fits within a different classification category, but the image is accompanied by text that explicitly asks for viewer feedback, then the post should be classified as crowdsourcing. Includes requests for followers to like, comment on, or tag others in a particular post. Humanizing Posts that aim to emphasize human character or elements of warmth, humor, or amusement. This includes historic/archival photos used to convey these sentiments. This code is only used if both the text and the photo or video can be categorized as humanizing because many library posts contain a “humanizing” element. 1 Sample images from the University of Idaho Library’s Instagram account. https://doi.org/10.1108/OIR-02-2016-0057 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 6 Category Description Example1 Interacting Posts with candid photographs or videos at library and library- associated events. Includes events within or outside the library. Orienting Posts that situate the library within its larger community, especially regarding locations, artifacts, or identities. Text often includes geographic information. Placemaking Posts that capture the atmosphere of the library through its physical space and attributes. Includes permanent murals, statues, etc. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 7 Category Description Example1 Showcasing Posts that highlight library or campus resources, services, or future events. Can include current or on-going events if people are not the focus of the image (e.g. exhibit, highlight of collection, etc.). These posts can also present information about library operations, such as hours and fundraising. Posts can also entice their audience to do something, outside of Instagram, such as visit a specific website. RESULTS General Data about the Library Instagram Accounts As of October 23, 2018 (the date this initial information was gathered), the eleven academic library Instagram accounts had shared a combined 3,124 posts. Most libraries created their Instagram accounts and started posting between 2013 and 2016, but one library shared a post in 2012 and one created their account in April 2018. Since the date of their first post, each account had shared 284 posts on average, while the actual number of posts shared across accounts ranged from 62 to 520. The number of followers and accounts followed across these eleven accounts ranged from 115 to 1,390 and 65 to 2,717, respectively. Between January 1, 2018 , and June 30, 2018, these eleven library Instagram accounts shared a total of 377 posts. The number of posts shared by each account during this time period ranged from four to 57, with an average of 34 posts. RQ1: Which Type of Post Category is Used most Frequently by Libraries on Instagram? Of the 377 posts analyzed, 359 included photos and 18 included videos. More than 50 percent of posts shared were coded as showcasing, with humanizing (18 percent) and crowdsourcing (9.8 percent) being the next most common categories (see table 2), although data demonstrated that individual libraries differed in their use of specific post categories (see table 3). When examining frequency based on category of post, the authors identified slight differences between video and photo posts. As with photos, the majority of videos (55.6 percent) were still coded as showcasing; however, the second most common post category for videos was interacting (16.7 percent). INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 8 Table 2. Number and Percentage of Posts by Category for Posts with Photos or Videos Category Number of Posts Percentage of Posts Crowdsourcing 38 10.1% Humanizing 68 18.0% Interacting 16 4.2% Orienting 28 7.4% Placemaking 33 8.8% Showcasing 194 51.5% Total 377 100% Table 3. Percentage of Posts by Category and Library for Posts with Photos or Videos Library Crowdsourcing Humanizing Interacting Orienting Placemaking Showcasing Lib 1 7.7% 15.4% 0% 23.1% 30.8% 23.1% Lib 2 4.2% 50.0% 0% 4.2% 0% 41.7% Lib 3 56.1% 10.5% 1.8% 3.5% 7.0% 21.1% Lib 4 0% 4.1% 4.1% 4.1% 2.0% 85.7% Lib 5 0% 24.4% 2.2% 20.0% 26.7% 26.7% Lib 6 7.5% 18.9% 3.8% 11.3% 11.3% 47.2% Lib 7 0% 20.0% 0% 0% 10.0% 70.0% Lib 8 0% 21.6% 9.8% 5.9% 0% 62.7% Lib 9 0% 25.0% 25.0% 0% 0% 50.0% Lib 10 0% 16.1% 6.5% 0% 9.7% 67.7% Lib 11 0% 15.0% 5.0% 5.0% 5.0% 70.0% RQ2: Is the Number of Likes or the Existence of Comments Related to the Post Category? Number of Likes by Category The results of the coding process also indicated that the number of likes differed based on the category of post. When examining photo posts, the authors noted that every post received at least five likes, with most posts receiving between 20-39 likes (see table 4). On average, crowdsourcing INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 9 photo posts generated the highest average number of likes across all categories, followed by orienting and placemaking posts (see table 5). However, it is important to recognize that crowdsourcing posts often asked visitors to participate in a post by “liking” it, often with the chance to win a library-sponsored contest, which may partially explain the higher average number of likes. Table 4. Number of Posts by Category and Range of Likes for Posts with Photos (does not include posts with videos) Range of Likes Category 5-19 20-39 40-59 60-79 80-99 100- 119 120- 140 Crowdsourcing 0 11 16 6 1 1 1 Humanizing 16 26 10 9 5 0 1 Interacting 5 5 3 0 0 0 0 Orienting 2 7 9 8 0 1 0 Placemaking 3 10 12 3 2 1 1 Showcasing 67 83 27 5 1 0 1 Total 93 142 77 31 9 3 4 Table 5. Average Number of Likes by Category for Posts with Photos (does not include posts with videos) Category Average Number of Likes Number of Posts Crowdsourcing 53.6 36 Humanizing 39.9 67 Interacting 27.8 13 Orienting 50.0 27 Placemaking 46.9 32 Showcasing 27.6 184 Existence of Comments by Category The authors also examined the existence of comments, another metric for engagement with Instagram posts. Data demonstrated that 78.9 percent of crowdsourcing posts included INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 10 comments, while a much lower percentage of placemaking (30.3 percent), orienting (28.6 percent), and humanizing (26.5 percent) posts generated this type of engagement (see table 6). As with the data on the number of “likes,” many crowdsourcing posts encouraged visitors to comment on a particular post, at times with an incentive connected to this type of engagement. Table 6. Presence of Comments by Category for Posts with Photos or Videos Category Number of Posts with Comments Number of Posts without Comments Total Number of Posts Percentage of Posts with Comments Crowdsourcing 30 8 38 78.9% Humanizing 18 50 68 26.5% Interacting 3 13 16 18.8% Orienting 8 20 28 28.6% Placemaking 10 23 33 30.3% Showcasing 40 154 194 20.6% Total 109 268 377 28.9% DISCUSSION As noted previously, the post category used most frequently by these eleven libraries on Instagram was showcasing (51.5 percent). The fact that libraries were more likely to share this type of content—which highlighted library resources, events, or collections—is understandable, as library promotion is one of the foundational reasons libraries spend the time and effort required to maintain social media accounts.29 This finding differs substantially from previous research with UK universities, which classified only 28.8 percent of posts as showcasing.30 When examining other post categories, it also became clear that UK universities shared humanizing posts more frequently (31 percent) than the eleven libraries (18 percent) included in this study.31 Although the results of this study demonstrated that showcasing posts were shared most often, the data also indicates that showcasing posts were neither the category with the most likes on average nor the category that received comments most often. Crowdsourcing posts were the category with the highest average number of likes (53.6) with orienting posts coming in at a close second (50), followed by placemaking (46.9) and humanizing (39.9) posts. Showcasing posts, along with interacting posts, only generated slightly more than half the number of likes on average, when compared to the other categories (27.6 and 27.8, respectively). The category with the largest proportion of comments was crowdsourcing posts, with 78.9 percent of posts in this category generating comments from visitors. However, this result is likely skewed, as one of the library Instagram accounts had exceptionally successful crowdsourcing posts, which often included a giveaway or other incentive for participation. In fact, when this institution was removed from the data set, only six crowdsourcing posts remained, two of which generated INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 11 comments. To better determine whether crowdsourcing posts are always this effective at generating engagement, it would be necessary to code a larger sample of Instagram pos ts. It is clear that while showcasing posts were the most common among the Instagram accounts analyzed, they also received the lowest number of likes, on average, and generated comments less frequently than all but one post category. While this may seem disheartening, it is important to remember that the showcasing category includes informational posts that convey library hours, services, or closures; this information that may be effectively relayed to users without necessitating an active response in the form of likes and comments. Therefore, one might use different criteria to determine the success of showcasing posts, perhaps examining Instagram data related to reach (the total number of unique visitors that view a post) and impressions (the total number of times a post is viewed).32 Data on reach and impressions are only available to Instagram account “owners.” In the current study, the authors did not quantify these types of engagement as their goal was to evaluate the content and metrics available to all Instagram users, rather than the data that was only available to the “owners” of these library Instagram accounts. In addition to answering the research questions, coding these Instagram posts prompted several new questions regarding the types of information libraries and other institutions share online. One such question includes: With both universities and academic libraries working with students, why did academic libraries share a smaller percentage of interacting posts than UK universities? 33 Additional research is needed to answer this question, but anecdotally, this difference may be related to the fact that universities, as a whole, have a larger number of opportunities to promote and share instances of interaction via Instagram than libraries. For example, general university Instagram accounts often include photos of students and affiliates interacting at large scale events such as sports games, musical performances, and other student gatherings that take place across campus. Library-specific accounts on the other hand, have fewer opportunities to post photos that capture individuals “interacting” candidly. Further, the fact that libraries tend to be proponents of privacy rights may inhibit library staff from taking photos of their users and sharing them online without first getting permission. Therefore, differences related to the number of events and the organization type may contribute to whether or not universities and libraries share interacting posts; more research is needed to examine this hypothesis. Another issue that arose during coding was that, if not for their inclusion of a request to comment, many crowdsourcing posts could have been classified under other categories. If an account follower looked only at the photos included in many of the crowdsourcing posts without reading the captions, they may not interpret those posts as crowdsourcing. Therefore, a future research project might examine whether applying secondary categories to crowdsourcing posts, as a means of further classifying images and not just their captions, could generate a more comprehensive picture of what libraries are sharing on their Instagram accounts. The authors also discovered that a majority of the library Instagram posts included in this sample contained humanizing elements. Almost all posts attempted to convey warmth, humor, or assistance, and therefore had the potential to be classified as humanizing. To successfully adapt Stuart et al.’s coding schema for academic library Instagram accounts, the authors specified that a post had to have both a humanizing caption as well as a humanizing photo to be coded as such.34 As with crowdsourcing posts, adding secondary categories to humanizing posts could better reflect the dual nature of this content and help future coders more accurately interpret the types of content shared by academic libraries. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 12 LIMITATIONS AND FUTURE RESEARCH The number of library Instagram accounts selected as well as the use of a six-month timeframe were limitations of the current study. In the future, selecting a larger sample size and a different group of academic libraries would serve to advance the discipline’s understanding of the types of content shared by academic libraries and how users interact with these Instagram posts. Additionally, collecting Instagram posts shared during an expanded timeframe could allow researchers to explore whether library Instagram accounts consistently share the same types of content at various points throughout the year. As mentioned in the Discussion section, future research could also include adding secondary categories to posts, which would allow researchers to gather more granular information about the types of content shared and the relationships between post category, comments, and likes. Lastly, to better understand the post categories that generate the greatest engagement, collaborative research between institutions could allow researchers to gather and analyze metrics that are only available to account owners, such as impressions and reach. With this type of collaboration, researchers could also investigate how social media outreach goals influence the types of content shared on library Instagram accounts. For example, researchers could conduct interviews or surveys with libraries and ask questions such as: what does your library hope to accomplish with its Instagram account, who are you attempting to reach, how do you define a successful post, what metrics do you use to evaluate your Instagram presence, and do your social media outreach goals influence the types of content shared on Instagram? Pursuing these types of questions, in addition to examining the actual content shared, would allow researchers to gain a more complete picture of what a successful social media presence looks like for an academic library. CONCLUSION This research provides initial insight into the Instagram presence of a subset of academic libraries at land-grant institutions in the United States. Expanding on the research of Stuart et al., this project used an adapted coding schema to document and analyze the content and efficacy of academic libraries’ Instagram posts.35 The results of this study suggest that social media accounts, including those used by academic libraries, perform better when they reflect the community the library inhabits by highlighting content that is unique to their particular constituents, rather than simply functioning as another platform through which to share information. This study’s findings also demonstrate that academic libraries should strive to create an Instagram presence that encompasses a variety of post categories to ensure that their online information sharing meets various needs. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 13 ENDNOTES 1 Nancy Dowd, “Social Media: Libraries are Posting, but is Anyone Listening?,” Library Journal 138, no. 10 (May 7, 2013), 12, https://www.libraryjournal.com/?detailStory=social-media-libraries-are- posting-but-is-anyone-listening. 2 Marshall Breeding, Next-Gen Library Catalogs (London: Facet Publishing, 2010); Zelda Chatten and Sarah Roughley, “Developing Social Media to Engage and Connect at the University of Liverpool Library,” New Review of Academic Librarianship 22, no. 2/3 (2016), https://doi.org/10.1080/13614533.2016.1152985; Amanda Harrison et al., “Social Media Use in Academic Libraries: A Phenomenological Study,” The Journal of Academic Librarianship 43, no. 3 (2017), https://doi.org/10.1016/j.acalib.2017.02.014; Nicole Tekulve and Katy Kelly, “Worth 1,000 Words: Using Instagram to Engage Library Users,” Brick and Click Libraries Symposium, Maryville, MO (2013), https://ecommons.udayton.edu/roesch_fac/20; Evgenia Vassilakaki and Emmanouel Garoufallou, “The Impact of Twitter on Libraries: A Critical Review of the Literature,” The Electronic Library 33, no. 4 (2015), https://doi.org/10.1108/EL- 03-2014-0051. 3 Yeni Budi Rachman, Hana Mutiarani, and Dinda Ayunindia Putri, “Content Analysis of Indonesian Academic Libraries’ Use of Instagram,” Webology 15, no. 2 (2018), http://www.webology.org/2018/v15n2/a170.pdf; Catherine Fonseca, “The Insta-Story: A New Frontier for Marking and Engagement at the Sonoma State University Library,” Reference & User Services Quarterly 58, no. 4 (2019), https://www.journals.ala.org/index.php/rusq/article/view/7148; Kjersten L. Hild, “Outreach and Engagement through Instagram: Experiences with the Herman B Wells Library Account,” Indiana Libraries 33, no. 2 (2014), https://journals.iupui.edu/index.php/IndianaLibraries/article/view/16633; Julie Lê, “#Fashionlibrarianship: A Case Study on the Use of Instagram in a Specialized Museum Library Collection,” Art Documentation: Bulletin of the Art Libraries Society of North America 38, no. 2 (2019), https://doi.org/10.1086/705737; Danielle Salomon, “Moving on from Facebook: Using Instagram to Connect with Undergraduates and Engage in Teaching and Learning,” College & Research Libraries News 74, no. 8 (2013), https://doi.org/10.5860/crln.74.8.8991. 4 “Our Story,” Instagram, https://business.instagram.com/; Chloe West, “17 Instagram Stats Marketers Need to Know for 2019,” Sprout Blog, April 22, 2019, https://web.archive.org/web/20191219192653/https://sproutsocial.com/insights/instagra m-stats/; Pew Research Center, “Social Media Fact Sheet,” last modified June 12, 2019, http://www.pewinternet.org/fact-sheet/social-media/. 5 “Our Story,” Instagram. 6 Joe Phua, Seunga Venus Jin, and Jihoon Jay Kim, “Gratifications of Using Facebook, Twitter, Instagram, or Snapchat to Follow Brands: The Moderating Effect of Social Comparison, Trust, Tie Strength, and Network Homophily on Brand Identification, Brand Engagement, Brand Commitment, and Membership Intention,” Telematics and Informatics 34, no. 1 (2017), https://doi.org/10.1016/j.tele.2016.06.004. https://www.libraryjournal.com/?detailStory=social-media-libraries-are-posting-but-is-anyone-listening https://www.libraryjournal.com/?detailStory=social-media-libraries-are-posting-but-is-anyone-listening https://doi.org/10.1080/13614533.2016.1152985 https://doi.org/10.1016/j.acalib.2017.02.014 https://ecommons.udayton.edu/roesch_fac/20 https://doi.org/10.1108/EL-03-2014-0051 https://doi.org/10.1108/EL-03-2014-0051 http://www.webology.org/2018/v15n2/a170.pdf https://www.journals.ala.org/index.php/rusq/article/view/7148 https://journals.iupui.edu/index.php/IndianaLibraries/article/view/16633 https://doi.org/10.1086/705737 https://doi.org/10.5860/crln.74.8.8991 https://business.instagram.com/ https://web.archive.org/web/20191219192653/https:/sproutsocial.com/insights/instagram-stats/ https://web.archive.org/web/20191219192653/https:/sproutsocial.com/insights/instagram-stats/ http://www.pewinternet.org/fact-sheet/social-media/ https://doi.org/10.1016/j.tele.2016.06.004 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 14 7 Fonseca, “The Insta-Story;” Hild, “Outreach and Engagement;” Lê, “#Fashionlibrarianship;” Rachman, Mutiarani, and Putri, “Content Analysis;” Salomon, “Moving on from Facebook;” Tekulve and Kelly, “Worth 1,000 Words.” 8 Vassilakaki and Garoufallou, “The Impact of Twitter.” 9 Breeding, Next-Gen Library Catalogs; Hild, “Outreach and Engagement;” Rachman, Mutiarani, and Putri, “Content Analysis;” Vassilakaki and Garoufallou, “The Impact of Twitter.” 10 Harrison, Burress, Velasquez, Schreiner, “Social Media Use,” 253. 11 Chatten and Roughley, “Developing Social Media.” 12 Peter Fernandez, “‘Through the Looking Glass: Envisioning New Library Technologies’ Social Media Trends that Inform Emerging Technologies,” Library Hi Tech News 33, no. 2 (2016), https://doi.org/10.1108/LHTN-01-2016-0004. 13 Robin M. Hastings, Microblogging and Lifestreaming in Libraries (New York: Neal-Schumann Publishers, 2010). 14 Hastings, Microblogging. 15 Robert David Jenkins, “How Are U.S. Startups Using Instagram? An Application of Taylor's Six- Segment Message Strategy Wheel and Analysis of Image Features, Functions, and Appeals” (MA thesis, Brigham Young University, 2018), https://scholarsarchive.byu.edu/etd/6721. 16 Lucy Hitz, “Instagram Impressions, Reach, and Other Metrics you Might be Confused About,” Sprout Blog, January 22, 2020, https://sproutsocial.com/insights/instagram-impressions/. 17 Vassilakaki and Garoufallou, “The Impact of Twitter.” 18 Mark Aaron Polger and Karen Okamoto, “Who’s Spinning the Library? Responsibilities of Academic Librarians who Promote,” Library Management 34, no. 3 (2013), https://doi.org/10.1108/01435121311310914. 19 Yuhen Hu, Lydia Manikonda, and Subbarao Kambhampati, “What We Instagram: A First Analysis of Instagram Photo Content and User Types,” Eighth International AAAI Conference on Weblogs and Social Media (2014), https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/viewPaper/8118; Jenkins, “How Are U.S. Startups Using Instagram?;” Brian J. McNely, “Shaping Organizational Image- Power Through Images: Case Histories of Instagram,” Proceedings of the 2012 IEEE International Professional Communication Conference, Piscataway, NJ (2012), https://doi.org/10.1109/IPCC.2012.6408624; Emma Stuart, David Stuart, and Mike Thelwall, “An Investigation of the Online Presence of UK Universities on Instagram,” Online Information Review 41, no. 5 (2017): 584, https://doi.org/10.1108/OIR-02-2016-0057. 20 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence;” McNely, “Shaping Organizational Image-Power,” 3. https://doi.org/10.1108/LHTN-01-2016-0004 https://scholarsarchive.byu.edu/etd/6721 https://sproutsocial.com/insights/instagram-impressions/ https://doi.org/10.1108/01435121311310914 https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/viewPaper/8118 https://doi.org/10.1109/IPCC.2012.6408624 https://doi.org/10.1108/OIR-02-2016-0057 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 15 21 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence.” 22 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 588. 23 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 585. 24 “University of Idaho’s peer institutions,” University of Idaho, accessed October 8, 2019. 25 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 588. 26 McNely, “Shaping Organizational Image-Power,” 4; Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 588. 27 Johnny Saldaña, The Coding Manual for Qualitative Researchers (Los Angeles: Sage Publications, 2013), 27. 28 “Fleiss’ Kappa,” Wikipedia, https://en.wikipedia.org/wiki/Fleiss%27_kappa. 29 Chatten and Roughley, “Developing Social Media.” 30 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 590. 31 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 590. 32 Hitz, “Instagram Impressions, Reach, and Other Metrics.” 33 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 590. 34 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 588. 35 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence.” https://en.wikipedia.org/wiki/Fleiss%27_kappa ABSTRACT INTRODUCTION LITERATURE REVIEW METHODS Research Questions Identifying a Sample Population Data Collection Research Data Analysis Content Analysis Interrater Reliability RESULTS General Data about the Library Instagram Accounts RQ1: Which Type of Post Category is Used most Frequently by Libraries on Instagram? RQ2: Is the Number of Likes or the Existence of Comments Related to the Post Category? Number of Likes by Category Existence of Comments by Category DISCUSSION LIMITATIONS AND FUTURE RESEARCH CONCLUSION ENDNOTES
12219 ---- Analytics and Privacy: Using Matomo in EBSCO’s Discovery Service ARTICLES Analytics and Privacy Using Matomo in EBSCO’s Discovery Service Denise FitzGerald Quintel and Robert Wilson INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12219 Denise FitzGerald Quintel (denise.quintel@mtsu.edu) is Discovery Services Librarian and Assistant Professor, Middle Tennessee State University. Robert Wilson (robert.wilson@mtsu.edu) is Systems Librarian and Assistant Professor, Middle Tennessee State University. © 2020. ABSTRACT When selecting a web analytics tool, academic libraries have traditionally turned to Google Analytics for data collection to gain insights into the usage of their web properties. As the valuable field of data analytics continues to grow, concerns about user privacy rise as well, especially when discussing a technology giant like Google. In this article, the authors explore the feasibility of using Matomo, a free and open-source software application, for web analytics in their library’s discovery layer. Matomo is a web analytics platform designed around user-privacy assurances. This article details the installation process, makes comparisons between Matomo and Google Analytics, and describes how an open-source analytics platform works within a library-specific application, EBSCO’s Discovery Service. INTRODUCTION In their 2016 article from The Serials Librarian, Adam Chandler and Melissa Wallace summarized concerns with Google Analytics (GA) by reinforcing how “reader privacy is one of the core tenets of librarianship.”1 For that reason alone, Chandler and Wallace worked to implement and test Piwik (now known as Matomo) on the library sites at Cornell University. Taking a cue from Chandler and Wallace, the authors of this paper sought out an analytics solution that was robust and private, that could easily work within their discovery interface, and provide the same data as their current analytics and discovery service implementation. This paper will expand on some of the concerns from the 2016 Wallace and Chandler article, make comparisons, and provide installation details for other libraries. Libraries typically use GA to support data-informed decisions or build discussions on how users interact with library websites. The goal of this pilot project was to determine the similarities between Google Analytics and Matomo, how viable Matomo might be as a Google Analytics replacement, and seek to bring awareness to privacy concerns in the library. Matomo could easily be installed on multiple websites. However, this project looked into a specific instance of monitoring, that of the library’s discovery layer, EBSCO Discovery Service (EDS). LITERATURE REVIEW Google Analytics The 2005 release of Google Analytics was a massive boon to libraries who long searched for an easy to implement and budget-friendly tool for analytics. Shortly after its release, academic libraries were quick to adopt the platform and install its JavaScript code into their library web pages.2 In a little over a decade, there have been nearly forty scholarly articles published that discuss the ways in which Google Analytics is used for libraries’ websites. Articles that not only mailto:denise.quintel@mtsu.edu mailto:robert.wilson@mtsu.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 2 introduced the service, but also discuss the various ways libraries utilize the platform.3 In fact, in their survey of 279 libraries, O’Brien et al.’s 2018 research found that 88 percent of libraries surveyed had implemented Google Analytics or Google Tag Manager.4 In contrast, during that same period, authors found Matomo, or its earlier name, Piwik, discussed in a total of five scholarly articles, with only three libraries who wrote about using it as a web analytics tool.5 In addition to measuring website use, libraries found that Google Analytics allowed for several different assessments. In using Google Analytics, libraries could provide immediate feedback for projects, indicate website design change possibilities, create key performance indicators, and determine research paths and user behaviors.6 Convenience of implementation and use, minimal cost, and a user-friendly interface were all reasons cited for the widespread and fast adoption.7 Although the early literature covers a lot of ground about the reporting possibilities and the coverage of Google Analytics, there is rarely a mention of user privacy. Early articles that mention privacy provide a cursory discussion, reiterating that the data collected by Google is anonymous and therefore, protects the privacy of the user. Recently, there has been a shift in literature, with articles that now provide more in-depth discussions about user privacy and the concerns libraries have with third parties that collect and host user data. O’Brien et. al discussed the problematic ways that libraries adopted and implemented GA, by either overlooking front-facing policies or implementing it without the consent of their users.8 In their webometrics study, O’Brien et. al found that very few libraries (1 percent) had implemented HTTPS with the GA tracking code, only 14 percent had used IP anonymization, and not a single site utilized both features.9 The concern is not solely Google’s control of the data, but in Google’s involvement with third-party trackers. Third parties, as Pekala remarks, are rarely held accountable.10 With an advertisement revenue of $134 billion in 2019, representing 84 percent of its total revenue, it is important to remember that Google is an advertising company.11 Google's search engine monetization transformed it into one of the world's most recognizable brands. As the most visited site in the world, Google is firmly committed to security, especially when it comes to data theft. Google offers protection from unwanted access into user accounts, even providing ways for high-risk users, such as journalists or political campaigns, to purchase additional security keys for advanced protection.12 But while Google keeps data breaches and hackers at bay, the user data that Google collects and stores for advertising revenue tells a different story. Goo gle stores user data for months on end; only after nine months is advertisement data anonymized by removing parts of IP addresses. Then, after 18 months, Google will finally delete stored cookie information.13 Recent surveys are reporting an increase in users who want to know how companies are collecting information to provide data-driven services. In a 2019 Pew Research Survey, 62 percent of respondents believe it is impossible to go through their daily lives untracked by companies. Additionally, even with the ease that certain data-driven services bring, “81 [percent] of the public reported that the potential risks they face because of data collection by companies outweigh the benefits.”14 CISCO Technologies, in a 2019 personal data survey, found a segment of the population (32 percent) that not only cares about data privacy and wants control of their data, but has also taken steps to switch providers, or companies, based on their data policies. 15 Additionally, in Pew Research Survey results published as recently as April 2020, Andrew Perrin reports that an even larger number of U.S. adults (52 percent) are now choosing to not use products or services INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 3 out of concerns for their privacy and the personal information companies collect. 16 With a growing population of users who make inquiries about who, or what, is in control of their data, a web analytics tool that can easily answer those questions might serve libraries, and their users, well. COMPARISONS Google Analytics had been the library’s only web analytics tool until the start of the pilot project. During the pilot period, the authors simultaneously ran both analytics tools. Once Matomo was installed the authors found several similarities between the two products, and discovered that nearly identical analyses could occur, given the quality and quantity of the data collected. The pilot study focused only on one analytics project, which would be the library’s discovery layer— EBSCO’s Discovery Service. Authors worked with their dedicated EBSCO engineer to replicate the Google Analytics EDS widget, and have it configured to send output to Matomo instead. In making comparisons, one of the common statements about GA and Matomo, is that the numbers will never be exact matches. Oftentimes with much higher counts presented in GA than in Matomo. Several forums and blogs, even Matomo themselves, admit that there are several possible reasons why there is a noticeable difference between the two.17 Those involved in the discussion theorize that this is due to GA spam hits, bot hits, and Matomo’s ability for users to limit tracking. Beyond the counts, both products measure the same kinds of metrics for websites.18 For this project, the authors only wanted to look at specific metrics within EDS, those measurements that look more closely at the user, rather than the larger aggregate data. For the sake of the analysis, it is important to note that although both products have several great features; this is a specific situation where the researchers use certain features in terms of analytics. The analytics we collect for EDS strive to answer specific questions: • Are users searching for known item or exploratory searches? How often? • Are users utilizing the facets and limiters? How often? Although you can use both products to count page views or set events for your website, when looking at meaningful metrics for our discovery system, we focus more on the user level. In Google Analytics, the best way to capture these is by going through the User Explorer tool, which breaks up a user journey into search terms, events, and actions that occur during sessions. In the same way, Matomo provides anonymized user profiles that include search terms, events, and actions in its Visits Log report. In GA, you can export this User Explorer data in JSON format, but only at one user at a time, as seen in figure 1. This restriction also means you cannot see data from multiple users, with those details, on a single page. To contrast, in Matomo’s Visits Log, you can export the same data (search terms, events, actions) from multiple users in CSV, XML, PHP, TSV, JSON, or HTML formats. As seen in figure 2, Matomo offers a snapshot of this data in an easy-to-read single page, versus Google’s one user at a time option which requires clicking through to see a user report. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 4 Figure 1. Screenshot of the Google Analytics User Explorer Tool Figure 2. Screenshot of the Matomo Visits Log Report INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 5 In summary, libraries using either of these analytics tools can measure usage and users with page views, visits, and unique visitors. Looking at how users navigate a site is possible with the available user paths, from the initial search, to events as seen in figures 3 and 4, and an exit page URL. Goals can be set and maintained with conversion metrics tied to referrers, visits, user location, devices, or user attributes. Like Google Analytics, Matomo can run reports on engagement and performance, and share customizable user-friendly graphs or graphs or other visual representation. Figure 3. Peer Reviewed Limiter as Event Action in Google Analytics Figure 4. Peer Reviewed Limiter Use as Event Name in Matomo Comparisons on Privacy Both Google Analytics and Matomo offer ways to protect the privacy of your users. Both offer IP anonymization, the option for data deletion after a certain time, and both provide Do Not Track feature for users. It is important to note the way Google offers these adjustments to the user. For Matomo, Do Not Track is a default behavior, meaning that the tracker automatically honors a browser’s settings for all sites, which is sometimes not the case, as respecting the Do Not Track browser setting is voluntary for websites, not mandatory .19 Google Analytics offers the same service, as long as it is implemented by the user through a browser extension.20 IP anonymization and data deletion are all features that Matomo users can adjust easily from the dashboard, whereas Google Analytics users will need to make those adjustments programmatically. 21 In Matomo, you can choose to automatically delete your old visitor logs from the database, although Matomo recommends keeping detailed Matomo logs from three to six months, and then INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 6 delete the older log data.22 Quite the contrast is Google Analytics where a user makes a data deletion request to Google, which then creates a report for your review, before submitting the request to Google. Even after submitting a request, Google still allows for seven days to reverse that decision. In terms of data retention, Google Analytics gives you the option to retain user data anywhere from 14 months to 50 months, with the option to never expire. Fourteen months is the shortest amount of time you can retain user data for, nothing less.23 IP anonymization is the default for Matomo analytics but is an opt-in feature for Google Analytics. Again, like data retention, any adjustments to IP anonymization in Matomo can occur in the dashboard with options to have two or three bytes removed from the address. Google Analytics will adjust the last octet to zero.24 Both products are similar in several ways, but the standout feature of Matomo is that the data belongs only to your institution. In his interview with Katherine Schwab for Fast Company, Mathieu Aubry, Matomo’s founder states it clearly: When [Google] released Google Analytics, [it] was obvious to me that a certain percent of the world would want the same technology, but decentralized, where it’s not provided by a centralized corporation and you’re not dependent on them… If you use it on your own server, it’s impossible for us to get any data from it.25 IMPLEMENTATION AND INSTALLATION Originally released as Piwik in 2007, Matomo was designed as a replacement to phpMyVisites.26 It is an open-source software application licensed under GNU GPL v3.27 It is designed as a PHP/MySQL application allowing the server operating system (OS) and web service to best match a user’s needs or institutional preferences and expertise.28 To match the organization’s preferences and expertise, this Matomo instance was set up as a Linux-Apache-MySQL/PHP (LAMP) stack server (CentOS 7 in our case) with Apache 2.4.6 and MySQL-MariaDB 5.5.60. The required configurations needed to run Matomo are well-documented on the Matomo documentation site as well as the download and documentation area. Depending on the version of Matomo, the mileage a user gets with the documentation may vary. For example, on the recent upgrade to 3.11.0, the instance displayed a warning notification that PHP v7.0 had reached end of life and recommended updating to PHP v7.1 or greater to accommodate future Matomo versions. However, at the time of this writing, the minimum PHP version required stated in Matomo’s documentation is 5.5.9 or greater.29 Like many PHP applications, once the prerequisite applications are installed (PHP, MySQL, and the selected web service, Apache in this case), the Matomo install is completed by browsing to the server’s URL or IP address on port 80. Browsing to the index.php path in a web browser will guide a user through the install process. The installer will also review file directories on the server and inform a user of any permissions problems that will need to be addressed for correct install and use. Compared to other PHP application install experiences, installing Matomo was straightforward and easier to follow than many. Within a few minutes, the admin user was created and the first website was added. The web-based administration area is also more robust and easier to use than many comparable applications. Many features that might typically require configuration file changes directly on the server, including Matomo upgrades, can be configured through the administration area. While the administration page has many options relating to paid-for premium features, there are several INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 7 particularly helpful free configuration cards in the interface. Most notably is the “System Summary” card that displays the current version of Matomo, PHP, and MySQL as well as total users, segments, goals, tracking failures, total websites configured, and a few other metrics. There is a “Tracking Failures” card that notifies of issues with websites, and a “Need Help?” card that links to the Matomo Community forums. Finally, the “System Check” card displays any warnings or errors as well as a link to the full system check report. This is extremely helpful when Matomo has been installed but the instance still needs additional configuration changes or follow-up tasks on upgrades. If there are warnings or errors, the full system report will often have recommendations of changes to make either in the administration page or on the server in the configuration files. These administration features make maintenance a straightforward process. Since setting up the server, two upgrades have been completed. In both cases, an email notification was received indicating a new stable release was available. On login to Matomo, this information also appeared as a banner. Simply clicking on the download update option automatically updated the service without any need to access the server directly or via SSH. In both cases the updates ran smoothly with one exception. In that case, several files were created or overwritten with the root user as the owner. As a result, Matomo indicated an issue with the files and/or path not being found. In actuality, the files did exist, but Matomo no longer had permission to read them. Resolution of the problem required browsing to the directory path indicated in a warning on the server and changing ownership from the root user to the apache user to match other files. Despite this issue, the update process is much more user-friendly than similarly structured applications. Standalone implementation and installation of Matomo is made simple by the installation documentation that is readily available on the Matomo.com website, especially if one is familiar with PHP/MySQL applications. Adding one or two websites whose architectures a new Matomo user is well-acquainted with is a good way for new users to pilot and get introduced to Matomo’s overall functions without being so overwhelmed that the more granular functions are never learned. A system admin may find maintenance and updates to this service less problematic with less interruption of the service than similarly structured applications while users may find the overall functionality of Matomo easier to use and finer points of reporting and analytics more transparent and easier to understand than Google Analytics. Once installed, the authors then tested Matomo on a low-traffic library site. After tracking proved successful, EDS was entered as a new website in the Matomo dashboard and the JavaScript tracking tag was placed in the bottom branding of EDS. The process of adding EDS as a new site to Matomo was as easy as expected, and the data collection was almost immediate. To mirror the EDS and Google Analytics integration, the authors worked with their EBSCO Library Service Engineer to create a Matomo widget. Luckily, another engineer had previously worked on an integration when it was known as Piwik. Instead of building from the ground up, the Piwik widget only needed clean and updated code to match the Google Analytics widget, which would allow for the tracking of events and site searches. Adding a user outside of the organization to Matomo was necessary for the EBSCO engineer to fine-tune the widget. Matomo admins can set up users with specific permissions within the system, with access to only a specific site. Each Matomo user has their own email address and password (not domain-specific), settings, and users can even customize their dashboard. After INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 8 testing proved successful, the new Matomo widget moved into the live profile of EDS, and data collection commenced. SECURITY Though the service is in a pilot stage with limited data collection, the authors wanted to ensure an SSL certificate was in place for login to Matomo. With EFF’s Certbot (https://certbot.eff.org/), the authors installed a Let’s Encrypt (https://letsencrypt.org/) SSL certificate. The SSL certificate is automatically renewed every three months via a cronjob on our server. Because of the power of the administration interface, caution should be used when assigning the “Super User” role to user accounts. It would also be wise to require two-factor authentication (2FA) on the service. Turning on 2FA is a very simple process and Matomo works with multiple third-party authentication utilities including Authy, LastPass, and 1Password. While each user can choose to activate 2FA, an admin can require it for all users if desired. CONCLUSION As the amount of research and rate of adoption testifies, since 2005 GA has set the benchmark for assessment of library web asset success and has made possible a completely new understanding of the library user experience and overall assessment of library services. Matomo’s earliest iteration appeared shortly after in 2007 and is a viable alternative to proprietary web analytics applications with a few notable advantages over GA. From a long-term perspective, the two biggest advantages of Matomo is that it is licensed under a copyleft GPL free and open source software (FOSS) license and is designed with user privacy at heart. For libraries, using FOSS applications whenever possible allows them to practice what they preach. FOSS does not mean cost-free. In fact, free in the FOSS sense is more akin to freedom (freedom to download, modify, distribute, and change the code) rather than free of charge. Budgeting for a hosted subscription, support, or the costs of a library running and maintaining the application itself or through an Infrastructure as a Service (IaaS) provider like Amazon Web Services (AWS) or Microsoft’s Azure is necessary, but the freedom Matomo provides by ensuring the library is in control of its patron data, that it is protected, and that data is not at risk of becoming a product in and of itself may well be worth the cost. Like other initiatives in the open-access movement or open-education resources, and as third- party data collection and privacy on the web becomes a more mainstream concern, opting to use Matomo to protect patron privacy principles allows libraries to be the leaders on issues relating to privacy and intellectual freedom. As noted earlier, there are other feature-based advantages Matomo provides that impact the day-to-day aspects of monitoring web asset use and assessment, like export options and viewing the full log of visits. Lastly, by focusing on EDS in this pilot, the authors were able to demonstrate and verify that Matomo rises to the challenge not just with traditional web asset analytics requirements, but to library-specific applications like proprietary discovery layer services. https://certbot.eff.org/ https://letsencrypt.org/ INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 9 ENDNOTES 1 Adam Chandler and Melissa Wallace, “Using Piwik Instead of Google Analytics at the Cornell University Library.” Serials Librarian 71, no. 3 (October 2016): 174, https://doi.org/10.1080/0361526X.2016.1245645. 2 Tabatha Farney and Nina McHale, “Introducing Google Analytics for Libraries,” Library Technology Reports 49, no. 4 (May 2013): 5, https://journals.ala.org/ltr/article/download/4269/4881. 3 Paul Betty, “Assessing Homegrown Library Collections: Using Google Analytics to Track Use of Screencasts and Flash-Based Learning Objects,” Journal of Electronic Resources Librarianship 21, no. 1 (2009): 75–92, https://doi.org/10.1080/19411260902858631; Jason D. Cooper and Alan May, “Library 2.0 at a Small Campus Library,” Technical Services Quarterly 26, no. 2 (2009): 89–95, https://doi.org/10.1080/07317130802260735; Stephan Spitzer, “Better Control of User Web Access of Electronic Resources,” Journal of Electronic Resources in Medical Libraries 6, no. 2 (2009): 91–100, https://doi.org/10.1080/15424060902931997; Julie Arendt and Cassie Wagner, “Beyond Description: Converting Web Site Usage Statistics into Concrete Site Improvement Ideas,” Journal of Web Librarianship 4, no. 1 (2010): 37–54, https://doi.org/10.1080/19322900903547414; Steven J. Turner, “Website Statistics 2.0: Using Google Analytics to Measure Library Website Effectiveness,” Technical Services Quarterly 27, no. 3 (2010): 261–78, https://doi.org/10.1080/07317131003765910; Gail Herrera, “Measuring Link-Resolver Success: Comparing 360 Link with a Local Implementation of WebBridge,” Journal of Electronic Resources Librarianship 23, no. 4 (2011): 379–88, https://doi.org/10.1080/1941126X.2011.627809; Wayne Loftus, “Demonstrating Success: Web Analytics and Continuous Improvement,” Journal of Web Librarianship 6, no. 1 (2012): 45–50, https://doi.org/10.1080/19322909.2012.651416; Tabatha A. Farney, “Click Analytics: Visualizing Website Use Data,” Information Technology & Libraries 30, no. 3 (2011): 141–8, https://doi.org/10.6017/ital.v30i3.1771. 4 Patrick O’Brien et al., “Protecting Privacy on the Web: A Study of HTTPS and Google Analytics Implementation in Academic Library Websites,” Online Information Review 42, no. 6 (2018): 734–51, https://doi.org/10.1108/OIR-02-2018-0056. 5 Junior Tidal, “Using Web Analytics for Mobile Interface Development,” Journal of Web Librarianship 7, no. 4 (2013): 451–64, http://doi.org/10.1080/19322909.2013.835218; Ramiro Federico Uviña, “Bibliotecas Y Analítica Web: Una Cuestión De Privacidad = Libraries and Web Analytics: A Privacy Matter,” Información, Cultura Y Sociedad no. 33 (2015): 105–12, http://revistascientificas.filo.uba.ar/index.php/ICS/article/view/1906; Sukumar Mandal, “Site Metrics Study of Koha OPAC through Open Web Analytics and Piwik Tools,” Library Philosophy and Practice (2019), https://digitalcommons.unl.edu/libphilprac/2835; Mohammad Azim and Nabi Hasan, “Web Analytics Tools Usage among Indian Library Professionals,” 2018 5th International Symposium on Emerging Trends and Technologies in Libraries and Information Services, (2018): 31-35, https://doi.org/10.1109/ETTLIS.2018.8485212. 6 Ian Barba et al., “Web Analytics Reveal User Behavior: TTU Libraries’ Experience with Google Analytics,” Journal of Web Librarianship 7, no. 4 (2013): 389–400, https://doi.org/10.1080/19322909.2013.828991. https://doi.org/10.1080/0361526X.2016.1245645 https://journals.ala.org/ltr/article/download/4269/4881 https://doi.org/10.1080/19411260902858631 https://doi.org/10.1080/07317130802260735 https://doi.org/10.1080/15424060902931997 https://doi.org/10.1080/19322900903547414 https://doi.org/10.1080/07317131003765910 https://doi.org/10.1080/1941126X.2011.627809 https://doi.org/10.1080/19322909.2012.651416 https://doi.org/10.1080/19322909.2012.651416 https://doi.org/10.1108/OIR-02-2018-0056 http://doi.org/10.1080/19322909.2013.835218 http://revistascientificas.filo.uba.ar/index.php/ICS/article/view/1906 https://digitalcommons.unl.edu/libphilprac/2835 https://doi.org/10.1109/ETTLIS.2018.8485212 https://doi.org/10.1080/19322909.2013.828991 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 10 7 Betty, “Assessing Homegrown Library Collections.” 8 O’Brien et al., “Protecting Privacy on the Web,” 734. 9 O’Brien et al., “Protecting Privacy on the Web,” 741. 10 Shayna Pekala, “Privacy and User Experience in 21st Century Library Discovery,” Information Technology & Libraries 36, no. 2 (2017): 50, https://doi.org/10.6017/ital.v36i2.9817. 11 J. Clement, “Advertising Revenue of Google from 2001 to 2019,” Statista, February 5, 2020, https://www.statista.com/statistics/266249/advertising-revenue-of-google; Lily Hay Newman, “The Privacy Battle to Save Google From Itself,” Wired, November 1, 2018, https://www.wired.com/story/google-privacy-data/; Ben Popken, “Google Sells the Future, Powered by Your Personal Data,” NBC News, May 10, 2018, https://www.nbcnews.com/tech/tech-news/google-sells-future-powered-your-personal- data-n870501; Richard Graham, “Google and Advertising: Digital Capitalism in the Context of Post-Fordism, the Reification of Language, and the Rise of Fake News,” Palgrave Communications 3, no. 45 (2017): 2-4, https://doi.org/10.1057/s41599-017-0021-4. 12 “Google Advanced Protection Program,” Google, https://landing.google.com/advancedprotection/. 13 “Google Privacy and Terms, Advertising,” Google, https://policies.google.com/technologies/ads?hl=en-US. 14 Brooke Auxier et al., “American and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information,” November 15, 2019, Pew Research, https://www.pewresearch.org/internet/wp-content/uploads/sites/9/2019/11/Pew- Research-Center_PI_2019.11.15_Privacy_FINAL.pdf. 15 “Consumer Privacy Survey,” November 2019, CISCO, https://www.cisco.com/c/dam/en/us/products/collateral/security/cybersecurity-series- 2019-cps.pdf. 16 Andrew Perrin, “Half of Americans Have Decided Not to Use a Product or Service Because of Privacy Concerns,” Pew Research, April 14, 2020, https://www.pewresearch.org/fact- tank/2020/04/14/half-of-americans-have-decided-not-to-use-a-product-or-service-because- of-privacy-concerns/. 17 “Matomo vs. Google Analytics 360,” Matomo.org, https://matomo.org/matomo-vs-google- analytics comparison; Lemon, “A Comparison of Data: Piwik vs. Google Analytics,” The FPlus (blog), November 30, 2016, https://thefpl.us/wrote/about-piwik; Himanshu Sharman, “Best Google Analytics Alternatives in 2020—Matomo & Piwik Pro,” OptimizeSmart (blog), March 30, 2020, https://www.optimizesmart.com/introduction-to-piwik-best-google-analytics- alternative. 18 “Matomo vs. Google Analytics 360,” Matomo.org. https://doi.org/10.6017/ital.v36i2.9817 https://www.statista.com/statistics/266249/advertising-revenue-of-google https://www.wired.com/story/google-privacy-data/ https://www.nbcnews.com/tech/tech-news/google-sells-future-powered-your-personal-data-n870501 https://www.nbcnews.com/tech/tech-news/google-sells-future-powered-your-personal-data-n870501 https://doi.org/10.1057/s41599-017-0021-4 https://landing.google.com/advancedprotection/ https://policies.google.com/technologies/ads?hl=en-US https://www.pewresearch.org/internet/wp-content/uploads/sites/9/2019/11/Pew-Research-Center_PI_2019.11.15_Privacy_FINAL.pdf https://www.pewresearch.org/internet/wp-content/uploads/sites/9/2019/11/Pew-Research-Center_PI_2019.11.15_Privacy_FINAL.pdf https://www.cisco.com/c/dam/en/us/products/collateral/security/cybersecurity-series-2019-cps.pdf https://www.cisco.com/c/dam/en/us/products/collateral/security/cybersecurity-series-2019-cps.pdf https://www.pewresearch.org/fact-tank/2020/04/14/half-of-americans-have-decided-not-to-use-a-product-or-service-because-of-privacy-concerns/ https://www.pewresearch.org/fact-tank/2020/04/14/half-of-americans-have-decided-not-to-use-a-product-or-service-because-of-privacy-concerns/ https://www.pewresearch.org/fact-tank/2020/04/14/half-of-americans-have-decided-not-to-use-a-product-or-service-because-of-privacy-concerns/ https://matomo.org/matomo-vs-google-analytics%20comparison/ https://matomo.org/matomo-vs-google-analytics%20comparison/ https://thefpl.us/wrote/about-piwik https://www.optimizesmart.com/introduction-to-piwik-best-google-analytics-alternative https://www.optimizesmart.com/introduction-to-piwik-best-google-analytics-alternative INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 11 19 Ryan Singel, “Google Holds Out Against ‘Do Not Track’ Flag,” Wired, April 15, 2011, https://www.wired.com/2011/04/chrome-do-not-track; Kieren McCarthy, “Do Not Track Is Back in the US Senate,” The Register, May 20, 2019, https://www.theregister.co.uk/2019/05/20/do_not_track; “How Do I Turn on the Do Not Track Features?,” Mozilla, https://support.mozilla.org/en-US/kb/how-do-i-turn-do-not-track- feature. 20 “Google Analytics Opt-Out Browser Add-On,” Google, https://support.google.com/analytics/answer/181881. 21 “IP Anonymization,” Google, https://developers.google.com/analytics/devguides/collection/analyticsjs/ip-anonymization. 22 “Managing Your Database’s Size,” Matomo.org, https://matomo.org/docs/managing-your- databases-size/ - deleting-old-unprocessed-data. 23 “Data Retention,” Google, https://support.google.com/analytics/answer/7667196?hl=en&ref_topic=2919631. 24 “IP Anonymization,” Google. 25 Katherine Schwab, “It’s Time to Ditch Google Analytics,” Fast Company, February 1, 2019, https://www.fastcompany.com/90300072/its-time-to-ditch-google-analytics. 26 “Matomo and phpMyVisites,” Matomo.org, https://matomo.org/faq/general/faq_437. 27 “Licenses,” Matomo.org, https://matomo.org/licences. 28 “Matomo (software),” Wikipedia, https://en.wikipedia.org/wiki/Matomo_(software). 29 “Matomo Requirements,” Matomo.org, https://matomo.org/docs/requirements. https://www.wired.com/2011/04/chrome-do-not-track https://www.theregister.co.uk/2019/05/20/do_not_track https://support.mozilla.org/en-US/kb/how-do-i-turn-do-not-track-feature https://support.mozilla.org/en-US/kb/how-do-i-turn-do-not-track-feature https://support.google.com/analytics/answer/181881 https://developers.google.com/analytics/devguides/collection/analyticsjs/ip-anonymization https://matomo.org/docs/managing-your-databases-size/%20-%20deleting-old-unprocessed-data https://matomo.org/docs/managing-your-databases-size/%20-%20deleting-old-unprocessed-data https://support.google.com/analytics/answer/7667196?hl=en&ref_topic=2919631 https://www.fastcompany.com/90300072/its-time-to-ditch-google-analytics https://matomo.org/faq/general/faq_437 https://matomo.org/licences https://en.wikipedia.org/wiki/Matomo_(software) https://matomo.org/docs/requirements ABSTRACT Introduction Literature Review Google Analytics Comparisons Comparisons on Privacy Implementation and Installation Security Conclusion Endnotes
12235 ---- Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results ARTICLES Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results Sam Grabus INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12235 Sam Grabus (smg383@Drexel.edu) is an Information Science PhD Candidate at Drexel University’s College of Computing and Informatics, and Research Assistant at Drexel’s Metadata Research Center. This article is the 2020 winner of the LITA/Ex Libris Student Writing Award. © 2020. ABSTRACT This research compares automatic subject metadata generation when the pre-1800s Long-S character is corrected to a standard < s >. The test environment includes entries from the third edition of the Encyclopedia Britannica, and the HIVE automatic subject indexing tool. A comparative study of metadata generated before and after correction of the Long-S demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the Long-S is not corrected. Results confirm that correcting the Long-S increases the availability of terms that can be used for creating quality metadata records. A relationship is also demonstrated between shorter entries and an increase in omitted terms when the Long-S is not corrected. INTRODUCTION The creation of subject metadata for individual documents is long known to support standardized resource discovery and analysis by identifying and connecting resources with similar aboutness .1 In order to address the challenges of scale, automatic or semi-automatic indexing is frequently employed for the generation of subject metadata, particularly for academic articles, where the abstract and title can be used as surrogates in place of indexing the full text. When automatically generating subject metadata for historical humanities full texts that do not have an abstract, anachronistic typographical challenges may arise. One key challenge is that presented by the historical “Long-S” < ſ >. In order to account for these idiosyncrasies, there is a need to understand the impact that they have upon the automatic subject indexing output. Addressing this challenge will help librarians and information professionals to determine whether or not they will need to correct the Long-S when automatically generating subject metadata for full-text pre-1800s documents. The problem of the Long-S in Optical Character Recognition (OCR) for digital manuscript images has been discussed for decades.2 Many scholars have researched methods for correcting the Long- S through the use of rule-based algorithms or dictionaries.3 While the problem of the Long-S is well-known in the digital humanities community, automatic subject metadata generation for a large corpus of pre-1800s documents is rare, as is research about the application and evaluation of existing automatic subject metadata generation tools on 18th-century documents in real-world information environments. The impact of the Long-S upon automatic subject metadata generation results for pre-1800s texts has not been extensively explored. The research presented in this paper addresses this need. The paper reports results from basic statistical analysis and visualization using the Helping Interdisciplinary Vocabulary Engineering (HIVE) tool automatic mailto:smg383@Drexel.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 2 subject indexing results, before and after the correction of the historical Long-S in the 3rd edition of the Encyclopedia Britannica. Background work was conducted over the Summer and Fall of 2019, and the research presented was conducted during Winter 2020. The work was motivated by current work on the “Developing the Data Set of Nineteenth-Century Knowledge” project, a National Endowment for the Humanities collaborative project between Temple University’s Digital Scholarship Center and Drexel University’s Metadata Research Center. The grant is part of a larger project, Temple University’s “19th-Century Knowledge Project,” which is digitizing four historical editions of the Encyclopedia Britannica.4 The next section of this paper presents background covering the historical Encyclopedia Britannica data, the automatic subject metadata generation tool used for this project, a brief background of “the Long-S Problem,” and the distribution of encyclopedia entry lengths in the 3rd edition. The background section will be followed by research objectives and method supporting the analysis. Next, the results are presented, demonstrating prevalence of terms omitted from the automatic subject metadata generation results if the Long-S is not corrected to a standard small < s > character, as well as the impact of encyclopedia entry length upon these results. The results are followed by a contextual discussion, and a conclusion that highlights key findings and identifies future research. BACKGROUND Indexing for the 19th-Century Knowledge Project The 19th-Century Knowledge Project, an NEH-funded initiative at Temple University, is fully digitizing four historical editions of the Encyclopedia Britannica (the 3rd, 7th, 9th, and 11th). The long-term goal of the project is to analyze the evolving conceptualization of knowledge across the 19th century.5 The 3rd edition of the Encyclopedia Britannica (1797) is the earliest edition being digitized for this project. The 3rd edition consists of 18 volumes, with a total of 14,579 pages, and individual entries ranging from four to over 150,000 words. For each individual entry, researchers at Temple have created individual TEI-XML files from the OCR output. In order to enrich accessibility and analysis across this digital collection, The Knowledge Project will be adding controlled vocabulary subject headings into the TEI headers of each encyclopedia entry XML file. Considering the size of this corpus, both in terms of entry length and number of entries, automatic subject metadata generation will be required for the creation of this metadata. The Knowledge Project will employ controlled vocabularies to replace or complement naturally extracted keywords for this process. Using controlled vocabularies adheres to metadata semantic interoperability best practices, ensures representation consistency, and helps to bypass linguistic idiosyncrasies of these 18th and 19th Century primary source materials. 6 We selected two versions of the Library of Congress Subject Headings (LCSH) as the controlled vocabularies for this project. LCSH was selected due to its relational thesaurus structure, multidisciplinary nature, and continued prevalence in digital collections due to its expressiveness and status as the largest general indexing vocabulary.7 In addition to the headings from the 2018 edition of LCSH, headings from the 1910 LCSH are also implemented in order to provide a more multi-faceted representation, using temporally-relevant terms that may have been removed from the contemporary LCSH. The tool applied for this process is HIVE, a vocabulary server and automatic indexing application. 8 HIVE allows the user to upload a digital text or URL, select one or more controlled vocabularies, and performs automatic subject indexing through the mapping of naturally extracted keywords to the available controlled vocabulary terms. HIVE was initially launched as an IMLS linked open INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 3 vocabulary and indexing demonstration project in 2009. Since that time, HIVE has been further developed, with the addition of more controlled vocabularies, user interface options, and the RAKE keyword extraction algorithm. The RAKE keyword extraction algorithm has been selected for this project after a comparison of topic relevance precision scores for three keyword extraction algorithms.9 The Long-S Problem Early in our metadata generation efforts, we discovered that the 3rd edition of the Encyclopedia Britannica employs the historical Long-S. Originating in early Roman cursive script, the Long-S was used in typesetting up through the 18th century, both with and without a left crossbar. By the end of the 18th century, the Long-S fell out of use with printers.10 As outlined by lexicographers of the 17th and 18th centuries, the rules for using the Long-S were frequently vague, complicated, inconsistent over time, and varied according to language (English, French, Spanish, or Italian). 11 These rules specified where in a word the Long-S should be used instead of a short < s >, whether it is capitalized, where it may be used in proximity to apostrophes, hyphens, and the letters < f >, < b >, < h >, and < k >; and whether it is used as part of a compound word or abbreviation.12 This is further complicated by the inclusion of the half-crossbar, which occasionally results in two consequences: (a) The Long-S may be interpreted by OCR as an < f >, and < b > and < f > may be interpreted by OCR as a Long-S. Figure 1 shows an example from the 3rd edition entry on Russia, in which the original text specifies “of” (line 1 in top figure), yet the OCR output has interpreted the character as a Long-S. The Long-S may also occasionally be interpreted by the OCR as a lower- case < l >, such as the “univerlity of Dublin” in the 3rd edition entry on Robinson (The most Rev Sir Richard). These complications and inconsistencies are challenges when developing Python rules for correcting the Long-S in an automated way, and even preexisting scripts will need to be adapted for individual use with a particular corpus. Figure 1. Example from the 3rd edition entry on Russia, comparing the original use of a letter < f > in “of” to the OCR output of the same passage, which mistakenly interprets the character as a Long-S. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 4 Despite the transition away from the Long-S towards the end of the 18th century, the 3rd edition of the Encyclopedia Britannica (published in 1797) implements the Long-S throughout, with approximately 100,594 instances of the Long-S in the OCR output. When performing metadata generation with the HIVE tool on the OCR output for an entry, the Long-S is most often interpreted by the automatic metadata generation tool as an < f >, which can result in (a) inaccurate keyword extraction (e.g., Russians→ Ruffians), and (b) when mapping extracted keywords to controlled vocabulary terms, essential topics could be unidentifiable, and HIVE will subsequently omit them from the results because they cannot be mapped to controlled vocabulary terms. Figure 2 provides a truncated view of Long-S words in the 3rd edition entry on Rum, which are subsequently removed from the pool of automatically extracted keywords when performing the automatic subject indexing sequence in HIVE. Using keyword extraction algorithms that are largely dependent upon term frequencies, automatic subject indexing for an entry on Rum may be substantially hindered when meaningful and frequently occurring words such as sugar, and yeast are removed. Figure 2. Examples of the Long-S in the 3rd edition Encyclopedia Britannica entry on Rum. Using this example entry, the automatic subject indexing results were compared using Python, to determine which terms only appear when the Long-S has been corrected to the standard < s >. The comparison showed that 16 total terms no longer appeared in the results when the Long-S was not corrected to a standard < s >: ten terms using the 2018 LCSH, and six terms using the 1910 LCSH. These omitted results included the terms sugar and yeast. The next section will discuss the encyclopedia entry word count for this corpus, and the possible impact that this may have upon automatic subject indexing between corrected and uncorrected Long-S instances. Encyclopedia Entry Lengths Consistent with other Encyclopedia Britannica editions in the 18th and 19th centuries, the encyclopedia entries in the 3rd edition vary substantially in length. A convenience sample of 3,849 3rd edition entries ranging in length from 2 to 202,848 words demonstrated an arithmetic mean of INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 5 826.60, and a median word count of 71. As shown in figure 3, this indicates a significant skew towards shorter entry lengths. For the vast majority of encyclopedia entries in this corpus, a low total word count may impact the degree of Long-S impact for automatic subject indexing results, given the importance of term availability and frequency for keyword extraction algorithms. Figure 3. Scatterplot of word count for a convenience sample of 3,849 3rd Edition Encyclopedia Britannica entries. Large-scale metadata generation requires time, labor, and resources, and it becomes more costly when accounting for the complications of correcting the Long-S for a particular corpus. Library and information professionals working with digital humanities resources will need to understand the impact of correcting or not corrected the Long-S in the corpus before designating resources and developing a protocol for generating the automatic or semi-automatic metadata for full-text resources. This includes understanding whether or not the length of each individual document will affect the degree of Long-S impact upon the results. This challenge, and issues reviewed above, are in the research presented below. OBJECTIVES The overriding goal of this work is to determine the prevalence of omitted terms in automatic subject indexing results when the Long-S is not corrected in the 3rd edition entries of the Encyclopedia Britannica. Research questions: 1. What is the average number of terms that are omitted from automatic subject indexing results when the Long-S is not corrected to a standard < s >? 2. How does the encyclopedia entry length affect the number of terms that are omitted when the Long-S is not corrected to a standard < s >? This analysis will approach these goals by performing a comparative analysis of automatic subject indexing results to determine the number of terms that are omitted from the results when the Long-S is not corrected to a standard letter < s >. Basic descriptive statistics are generated to determine central tendency. The quantity of terms omitted are then compared with encyclopedia INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 6 entry word counts. These objectives were shaped by collaboration between Drexel University’s Metadata Research Center and Temple University’s Digital Scholarship Center. The next section of this paper will report on methods and steps taken to address these objectives. METHODS We approached this research by performing a comparative analysis of subject metadata generated both before and after the correction of the historical Long-S in the 3rd edition of the Encyclopedia Britannica. The HIVE tool was used to automatically generate the subject metadata. Descriptive statistics were applied, and visualizations produced from the results were also examined to identify trends. Figure 4. The 30 Encyclopedia Britannica 3rd edition Encyclopedia Britannica entries randomly selected for this study, sorted in ascending order by their word counts. The protocol for performing this research involved the following steps: 1. Compile a sample for testing: 1.1. A random sample of 30 encyclopedia entries was identified from a convenience sample of entries that comprise the letter S volumes of the 3rd edition. The entries range, in length, from 6 to 6,114 words. The median word count for entries in this sample is 99 words. 1.2. The sample of terms selected for this study and their respective word counts are visualized in figure 4. 1.3. For each entry, the Long-S terms in the original XML file were extracted to a list. 2. Perform automatic subject indexing sequence upon entries to generate lists of terms: 2.1. Using the 2018 and 1910 versions of the LCSH. 2.2. With fixed maximum subject heading results set to 40: 20 maximum terms returned with the 2018 LCSH, and 20 maximum terms returned with the 1910 LCSH. 2.3. Before Long-S correction and after Long-S correction, using the Oxygen XML Editor TEI to TXT transformation. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 7 3. Perform outer join on Python Data Frames, between terms generated when the Long-S has been corrected vs. terms generated when the Long-S has not been corrected. The resulting left outer join list displays terms that are omitted from the automatic indexing results if the Long-S is not corrected to a standard small < s >. The quantity of terms omitted are recorded for comparison. 4. Analysis: Descriptive statistics were generated to determine central tendency for the number and percentage of words omitted when the Long-S is not corrected. The quantity of terms omitted are also visualized in a continuous scatterplot with the corresponding word counts, to demonstrate that the quantity of terms omitted when the Long-S is not corrected seems to relate to the length of the document being automatically classified. RESULTS The results report the prevalence of omitted terms when the Long-S is not corrected to a standard < s >, as well as a visualization of the number of terms omitted as they relate to the encyclopedia entry length. For each of the 30 sample entries automatically indexed with HIVE, a fixed maximum number of 40 entries were returned: a maximum of 20 terms using the 2018 LCSH, and a maximum of 20 terms using the 1910 LCSH. As seen in table 1, central tendency is measured using the arithmetic mean and median, along with the standard deviation and range. The average number of terms omitted from an entry’s results is 6.73, and the average percentage of terms omitted from an entry’s results is 26.51 percent, with the 2018 and 1910 editions of LCSH performing at similar rates. The full results are displayed in appendix A. Table 1. Measures of centrality, standard deviation, range, and percentage for quantity of terms omitted when the Long-S is not corrected to a standard < s >, rounded to the hundredth. For each entry, a maximum of 40 terms were returned: 20 using 2018 LCSH and 20 using 1910 LCSH. The total results returned varies according to the entry length. These totals are reported in appendix B. (N= 30 entries.) For each entry in the sample, the results in appendix A display the total words omitted when the Long-S is not corrected, the number of 2018 LCSH terms omitted, the number of 1910 LCSH terms omitted, and the encyclopedia entry word count. Figure 5 visualizes the total number of terms omitted for each entry when the Long-S is not corrected, demonstrating an increase in terms omitted for entries with lower word counts. These results are broken down by vocabulary used in figure 6, demonstrating that both vocabularies used to generate these results indicate a significant increase in omitted terms for shorter entries. Column1 Both Vocabularies 2018 LCSH 1910 LCSH Average, Terms Omitted 6.73 3.67 3.07 Median, Terms Omitted 5 3 2 Standard Deviation 6.53 3.84 3.17 Range, Terms Omitted 0-24 0-13 0-11 Average Percentage, Omitted Terms 26.51% 27.51% 24.28% Median Percentage, Omitted Terms 22.36% 20.00% 19.09% INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 8 Figure 5. Number of automatic subject indexing terms that are omitted when the Long-S is not corrected to a standard < s > as compared by encyclopedia entry word count. Figure 6. Number of automatic subject indexing terms that are omitted when the Long-S is not corrected to a standard < s > as compared by encyclopedia entry word count, separated by controlled vocabulary version. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 9 DISCUSSION The analysis above presents measures of centrality for quantity of terms omitted if the Long-S is not corrected to a standard < s > prior to automatic subject indexing using HIVE, as well as a visualization to represent the relationship between encyclopedia entry word count and number of terms omitted. Although researchers have identified challenges with the Long-S and have focused a great deal on the technologies and methods used to correct it, there is still limited work on looking at the results of not correcting the Long-S character when performing an automatic subject indexing sequence. This research demonstrated an average of 6.73 potentially relevant terms omitted from automatic indexing results when the Long-S is not corrected, accounting for an average of 26.51 percent of the total results, with an approximately equal distribution of omitted terms across the two controlled vocabulary versions used. When the quantity of terms omitted is visualized using a continuous scatterplot, the results also demonstrated a significant increase in omitted terms for shorter entries, with longer entries less affected. These results reflect the impact of term frequency and total word count in keyword extraction and automatic subject indexing, with longer documents having a greater pool of total terms from which to identify key terms. Considering the complexities and similarities of the typographical characters in the original manuscript, the OCR output process for this corpus occasionally mistakes the letters < s >, < f >, < r >, and < l >. As a result, an occasional Long-S word in this study did not originally contain an < s > (e.g., sor instead of for). Correction of these Long-S OCR errors requires the development of a dictionary-based script. An additional complication of this research is that the corrected OCR output for the encyclopedia entries still contains a few errors not related to the Long-S, which will prevent the mapping of the term to any controlled vocabulary term (e.g., in the entry on Sepulchre, the OCR output for the term Palestine was Palestinc). These results are specific to this particular corpus of 3rd edition Encyclopedia Britannica entries, but it is very likely that testing another set of pre-1800s documents containing the Long-S would also illustrate that for best results with any algorithm or tool, the Long-S needs to be corrected. The results are also specific to the two versions of the LCSH used, both the 1910 LCSH and the 2018 LCSH, which are available in the HIVE tool. The 1910 version is key for the time period being studied, and the 2018, more contemporary to today, has supported additional analysis on the impact of the Long-S. Both of these vocabularies are important to the larger 19th-Century Knowledge Project. It should be noted that while the LCSH is updated weekly, we were limited to what is available via the HIVE tool, and any discrepancies that may be found with the 2020 LCSH will very likely have a minimal effect upon metadata generation results. It should be noted that the 2020 LCSH will be incorporated into HIVE soon and can be explored in future research. CONCLUSION AND NEXT STEPS The objective of this research was to determine the impact of correcting the Long-S in pre-1800s documents when performing an automatic metadata generation sequence using keyword extraction and controlled vocabulary mapping. This was accomplished by performing an automatic subject indexing sequence using the HIVE tool, followed by a basic statistical analysis to determine the quantity of terms omitted from the results when the Long-S is not corrected to a standard < s >. The number of omitted terms was also compared with the encyclopedia entry word count and visualized to demonstrate a significant increase in omitted terms for shorter INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 10 encyclopedia entries. The study was conclusive in confirming that the correction of the Long-S is a critical part of our workflow. The significance of this research is that it demonstrates the necessity of correcting the Long-S prior to performing an automatic subject indexing on historical documents. Beyond the correction of the Long-S, the larger next steps for this project are to continue to explore automatic metadata generation for this corpus. These next steps include the comparison of results using contemporary vs. historical vocabularies and streamlining a protocol for bulk classification procedures and integration of terms into the TEI-XML headers. The research presented here can inform other digital humanities and even science-oriented projects, where researchers may not be aware of the impact of the Long-S on automatic metadata generation not only for subjects, but also named entities, particularly when automatic approaches with controlled vocabularies are desired. ACKNOWLEDGEMENTS The author thanks Dr. Jane Greenberg and Dr. Peter Logan for their guidance. The author acknowledges the support of the NEH grant #HAA-261228-18. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 11 APPENDIX A Entry Term Total Words Omitted 2018 LCSH Terms Omitted 1910 LCSH Terms Omitted Encyclopedia Entry Word Count SARDIS 24 13 11 381 SUCTION 24 13 11 38 STYLITES, PILLAR SAINTS 19 13 6 199 SHADWELL 14 10 4 211 SALICORNIA 13 6 7 254 SEPULCHRE 11 3 8 348 SITTA NUTHATCH 9 5 4 620 SPRAT 9 3 6 475 SERAPIS 8 5 3 587 STRADA 8 1 7 189 SHOAD 7 4 3 463 SIGN 7 5 2 68 SHOOTING 6 3 3 6114 STRATA 6 3 3 2920 STEWARTIA 5 4 1 72 SUBCLAVIAN 5 3 2 20 SCHWEINFURT 4 2 2 84 SCROLL 4 2 2 45 SPALATRO 4 3 1 99 SPECIAL 4 3 1 24 SAMOGITIA 3 2 1 112 SHAKESPEARE 3 0 3 3855 SINAPISM 2 1 1 25 SECT 1 1 0 20 SEVERINO 1 1 0 38 SHADDOCK 1 1 0 6 SCARLET 0 0 0 65 SHALLOP, SHALLOOP 0 0 0 42 SOLDANELLA 0 0 0 56 SPOLETTO 0 0 0 99 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 12 APPENDIX B *N = 30 entries Average Terms Returned Median Terms Returned Corrected 24.77 / 40 possible 28 / 40 possible Uncorrected 26.47 / 40 possible 29 / 40 possible 2018 LCSH Corrected 14.10 / 20 possible 19 / 20 possible 2018 LCSH Uncorrected 13.47 / 20 possible 18.5 / 20 possible 1910 LCSH Corrected 11.27 / 20 possible 11 / 20 possible 1910 LCSH Uncorrected 10.13 / 20 possible 9 / 20 possible INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 13 ENDNOTES 1 Liz Woolcott, “Understanding Metadata: What is Metadata, and What is it For?,” Routledge (November 17, 2017), https://doi.org/10.1080/01639374.2017.1358232; Koraljka Golub et al., “A framework for evaluating automatic indexing or classification in the context of retrieval,“ Journal of the Association for Information Science and Technology 67, no. 1 (2016), https://doi.org/10.1002/asi.23600; Lynne C. Howarth, “Metadata and Bibliographic Control: Soul-Mates or Two Solitudes?,“ Cataloging & Classification Quarterly 40, no. 3-4 (2005), https://doi.org/10.1300/J104v40n03_03. 2 A. Belaid et al., “Automatic indexing and reformulation of ancient dictionaries“ (paper presented at the First International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, 2004), https://doi.org/10.1109/DIAL.2004.1263264. 3 Beatrice Alex et al., “Digitised Historical Text: Does it have to be mediOCRe" (paper presented at the KONVENS 2012 (LThist 2012 workshop), Vienna, September 21, 2012); Ted Underwood, “A half-decent OCR normalizer for English texts after 1700," The Stone and the Shell, December 10, 2013, https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english- texts-after-1700/. 4 “Nineteenth-century knowledge project," (GitHub Repository), 2020, https://tu- plogan.github.io/. 5 “Nineteenth-century Knowledge Project.” 6 Marcia Lei Zeng and Lois Mai Chan, “Metadata Interoperability and Standardization - A Study of Methodology, Part II," D-Lib Magazine 12, no. 6 (2006); G. Bueno-de-la-Fuente, D. Rodríguez Mateos, and J. Greenberg, “Chapter 10 - Automatic Text Indexing with SKOS Vocabularies in HIVE" (Elsevier Ltd, 2016); Sheila Bair and Sharon Carlson, “Where Keywords Fail: Using Metadata to Facilitate Digital Humanities Scholarship," Journal of Library Metadata 8, no. 3 (2008), https://doi.org/10.1080/19386380802398503. 7 John Walsh, “The use of Library of Congress Subject Headings in digital collections," Library Review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 8 Jane Greenberg et al., “HIVE: Helping interdisciplinary vocabulary engineering,“ Bulletin of the American Society for Information Science and Technology 37, no. 4 (2011), https://doi.org/10.1002/bult.2011.1720370407. 9 Sam Grabus et al., “Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries,” NASKO 7 (2019), pp. 138-48, https://doi.org/10.7152/nasko.v7i1.15635. 10 Karen Attar, “S and Long S," in Oxford Companion to the Book, eds. Michael Felix Suarez and H. R. II Woudhuysen (Oxford: Oxford University Press, 2010); Ingrid Tieken-Boon van Ostade, “Spelling systems,“ in An Introduction to Late Modern English (Edinburgh University Press, 2009). 11 Andrew West, “The Rules for Long-S," TUGboat 32, no. 1 (2011). 12 Attar, “S and Long S.” https://doi.org/10.1080/01639374.2017.1358232 https://doi.org/10.1002/asi.23600 https://doi.org/10.1300/J104v40n03_03 https://doi.org/10.1109/DIAL.2004.1263264 https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tu-plogan.github.io/ https://tu-plogan.github.io/ https://doi.org/10.1080/19386380802398503 https://doi.org/10.1108/00242531111127875 https://doi.org/10.1002/bult.2011.1720370407 https://doi.org/10.7152/nasko.v7i1.15635 ABSTRACT INTRODUCTION Background Indexing for the 19th-Century Knowledge Project The Long-S Problem Encyclopedia Entry Lengths Objectives Methods Results Discussion Conclusion and Next Steps Acknowledgements Appendix A Appendix B
12367 ---- Seeing through Ontologies EDITORIAL BOARD THOUGHTS Seeing through Vocabularies Kevin Ford INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12367 Kevin Ford (kevinford@loc.gov) is Librarian, Linked Data Specialist in the Library of Congress’s Network Development and MARC Standards Office. He works on the Library’s Bibframe Initiative, and similar projects, such as MADS/RDF, and is a member of the ITAL Editorial Board. The ideas and opinions expressed here are those of the author and do not necessarily reflect those of his employer. “Ontologies” are popular in library land. “Vocabularies” are popular too, but it seems that the library profession prefers “ontologies” over “vocabularies” when it comes to defining classes and properties that attempt to encapsulate some realm of knowledge. Bibframe, MADS/RDF, BIBO, PREMIS, and FRBR are well-known “ontologies” in use in the library community.1 They were defined either by librarians or to be used mainly in the library space, or both. SKOS, FOAF, Dublin Core, and Schema are well known “vocabularies.”2 They are used widely by libraries though none were created by librarians or specifically for library use. In all cases, those ontologies and vocabularies were created for the very purpose of publication for broader use, which is one of the primary objectives behind creating one: to define a common set of metadata elements to facilitate the description and sharing of data within a group or groups of users. Ontologies and vocabularies are common when working with RDF (Resource Description Framework), a very simple data model in which information is expressed as a series of triple statements, each consisting of three parts: a subject, a predicate, and an object. The types of ontologies and vocabularies referred to here are in fact defined using RDF—Thing A is a Class and Thing Z is a Property. Those using any given ontology or vocabulary employ the defined classes and properties to further describe their Things, for a lack of a better word. It is useful to provide an example. The first block of triples below represents Class and Property definitions in RDF Schema (RDFS), which provides some very basic means to define classes and properties and some relationships between them, such as the domains and ranges for properties. The second block is instance data. ontovoc:Book rdf:type rdfs:Class ontovoc:authoredBy rdf:type rdf:Property ontovoc:authorOf rdf:type rdf:Property ex:12345 rdf:type ontovoc:Book ex:12345 ontovoc:authoredBy ex:abcde ontovoc:Book is defined as a Class and ontovoc:authoredBy is defined as a Property. Using those declarations, it is possible to then assert that ex:12345, which is an identifier, is of type ontovoc:Book and was authored by ex:abcde, an identifier for the author. Is the first block— the definitions—an “ontology” or a “vocabulary?” Putting aside the question for now, air quotes— in this case literal quotes—have been employed around “ontologies” and “vocabularies” to suggest that these are more terms of art than technical distinctions, though it must also be acknowledged that there is a technical distinction to be made. mailto:kevinford@loc.gov INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 SEEING THROUGH VOCABULARIES | FORD 2 Ontologies in the RDF space frequently, if not always, use classes and properties from the Web Ontology Language (known as OWL) to define a specific realm’s classes and properties and how they relate to each other within that realm of knowledge. This is because OWL is a more expressive definition language than basic RDFS. Using OWL, and considering the example above, ontovoc:authoredBy could be defined as an inverse of ontovoc:authorOf. ontovoc:authoredBy owl:inverseOf ontovoc:authorOf In this way, and given the little instance data above (the two triples that begin ex:12345), it is then possible to infer the following bit of knowledge: ex:abcde ontovoc:authorOf ex:12345 Now that the owl:inverseOf triple/declaration has been added to the definitions, it’s worth re- asking: Do the definitions represent an “ontology” or a “vocabulary?” A purist might answer “not an ontology,” but only because those statements have not been combined in a document, which itself has been given a URI and declared to be an owl:Ontology. That’s the actual OWL Class that says, “This is an OWL Ontology.” But let’s say those statements had been added to a document published at a URI and declared to be an owl:Ontology. Is it an ontology now? Perhaps in a strict sense the answer is “yes.” But in a practical sense few would view those four declarations, wrapped neatly in a document that has been given a URI and called an Ontology, as an “ontology.” It doesn’t quite rise to the occasion—“ontologies” almost always have a broader scope and employ more formal semantics—making its use a term of art, often, rather than a real technical distinction. Yet, based on the same narrow definition (a published document declaring itself to be an OWL:Ontology) combined with a far more extensive set of class and property definitions with defined relationships between them, it is possible to describe FOAF as an ontology.3 But it is widely known as, and understood as, a “vocabulary.” (There is also an experimental version of Schema as OWL.4) And that gets to the crux of the issue in many ways. Putting aside the technical distinction that can be argued to identify something as an “ontology” versus a “vocabulary,” there are non-technical semantics at work here—what was earlier described as a “term of art”—about when, how, and why something is deemed an “ontology” versus a “vocabulary.” The library community appears to think of their creations as “ontologies” and not “vocabularies,” even when the documentation tends to avoid the word “ontology.” For example, the opening sentence of the Bibframe and MADS/RDF documentation very clearly introduces each as a “vocabulary,” as does FRBR in RDF.5 On the surface they may be presented as “vocabularies,” which they are of course, but despite this prominent self-declaration they are not seen in the same light as FOAF or Schema but instead as something more exacting, which they also are. It is worth contemplating why they are viewed principally as “ontologies” and to examine whether this has been beneficial. Perhaps the ideas behind designating something a “vocabulary” are, in fact, more in line with the way libraries operate, whereas “ontologies” represent an ideal (and who doesn’t set their sights on the ideal?), striving toward which only exposes shortcomings and sows confusion. The answer to “why” is historical and probably derives from a combination of lofty thinking, traditional standards practices, and good ol’ misunderstanding. Traditional standards practices favor more formal approaches. Libraries’ decades-long experience with XML and XML Schema INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 SEEING THROUGH VOCABULARIES | FORD 3 contributed significantly to this mindset. XML Schema provides a way to describe the precise construction of an XML document and it can then be used to validate the XML document. XML Schema defines what elements and attributes are permitted in the XML document and frequently dictates their order. It can further constrain the values of an element or attribute to a select list of options. In many ways, XML Schema was the very expression of metadata quality control. Librarians swooned. With the right controls and technology in place, it was impossible to produce poor, variable metadata. In the case of semantic modelling, OWL is certainly a more formal approach. It’s founded in description logics whose expressions take the form of occult-like mathematics, at least as viewed by a librarian with a humanities background. OWL can be used to declare domains and ranges for properties. One can also designate a property as a Datatype Property, meaning it takes a value such as a string or a date, as its value, or an Object Property, which means it will reference another RDF resource as its object. But these declarations are actually more about inferencing—deriving information by applying the ontology against some instance data—and not about restrictions, constraints, or validation. To be clear, there are ways to apply restrictions in OWL—“wine can be either red or white”—but this is a form of advanced OWL modelling that is not well understood and not often implemented, and virtually never in ontologies designed by librarians. Conversely, indicating a domain for a property, for example, is easy, relatively straightforward, and seductive because it gives the appearance that the property can only be used with resources of a specific class. Consider: The domain of ontovoc:authoredBy is ontovoc:Book. That does not mean that the ontovoc:authoredBy can only be used with a ontovoc:Book resource. It means that whatever resource uses ontovoc:authoredBy must therefore be a ontovoc:Book. Defining that domain for that property is not restricting its use only to books; it allows one to derive the additional knowledge that the thing it is used with must be a book even if it doesn’t identify itself as one. This may seem like a subtle distinction and/or it may seem like tortured logic, but if it does it may suggest that one’s point of view, one’s mindset, favors constraints, restrictions, and validations. And that’s OK. That’s library training and conditioning, completely reinforced in our daily work. It’s what has been taught in library schools for decades and practiced by library professionals even longer. Names should be entered “last name, first name” and any middle initial, if known, included. The data in this field should only be a three-character language code from this approved list of language codes. These rules and the consistency resulting from these rules are what make library data so often very high quality. Google loves MARC records from our community for this very reason. Wishing to exert strong control at the definition level when creating a model or metadata scheme with an eye to data quality, it is a natural inclination for librarians to gravitate to a more formal means of defining a model, especially one that seems to promise constraints. So, despite these models self-describing at a high-level as vocabularies, the models themselves employ a considerable amount of OWL at the technical level, which becomes the focus of any users wishing to implement the model. Users comprehend these models as something more than a vocabulary and therefore view the model through this more complex lens. Unfortunately, because OWL is poorly understood (sometimes by creators and sometimes by users, and sometimes by both), this leads to various problems. On the one hand, creators and users believe there are technical restrictions or constraints where there are, in fact, none. When this happens, the “constraint” is INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 SEEING THROUGH VOCABULARIES | FORD 4 either identified as a problem (“Consider removing the range for this property”) or—and this is more damaging—the property (read: model/vocabulary/ontology) is avoided. Even when it is recognized that the “constraint” is not a real restriction (just a means to infer knowledge), forging ahead can generate new issues. When faced with a domain and range declaration, for example, forging ahead can result in inaccurate, imprecise, or simply undesirable inferences. Most of the currently open “issues” (about 50 at the time of writing) about Bibframe follow a basic pattern: 1) there is a declaration about this Property or this Class that makes it difficult to use because of how it has been defined with OWL; 2) we cannot really use it presently because it would cause potential inferencing issues; 3) consider altering the OWL definitions.6 Pursuing an (OWL) ontology, while formal and seemingly comforting because it feels a little like constraining the metadata schema, can result in confusion and a lack of adoption. Given that vocabularies and ontologies are developed and published to encourage users to describe their data in a way that fosters wide consumption by others, this is unfortunate to say the least. It is notable that SKOS, FOAF, Dublin Core, and Schema have very different scopes and potentially much wider user bases than the more library-specific ontologies (Bibframe, MADS/RDF, BIBO, etc.). There is something to be learned here: the smaller the domain, the more effective an ontology might be; the larger the universe, a more general approach may be better. It is further true that FOAF, Dublin Core, and Schema define specific domains and ranges for many of their properties, but they have strived for clarity and simplicity. The creators of Schema, for example, eschewed the formal semantics behind RDFS and OWL and redefine domain and range to better match their needs and (perhaps unexpectedly) most users’ automatic understanding.7 What is generally true is that each of the “vocabularies” approached the creation and defining of their models so as to minimize the use of formal semantics, and promoted this as a feature. In this way, they limited or removed altogether the actual or psychological barriers to adoption. Their offering was more accessible, less fussy. Bearing in mind the differences in scale and scope, they have been rewarded with a wider adopter base and passionate advocates. The decision to create a “vocabulary” or an “ontology” is a technical one and a political one, both of which must be in alignment. It’s a mindset and it is a statement. It is entirely possible to define the model at a technical level using OWL, making it by definition an ontology, but to have it be perceived, and used, as a vocabulary because it is flexible and not strictly defined. Likewise, it is not enough to call something a vocabulary, but in reality be a model burdened with formal semantics that is then expected to be adopted and used widely. If the objective is to fashion a (pseudo?) restrictive metadata set with rules that inform its use, and which is strongly bonded with a specific community, develop an “ontology,” but recognize that this may result in confusion and lack of uptake. If, however, the desire is to cultivate a metadata element set that is flexible, readily useable, and positioned to grow in the future because it employs fewer rules and formal semantics, create a “vocabulary.” That’s really what is being communicated when we encounter ontologies and vocabularies. Interestingly, the political difference between “vocabulary” and “ontology” appears, in fact, to be understood by librarians: library models self-identify as “vocabularies.” But once past those introductory remarks, the truth is exposed quickly in the widespread use of OWL, revealing beyond doubt that it is not a flexible, accommodating vocabulary but a strictly defined model. To dispense with the air quotes: as librarians we’re creating ontologies and calling them vocabularies. We really want to be creating vocabularies that are ontologies in name only. INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 SEEING THROUGH VOCABULARIES | FORD 5 ENDNOTES 1 “Bibframe Ontology,” Library of Congress, accessed May 21, 2020, http://id.loc.gov/ontologies/bibframe.html; “MADS/RDF (Metadata Authority Description Schema in RDF),” Library of Congress, accessed May 21, 2020, http://id.loc.gov/ontologies/madsrdf/v1.html; “Bibliographic Ontology Specification,” The Bibliographic Ontology, accessed May 21, 2020, http://bibliontology.com/; “PREMIS 3 Ontology,” Premis Editorial Committee, accessed May 21, 2020, http://id.loc.gov/ontologies/premis3.html; Ian Davis and Richard Newman, “Expression of Core FRBR Concepts in RDF,” accessed May 21, 2020, https://vocab.org/frbr/. 2 Alistair Miles and Sean Bechhofer, editors, “SKOS Simple Knowledge Organization System Reference,” W3C, accessed May 21, 2020, https://www.w3.org/TR/skos-reference/; Dan Brickley and Libby Miller, “FOAF Vocabulary Specification 0.99,” accessed May 21, 2020, http://xmlns.com/foaf/spec/; “DCMI Metadata expressed in RDF Schema Language,” Dublin Core™ Metadata Initiative, accessed May 21, 2020, https://www.dublincore.org/schemas/rdfs/; “Welcome to Schema.org,” Schema.org, accessed May 21, 2020, http://schema.org/. 3 “FOAF Ontology,” xmlns.com, accessed May 21, 2020, http://xmlns.com/foaf/spec/index.rdf. 4 See “OWL” at “Developers,” schema.org, accessed May 21, 2020, https://schema.org/docs/developers.html. 5 See “Bibframe Ontology” and “MADS/RDF (Metadata Authority Description Schema in RDF)” above. 6 “Issues,” Bibframe Ontology at GitHub, accessed 21 May 2020, https://github.com/lcnetdev/bibframe-ontology/issues. 7 R.V. Guha, Dan Brickley, and Steve Macbeth, “Schema.org: Evolution of Structured Data on the Web,” acmqueue 15, no. 9 (15 December 2015): 14, https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1. http://id.loc.gov/ontologies/bibframe.html http://id.loc.gov/ontologies/madsrdf/v1.html http://bibliontology.com/ http://id.loc.gov/ontologies/premis3.html https://vocab.org/frbr/ https://www.w3.org/TR/skos-reference/ http://xmlns.com/foaf/spec/ https://www.dublincore.org/schemas/rdfs/ http://schema.org/ http://xmlns.com/foaf/spec/index.rdf https://schema.org/docs/developers.html https://github.com/lcnetdev/bibframe-ontology/issues https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1 ENDNOTES
12383 ---- Facing What’s Next, Together LITA PRESIDENT’S MESSAGE Facing What’s Next, Together Emily Morton-Owens INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12383 Emily Morton-Owens (egmowens.lita@gmail.com) is LITA President 2019-20 and the Acting Associate University Librarian for Library Technology Services at the University of Pennsylvania Libraries. When I wrote my March editorial, I was optimistically picturing some of the changes that we are now seeing for LITA—while being scarcely able to imagine how the world and our profession would need to adapt quickly to the impacts on library services as a result of COVID-19. It is a momentous and exciting change for us to turn the page on LITA and become Core, yet this suddenly pales in comparison to the challenges we face as professionals and community members. Libraries’ rapid operational changes show how important the ingenuity and dedication of technology staff are to our libraries. Since states began to shut down, our listserv, lita-l, has hosted discussions on topics like how to provide person-to-person reference and computer assistance remotely, how to make computer labs safe for re-occupancy, how to create virtual reading lists to share with patrons, and how to support students with limited internet access. There has been an explosion in practical problem-solving (ILS experts reconfiguring our systems with new user account settings and due dates), ingenuity (repurposing 3D printers and conservation materials to make masks), and advocacy (for controlled digital lending). Sometimes the expense of library technologies feels heavy, but these tools have the ability to scale services in crucial ways—making them available to more people at the same time, available to people who can only take advantage after hours, available across distances. Technologists are focused on risk, resilience, and sustainability, which makes us adaptable when the ground rules change. Our websites communicate about our new service models and community resources; ILL systems regenerate around increased digital delivery; reservation systems for laptops now allocate the use of study seating. Our library technology tools bridge past practices, what we can do now, and what we’ll do next. One of our values as ALA members is sustainability. (We even chose this as the theme for LITA’s 2020 team of Emerging Leaders.) Sustainability isn’t about predicting the future and making firm plans for it; it’s about planning for an uncertain future, getting into a resilient mindset, and including the community in decision-making. Although the current crisis isn’t climate-related per se, this way of thinking is relevant to helping libraries serve their communities. We will need this agile mindset as we confront new financial realities. Our libraries and ALA itself are facing difficult budget challenges, layoffs, reorganizations, and fundamental conversations about the vitalness of the services we provide. My favorite example from my own library of a COVID-19 response is one where management, technical services, and IT innovated together. Our leadership negotiated an opportunity for us to gain access to digitized, copyrighted material from HathiTrust that corresponds to print materials currently locked away in our library building. Thanks to decades of careful effort by our technical services team, we had accurate data to match our print records with records for the digital versions. Our IT team had processes for loading the new links into our catalog almost mailto:egmowens.lita@gmail.com INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 FACING WHAT’S NEXT, TOGETHER | MORTON-OWENS 2 instantaneously. The result was a swift and massive bolstering of our digital access precisely when our users needed it most. This collaboration perfectly illustrates how natural our merger with ALCTS and LLAMA is. As threats to our profession and the ways we’ve done things in the past gather around us, I am heartened by the strengths and opportunities of Core. It is energizing to be surrounded by the talent of our three organizations working together. I hope more of our members experience that over the summer and fall, as we convene working groups and hold events together, including a unique social hour at ALA Virtual and an online fall Forum. I close out my year serving as the penultimate LITA president in a world with more sadness and uncertainty than we could have foreseen. We are facing new expectations and new pressures, especially financial ones. As professionals and community members, we are animated by our sense of purpose. While LITA has been transformed by our vote to continue as Core, the support and inspiration we provide each other in our association will carry on.
12391 ---- LibraryVPN: A New Tool to Protect Patron Privacy PUBLIC LIBRARIES LEADING THE WAY LibraryVPN A New Tool to Protect Patron Privacy Chuck McAndrew INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12391 Chuck McAndrew (chuck.mcandrew@leblibrary.com) is Information Technology Librarian, Lebanon (NH) Public Libraries. Due to increased public awareness of online surveillance, a rise in massive data breaches, and spikes in identity theft, there is a high demand for privacy enhancing services. VPN (Virtual Private Network) services are a proven way to protect online security and privacy. VPN’s effectiveness and ease of use have led to a boom in VPN service providers globally. VPNs protect privacy and security by offering an encrypted tunnel from the user’s device to the VPN provider. VPNs ensure that no one who is on the same network as the user can learn anything about their traffic except that they are connecting to a VPN. This prevents surveillance of data from any source, including commercial snooping such as your ISP trying to monetize your browsing habits by selling your data, malicious snooping such as a fake wifi hotspot in an airport hoping to steal your data, or government-level surveillance that can target political activists and reporters in repressive countries. Some people might ask why we need a VPN as HTTPS becomes more ubiquitous and provides end to end encryption for your web traffic. HTTPS will encrypt the content that goes over the network, but metadata such as the site you are connecting to, how long you are there, and where you go next are all unprotected. Additionally, some very important network protocols, such as DNS, are unencrypted and anyone can see them. A VPN eliminates all of those issues. However, there are two major problems with current VPN offerings. First, all reliable VPN solutions require a paid subscription. This puts them out of reach of economically vulnerable populations who often have no access to the internet in their homes. In order to access online services, they may rely on public internet connections such as those provided by restaurants, coffee shops, and libraries. Using publicly accessible networks without the security benefits of a VPN puts people’s security and privacy at great risk. This risk could be eliminated by providing free access to a high-quality VPN service. The second problem is that using a VPN requires people to place their trust in whatever VPN company they use. Some (especially free solutions) have proven not to be worthy of that trust by containing malware or leaking and even outright selling customer data. Companies that abuse customer data are taking advantage of vulnerable populations who are unable to afford more expensive solutions or who do not have the knowledge to protect themselves. Together, these two problems create a situation where having security and privacy is only available to those who can afford it and have the knowledge to protect themselves. Libraries are ideally positioned to help with this situation. Libraries work to provide privacy and security to people every day. This can mean teaching classes, making privacy resources available, and even advocating for privacy- friendly laws. mailto:chuck.mcandrew@leblibrary.com https://www.forbes.com/sites/forbestechcouncil/2018/07/10/the-future-of-the-vpn-market/#5b08fd8e2e4d https://research.csiro.au/ng/wp-content/uploads/sites/106/2016/08/paper-1.pdf INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 LIBRARYVPN | MCANDREW 2 Libraries are also located in almost every community in the United States and enjoy a high level of trust from the public. Librarians can be thought of as being a physical VPN. People who come into libraries know that what they read and information that they seek out will be protected by the library. In fact, libraries have helped to get laws protecting the library records of patrons in all 50 states of the USA. People know that when a library offers a service to their community it isn’t because they want to sell their information or show them advertisements. With libraries, our patrons are not the product. Libraries also already provide many online services to all members of their community, regardless of financial circumstances. Examples include access to online databases, language learning software, and online access to periodicals such as the New York Times or Consumer Reports. Many of these services would cost too much for individual patrons to access individually. By pooling their resources, communities are able to make more services available to all of their citizens. To help address the above issues, the Lebanon Public Libraries, in partnership with the Westchester (New York) Library System, the LEAP Encryption Access Project (https://leap.se/), and TJ Lamanna (Emerging Technology Librarian from Cherry Hill Public Library and Library Freedom Institute Graduate) started the LibraryVPN project. This project will allow libraries to offer a VPN to their patrons. Patrons will be able to download the LibraryVPN application on a device of their choosing and connect to their library’s VPN server from wherever they are. LibraryVPN was first conceived a number of years ago, but the real start of the project was when it received an IMLS National Leadership Grant (LG-36-19-0071-19) in 2019. This grant was to develop integrations between LEAP’s existing VPN solution and integrated library systems using SIP2 which will allow library patrons to sign in to LibraryVPN using their library card. This grant also included development of a Windows client (there was already a Mac and Linux client) and alpha testing at the Lebanon Public Libraries and Westchester Library System. We are currently working on moving into the testing phase of the software, and planning phase two of this project. Phase two of LibraryVPN will involve expanding our testing to up to 12 libraries and conducting end-user testing with patrons and library staff. We have submitted an application for IMLS funding for phase two and are actively looking for libraries that are excited about protecting patron privacy and would like to help us beta test this software. If you work for a library that would be interested in participating, you can reach us via email at libraryvpn@riseup.net or @libraryvpn on twitter. If you would like to help out with this project in another way, we would love to have more help. Please reach out. We currently are thinking about three deployment models for libraries in phase two. First would be an on-premises deployment. This would be for larger library systems with their own servers and IT staff. LibraryVPN is free and open source software and can be deployed by anyone. Since it uses SIP2 to connect to your ILS, it should work with any ILS that supports the SIP2 protocol. This deployment model has the advantage of not requiring any hosting fees but does require the library system to have staff that can deploy and manage public facing services. Drawbacks to this approach would include higher bandwidth use and dealing with abuse complaints. Phase 2 testing should give us better data about how much of an issue this will be, but https://leap.se/ mailto:libraryvpn@riseup.net INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 LIBRARYVPN | MCANDREW 3 our experience hosting a Tor exit node at the Lebanon Public Libraries suggest that it won’t be too bad to deal with. Our second deployment model would be cloud hosting. If a library has IT staff who can deploy services to the cloud, they could host their own LibraryVPN service without needing their own hardware. However, when deploying to the cloud, there will be ongoing costs for running the servers and bandwidth used. Figuring out how much bandwidth an average user will consume is part of the data we are hoping to get from our phase 2 testing so we can offer guidelines to libraries who choose to deploy their own LibraryVPN service. Finally, we are looking at a hosted version of LibraryVPN. We anticipate that smaller systems that do not have dedicated servers or IT staff will be interested in this option. In this case, there would be ongoing hosting and support costs, but managing the service would not be any more complicated than subscribing to any other service the library hosts for their patrons. LibraryVPN is a new project that is pushing library services outside of the library to where the library is. We want to make sure that all of our patrons are protected, not just those with the financial ability and technical know-how to get their own VPN service. As librarians, we understand that privacy and intellectual freedom are joined, and we want to maximize both. As the American Library Association’s Code of Ethics says, “We protect each library user's right to privacy and confidentiality.” http://www.ala.org/tools/ethics
12405 ---- Letter from the Editor: A Blank Page LETTER FROM THE EDITOR A Blank Page Kenneth J. Varnum INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12405 Nothing is as daunting as a blank page, particularly now. As I sat down to write this issue’s letter, I was struck by how much fundamental uncertainty is in our lives, so much trauma. A blank page can emphasize our concerns that the old familiar should return at all, or that a new, better, normal will emerge. At the same time, a blank page can be liberating at a time when so much of our social, professional, and personal lives needs to be reconceptualized and reactivated in new, healthier , more respectful and inclusive ways. We are collectively faced with two important societal ailments. The first is the literal disease of the COVID-19 pandemic that has been with us for only months. The other is the centuries-long festering disease of racial injustice, discrimination, and inequality that typifies (particularly, but not uniquely) American society. While some of us may be in better positions to help heal one or the other of these two ailments, we can all do something in both, as different as they are. Lend emotional support to those in need of it, take part in rallies if your personal health and circumstances allow, and advocate for change to government officials at all levels from local to national. Learn about the issues and explore ways you can make a difference on either or both fronts. I hope I am not being foolish or naive when I say I believe the blank page before us as a society will be liberating: an opportunity to shift ourselves toward a better, more equitable, more just path. * * * * * * To rephrase Humphrey Bogart’s Rick Blaine in Casablanca, “it doesn’t take much to see that the problems of three little people library association divisions don’t amount to a hill of beans in this crazy world.” But despite the small global impact of our collective decision, I am glad our ALCTS, LLAMA, and LITA colleagues chose a united future as Core: Leadership, Infrastructure, Futures. Watch for more information about what the merged division means for our three divisions and this journal in the months to come. Sincerely, Kenneth J. Varnum, Editor varnum@umich.edu June 2020 https://core.ala.org/ mailto:varnum@umich.edu
12457 ---- The Role of the Library in the Digital Economy ARTICLE The Role of the Library in the Digital Economy Serhii Zharinov INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.12457 Serhii Zharinov (serhii.zharinov@gmail.com) is Researcher, State Scientific and Technical Library of Ukraine. © 2020. ABSTRACT The gradual transition to a digital economy requires all business entities to adapt to the new environmental conditions that are taking place through their digital transformation. These tasks are especially relevant for scientific libraries, as digital technologies make changes in the main subject field of their activities, the processes of creating, storing, and information disseminating. In order to find directions for the transformation of scientific libraries and determine their role in the digital economy, a study of the features of digital transformation and the experience of the digital transformation of foreign libraries was conducted. Management of research data, which is implemented through the creation of Current Research Information Systems (CRIS) was found to be one of the most promising areas of the digital transformation of libraries. The problem area of this direction and ways of engaging libraries in it have been also analyzed in the work. INTRODUCTION The transition to a digital economy contributes to the even greater penetration of digital technologies into our lives and the emergence of new conditions of competition and trends in organizations’ development. Big Data, machine learning, and artificial intelligence are becoming common tools implemented by the pioneers of digital transformation in their activities.1 Significant changes in the main functions of libraries, storage and dissemination of information caused by the development of digital technologies, affect the operational activities of libraries, user and partners’ requests to the library, and ways to meet them. In the process of adapting to these changes, the role of libraries in the digital economy is changing. This study is designed to find current areas of library development and to determine the role of the library in the digital economy. Achieving this goal requires study of the “digital economy” concept and the peculiarities of the digital transformation of organizations in order to better understand the role of the library in it; research on the development of libraries and determine what best fits the new role of the library in the digital economy; identification of obstacles to the development of this area and ways to engage libraries in it. THE CONCEPT OF THE “DIGITAL ECONOMY” The transition to an information society and digital economy will gradually change all industries, and all companies must change accordingly.2 Taking advantage of the digital economy is the main driving force of innovation, competitiveness, and economic development of the country.3 The transition to a digital economy is not instant but occurs over many years. The topic emerged starting at the end of the twentieth century, but in recent years has experienced rapid growth. In the Web of Science (WoS) citation database, publications with this term in the title began to be published in 1996 (figure 1). mailto:serhii.zharinov@gmail.com INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 2 Figure 1. The number of publications in the WoS citation database for the query “digital economy.” One of the first books devoted entirely to the study of the digital economy concept is the work of Don Tapscott, published in 1996. In this book, the author understands the digital economy as an economy in which the use of digital computing technologies in economic activity becomes its dominant component.4 Thomas Mesenbourg, an American statistician and economist, identified in 2000 the three main components of the digital economy: e-business, e-commerce, and e-business infrastructure.5 A number of works on the development of indicators to assess the state of the digital economy, in particular, the work of Philip Barbet and Nathalie Coutinet, are based on the analysis of these components.6 Alnoor Bhimani, in his 2003 paper, “Digitization and Accounting Change,” defined the digital economy as “the digital interrelationships and dependencies between emerging communication and information technologies, data transfers along predefined channels and emerging platforms, and related contingencies within and across institutional and organizational entities.”7 Bo Carlsson’s 2004 article described the digital economy as a dynamic state of the economy characterized by the constant emergence of new activities based on the use of the Internet and new forms of communication between different authors of ideas, whose communication allows them to generate new activities.8 In 2009, John Hand gave the meaning of the digital economy as the new design or use of information and communication technologies that help transform the lives of people, society, or business.9 0 100 200 300 400 500 600 700 1995 2000 2005 2010 2015 2020 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 3 Ciocoiu Carmen Nadia, in her 2011 article, explained the digital economy as a state of the economy where knowledge and networking begin to play a more important role than capital in a post- industrial society due to technology.10 In a 2014 article, Kit Lesya defined the digital economy as an element of the network economy, characterized by the transformation of all spheres of the economy by transferring information resources and knowledge to a computer platform for further use.11 Ukrainian scientists Mykhailo Voinarenko and Larysa Skorobohata, in a study of network tools in 2015, gave the following definition of the digital economy: “The digital economy, unlike the Internet economy, assumes that all economic processes (except for the production of goods) take place independently of the real world. Goods and services do not have a physical medium but are ‘electronic.’”12 Yurii Pivovarov, director of the Ukrainian Association for Innovation Development (UAID), gives the following definition: “Digital economy is any activity related to information technology. And in this case, it is important to separate the terms: digital economy and IT sphere. After all, it is not about the development of IT companies, but about the consumption of services or goods they provide—online commerce, e-government, etc.—using digital information technology.”13 Taking into account the above, in this study, the digital economy is defined as digital infrastructure encompasses all business entities and their activities. The transition to the digital economy is the process of creating conditions for the digital transformation of organizations, the creation of digital infrastructure, and the process of gradual involvement of various economic entities and certain sectors of the economy in the digital infrastructure. One of the first practical and political manifestations of the transition to the digital economy was the European Commission’s Index of Digital Economy and Society (DESI), first published in 2014. The main components of the index are communications, human capital, Internet use, digital integration, and digital public services. Among European countries in 2019, there is significant progress in the digitalization of business and in the interaction of society with the state.14 For Ukraine, the first step towards the digital economy was the Digital Economy and Development Concept of Ukraine, which defines the understanding of the digital economy, the direction and principles of transition to it.15 Thus, for active representatives of the public sector, this concept is a signal that the development of structures and organizations should be based not on improving operational efficiency, but on transformation in accordance with the requirements of Industry 4.0. Confirmation of the seriousness of the Ukrainian government’s intentions in this direction is the creation of the Ministry of Digital Transformation in 2019 and the digitization of the latest public services through online services.16 One of the priority challenges which needs to be solved at the stage of transition to the digital economy is the development of skills in working with digital technologies in the entire population . This is relevant not only for Ukraine, but also for the European Union. In Europe, a third of the active workforce does not have basic skills in working with digital technologies; in Ukraine, 15.1 percent of Ukrainians do not have digital skills, and the share of the working population with below-average digital skills is 37.9 percent.17 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 4 Part of the solution to this challenge in Ukraine is entrusted to the “Digital Education” project, implemented by the Ministry of Digital Transformation (osvita.diia.gov.ua), which through the mini-series created by him for different target audiences should form digital literacy in the population of Ukraine. FEATURES OF DIGITAL TRANSFORMATION Developed digital skills in the population make the digital transformation of organizations not just a competitive advantage, but a prerequisite for their survival. Thus, the larger the target audience is accustomed to the benefits of the digital economy, the more actively the organization is to adapt to new requirements and customer needs, to the new competitive environment. Digital transformation of the organization is a complex process that is not limited to the implementation of software in the company’s activities or automation of certain components of production. It includes changes to all elements of the company, including methods of manufacturing and customer service, the organization’s strategy and business model, approaches , and management methods. According to a study by McKinsey, the integration of new technologies into a company's operations can reduce profits in 45 percent of cases.18 Therefore, it is extremely important to have a comprehensive approach to digital transformation, understanding the changes being implemented, choosing the method of their implementation, and gradually involving all structural units and business processes in the process of transformation. The Boston Consulting Group study identified six factors necessary for the effective use of the benefits of modern technologies:19 • connectivity of analytical data; • integration of technologies and automation; • analysis of results and application of conclusions; • strategic partnership; • competent specialists in all departments; and • flexible structure and culture. McKinsey consultants draw attention to the low percentage of successful digital transformation practices and based on the successful experience of 83 companies form five categories of recommendations that can contribute to successful digitalization:20 • involvement of leaders experienced in digitalization; • development of digital staff skills; • creating conditions for the use of digital skills by staff; • digitization of tools and working procedures of the company; and • establishing digital communication and ensuring the availability of information. Experts at the Institute of Digital Transformation identify four main stages of digital transformation in the company:21 1. Research, analysis and understanding of customer experience. 2. Involvement of the team in the process of digital transformation and implementation of corporate culture, which contributes to this process. 3. Building an effective operating model based on modern systems. 4. Transformation of the business model of the organization. https://osvita.diia.gov.ua/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 5 The “Integrated Model of Digital Transformation” study identifies one of the key factors of successful digital transformation, focusing on priority digital projects, the development and implementation of which should be engaged in specific organizational teams. The authors identify three main functional activities for digital transformation teams, the implementation of which provides a gradual comprehensive renewal of the company, namely: the creation and implementation of digital strategy, digital activity management, digitization of operational activities.22 In their study, Ukrainian scientists Natalia Kraus, Oleksandr Holoborodko, and Kateryna Kraus determine that the general pattern for all digital economy projects is their focus on a specific consumer and comprehensive use of available information about the latter and the conditions of project effectiveness.23 Initially, the project is pre-tested on a small scale, and only after obtaining satisfactory results from the testing of new principles of activity in a narrow target audience is the project scaled to a wider range of potential users. All this reduces the risks associated with digital transformation. Eliminating unnecessary changes and false hypotheses on a small scale allows to avoid overspending at the stage of a comprehensive transformation of the entire enterprise. Therefore, the process of effective digital transformation should begin with the involvement of experienced leaders in the field of digital transformation, analysis of the weaknesses of the organization, and building of a plan for its comprehensive transformation, which is divided into individual projects implemented by individual qualified teams with a gradual increase in the volume of these projects, while confirming their effectiveness on a small scale. The process of digital transformation should be accompanied by constant training of employees in digital skills. The goal of digital transformation is to build an efficient, high-profile company that can quickly adapt to new environmental conditions, which is achieved through the introduction of digital technologies and new methods and tools of organization management. DIRECTIONS OF LIBRARY DEVELOPMENT IN THE DIGITAL ECONOMY Based on the study of the digital economy concept and the peculiarities of digital transformation, the review of library development in the digital economy was conducted to find the library’s place in digital infrastructure and potential projects that can be implemented on a separate library as part of its comprehensive transformation plan. The main task is to determine the new role of the library in the digital economy and the areas that best meet it. The search for directions in the development of the library in response to the spread of digital technology began at the end of the last century. One of the first concepts to reflect the impact of the internet on the library sector is the concept of the digital library, published in 1999.24 In 2006, the concept of “library 2.0” emerged, which is based on the use of WEB 2.0 technologies, dynamic sites, users become data authors, open-source software, API interfaces, data added to one database is immediately fed to partner databases.25 The spread of the use of social networks, mobile technologies, and their successful use in library practice has led to the formation of the concept of “library 3.0.”26 The development of Open Source, Cloud Service, Big Data, Augmented Reality, Context-Aware, and other technologies has influenced library activities, which is reflected in the “library 4.0.”27 Researchers, scholars, and the professional community continued to develop the concepts of the modern library, drawing on the experience of implementing changes in library activities and taking into account the development of other areas, and in 2020 articles began to appear which described the concept of “library 5.0,” based on a personalized approach to students, INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 6 support of each student during the whole period of study, development of skills necessary for learning and a set of other supporting actions integrated into the educational process.28 In determining the current role of the library in the digital economy, it is necessary to pay attention to a study by Denis Solovianenko, who in identifies research and educational infrastructure as one of the key elements of scientific libraries of the twenty-first century.29 Olga Stepanenko considers libraries as part of the information and communication infrastructure, the development of which is one of the main tasks of the transformation of the socioeconomic environment in accordance with the needs of the digital economy, which ensures high efficiency of stakeholders the pace of digitalization of the state economy, which occurs through the development of its constituent elements.30 The importance of traditional library services replacing digital infrastructure, based on the example of the Moravian Library, is proved in a study by Michal Indrak and Lenka Pokorna, published in April 2020.31 Projects that contribute to the library’s adaptation to the conditions of the digital economy, implemented in the environment of public libraries, include: digitization of library collections (including historical heritage) and the creation of a database of full-text documents; providing free access to the Internet via library computers and Wi-Fi; organization of online customer service, development of services that do not require a physical presence in the library; organization of events for the development of digital skills of users, work with information.32 Under such conditions, the role of the librarian as a specialist in the field of information changes from being a custodian to being an intermediary, a distributor.33 One of the main objectives of library activity in the digital economy becomes overcoming a digital divide, dissemination of knowledge about modern technologies and innovations, the assistance of their use by the community, development of digital skills in all users of the library.34 An example of the digital public library is the Digital North Library project in Canada, which resulted in the creation of the Inuvialuit Digital Library (https://inuvialuitdigitallibrary.ca). The project lasted four years, bringing together researchers from different universities and the community in the region, who together digitized cultural heritage documents and created metadata. The library now has more than 5,200 digital resources collected in 49 catalogues. The implementation of this project provides access to library services and information to a significant number of people living in remote areas of Northern Canada and unable to visit libraries (https://sites.google.com/ualberta.ca/dln/home?authuser=0, https://inuvialuitdigitallibrary.ca).35 Other representatives of modern digital libraries, one of the main tasks of which is the preservation of cultural heritage and the spread of national culture, are the British Library (https://www.bl.uk), the Hispanic Digital Library—Biblioteca Nacional de España (http://www.bne.es), Gallica Digital Library in France (https://gallica.bnf.fr), the German Digital Library—Deutsche Digitale Bibliothek (https://www.deutsche-digitale-bibliothek.de), and the European Library (https://www.europeana.eu). Another direction was the development of analytical skills in information retrieval. Academic libraries, operating with their competencies in information retrieval and information technology, which refined the results of the analysis were able to better identify trends in academia and expand cooperation with teachers to update their curricula.36 Libraries become active participants https://inuvialuitdigitallibrary.ca/ https://sites.google.com/ualberta.ca/dln/home?authuser=0 https://inuvialuitdigitallibrary.ca/ https://www.bl.uk/ http://www.bne.es/ https://gallica.bnf.fr/ https://www.deutsche-digitale-bibliothek.de/ https://www.europeana.eu/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 7 in the process of teaching, learning, and assessment of acquired knowledge in educational institutions. T. O. Kolesnikova, in her research of models of library development, substantiates the expediency of creating information intelligence centers for the implementation of the latest scientific advances in training and production processes, the involvement of libraries in the activities of higher educational establishments in the educational process, and the creation of centralized repositories as directions of development for university libraries of Ukraine.37 One of the advantages of the development and dissemination of digital technologies is the possibility of forming individual curricula for students. Involvement of university libraries in this area is one of the new areas of their activities in the digital economy.38 One of the important areas of operation for departmental and scientific-technical libraries that contribute to increasing the innovative potential of the country is activity in the area of intellectual property. Consulting services in the field of intellectual property, information support for scientists, creation of electronic patent information databases in the public domain , and other related services are important components of libraries in many countries. Consulting services in the field of intellectual property, information support for scientists, creation of electronic patent- information databases in the public domain and other related services are important components of libraries in many countries.39 Another important component of libraries’ transformation is the deepening of their role in scientific communication; expanding the boundaries of the use of information technology in order to integrate scientific information into a single network; creation and management of information technology infrastructure of science.40 The presence of libraries on social networks has become an important component of their digital transformation. On the one hand, libraries have thus created another source of information dissemination and expanded the number of service delivery channels, for the implementation of which they have developed online training videos and interactive help services.41 On the other hand, social networks have become a marketing tool to engage the audience in the digital fund of the library and its online services. An additional important component of the presence of libraries in social networks was the establishment of contacts and exchange of ideas with other professional organizations, which contributed to the further expansion of the network of library partners.42 Another area of activity that libraries take on in the digital economy is the management of research data, which is confirmed by the significant number of publications on this topic in professional scientific and research journals for 2017–18.43 Joining this area allows libraries to become part of the scientific digital information and communication infrastructure, the creation of which is one of the main tasks of digital transformation on the way to the digital economy.44 The development of this area contributes to the digitalization of scientific and information sphere, systematization and structuring of all scientific research data has a positive effect on the effectiveness of research, the level of scientific novelty of the results of intellectual activity. The Ukrainian Institute of the Future with the Digital Agency of Ukraine consider digital transformation as the integration of modern digital technologies into all spheres of business. The introduction of modern technologies (Artificial Intelligence, Blockchain, Koboty, Digital Twins, IIoT Platforms and others) in the production process will lead to the transition to Industry 4.0. According to their forecasts, the key competence in Industry 4.0 should be data processing and INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 8 analytics.45 Research information is an integral part of this competence, so the development of this area is one of the most promising for the library in the digital economy. The tools used in the management of research data are called Current Research Information Systems, abbreviated as CRIS. In Ukraine, there is no such system connected to the international community. 46 The change of the library’s role from a repository to its manager, the alignment of the functions and tasks of a CRIS with the key requirements of the digital economy, and the advantages of such systems, together with the fact that they are still not used in Ukraine, make this area extremely relevant for research and a promising area of work of scientific libraries, so we’ll consider it more thoroughly. PROBLEMS IN RESEARCH DATA MANAGEMENT The global experience of research information management shows several problems in the process of research data management. Some of them are related to the processes of workflow organization, control, and reporting. This is due to the use of several poorly coordinated systems to organize the work of scientists. Data sets from different systems without metadata are very difficult to combine into a single system, and it is almost impossible to automate the process. All this is manifested in the lack of information security of the decision-making process in the field of science, both at the state level and at the level of individual structures. This situation can lead to wrong management decisions and can lead to overspending on similar, duplicate projects; increasing the cost of the process of recruiting and finding scientists with relevant experience for research, finding the equipment needed for research. CRIS, which began to appear in Europe in the 1990s, are designed to overcome these shortcomings and promote the effective organization of scientific work. Such systems are now widespread throughout the world, with a total of about five hundred, which are mainly concentrated in Europe and India. However, there is currently no research information management system in Ukraine that meets international standards and integrates with international scientific databases. This omission slows down Ukraine’s integration into the international scientific community. The solution to this problem may be the creation of the National Electronic Scientific Information System URIS (Ukrainian Research Information System).47 The development of this system is an initiative of the Ministry of Education and Science of Ukraine. It is based on combining data from Ukrainian scientific institutions with data from CrossRef and other organizations, as well as ensuring integration with other international CRIS systems through the use of the CERIF standard. Future developers of the system face a number of challenges, both specific and already studied by foreign scientists. A significant number of studies in this area are designed to over come the problem of lack of access to research data, as well as to solve problems of data standardization and openness. In the global experience, the directions of collection processes management and development of structured data sets, their distribution on a commercial basis, and also ways of receiving the advantage of providing them in open access are investigated. The mechanisms of financing these processes are studied, in particular, the effective ways of attracting patronage funds are analyzed. The possibilities of licensing the received data sets and their distribution, approaches and tools that can be the most effective for the library are determined. In particular, Alice Wise describes INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 9 the experience of settling some legal aspects by clarifying the use of the site in the license agreement, which covers the conditions of access to information and search in it, while maintaining a certain level of anonymity.48 The problem of data consistency is related to the lack of uniform standards for information retention, which would relate to the format of the data, the metadata itself, the methods of their generation and use. Thus, the use of different standards and formats in repositories and archives leads to problems with data consistency in researchers, which, in turn, affects the quality of service delivery and makes it impossible to use multiple data sets.49 Another important problem for the dissemination of research data is the lack of tools, components in libraries, and repositories of higher educational establishments and scientific institutions. It is worth to develop the infrastructure so that at the end of the projects, in addition to the research results, the scientists publish the research data they used and generated. This approach will be convenient both for authors (in case they need to reuse the research data) and for other scientists (because they will have access to data that can be used in their own research).50 The development of the necessary tools is quite relevant, especially because researcher-practitioners are in favor of sharing the data they create with other researchers and the licensed use of other people’s datasets in conducting their own research, according to international surveys.51 Another reason for the low prevalence of research data is that datasets have less of an impact on a researcher’s reputation and rating than publications.52 This is partly due to the lack of citation tracking infrastructure in datasets, in contrast to the publication of research results, and the lack of standards for storing and publishing data. Prestigious scientific journals have been struggling with this problem for several years. For example, the American Economic Review requires authors whose articles contain empirical work, modelling, or experimental work to provide information about research data in a volume enough for replication.53 Nature and Science require authors to preserve research data and provide them at the request of the editors of the journals.54 One of the reasons for the underdeveloped infrastructure in research data management is the weak policy of disseminating free access to this data, as a result of which even a small part of usable scientific data remains closed by license agreements and cannot be used by other scientists.55 Open science initiatives related to publications have been operating in the scientific field for a long time, but their dissemination to research data remains insufficient. The development of the URIS system will provide management of scientific information, will solve problems highlighted in the above scientific works of researchers; will promote the efficient use of funds, will simplify the process of finding data for conducting research; will discipline research , and therefore will have a positive impact on the entire economy of Ukraine. LIBRARY AND RESEARCH INFORMATION MANAGEMENT Library involvement in the development process for scientific information management systems will be an important future direction of their work. Such systems, which could include all the necessary information about scientific research, will contribute to the renewal and development of the library sphere of Ukraine, will promote the transition of the state to a digital economy. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 10 The creation of the URIS system is designed to provide access to research data generated by both Ukrainian and foreign scientists. Such a system can ensure the development of cooperation in the field of research, intensification of knowledge exchange, and interaction through the open exchange of scientific data and integration of Ukrainian scientific infrastructure into the world scientific and information space. According to surveys conducted by the international organizations EuroCRIS and OCLC, of the 172 respondents working in the field of research information management, 83 percent said that libraries play an important role in the development of open science, copyright, and the deposit of research results. The share of libraries that play a major role in this direction was 90 percent. Almost 68 percent of respondents noted the significant contribution of libraries in filling in the metadata needed to correctly identify the work of researchers in various databases; 60 percent noted the important role of libraries in verifying the correctness of metadata filling by researchers, and almost 49 percent of respondents assess the role of libraries as the main one in the management of research data (figure 4). Figure 4. The proportion of organizations among 172 users of CRIS-systems that assess the role of libraries in the management of research information as basic or supporting.56 At the same time, the activity of libraries in the direction of assistance in information management of scientific research can take various forms, which should be adopted by scientific libraries of Ukraine; some of these forms will be useful to public libraries that can become science ambassadors in their communities. Based on the experience of foreign libraries, we have identified areas of activity in which the library can join the management of research information. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Financial support for RIM Project management Maintaining or servicing technical operations Impact assessment and reporting Strategic development, management and planning Creating internal reports for departments System configuration Outreach and communication Initiating RIM adaption Research data management Metadata validation workflows Metadata entry Training and support Open access, copyright and deposit INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 11 One of the main directions for libraries that cooperate with CRIS users or are themselves the organizers of such systems is the introduction and support of open science. Historically, libraries support open science because they provide access to scientific papers, but they can further expand their activities. Using open data resources and promoting them among the scientific community, involving scientific users in disseminating their own research results on the principles of open science, supporting users in disseminating their publications, creating conditions for increasing the citation of scientific papers, tracking information about user publications, creating and support of public profiles of scientists in scientific and professional resources and scientific social networks—all this will help to intensify researchers in engaging in open science and take advantages of this area. The analysis of the world experience shows that in the activity of scientific libraries there is a significant intensification of support for the strategic goals of the structures that finance their activities and to which they are subordinated. Libraries are moving away from the usual customer service and expanding their activities through the use of their own assets and the introduction of new modern tools. Such libraries try to promote the development of parent structures, increase modern competencies to meet the needs and goals of these institutions better. By introducing and implementing various tools for the development of management, libraries synchronize their strategy with the strategy of the parent structure to achieve a synergistic effect. The next important direction of library development is their socialization. Wanting to get rid of the antiquated understanding of the word library, many of them conduct campaigns aimed at changing the image of the library in the imagination of users, communities, and society. An important component of this system step is to build relationships with the target audience, creating user communities around the library, which are not only its users but also supporters, friends, and promoters. Building relationships with members of the scientific community allows libraries to reduce resistance to change as a result of the introduction of scientific information management systems; to influence users positively so that they introduce new tools into their usual activities, receive benefits, and become an active part of the scientific space structuring process. Recently, work with metadata has undergone some changes. The need for identification and structuring of data in the world scientific space leads to the fact that they are already filled not only by libraries but also by other organizations that produce, issued, publish scientific results and scientific literature. Scientists are beginning to make more active use of modern standards in the field of information in order to promote their own work. Libraries, in turn, take on the role of consultant or contractor with many years of experience working with metadata and sufficient knowledge in this area. On the other hand, filling in metadata by users frees up the time of librarians and creates conditions for them to perform other functions, such as information management, creation of automated data collection and management systems integrated with scientific databases—both Ukrainian and international. Another area of research information management is the direct management of this process. Thus, CRIS are developed and implemented with the contribution of scientific libraries in different countries of the world. This allows libraries to combine disparate data obtained from different sources, compile scientific reports, evaluate the effectiveness of scientific activities of the institution, create profiles of scientific institutions and scientists, develop research network s, etc. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 12 Scientists and students can find the results of scientific research, and look for partners and sources of funding for research. Research managers have access to up-to-date scientific information, which allows to more accurately assess the productivity and influence of individual scientists, research groups and institutions. Business representatives get access to up-to-date information on promising scientific developments, and the public—a way to control research conducting effectively. CONCLUSIONS Ukraine is on the path to a digital economy, characterized by the penetration of new technologies in all areas of human activity, simplification of access to information, goods and services, blurring the geographical boundaries of companies, increasing the share of automated and robotic production units, strengthening the role of creation and use databases. These changes affect all sectors of the economy, and all organizations, without exception, need to adapt accordingly. Rapid response to relevant changes helps to increase competitiveness both at the level of individual organizations and at the level of the state economy. Adaptation to the conditions of the digital economy occurs through digital transformation—a complex process that requires a review of all business processes of the organization and radically changes its business model. The digital transformation of the organization takes place through the involvement of management, which is competent in digitization, updating management methods, developing digital skills, establishing efficient production and services, implementing digital to ols and building digital communication, implementing individual development projects, and adapting to new user needs. The digital transformation of the economy occurs through the transformation of its individual sectors, creating conditions for the transformation of their representatives. One of the first steps in the process of transition to the digital economy is the establishment of digital information and communication infrastructure. Libraries are representatives of the information sphere, which were the main operators of information in the analogue era. Significant changes in the subject area of their activities require the search for a new role for libraries. Modern projects and directions of library development are integral elements of transformation to the conditions of the digital economy. The result of completing this complex implementation will allow libraries to update their management methods, the range of services, and the channels of their provision; change fixed assets through their digitization, structuring the data and creating metadata; affect approaches to communication with users and cooperation with both domestic and international partners; change the functions and positioning of the library; and will enable them become effective information operator-managers. In the digital economy, the role of the library is changing from passively collecting and storing information to actively managing it. One of the areas of development that most comprehensively meets this role is the management of research data, which is implemented through the creation of CRIS systems. Thus, the main asset of libraries is a digital, structured database, which is automatically and regularly updated, the main focus of which is to support the decision-making process. The library becomes an assistant in conducting research, finding funding, partners, fixed assets and information; a partner in the strategic management of both scientific organizations and the state at the level of committees and ministries. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 13 The development of this area in Ukraine requires solving a number of technical, administrative, and managerial questions that are relevant not only in Ukraine, but also around the world. In particular, libraries need to address the issue of data integration and consistency, its accessibility and openness, copyright, and personal data issues. Solving the problems of creation and operation of CRIS systems in Ukraine are promising areas for future research. ENDNOTES 1 Andriy Dobrynin, Konstantin Chernykh, Vasyl Kupriyanovsky, Pavlo Kupriyanovsky and Serhiy Sinyagov, “Tsifrovaya ekonomika—razlichnyie puti k effektivnomu primeneniyu tehnologiy (BIM, PLM, CAD, IOT, Smart City, BIG DATA i drugie),” International Journal of Open Information Technologies 4, no. 1 (2016): 4–10, https://cyberleninka.ru/article/n/tsifrovaya- ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart- city-big-data-i-drugie. 2 Jurgen Meffert, Volodymyr Kulagin, and Alexander Suharevskiy, Digital @ Scale: nastolnaya kniga po tsifrovizatsii biznesa (Moscow: Alpina, 2019). 3 Victoria Apalkova, “Kontseptsiia rozvytku tsyfrovoi ekonomiky v Yevrosoiuzi ta perspektyvy Ukrainy,” Visnyk Dnipropetrovskoho universytetu. Seriia «Menedzhment innovatsii» 23, no. 4 (2015): 9–18, http://nbuv.gov.ua/UJRN/vdumi_2015_23_4_4. 4 Don Tapscott, The Digital Economy: Promise and Peril in the Age of Networked Intelligence (New York: McGraw-Hill, 1996). 5 Thomas L. Mesenbourg, Measuring the Digital Economy (Washington, DC: Bureau of the Census, 2001). 6 Philippe Barbet and Nathalie Coutinet, “Measuring the Digital Economy: State-of-the-Art Developments and Future Prospects,” Communications & Strategies, no. 42 (2001): 153, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.576.1856&rep=rep1&type=pdf . 7 Alnoor Bhimani, “Digitization and Accounting Change,” in Management Accounting in the Digital Economy, edited by Alnoor Bhimani, 1-12 (London: Oxford University Press, 2003), https://doi.org/10.1093/0199260389.003.0001. 8 Bo Carlsson, “The Digital Economy: What is M=New and What is Not?,” Structural Change and Economic Dynamics 15, no. 3 (September 2004): 245–64, https://doi.org/10.1016/j.strueco.2004.02.001. 9 John Hand, “Building Digital Economy—The Research Councils Programme and the Vision,” Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 16, (2009): 3, https://doi.org/10.1007/978-3-642-11284-3_1. 10 Carmen Nadia Ciocoiu, “Integration Digital Economy and Green Economy: Opportunities for Sustainable Development,” Theoretical and Empirical Researches in Urban Management 6, no. 1 (2011): 33–43, https://www.researchgate.net/publication/227346561. https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie http://nbuv.gov.ua/UJRN/vdumi_2015_23_4_4 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.576.1856&rep=rep1&type=pdf https://doi.org/10.1093/0199260389.003.0001 https://doi.org/10.1016/j.strueco.2004.02.001 https://doi.org/10.1007/978-3-642-11284-3_1 https://www.researchgate.net/publication/227346561 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 14 11 Lesya Zenoviivna Kit, “Evoliutsiia Merezhevoi Ekonomiky,” Visnyk Khmelnytskoho Natsionalnoho Universytetu, Ekonomichni nauky, no. 3 (2014): 187–94, http://nbuv.gov.ua/UJRN/Vchnu_ekon_2014_3%282%29__42. 12 Mykhailo Voinarenko and Larissa Skorobohata, “Merezhevi Instrumenty Kapitalizatsii Informatsiino-intelektualnoho Potentsialu ta Innovatsii,” Visnyk Khmelnytskoho Natsionalnoho Universytetu, . Ekonomichni nauky, no. 3 (2015): 18–24, http://elar.khnu.km.ua/jspui/handle/123456789/4259. 13 Yurii Pivovarov, “Ukraina Perehodut na “Cifrovu Economic,” Sccho ce oznachae,” edited by Miroslav Liskovuch. Ukrinform (January 21, 2020). https://www.ukrinform.ua/rubric- society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html. 14 European Commission, “Digital Economy and Society Index,” Brussels, Belgium, https://ec.europa.eu/commission/news/digital-economy-and-society-index-2019-jun-11_en. 15 Kabinet Ministriv Ukrainu, “Pro Skhvalennia Kontseptsii Rozvytku Tsyfrovoi Ekonomiky ta Suspilstva Ukrainy na 2018–2020 Roky ta Zatverdzhennia Planu Zakhodiv Shchodo yii Realizatsii,” (Kyiv: 2018), https://zakon.rada.gov.ua/laws/show/67-2018-%D1%80. 16 Kabinet Ministriv Ukrainu, “Pytannia Ministerstva Tsyfrovoi Transformatsii,” (Kyiv: 2019), https://zakon.rada.gov.ua/laws/show/856-2019-%D0%BF. 17 Piatuy, “Biblioteky Stanut Pershymy Oflain-khabamy: Mintsyfry Zapustyt Kursy z Tsyfrovoi Osvity,” https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry- zapustyt-kursy-z-tsyfrovoi-osvity-206206.html. 18 Jacques Bughin, Jonathan Deaki, and Barbara O’Beirne, “Digital Transformation: Improving the Odds of Success,” McKinsey & Company, https://www.mckinsey.com/business- functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of- success. 19 Domynyk Fyld, Shylpa Patel, and Henry Leon, “Kak Dostich Tsifrovoy Zrelosti,” The Boston Consulting Group Inc. (2018), https://www.thinkwithgoogle.com/_qs/documents/5685/ru_AdWords_Marketing___Sales_89 1609_Mastering_Digital_Marketing_Maturity.pdf. 20 Hortense de la Boutetière, Alberto Montagner, and Angelika Reich, “Unlocking Success in Digital Transformations,” McKinsey & Company, https://www.mckinsey.com/business- functions/organization/our-insights/unlocking-success-in-digital-transformations. 21 Top Lea, “Tsyfrova Transformatsiia Biznesu: Navishcho vona Potribna i Shche 14 Pytan,” BusinessViews, https://businessviews.com.ua/ru/business/id/cifrova-transformacija- biznesu-navischo-vona-potribna-i-sche-14-pitan-2046. 22 Vasily Kupriyanovsky, Andrey Dobrynin, Sergey Sinyagov, and Dmitry Namiot, “Tselostnaya Model Transformatsii v Tsifrovoy Ekonomike—Kak Stat Tsifrovyimi Liderami,” International Journal of Open Information Technologies 5, no. 1 (2017): 26–33, http://nbuv.gov.ua/UJRN/Vchnu_ekon_2014_3%282%29__42 http://elar.khnu.km.ua/jspui/handle/123456789/4259 https://www.ukrinform.ua/rubric-society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html https://www.ukrinform.ua/rubric-society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html https://ec.europa.eu/commission/news/digital-economy-and-society-index-2019-jun-11_en https://zakon.rada.gov.ua/laws/show/67-2018-%D1%80 https://zakon.rada.gov.ua/laws/show/856-2019-%D0%BF https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry-zapustyt-kursy-z-tsyfrovoi-osvity-206206.html https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry-zapustyt-kursy-z-tsyfrovoi-osvity-206206.html https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.thinkwithgoogle.com/_qs/documents/5685/ru_AdWords_Marketing___Sales_891609_Mastering_Digital_Marketing_Maturity.pdf https://www.thinkwithgoogle.com/_qs/documents/5685/ru_AdWords_Marketing___Sales_891609_Mastering_Digital_Marketing_Maturity.pdf https://www.mckinsey.com/business-functions/organization/our-insights/unlocking-success-in-digital-transformations https://www.mckinsey.com/business-functions/organization/our-insights/unlocking-success-in-digital-transformations https://businessviews.com.ua/ru/business/id/cifrova-transformacija-biznesu-navischo-vona-potribna-i-sche-14-pitan-2046 https://businessviews.com.ua/ru/business/id/cifrova-transformacija-biznesu-navischo-vona-potribna-i-sche-14-pitan-2046 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 15 https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike- kak-stat-tsifrovymi-liderami. 23 Nataliia Kraus, Alexander Holoborodko, and Kateryna Kraus, “Tsyfrova Ekonomika: Trendy ta Perspektyvy Avanhardnoho Kharakteru Rozvytku,” Efektyvna Ekonomika no. 1 (2018): 1–7, http://www.economy.nayka.com.ua/pdf/1_2018/8.pdf. 24 David Bawden and Ian Rowlands, “Digital Libraries: Assumptions and Concepts,” International Journal of Libraries and Information Studies (Libri), no. 49 (1999): 181–91, https://doi.org/10.1515/libr.1999.49.4.181. 25 Jack M. Maness, “Library 2.0: The Next Generation of Web-based Library Services,” LOGOS 13, no. 3 (2006): 139–45, https://doi.org/10.2959/logo.2006.17.3.139. 26 Woody Evans, Building Library 3.0: Issues in Creating a Culture of Participation (Oxford: Chandos Publishing, 2009). 27 Younghee Noh, “Imagining Library 4.0: Creating a Model for Future Libraries,” The Journal of Academic Librarianship 41, no. 6 (November 2015): 786–97, https://doi.org/10.1016/j.acalib.2015.08.020. 28 Helle Guldberg et al., “Library 5.0,” Septentrio Conference Series, UiT The Arctic University of Norway, no. 3 (2020), https://doi.org/10.7557/5.5378. 29 Denys Solovianenko, “Akademichni Biblioteky u Novomu Sotsiotekhnichnomu Vymiri. Chastyna Chetverta. Suchasnyi Riven Dyskursu Akademichnoho Bibliotekoznavstva ta Postup E-nauky,” Bibliotechnyi visnyk no.1 (2011): 8–24, http://journals.uran.ua/bv/article/view/2011.1.02. 30 Olga Petrivna Stepanenko, “Perspektyvni Napriamy Tsyfrovoi Transformatsii v Konteksti Rozbudovy Tsyfrovoi Ekonomiky,” in Modeliuvannia ta informatsiini systemy v ekonomitsi : zb. nauk. pr., edited by V. K. Halitsyn, (Kyiv: KNEU, 2017), 120–31, https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120- 131.pdf?sequence=1&isAllowed=y. 31 Michal Indrák and Lenka Pokorná, “Analysis of Digital Transformation of Services in a Research Library,” Global Knowledge, Memory and Communication (2020), https://doi.org/10.1108/GKMC-09-2019-0118. 32 Irina Sergeevna Koroleva, “Biblioteka—Optimalnaya Model Vzaimodeystviya s Polzovatelyami v Usloviyah Tsifrovoy Ekonomiki,” Informatsionno-bibliotechnyie sistemyi, resursyi i tehnologii no. 1 (2020): 57–64, https://doi.org/10.20913/2618-7515-2020-1-57-64. 33 James Currall and Michael Moss, “We are Archivists, But are We OK?”, Records Management Journal 18, no. 1 (2008): 69–91, https://doi.org/10.1108/09565690810858532. 34 Kirralie Houghton, Marcus Foth and Evonne Miller, “The Local Library across the Digital and Physical City: Opportunities for Economic Development,” Commonwealth Journal of Local Governance no. 15 (2014): 39–60, https://doi.org/10.5130/cjlg.v0i0.4062. https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike-kak-stat-tsifrovymi-liderami https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike-kak-stat-tsifrovymi-liderami http://www.economy.nayka.com.ua/pdf/1_2018/8.pdf https://doi.org/10.1515/libr.1999.49.4.181 https://doi.org/10.2959/logo.2006.17.3.139 https://doi.org/10.1016/j.acalib.2015.08.020 https://doi.org/10.7557/5.5378 http://journals.uran.ua/bv/article/view/2011.1.02 https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120-131.pdf?sequence=1&isAllowed=y https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120-131.pdf?sequence=1&isAllowed=y https://doi.org/10.1108/GKMC-09-2019-0118 https://doi.org/10.20913/2618-7515-2020-1-57-64 https://doi.org/10.1108/09565690810858532 https://doi.org/10.5130/cjlg.v0i0.4062 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 16 35 Sharon Farnel and Ali Shiri, “Community-Driven Knowledge Organization for Cultural Heritage Digital Libraries: The Case of the Inuvialuit Settlement Region,” Advances in Classification Research Online no. 1 (2019): 9–12, https://doi.org/10.7152/acro.v29i1.15453. 36 Elizabeth Tait, Konstantina Martzoukou, and Peter Reid, “Libraries for the Future: The Role of IT Utilities in the Transformation of Academic Libraries,” Palgrave Communications no. 2 (2016): 1–9, https://doi.org/10.1057/palcomms.2016.70. 37 Tatiana Alexandrovna Kolesnykova, “Suchasna Biblioteka VNZ: Modeli Rozvytku v Umovakh Informatyzatsii,” Bibliotekoznavstvo. Dokumentoznavstvo. Informolohiia no. 4 (2009): 57–62, http://nbuv.gov.ua/UJRN/bdi_2009_4_10. 38 Ekaterina Kudrina and Karina Ivina, “Digital Environment as a New Challenge for the University Library,”Bulletin of Kemerovo State University. Series: humanities and social sciences 2, no. 10 (2019): 126–34, https://doi.org/10.21603/2542-1840-2019-3-2-126-134. 39 Anna Kochetkova, “Tsyfrovi Biblioteky yak Oznaka XXI Stolittia,” Svitohliad no. 6 (2009): 68–73, https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68- kochetkova.pdf. 40 Victoria Alexandrovna Kopanieva, “Naukova Biblioteka: Vid E-katalohu do E-nauky,” Bibliotekoznavstvo. Dokumentoznavstvo. Informolohiia no. 6 (2016): 4–10, http://nbuv.gov.ua/UJRN/bdi_2016_3_3. 41 Christy R. Stevens, “Reference Reviewed and Re-Envisioned: Revamping Librarian and Desk- Centric Services with LibStARs and LibAnswers,” The Journal of Academic Librarianship 39, no. 2 (March 2013): 202–14, https://doi.org/10.1016/j.acalib.2012.11.006. 42 Samuel Kai-Wah Chu and Helen S Du, “Social Networking Tools for Academic Libraries,” Journal of Librarianship and Information Science 45, no. 1 (February 17, 2012): 64–75, https://doi.org/10.1177/0961000611434361. 43 ACRL Research Planning and Review Committee, “2018 Top Trends in Academic Libraries A Review of the Trends and Issues Affecting Academic Libraries in Higher Education,” C&RL News 79, no.6 (2018): 286–300. https://doi.org/10.5860/crln.79.6.286. 44 Currall and Moss, “We are Archivists, but are We OK?”, 69–91, https://doi.org/10.1108/09565690810858532. 45 Valerii Fishchuk et al., “Ukraina 2030E— Kraina z Rozvynutoiu Tsyfrovoiu Ekonomikoiu,” Ukrainskyi instytut maibutnoho, 2018, https://strategy.uifuture.org/kraina-z-rozvinutoyu- cifrovoyu-ekonomikoyu.html. 46 EuroCRIS, “Search the Directory of Research Information System (DRIS),” https://dspacecris.eurocris.org/cris/explore/dris. 47 MON, “MON Zapustylo Novyi Poshukovyi Servis dlia Naukovtsiv—Vin Bezkoshtovnyi ta Bazuietsia na Vidkrytykh Danykh z Usoho Svituю,” https://mon.gov.ua/ua/news/mon- https://doi.org/10.7152/acro.v29i1.15453 https://doi.org/10.1057/palcomms.2016.70 http://nbuv.gov.ua/UJRN/bdi_2009_4_10 https://doi.org/10.21603/2542-1840-2019-3-2-126-134 https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68-kochetkova.pdf https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68-kochetkova.pdf http://nbuv.gov.ua/UJRN/bdi_2016_3_3 https://doi.org/10.1016/j.acalib.2012.11.006 https://doi.org/10.1177/0961000611434361 https://doi.org/10.5860/crln.79.6.286 https://doi.org/10.1108/09565690810858532 https://strategy.uifuture.org/kraina-z-rozvinutoyu-cifrovoyu-ekonomikoyu.html https://strategy.uifuture.org/kraina-z-rozvinutoyu-cifrovoyu-ekonomikoyu.html https://dspacecris.eurocris.org/cris/explore/dris https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 17 zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na- vidkritih-danih-z-usogo-svitu. 48 Nancy Herther et al., “Text and Data Mining Contracts: The Issues and Needs,” Proceedings of the Charleston Library Conference, 2016, https://doi.org/10.5703/1288284316233. 49 Karen Hogenboom and Michele Hayslett, “Pioneers in the Wild West: Managing Data Collections.” Portal: Libraries and the Academy 17, no. 2 (2017): 295–319, https://doi.org/10.1353/pla.2017.0018. 50 Philip Young et al., “Library Support for Text and Data Mining,” A Report for the University Libraries at Virginia Tech, 2017, http://bit.ly/2FccOwu. 51 Carol Tenopir et al., “Data Sharing by Scientists: Practices and Perceptions,” PloS One 6 (2011), no. 6, https://doi.org/10.1371/journal.pone.0021101. 52 Filip Kruse and Jesper Boserup Thestrup, “Research Libraries’ New Role in Research Data Management, Current Trends and Visions in Denmark,” Liber Quarterly 23, no.4 (2014): 310– 35, https://doi.org/10.18352/lq.9173. 53 American Economic Review, “Data and Code.” AER Guidelines for Accepted Articles. Instructions for Preparation of Accepted Manuscripts, 2020, https://www.aeaweb.org/journals/aer/submissions/accepted-articles/styleguide#IIC. 54 “Data Access and Retention.” The Publication Ethics and Malpractice Statement, (New York: Marsland Press, 2019), http://www.sciencepub.net/marslandfile/ethics.pdf. 55 Patricia Cleary et al., “Text Mining 101: What You Should Know,” The Serials Librarian 72, no.1-4 (May 2017): 156–59, https://doi.org/10.1080/0361526X.2017.1320876. 56 Rebecca Bryant et al., Practices and Patterns in Research Information Management Findings from a Global Survey (Dublin: OCLC Research, 2018), https://doi.org/10.25333/BGFG-D241. https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu https://doi.org/10.5703/1288284316233 https://doi.org/10.1353/pla.2017.0018 http://bit.ly/2FccOwu https://doi.org/10.1371/journal.pone.0021101 https://doi.org/10.18352/lq.9173 https://www.aeaweb.org/journals/aer/submissions/accepted-articles/styleguide#IIC http://www.sciencepub.net/marslandfile/ethics.pdf https://doi.org/10.1080/0361526X.2017.1320876 https://doi.org/10.25333/BGFG-D241 ABSTRACT INTRODUCTION THE CONCEPT OF THE “DIGITAL ECONOMY” FEATURES OF DIGITAL TRANSFORMATION DIRECTIONS OF LIBRARY DEVELOPMENT IN THE DIGITAL ECONOMY PROBLEMS IN RESEARCH DATA MANAGEMENT LIBRARY AND RESEARCH INFORMATION MANAGEMENT CONCLUSIONS ENDNOTES
12483 ---- Automated Fake News Detection in the Age of Digital Libraries ARTICLE Automated Fake News Detection in the Age of Digital Libraries Uğur Mertoğlu and Burkay Genç INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.12483 Uğur Mertoğlu (umertoglu@hacettepe.edu.tr) is a PhD Candidate, Hacettepe University. Burkay Genç (bgenc@cs.hacettepe.edu.tr) is Assistant Professor, Hacettepe University. © 2020. ABSTRACT The transformation of printed media into the digital environment and the extensive use of social media have changed the concept of media literacy and people’s habits of news consumption. While online news is faster, easier, comparatively cheaper, and offers convenience in terms of people's access to information, it speeds up the dissemination of fake news. Due to the free production and consumption of large amounts of data, fact-checking systems powered by human efforts are not enough to question the credibility of the information provided, or to prevent its rapid dissemination like a virus. Libraries, long known as sources of trusted information, are facing challenges caused by misinformation as mentioned in studies about fake news and libraries.1 Considering that libraries are undergoing digitization processes all over the world and are providing digital media to their users, it is very likely that unverified digital content will be served by world’s libraries. The solution is to develop automated mechanisms that can check the credibility of digital content served in libraries without manual validation. For this purpose, we developed an automated fake news detection system based on Turkish digital news content. Our approach can be modified for any other language if there is labelled training material. This model can be integrated into libraries’ digital systems to label served news content as potentially fake whenever necessary, preventing uncontrolled falsehood dissemination via libraries. INTRODUCTION Collins dictionary which chose the term “fake news” as the “Word of the Year 2017,” describes news as the actual and objective presentation of a current event, information, or situation that is published in newspapers and broadcast on radio, television, or online.2 We are in an era where everything goes online, and news is not an exception. Many people today prefer to read their daily news online, because it is a cost-effective and convenient way to remain up to date. Although this convenience has lucrative benefits for society, it can also have harmful side effects. Having access to news from multiple sources, anytime, anywhere has become an irresistible part of our daily routines. However, some of these sources may provide unverified content which can easily be delivered right to your mobile device. Most importantly, potential fake news content delivered by these sources may mislead society and cause social disturbances such as triggering violence against ethnic minorities and refugees, causing unnecessary fear related to health issues, or even sometimes result in crisis, devastating riots and strikes. Not having a steady definition compared to news, fake news is often defined according to the data used or the limited perspective of the study in the literature. For example; DiFranzo and Gloria- Garcia defined the fake news as “false news stories that are packaged and published as if they were genuine.”3 On the other hand, Guess et al. see the term as “a new form of political misinformation” within the domain of politics, whereas Mustafaraj is more direct and defines it as mailto:umertoglu@hacettepe.edu.tr mailto:bgenc@cs.hacettepe.edu.tr INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 2 “lies presented as news.”4 A comprehensive list of 12 definitions can be found in Egelhofer and Lecheler.5 In simplified terms, news which is created to deceive or mislead readers can be called fake news. However, the concept of fake news is a quite broad one that needs to be specified meticulously. Fake news is created for many purposes and emerges in many different types. Having an interwoven structure, most of these types are shown in figure 1. Although, it is not easy to cluster these types into separate groups, they can be categorized according to the information quality or based on the intention as it is created to deceive deliberately or not, as Rashkin et al. did.6 We propose the following classification where the two dimensions represent the potential impact and the speed of propagation. Figure 1. The volatile distribution of the fake news types (clustered in four regions: sr, Sr, Sr, SR) with respect to two dimensions: speed of propagation and potential impact. The four regions visualized are clustered according to their dangerousness. First of all, it should be noted that to order types of fake news in a stable precision is quite a challenging task. The variations within the field highly depend on dynamic factors such as timespan, actors, and echo- chamber effect. Hence, this figure should be considered as a clustering effort. There are possible intersecting areas of types within the regions. We will now give examples for two regions, “sr” and “SR.” For example, the SR grouping shows characteristics of high-risk levels and fast dissemination. This includes varieties of fake news such as propaganda, manipulation, misinformation, hate news, INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 3 provocative news, etc. We usually encounter this in the domain of politics. This kind of news may cause critical and nonrecoverable results in politics, the economy, etc., in a short period of time. The rise of the term fake news itself can also be attributed to this kind of news. On the other hand, the relatively less severe group (sr) of fake news, comprising of satire, hoax, click-bait, etc., has low-risk levels and a slow speed of dissemination. A frequently used type of this group, click-bait, is a sensational headline or link that urges the reader to click on a post, link, article, image, or video. These kinds of news have a repetitive style. It can be said that readers become aware of falsehood after experiencing a few times. So, risk level is lower, and dissemination is slower. Vosoughi et al. stated the assumption that “Falsehood diffuses significantly farther, faster, deeper, and more broadly than the truth.”7 So indeed, just one piece of fake news may affect many more people than thousands of true news items do because of the dramatic circulation of fake news. In their recent survey about fake news, Zhou and Zafarani highlighted that fake news is a major concern for many different research disciplines especially information technologies. 8 Being a trusted source of information for a long time, libraries will play an important role in fighting against fake news problem. Kattimani et al. claims that the modern librarian must be equipped with necessary digital skills and tools to handle both printed collections and newly emerging digital resources.9 Similarly, we foresee that digital libraries, which can be defined as collections of digital content licensed and maintained by libraries, can be a part of the solution as an authority service with a collective effort. Connaway et al. point to the key role of information professionals such as librarians, archivists, journalists, and information architects in helping society use the products and services related to news in a convenient way. 10 As libraries all over the world are transitioning into digital content delivery services, they should implement mechanisms to avoid fake and misleading content being disseminated through them under the guidance of information professionals. To lay out proper future directions for the solution strategy, a clear understanding of interaction between library and information science (LIS) community and fake news must be addressed. Sullivan states that the LIS community has been affected deeply in the aftermath of the 2016 US presidential elections.11 Moreover, he quotes many other scientists, emphasizing libraries’ and librarians’ role in the fight against fake news. For example, Finley et al. say that libraries are the direct antithesis of fake news, the American Library Association (ALA) called fake news an anathema to the ethics of librarianship in 2017, Rochlin emphasizes the role of librarians in this fight, and talks about the need to adopt fake news as a central concern in librarianship and many other researchers name librarians in the front lines of the fight against fake news.12 Today, the struggle to detect fake news and prevent their spread is so popular that competitions are being organized (e.g., http://www.fakenewschallenge.org/) and conferences are being held (e.g., Bobcatsss 2020). The struggle against fake news can be classified under three main venues: • Reader awareness • Fact-checking organizations and websites • Automated detection systems The first item requires awareness of individuals against fake news and a collective conscience within the society against spreading fake news. To this end, visual and textual checklists, frameworks, and guidance lists are being published by official organizations, such as IFLA’s13 http://www.fakenewschallenge.org/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 4 (International Federation of Library Associations) infographic which contains eight steps to spot fake news. The RADAR framework and the Currency, Relevance, Authority, Accuracy, and Purpose (CRAAP) test are some of the efforts trying to increase reader-awareness of fake news.14 Unfortunately, due to the nature of fake news and the clever way they are created triggering people’s hunger to spread sensational information, it is very difficult to achieve full control via this strategy. Some studies explicitly showed that humans are prone to get confused when it comes to spotting lies or deciding whether a news item is fake or not.15 Furthermore, people often overlook facts that conflict with their current belief, especially in politics and controversial social issues.16 The second strategy focuses on third-party manually driven systems for checking and labelling content as fake or valid. Recently, we have seen many examples of offline and online organizations trying to work according to this strategy, such as a growing body of fact-checking organizations, start-ups (Storyzy, Factmata, etc.), and other projects with similar purposes.17 Unfortunately, these manually powered systems cannot cope with the huge amounts of digital content being steadily produced. Therefore, they focus only on a subset of digital content that they classify as having higher priority. Even for this subset of content, their reaction speed is much slower than the fake information’s spread speed. Therefore, automated and verified systems emerge as an inevitable last option. The third strategy offers automated fact-checking systems, which once trained, can deliver content labelling at unprecedented speeds. Today, many researchers are researching automated solutions and building models with different methodologies.18 Notwithstanding the latest studies, there is still a lot to do in the realm of automated fake news detection. Automated fact-checking systems will be detailed in the rest of the paper. Thanks to the internet, the collections of digital content served by digital libraries can be accessed by a great number of users without distance and time limits. Therefore, we propose a solution to the problem by positioning digital libraries as automated fact-checking services, which label digital news content as fake or valid as soon as or before it is served through library systems. The main reason we associate this approach with digital libraries is their access to a wide variety of digital content which can be used to train the proposed mathematical models, as well as their role in the society as the publisher of trusted information. To this end, we develop a mathematical model that is trained using existing news content served by digital libraries, and capable of labelling news content as fake or valid with unprecedented accuracy. The proposed solution uses machine learning techniques with an optimized set of extracted features and annotated labels of existing digital news content. Our study mainly contributes (a) a new set of features highly applicable for agglutinative languages, (b) the first hybrid model combining a lexicon/dictionary- based approach with machine learning methods to detect fake news, and (c) a benchmark dataset prepared in Turkish for fake news detection. LITERATURE REVIEW Contemporary studies have indicated that social, economic, and political events in recent years, especially after the 2016 US presidential elections, are increasingly associated with the concept of fake news.19 Since then, fake news has begun to be used as a tool in many domains. On the other hand, researchers motivated by finding automated solutions started to make use of machine learning, deep learning, hybrid models, and other methodologies for their solutions. https://storyzy.com/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 5 Although computational deception detection studies applying NLP (Natural Language Processing) operations are not new, textual deception in the context of text-based news is a new topic for the field of journalism.20 Accordingly, we believe that there is a hidden body language of news text, which has linguistic clues indicating whether the news is fake or not. Thus, lexical, syntactic, semantic, and rhetorical analysis when used with machine learning and deep learning techniques offers encouraging directions. The textual deception spread over a wide spectrum and the studies have utilized many different techniques. There are some prominent studies which took the problem as a binary classification problem utilizing linguistic clues.21 Although it is still early to say the linguistic characteristics of fake news are fully understood, research into fake-news detection in English-language texts is relatively advanced compared to that in other languages. In contrast, agglutinative languages such as Turkish have been little researched when it comes to fake news detection. Agglutinative languages enable the construction of words by adding various morphemes, which means that words that are not practically in use may exist theoretically. For example, “gerek-siz-leş-tir-ebil- ecek-leri-miz-den-dir,” is a theoretically possible word that means “it is one of the things that we will be able to make redundant,” but it is not a practical one. Shu et al. classified the models for the detection of fake news in their study.22 According to this study, the automated approaches can focus on four types of attributes to detect fake news: knowledge based, style based, stance based, or propagation based. Among these, it can be said that the most useful approaches are the ones which focus on the textual news content. Th e textual content can be studied by an automated process to extract features that can be very helpful in classifying content as fake or valid. Many scholars have tried to build models for automatic detection and prediction of fake news using machine learning algorithms, deep learning algorithms, and other techniques. These scholars approach the detection of fake news from many different perspectives and domains. For example, in one of the studies, scientific news and conspiracy news were used.23 In Shu et al.’s study based on credibility of news, the headlines were used to determine whether the article was clickbait or not. In another study, Reis et al. worked on Buzzfeed articles linked to the 2016 US election using machine learning techniques with a supervised learning approach.24 Studies which try to detect satire and sarcasm can be attributed to subcategories of fake news detection.25 Our observation, in line with the general view, is that satire is not always recognizable and can be misunderstood for real news.26 For this reason, we included satirical news in our dataset. It should be noted that although satire or sarcasm can be classified by automated detection systems, experts should still evaluate the results of the classification. While some scholars used specific models focusing on unique characteristics, some others such as Ruchansky et al. proposed hybrid deep models for fake news detection making use of multiple kinds of features such as temporal engagement between users and news articles over time and generated a labelling methodology based on those features.27 In related studies, many features such as automatic extracted features, hand-crafted features, social features, network information, visual features, and some others such as psycholinguistic features, are applied by researchers.28 In this work, we focused on news content features, however the social context features can also be adapted using different tiers such as user activity patterns, INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 6 analysis of user interaction, profile metadata, social network/graph analysis etc. to extract features. We also have some of these features in our data but not having ground truth quantitatively, we avoided using these features. METHODOLOGY In this section, we present our motivation for this work which we visualized in a framework and named Global Library and Information Science (GLIS_1.0). Subsequently, we discuss the construction of the automated detection system as the key element of the GLIS_1.0 framework. We explain the framework, model, dataset, features, and the techniques used in this section. Framework The main structure of the proposed framework is shown in figure 2. This framework consists of highly cohesive but flexible layers. Figure 2. The GLIS_1.0 framework main structure. In the presentation layer one can find the different sources of news that are publicly available. These sources can be accessed directly using their websites or can be searched for via search engines. The news is received by fact-checking organizations which classify them manually, digital libraries which archives and serves them, and automated detection systems (ADS) which classify them automatically. Digital libraries work together with fact-checking organizations and ADSs to present clean and valid news to the public. Moreover, search engines use digital libraries systems to label their results as fake or valid. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 7 Fact-checking organizations should also benefit from the output of ADSs, as instead of manually checking heaps of news content, they could now focus on news labeled as potentially fake by an ADS. Through GLIS, ADSs make the life of fact-checking organizations and digital libraries much easier, all the while increasing the quality of news served to the public. Considering this is a high-level overview of a structure given in figure 2, there may be many other components, mechanisms, or layers, but the key elements of this structure are automated detection systems and the digital libraries. A critical approach to this framework can be why we need such an authority mechanism. The answer will be quite simple, technological progress is not the only solution. On the contrary, tech giants have already been subject to regulatory scrutiny for how they handle personal information.29 Also, their policy related to political ads has been questioned. Furthermore, they are often blamed for failing to fight fake news. Indeed, there is an urgent need for a global action more than ever. Digital libraries are much more than a technological advancement. Hence, they should be considered as institutions or services which can be a great authority service to provide news to society since the printed media disappears day by day. The threats caused by fake news are real and dangerous, but only recently have researchers from different disciplines been trying to find possible solutions such as educational, technological, regulatory, or political. Digital librarianship can be the intersection of all these solutions for promoting information/media literacy. Hence, digital librarianship will make use of many automated detection systems (ADS) to serve qualified news. In the following section, we discuss ADS in detail. Model An overview of our model of automated detection system solution which is very critical for the framework is shown in figure 3. Our fake news detection model consists of two phases. First is the Language Model/Lexicon Generation and the second is Machine Learning Integration. In this work, we used machine learning algorithms via supervised learning techniques which learn from labeled news data (training) and helps us to predict outcomes for unforeseen news data (test). Dataset We collected our data from three sources: • The primary source is the GDELT (Global Database of Events, Language and Tone) Project (https://www.gdeltproject.org/), a massive global news media archive offering free access to news text metadata for researchers worldwide. It can almost be considered a digital library of news in its own right. However, GDELT does not provide the actual news text and only serves processed metadata along with the URL of the news item. GDELT normally does not check for the validity of any news items. However, we have only used news from approved news agencies and completely ignored news from local and lesser-known sources to maximize the validity of the news we have automatically obtained through GDELT. Moreover, we have post-processed the obtained texts by cross-validating with teyit.org data to clean any potential fake news obtained through GDELT links. https://www.gdeltproject.org/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 8 Figure 3. Integrated fake news detection model with main phases combining language-model based approach with machine learning approach. • The second source is teyit.org which is a fact-checking organization based in Turkey, compliant to the principles of IFCN (International Fact-Checking Network) aiming to prevent spreading of false information through online channels. Manually analyzing each INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 9 news item, they tag them as fake, true, or uncertain. We used their results to automatically download and label each news text. • Lastly, our team collected manually curated and verified fake and valid news obtained from various online sources and named it as MVN (Manually Verified News). This set includes fake and valid news that we have manually accumulated in time during our studies and that were not overlapping with the news obtained from GDELT and teyit.org sources. We named our dataset TRFN. In Phase 2, the data is very similar to the one we used in Phase 1. However, to see the effectiveness of model, we made modifications to exclude old news before 2017 and added new items from 2019. The news in our dataset span a time frame between 2017– 2019 and are uniformly distributed. Table 1 outlines the dataset statistics, namely where the news text comes from, its class (fake or valid), the amount of distinct texts and the corresponding data collection method. It can be seen from the table that most of our valid news come from the GDELT source, whereas teyit.org, a fact-checking organization, contributes only fake news. Table 1. TRFN Dataset Summary after cleaning and duplicate removal. Dataset Class Size of Processed Data Collection Method GDELT NON-FAKE 82708 Automated Teyit.org FAKE 1026 MVN NON-FAKE 1049 Manual FAKE 400 All news items were processed through Zemberek (http://code.google.com/p/zemberek), the Turkish NLP engine for extracting different morphological properties of words within texts. After this processing phase, all obtained features were converted into tabular format and made available for future studies. This dataset is now available for scholarly studies upon request. In a study of this nature, the verifiability of the data used is important. As we have already mentioned, most of the data we used comes from verified sources such as mainstream news agencies accessed through GDELT and teyit.org archives which are verified by teyit.org staff. All data used in training the mathematical models which are to be explained in the rest of the paper are either directly or indirectly verified. Another important issue was generalizability of the dataset, which determines whether the results of the study are only applicable to specific domains or to all available domains. Although focusing on a specific news domain would clearly improve our accuracies, we preferred to work in the general domain and included news from all specific domains. The distribution of domains in our dataset is visualized in figure 4. This distribution closely matches the distribution one would experience reading daily news in Turkey. Hence, we have no domain specific bias in our training dataset. http://code.google.com/p/zemberek INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 10 Figure 4. The distribution of domains in the dataset. (SciTechEnvWetNatLife = Science, Technology, Environment, Weather, Nature, Life. EduCultureArtTourism = Education, Culture, Art, Tourism.) Moreover, we obtained highly correlated evidence showing syntactic similarities with the other NLP studies in Turkish during the exploratory data analysis. For example, the results of a study by Zemberek developers (http://zembereknlp.blogspot.com/2006/11/kelime-istatistikleri.html) to find the most common words in Turkish experimented with over five million words is compatible with most common words in our corpus. This evidence can be attributed to representability of our dataset. The last issue worth discussing is the imbalanced nature of the dataset. An imbalanced dataset occurs in a binary classification study when the frequency of one of the classes dominates the frequency of the other class in the dataset. In our dataset, the amount of fake news is highly surpassed by the amount of valid news. This generally results in difficulties in applying conventional machine learning methods to the dataset. However, it is a frequently observed phenomenon due to the disparity of variable classes in these kinds of problems in real world. To avoid potential problems due to the imbalanced nature of the dataset, we used SMOTE (Synthetic http://zembereknlp.blogspot.com/2006/11/kelime-istatistikleri.html INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 11 Minority Over-sampling Technique) which is an over-sampling method.30 It creates synthetic samples of the minority class that are relatively close in the feature space to the existing observations of the minority class. Features In this study, we discarded some features because of their relatively low impact on overall performance during the exploratory data analysis and subsequently in the training phase. The most effective features we decided on are shown in table 2. Table 2. Main Features Features Group Definition nRootScore Language Model Features The news score calculated according to the Root Model nRawScore The news score calculated according to the Raw Model SpellErrorScore Extracted Features Spell errors per sentences ComplexityScore The score of the complexity/readibility of the news Source Labels The URL or identifier of the news MainCategory The category of the news NewsSite The unique address of the news The language model features nRootScore and nRawScore are features that we have borrowed from our earlier study on fake news detection.31 In that study, we focused on constructing a fake news dictionary/lexicon based on different morphological segments of the words used in news texts. These two scores were found to be the most successful ones in determining the fakeness/validity of a news text, one considering the raw form of the words, the other considering the root form. The extracted features are ComplexityScore and SpellErrorScore. ComplexityScore basically represents the readability of the text. Studies for determining a good readability metric exist for the Turkish language.32 We used a modified version of the Gunnig-Fog metric, which is based on word length and sentence length.33 Since Turkish is an agglutinative language, we used word length instead of using the syllable count. We also made some modifications to normalize the scores. The average number of syllables per word syllable in Turkish is 2.6, so we defined a word as a long word if it has more than 9 letters.34 For a given news text T, the Complexity Score (CS) can be computed by equation 1. (1) 𝑇𝐶𝑆 = ( 𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠𝑐𝑜𝑢𝑛𝑡 + 𝐿𝑜𝑛𝑔𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡∗100 𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡 10 ) The second Extracted Feature is SpellErrorScore. We foresee that there may be many more errors in fake news than in valid news. We calculated the spell error counts making use of Turkish Spellchecker class of Zemberek. Due to the text length of news varies, we calculate the ratio INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 12 according to the sentences. For a given news text T, SE (Spell Error Score) is calculated as shown in equation 2. (2) 𝑇𝑆𝐸 = ( 𝑆𝑝𝑒𝑙𝑙𝐸𝑟𝑟𝑜𝑟𝐶𝑜𝑢𝑛𝑡 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠𝐶𝑜𝑢𝑛𝑡 ) Finally, we included the metadata categories Source, MainCategory, and NewsSite as additional identifiers for the learning process. Then, we combined features extracted from text representation techniques with the features shown in table 2 and trained the model with different classifiers. For text representation, we followed two directions for the experiments. First, we converted text into structured features with Bag of Words (BOW) approach in which text data is represented as the multiset of its words. Second, we experimented with N-grams which represents the sequence of n words, in other words splitting text into chunks of size N-words. In the (BOW) model, documents in TRFN are represented as a collection of words, ignoring grammar and even word order, but preserving multiplicity. In a classic BOW approach, each document can be represented as a fixed-length vector with length equal to the vocabulary size. This means each dimension of this vector corresponds to the occurrence of a word in a news item. We customized the generic approach by reducing variable-length documents to fixed-length vectors to be able to use with varying lengths with many machine learning models. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 13 Figure 5. An overview of BOW (Bag of Word) Approach. Because we ignore the word order, we reduced fixed length of counts as histograms as seen in figure 5. Assuming N is the number of news documents and W is the number of possible words in the corpus, it should be noted that in N*W count matrix, N is generally large but infrequent, because we have many news documents, but most words do not occur in any given document causing rareness of a term/word which is a drawback for the approach. Therefore, we modified the model to compensate the rarity problem by weighting the terms using TF-IDF measure which evaluates how important a word is to a document in a collection. The other technique we used, N-gram model is the generic term for a string of words in computational linguistics, and it is extensively used in text mining and NLP tasks. The prefixes that replace the n-part indicate the number of consecutive words in the string. So, a unigram is referred to one word, a bigram is two words, and an n-gram is n words. EXPERIMENTAL RESULTS AND DISCUSSION In this section, the experimental process and the results are presented. All experiments are performed using the Scikit-learn library. To evaluate the performance of the model and proposed features we employed the precision, recall, F1 score (the harmonic mean of the precision and recall), and accuracy metrics. We did many experiments using different combinations of features. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 14 Several classification models have been trained. These are as follows: K-Nearest Neighbor, Decision Trees, Gaussian Naive Bayes, Random Forest, Support Vector Machine, ExtraTrees Classifier, and Logistic Regression. To be effective, a classifier should be able to correctly classify previously unseen data. To this end, we tuned the parameter values for all the classification models used. Then, models were trained and evaluated on TRFN dataset using 10-fold cross-validation. In table 3, we present the ultimate best scores of the proposed model. The results are highly motivating to exemplify how useful automated detection systems can be as a key component of the integrated solution framework in figure 2. We compared the algorithms with three ultimate feature sets for having respectively consistent results to the other feature set combinations. Set1 stands for bigram+FOpt (Optimized Features), Set2 stands for BOWModified+ FOpt and Set3 stands for unigram+bigram+FOpt. The results show that there is a relative consistency in terms of performance across the models. In almost all models, the combination of unigram+bigram and optimized features sets (FOpt) gives better results than the other combinations. The ExtraTree Classifier model is chosen as the best due to its higher performance. This model is also known as Extremely Randomized Trees Classifier which is a type of ensemble learning technique aggregating the results of multiple decision trees collected in a “forest” to output its classification result. It is very similar to Random Forest Classifier and only differs in the manner of construction of the decision trees. So, we can also see closer results between these two classifiers. Table 3. Results. Evaluation results of all combinations of features and classification models. Model Feature Sets Precision%(0,1) Recall%(0,1) Accurac y F1Scor e Set1 93.32 93.96 93.92 93.3 6 93.64 93.62 Gaussian Naive Bayes Set2 93.37 94.02 93.98 93.4 2 93.70 93.68 Set3 93.95 94.21 94.19 93.9 7 94.08 94.07 Set1 93.70 93.50 93.52 93.6 9 93.60 93.61 K-Nearest Neighbour Set2 93.66 94.05 94.03 93.6 8 93.85 93.84 Set3 94.42 94.21 94.22 94.4 1 94.31 94.32 Set1 94.15 94.92 94.88 94.1 9 94.53 94.51 ExtraTrees Classifier Set2 94.09 94.94 94.90 94.1 4 94.51 94.49 Set3 97.90 95.72 95.81 97.8 6 96.81 96.85 Set1 89.61 88.92 88.99 89.5 4 89.26 89.30 Support Vector Machine Set2 89.70 88.96 89.04 89.6 2 89.33 89.37 Set3 90.85 91.26 91.22 90.8 9 91.05 91.03 Set1 91.56 92.28 92.23 91.6 2 91.92 91.89 Logistic Regression Set2 91.50 92.28 92.22 91.5 6 91.89 91.86 Set3 92.25 92.90 92.86 92.3 0 92.57 92.55 Set1 93.71 94.44 94.40 93.7 5 94.07 94.05 Random Forest Set2 93.87 95.00 94.94 93.9 4 94.44 94.41 Set3 94.77 95.14 95.12 94.7 9 94.96 94.95 Set1 93.95 94.59 94.56 93.9 9 94.27 94.25 Decision Trees Set2 94.05 95.08 95.03 94.1 1 94.57 94.54 Set3 94.94 95.24 95.23 94.9 5 95.09 95.08 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 15 Every ADS in GLIS_1.0 framework may use its own way to detect fake news. The open source ADS may improve with feedbacks. Hybrid models and other techniques such as neural networks with deep learning methodology can also be used according to the data, language of news and the news features related with both social context and news content. CONCLUSION AND FUTURE WORK In this study we presented a novel framework which offers a practical architecture of an integrated system for identifying fake news. We have tried to illustrate how digital libraries can be a service authority to promote media literacy and fight against fake news. Because librarians are trained to critically analyze information sources, their contributions to our proposed model are critical. Accordingly, we see this work as an encouraging effort for the next collaborative studies among the communities of LIS and CS (computer science). We think that there is an immediate need for LIS professionals to participate and contribute to automated solutions that can help detecting inaccurate and unverified information. In the same manner, we believe the collaboration of LIS professionals, computer scientists, fact-checking organizations, and pioneering technology platforms is the key to provide qualified news within a real-time framework to promote information literacy. Moreover, we put the reader at the core of the framework as the feed reader position while consuming news. In terms of automated detection systems, we proposed a fake news detection model in tegration of dictionary-based approach and machine learning techniques offering optimized feature sets applicable to agglutinative languages. We comparatively analyzed the findings with several classification models. We demonstrated that machine learning algorithms when used together with dictionary-based findings yield high scores both for precision and recall. Consequently, we believe once operational in the field, proposed workflow can be extended in the future to support other news elements such as photographs and videos. With the help of Social Network Analysis (SNA) it may be possible to stop or slow down the spread of fake news as it emerges. During all the experiments we did, this work also highlighted several tasks as future research directions such as: • The studies can be deepened to mathematically categorize the fake news types and the dissemination characteristics of each type can be analyzed. • The workflow has the potential to provide an automated verification platform for all news content existing in digital libraries to promote media literacy. ENDNOTES 1 M. Connor Sullivan, “Why Librarians Can’t Fight Fake News,” Journal of Librarianship and Information Science 51, no. 4 (December 2019): 1146–56, https://doi.org/10.1177/0961000618764258. 2 “Definition of 'News',” available at: https://www.collinsdictionary.com/dictionary/english/news 3 Dominic DiFranzo and Kristine Gloria-Garcia, “Filter Bubbles and Fake News,” XRDS: Crossroads, The ACM Magazine for Students 23, no. 3 (April 2017): 32–35, https://doi.org/10.1145/3055153. https://doi.org/10.1177/0961000618764258 https://www.collinsdictionary.com/dictionary/english/news https://doi.org/10.1145/3055153 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 16 4 Andrew Guess, Brendan Nyhan, and Jason Reifler, “Selective Exposure to Misinformation: Evidence from the Consumption of Fake News during the 2016 US Presidential Campaign,” European Research Council 9, no. 3 (2018): 4; Eni Mustafaraj and P. Takis Metaxas, “The Fake News Spreading Plague: Was It Preventable?” Proceedings of the 2017 ACM on Web Science Conference, (June 2017): 235–39, https://doi.org/10.1145/3091478.3091523. 5 Jana Laura Egelhofer and Sophie Lecheler, “Fake News as a Two-Dimensional Phenomenon: A Framework and Research Agenda,” Annals of the International Communication Association 43, no. 2 (2019): 97–116, https://doi.org/10.1080/23808985.2019.1602782. 6 Hannah Rashkin et al., “Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking,” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (2017): 2931–37. 7 Soroush Vosoughi, Deb Roy, and Sinan Aral, “The Spread of True and False News Online,” Science 359, no. 6380 (2018): 1146–51, https://doi.org/10.1126/science.aap9559. 8 Xinyi Zhou and Reza Zafarani, “A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities,” ACM Computing Surveys (CSUR) 53, no. 5 (2020): 1–40, https://doi.org/10.1145/3395046. 9 S. F. Kattimani, Praveenkumar Kumbargoudar, and D. S. Gobbur, “Training of the Library Professionals in Digital Era: Key Issues” (2006), https://ir.inflibnet.ac.in:8443/ir/handle/1944/1234. 10 Lynn Silipigni Connaway et al., “Digital Literacy in the Era of Fake News: Key Roles for Information Professionals,” Proceedings of the Association for Information Science and Technology 54, no. 1 (2017): 554–55, https://doi.org/10.1002/pra2.2017.14505401070. 11 Matthew C. Sullivan, “Libraries and Fake News: What’s the Problem? What’s the Plan?,” Communications in Information Literacy 13, no. 1 (2019): 91–113, https://doi.org/10.15760/comminfolit.2019.13.1.7. 12 Wayne Finley, Beth McGowan, and Joanna Kluever, “Fake News: An Opportunity for Real Librarianship,” ILA reporter 35, no. 3 (2017): 8–12; American Library Association, “Resolution on Access to Accurate Information,” 2018; Nick Rochlin, “Fake News: Belief in Post-Truth,” Library Hi Tech 35, no. 3 (2017): 386–92, https://doi.org/10.1108/LHT-03-2017-0062; Linda Jacobson, “The Smell Test: In the Era of Fake News, Librarians Are Our Best Hope,” School Library Journal 63, no. 1 (2017): 24–29; Angeleen Neely–Sardon, and Mia Tignor, “Focus on the Facts: A News and Information Literacy Instructional Program,” The Reference Librarian 59, no. 3 (2018): 108–21, https://doi.org /10.1080/02763877.2018.1468849; Claire Wardle and Hossein Derakhshan, “Information Disorder: Toward an Interdisciplinary Framework for Research and Policy Making,” Council of Europe report 27 (2017). 13 IFLA, “How to Spot Fake News,” 2017. https://doi.org/10.1145/3091478.3091523 https://doi.org/10.1080/23808985.2019.1602782 https://doi.org/10.1145/3395046 https://doi.org/10.1002/pra2.2017.14505401070 https://doi.org/10.15760/comminfolit.2019.13.1.7 https://www.emerald.com/insight/publication/issn/0737-8831 https://doi.org/10.1108/LHT-03-2017-0062 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 17 14 Jane Mandalios, “Radar: An Approach for Helping Students Evaluate Internet Sources,” Journal of Information Science 39, no. 4 (2013): 470–78, https://doi.org/10.1177/0165551513478889; Sarah Blakeslee, “The CRAAP test,” LOEX Quarterly 3, no. 3 (2004):4. 15 Victoria L. Rubin and Niall Conroy, “Discerning Truth from Deception: Human Judgments and Automation Efforts,” First Monday 17, no. 5 (2012), https://doi.org/10.5210/fm.v17i3.3933; Verónica Pérez-Rosas et al., “Automatic Detection of Fake News,” arXiv preprint arXiv:1708.07104 (2017). 16 Justin P. Friesen, Troy H. Campbell, and Aaron C. Kay, “The Psychological Advantage of Unfalsifiability: The Appeal of Untestable Religious and Political Ideologies,” Journal of Personality and Social Psychology 108, no. 3 (2015): 515–29, https://doi.org/10.1037/pspp0000018. 17 Tanja Pavleska et al., “Performance Analysis of Fact-Checking Organizations and Initiatives in Europe: A Critical Overview of Online Platforms Fighting Fake News,” Social Media and Convergence 29 (2018). 18 Yasmine Lahlou, Sanaa El Fkihi, and Rdouan Faizi, “Automatic Detection of Fake News on Online Platforms: A Survey,” (paper, 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Rabat, Morocco, 2019), https://doi.org/10.1109/ICSSD47982.2019.9002823; Christian Janze, and Marten Risius, “Automatic Detection of Fake News on Social Media Platforms,” (paper, Pasific Asia Conference on Information Systems (PACIS), 2017); Torstein Granskogen, “Automatic Detection of Fake News in Social Media Using Contextual Information” (master’s thesis, Norwegian University of Science and Technology (NTNU), 2018). 19 Jacob L. Nelson and Harsh Taneja, “The Small, Disloyal Fake News Audience: The Role of Audience Availability in Fake News Consumption,” New Media & Society 20, no. 10 (2018): 3720–37, https://doi.org/10.1177/1461444818758715; Philip N. Howard et al., “Social Media, News and Political Information During the US Election: Was Polarizing Content Concentrated in Swing States?,” arXiv preprint arXiv:1802.03573 (2018); Alexandre Bovet and Hernán A. Makse, “Influence of Fake News in Twitter During the 2016 US Presidential Election,” Nature Communications 10, no. 7 (2019): 1–14, https://doi.org/10.1038/s41467-018-07761-2. 20 Lina Zhou et al., “Automating Linguistics-Based Cues for Detecting Deception in Text-Based Asynchronous Computer-Mediated Communications,” Group Decision and Negotiation 13, no. 1 (2004): 81–106, https://doi.org/10.1023/B:GRUP.0000011944.62889.6f; Myle Ott et al., “Finding Deceptive Opinion Spam by Any Stretch of the Imagination,” arXiv preprint arXiv:1107.4557 (2011); Rada Mihalcea and Carlo Strapparava, “The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language,” (paper, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, (2009): Association for Computational Linguistics, 309–12); Julia B. Hirschberg et al., “Distinguishing Deceptive from Non-Deceptive Speech,” (2005), https://doi.org/10.7916/D8697C06. 21 Victoria L. Rubin, Yimin Chen, and Nadia K. Conroy, “Deception Detection for News: Three Types of Fakes,” Proceedings of the Association for Information Science and Technology 52, no. 1 (2015): 1–4, https://doi.org/10.1002/pra2.2015.145052010083; David M. Markowitz, and Jeffrey T. Hancock, “Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel,” PloS https://doi.org/10.1177/0165551513478889 https://doi.org/10.5210/fm.v17i3.3933 https://psycnet.apa.org/doi/10.1037/pspp0000018 https://doi.org/10.1109/ICSSD47982.2019.9002823 https://doi.org/10.1177%2F1461444818758715 https://doi.org/10.1038/s41467-018-07761-2 https://doi.org/10.1023/B:GRUP.0000011944.62889.6f https://doi.org/10.7916/D8697C06 https://doi.org/10.1002/pra2.2015.145052010083 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 18 one 9, no. 8 (2014): e105937, https://doi.org/10.1371/journal.pone.0105937; Jing Ma et al., “Detecting Rumors from Microblogs with Recurrent Neural Networks,” (paper, Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), (2016): 3818–24), https://ink.library.smu.edu.sg/sis_research/4630. 22 Kai Shu et al., “Fake News Detection on Social Media: A Data Mining Perspective,” ACM SIGKDD Explorations Newsletter 19, no. 1 (2017): 22–36, https://doi.org/10.1145/3137597.3137600. 23 Eugenio Tacchini et al., “Some Like It Hoax: Automated Fake News Detection in Social Networks,” arXiv preprint arXiv:1704.07506 (2017). 24 Julio C.S. Reis et al., “Supervised Learning for Fake News Detection,” IEEE Intelligent Systems 34, no. 2 (2019): 76–81, https://doi.org10.1109/MIS.2019.2899143. 25 Victoria L. Rubin et al., “Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News,” (paper, Proceedings of the Second Workshop on Computational Approaches to Deception Detection, (2016): 7–17); Francesco Barbieri, Francesco Ronzano, and Horacio Saggion, “Is This Tweet Satirical? A Computational Approach for Satire Detection in Spanish,” Procesamiento del Lenguaje Natural, no. 55 (2015): 135-42; Soujanya Poria et al., “A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks,” arXiv preprint arXiv:1610.08815 (2016). 26 Lei Guo and Chris Vargo, “’Fake News’ and Emerging Online Media Ecosystem: An Integrated Intermedia Agenda-Setting Analysis of the 2016 Us Presidential Election,” Communication Research 47, no. 2 (2020): 178–200, https://doi.org/10.1177/0093650218777177. 27 Natali Ruchansky, Sungyong Seo, and Yan Liu, “CSI: A Hybrid Deep Model for Fake News Detection,” Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (November 2017): 797–806, https://doi.org/10.1145/3132847.3132877. 28 Yaqing Wang et al., “EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection,” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2018): 849–57, https://doi.org/10.1145/3219819.3219903; James W. Pennebaker, Martha E. Francis, and Roger J. Booth, “Linguistic Inquiry and Word Count: LIWC 2001”, Mahway: Lawrence Erlbaum Associates 71, no. 2001 (2001). 29 “Facebook, Twitter May Face More Scrutiny in 2019 to Check Fake News, Hate Speech,” accessed May 17, 2020, available: https://www.huffingtonpost.in/entry/facebook-twitter-may-face- more-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e. 30 Nitesh V. Chawla et al., “Smote: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research 16, (2002): 321–57, https://doi.org/10.1613/jair.953. 31 Uğur Mertoğlu and Burkay Genç, “Lexicon Generation for Detecting Fake News,” arXiv preprint arXiv:2010.11089 (2020). 32 Burak Bezirci, and Asım Egemen Yilmaz, “Metinlerin Okunabilirliğinin Ölçülmesi Üzerine Bir Yazilim Kütüphanesi Ve Türkçe Için Yeni Bir Okunabilirlik Ölçütü,” Dokuz Eylül Üniversitesi https://doi.org/10.1371/journal.pone.0105937 https://ink.library.smu.edu.sg/sis_research/4630 https://doi.org/10.1145/3137597.3137600 https://doi.org10.1109/MIS.2019.2899143 https://doi.org/10.1177%2F0093650218777177 https://doi.org/10.1145/3132847.3132877 https://doi.org/10.1145/3219819.3219903 https://www.huffingtonpost.in/entry/facebook-twitter-may-face-more-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e https://www.huffingtonpost.in/entry/facebook-twitter-may-face-more-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e https://doi.org/10.1613/jair.953 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 19 Mühendislik Fakültesi Fen ve Mühendislik Dergisi 12, no. 3 (2010): 49–62, https://dergipark.org.tr/en/pub/deumffmd/issue/40831/492667. 33 Robert Gunning, “The technique of clear writing,” Revised Edition, New York: McGraw Hill, 1968. 34 Ender Ateşman, “Türkçede Okunabilirliğin Ölçülmesi,” Dil Dergisi 58, no. 71–74 (1997). https://dergipark.org.tr/en/pub/deumffmd/issue/40831/492667 ABSTRACT INTRODUCTION LITERATURE REVIEW METHODOLOGY Framework Model Dataset Features EXPERIMENTAL RESULTS AND DISCUSSION CONCLUSION AND FUTURE WORK ENDNOTES
12593 ---- A Collaborative Approach to Newspaper Preservation PUBLIC LIBRARIES LEADING THE WAY A Collaborative Approach to Newspaper Preservation Ana Krahmer and Laura Douglas INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12596 Ana Krahmer (ana.krahmer@unt.edu) oversees the Digital Newspaper Unit at UNT. Through this work, she manages the Texas Digital Newspaper Program collection on The Portal to Texas History, which is a gateway to historic research materials freely available worldwide. Laura Douglas (laura.douglas@cityofdenton.com) is the librarian in charge of the Special Collections with the Denton Public Library which houses the genealogy, Texana, and local Denton history collections as well as the Denton municipal archives. In her work, she regularly assists patrons with newspaper research questions specifically related to Denton newspapers. © 2020. INTRODUCTION When we first proposed this column in January 2020, we had no idea how much the world would change between then and the July deadline. While we have collaborated for many years on a variety of projects, the value of our collaboration has never proven itself more than in this COVID - 19 reality: collaboration leverages the strengths and resources of partners to form something stronger than each. In this world of COVID-19, the collaboration between the Denton Public Library (DPL) and the University of North Texas Libraries (UNT) has allowed us to build open, online access to the first 16 years of the Denton Record-Chronicle (DRC). This newspaper is the city’s daily newspaper of record, and the collaboration between DPL and UNT resulted in free, worldwide research access, via The Portal to Texas History. The project was funded by a $24,820.00 grant through the IMLS Library Services and Technology Act (LSTA), awarded from September 2019 to August 2020 by the Texas State Libraries and Archives Commission (TSLAC) as part of their TexTreasures program, to digitize 24,000 newspaper pages. This project has also resulted in a follow-up collaboration to build open access to further years of this daily newspaper title, through a 2021 TexTreasures award to digitize an additional 24,000 newspaper pages. The real question, though, is what recipe made this a successful collaboration. BACKGROUND The DRC has been the community newspaper in Denton for over 100 years. Due to the sheer amount of material, digitizing a daily newspaper with such an extensive publication run is a long - term project that requires a lot of planning, time, and funding. Since the DPL’s inception in 1937, the library has endeavored to collect items related to Denton and Texas history. With community support, the library has developed a well-rounded collection of local history, Texana, and genealogical materials, all of which are housed in the Special Collections Research Area at the Emily Fowler Central Library. These materials support research, projects, and exhibits. One major research resource is the archival collection of local newspapers, mainly the DRC, maintained on 752 rolls of microfilm containing issues from 1908 to 2018. Before this project, access to these newspapers was only available in the Special Collections Research Area, through microfilm readers or paid subscription services. In addition, although steps had been taken to preserve the film, many of the rolls show wear from years of use, while others have developed vinegar syndrome and soon will no longer be a usable resource. In 2018, UNT obtained publisher permission to make the DRC run freely accessible on The Portal to Texas History. mailto:ana.krahmer@unt.edu mailto:laura.douglas@cityofdenton.com INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 A COLLABORATIVE APPROACH TO NEWSPAPER PRESERVATION | KRAHMER AND DOUGLAS 2 Laura had been exploring different avenues to digitize this microfilm and make them freely available to the public when Ana contacted her with information about the Texas State Library and Archives Commission (TSLAC), which awards annual grants supported by Library Services & Technology Act funds, through the Institute of Museum & Library Services. LSTA funding is annually provided to all fifty states through the Institute of Museum and Library Services, and the state library determines how this funding is expended. In Texas, LSTA funding is provided through a number of grant programs including TexTreasures, a competitive grant program for any Texas library. As described by TSLAC, the “TexTreasures grant is designed to help libraries make their special collections more accessible for the people of Texas and beyond. Activities considered for possible funding include digitization, microfilming, and cataloging.” Libraries can apply to fund the same type of project up to three years in a row, and the DRC project applied for $24,820.00 in 2019 to digitize 24,000 newspaper pages, representing the earliest years of microfilm available at the Denton Public Library. To create a viable grant application DPL partnered with the Texas Digital Newspaper Program (TDNP), available through UNT’s Portal to Texas History, and decided to start first by digitizing as many early years of microfilm as grant funding could cover. TDNP is the largest single-state, open access, digital newspaper preservation repository in the U.S., hosting just under 8 million newspaper pages at the time of this writing. In late 2018, UNT received permission from the owner of the DRC to include the newspaper run in the TDNP collection, which represented a very exciting opportunity for city and county researchers, as well as for the DPL. As thanks to the publisher for granting permission, UNT built access to the 2014 to 2018 PDF ePrint editions, which the TDNP preserves as a service to Texas Press Association- member publishers. After this, UNT contacted the DPL to discuss applying for grant funding. Once Laura learned that the DPL had received the 2019 award, she prepared the local planning steps necessary to collaborate with the university. THE PROJECT BECOMES REAL The Denton Record-Chronicle Digitization Project Grant contract and resolution for adoption went before the Denton City Council on October 8, 2019. The City of Denton issued a press release that day, and the DRC also published an article announcing the project. Over the next few days the DRC article appeared across social media, including the City of Denton’s social media accounts, as well as through library-associated email newsletters. After the first newspapers became available on the Portal, both DPL and UNT prepared blog posts about the project, which have also appeared on social media. These blog posts fulfilled publicity requirements specified by the grant, even while offering training to researchers in how to work with the online newspaper collection. One major convenience to this collaboration is that both organizations are in the same city. Transfer of materials was arranged by email and accomplished by a trip across town. We completed the digitization process in batches, with the first 10 microfilm rolls going to UNT on October 10, 2019, and UNT uploading the first 854 issues in December 2019. The newspapers from the first microfilm set represented 1908-1916. DPL transferred the last set of microfilm in April 2020, with dates ranging from 1917 through September 1924, shortly after which UNT completed and uploaded the grant-funded count of 24,000 newspaper pages. The estimated year given in the grant proposal that the scans would have gone through was 1938, but the page count on this newspaper proved to be much, much higher than originally estimated, and as a result, the funding only covered up to September 1924. DPL and UNT will continue their partnership by INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 A COLLABORATIVE APPROACH TO NEWSPAPER PRESERVATION | KRAHMER AND DOUGLAS 3 digitizing further years of the DRC, through a variety of methods. As we were in the midst of preparing this column the TSLAC contacted Laura to inform her that DPL had received a second grant award, in the amount of $24,820.00 to digitize 24,000 additional newspaper pages, which will move the newspapers through 1954. As of July 23, 2020, the Denton Record-Chronicle Collection on the Portal to Texas History hosts 6,168 items and has been used 16,397 times. This includes 1,743 items that are PDF ePrint editions of the paper from 2014 to 2018, which UNT uploaded for long-term preservation and access. UNT uploads ePrint editions without a charge, and digitally preserves these through an agreement with the Texas Press Association; these PDFs were not a part of the funded grant, but they do enhance access to the collection and helped to build community interest in seeing earlier years available on the Portal. The usage of the collection skyrocketed after the early editions became available. January 2020 saw the highest number with the collection uses at 3105. Once this project is complete, it will include over 200,000 newspaper pages. Neither DPL nor UNT has the ability to tackle this project on their own, but through collaboration, it is possible. RECIPE FOR YOUR OWN COLLABORATION SUCCESS These are planning recommendations as you prepare for your own collaboration, drawn from what we’ve learned as we worked on this project together. 1. Communicate Early and Often: Communicating needs enables partners to identify each other’s strengths. Each partner will bring their strengths to the project, which in this case included actual archival materials from DPL and technological expertise on the UNT side. In addition, be prepared to communicate with local groups who need to endorse or sign off on the project, including possibly the city council, the historical commission, or the city manager. 2. Partner to Write the Grant: Partnering in preparing the grant achieves two goals: first, it enables partners to develop a communication flow that will move forward throughout the collaboration; second, it ensures that partners know what each can realistically accomplish within the grant timeline. In this case, Laura wrote most of the grant application herself, but she had very specific questions that Ana had to answer, and she needed key elements from UNT, including project budget, technological infrastructure, and a commitment letter. Communicating early and partner on the grant application process ensured that there were no unexpected surprises that were within the control of either partner. 3. Work Together to Explain Your Partnership: With a grant of this size, we always spoke in advance to ensure we weren’t over-promising when newspapers would appear online. This also gave both Laura and Ana lead-time for promoting the project: Laura would share the years of the physical microfilm before sending them over, and Ana would walk Laura through the years that would get uploaded in a given month. This allowed them to plan publicity, training, and outreach efforts based on the dates of newspapers going online. In addition, Laura regularly communicated with Ana prior to submitting grant reports, and this was critical in preventing miscommunication going to the funding agency. 4. Pad Enough Time for the Unexpected: Of course, we had no way of knowing a pandemic would occur when we began this project, and what saved us was that we’d started planning as soon as we learned about receiving the grant, rather than as soon as the grant started, which was in September 2019. Planning two months in advance put us two months ahead of schedule, and we were able to start exchanging materials as soon as the grant period INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 A COLLABORATIVE APPROACH TO NEWSPAPER PRESERVATION | KRAHMER AND DOUGLAS 4 started. This gave us a few weeks of lead time so we successfully completed the project by the end of April 2020, at which point the microfilm page count had been scanned and UNT staff could remote in to complete the digitization processes. Extra time is only a benefit. If the COVID-19 pandemic had not occurred, we still might have had to address technological or film deterioration problems, and we could resolve these earlier rather than later because we had given ourselves a few extra weeks of lead time. 5. Don’t be Afraid to Explain Changes to Your Granting Agency: If your project changes due to unforeseen circumstances, for example in our project the uploaded total of pages reached 24,000 before we digitized the entire planned date range. UNT charges a per-page digitization fee, and these newspaper issues proved to contain more pages than expected . Laura contacted the representative at TSLAC to explain the situation and offer an alternative approach to cover the digitization of the remaining years. The important thing is to keep the granting agency informed of any changes, delays, or hiccups in the project. We are both proud of having completed this project three months before the end of the grant period, but we know that without solid communication, planning, or flexibility, the COVID-19 pandemic would have made the situation extremely difficult if not impossible. Leveraging the Portal’s technical infrastructure and TDNP’s newspaper expertise with the volume of material and collection expertise provided by the DPL has given us a model for success we plan to capitalize on in future projects. Best of all, in the world of COVID-19, our patrons can access these newspapers from the comfort of their own couches, without even taking off their pajamas! Introduction Background The Project Becomes Real Recipe for your own collaboration Success
12619 ---- What More Can We Do to Address Broadband Inequity and Digital Poverty? EDITORIAL BOARD THOUGHTS What More Can We Do to Address Broadband Inequity and Digital Poverty? Lori Ayre INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 HTTPS://DOI.ORG/10.6017/ITAL.V39I3.12619 Lori Ayre (lori.ayre@galecia.com) is Principal, The Galecia Group, and a member of the ITAL Editorial Board. © 2020. We are now almost seven months into our new lives with the novel coronavirus and over 190,000 Americans have died of COVID-19. Library administrators have been struggling with their commitment to provide services to their communities while keeping their staff safe. Initially, libraries relied on their online offerings, so more e-books and other online resources were acquired. Staff learned that they could do quite a bit of their work from home. They could still respond to email and phone messages. They could evaluate and order new material. They could deliver online programs like summer reading and story time. They could interact with people on social media. They could put together key resources for patrons and post them on the website.1 A lot of what the library was doing while the buildings were closed was not obvious. Most people associate the library with the building and since the building was closed… it seemed like nothing was happening at the library. Yet, library workers were busy. Once it became possible for library staff to enter the building (per local health ordinances), the first thing that libraries started to do was accept returns. That was a little fraught considering how little we knew about the virus and how long contaminants might live on returned library material. Eventually with the long-awaited testing results from the REALM Project and Battelle Labs (https://www.webjunction.org/explore-topics/COVID-19-research-project.html), people started standardizing on a three-day quarantine of returns. Then more testing of stacked material was done resulting in some people choosing to quarantine returns for four days. As of early September, we have learned that even five days isn’t enough to quarantine delivery totes and some other plastic material. Curbside pick-up was born in these early days of being allowed back in the buildings. If someone had mapped who was offering curbside pick-up, it would look like popcorn popping across the country. The number of libraries offering the service slowly increased and pretty soon nearly everyone was doing it.2 Many library directors will say that curbside pick-up is here to stay. People love the convenience too much to take the service away. Rolling out curbside pick-up has had some challenges: how to safely make the handoff between library staff and library patrons; whether to accept returns; whether to charge fines; modifying mailto:lori.ayre@galecia.com https://www.webjunction.org/explore-topics/COVID-19-research-project.html INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 2 circulation policies to fit the current needs; and selecting books for people that want them but who don’t have the skills needed to negotiate the library catalog’s requesting system. Some libraries started putting together grab bags of materials selected by staff for specific patrons—kind of like homebound services on-the-fly. Curbside helped get material in circulation again. Importantly, also during this period, libraries started finding creative ways to get Wi-Fi hotspots out into communities. They began lending them if they weren’t already. Those libraries already circulating hotspots increased their inventory. They took their bookmobiles into neighborhoods and created temporary hotspot connections around town. Many libraries made sure Wi-Fi from their building was available in their own parking lots.3 But one thing everyone has learned during this pandemic is that libraries alone cannot be the solution to the digital divide. This isn’t news to librarians who have been arguing that Internet access should be as readily available as electricity and water. Librarians understand that information cannot be free and accessible unless everyone has Internet access and knows how to use it. Public access computers, Wi-Fi hotspots, and media literacy are all staple services in our libraries today.4 However, these services are not enough to bridge a digital divide that only seems to be getting worse. The coronavirus that closed libraries and schools has made it painfully clear that something much bigger has to happen to address the problem. As Gina Millsap stated in a recent Facebook post: I think it’s become obvious that the COVID-19 crisis is shining a spotlight on the flaws we have in our broadband infrastructure and on our failure to make the investments that should have been made for equitable access to what should be a basic utility, like water or electricity.5 According to BroadbandNow, the number of people who lack broadband Internet access could be as high as 42 million.6 The FCC reports that at least “18 million people lacked access to broadband Internet at the end of 2018.”7 Even if all the libraries were open and circulating hundreds of Wi-Fi hotspots, we’d still have a very serious access problem. THINKING DIFFERENTLY ABOUT ADDRESSING THE DIGITAL DIVIDE In a paper published March 28, 2019, by the Urban Libraries Council (ULC), the author suggested three specific actions that libraries can take to address race and social equity and the digital divide. They are: 1. Assess and respond to the needs of your community through meaningful conversation (including considering different partners for your work) 2. Optimize funding opportunities to support your efforts (e.g. E-rate), and INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 3 3. Think outside the box to create effective solutions that are informed by those in need (e.g. lending Wi-Fi hot spots).8 While we know libraries have been heeding this advice when it comes to Wi-Fi hotspots, let’s look into what can be done when we consider ULC’s suggestion to consider different partners for your work. Community Partners An excellent example of what can be done with a coalition of community partners comes from Detroit where a mesh wireless network was put in place to provide permanent broadband in a low-income neighborhood.9 The project is called the Detroit Community Technology Project. With the community-based mesh network, only one Internet account is needed to provide access for multiple homes. The networks also enable people on the network to share resources on the network (calendar, files, bulletin board) and that data lives on their network, not in the cloud. One of the sponsors of the Detroit Community Technology Project is the Allied Media Project (https://www.alliedmedia.org/) which also sponsors the CassCoWifi and the Equitable Internet Initiative to get broadband and digital literacy training into several underserved areas. Community Networks (https://muninetworks.org/), a project of the Institute for Local Self- Reliance (https://ilsr.org/), describes several innovative projects in which communities partner with electric utilities. Surry County, Virginia, expects to extend broadband access to 6,700 households through a first-ever partnership between a utility (Dominion Energy Virginia) and an electric cooperative (Dominion Energy). A similar project is underway with the Northern Neck Cooperative and Dominion Energy.10 These initiatives are made possible due to some regulatory changes made in Virginia (SB 966). According to Community Networks, there are 900 communities providing broadband connectivity locally (https://muninetworks.org/communitymap). But nineteen states still have barriers in place that discourage, if not outright prevent, local communities from investing in broadband. Libraries in states where community networks are a viable option should be at the table, or perhaps setting the table, for discussions about how to bring broadband to the entire community - - not just into the library or dispatched one-at-a-time via Wi-Fi hotspots. This is an opportunity to convene community conversations focusing on the issue of broadband. Library staff have been doing more and more of this type of outreach into the community and acting as facilitator. The ALA has even produced a Community Conversation Workbook (http://www.ala.org/tools/sites/ala.org.tools/files/content/LTC_ConvoGuide_final_062414.pdf ) to support libraries just getting started. State Partners In California, the Governor recently issued Executive Order N-73-20 (https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-EO-N-73-20.pdf) directing state agencies to pursue a goal of 100 Mbps download speed and outlines actions across state agencies https://www.alliedmedia.org/ https://muninetworks.org/ https://ilsr.org/ https://muninetworks.org/communitymap http://www.ala.org/tools/sites/ala.org.tools/files/content/LTC_ConvoGuide_final_062414.pdf https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-EO-N-73-20.pdf INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 4 and departments to accelerate mapping and data collection, funding, deployment, and adoption of high-speed internet.11 This will undoubtedly create fertile ground for libraries to partner with other agencies and community organizations to advance this initiative. Libraries are specifically called out to raise awareness of low-cost broadband options to their local community. Every state has some kind of broadband task force or commission or advisory council (https://www.ncsl.org/research/telecommunications-and-information-technology/state- broadband-task-forces-commissions.aspx). This is another instance where libraries should be at the table. In my state, our State Librarian is on the California Broadband Council. But many of these commissions do not have a representative from the library world which means they probably are not hearing from us. Whether it is through your local library, your state library, or your state library association, it is important for librarians to build relationships with people on these commissions—if not get a seat on the commission themselves. National Partners Unless your community is blanketed with affordable broadband connectivity, it will be important that we continue to advocate nationally for the needs we see. In addition to helping the patron standing right in front of us checking out their hotspot, we also need to address the needs of the people who aren’t able to get to the library but are equally in need of access. Our job is to make sure that any new initiatives undertaken by a new administration provide for free and equitable access to the Internet for every household. Extending E-rate (the Federal Communication Commission’s program for making Internet access more affordable for schools and libraries) isn’t enough. Free (or at least affordable) broadband needs to be brought to every home. The Electronic Frontier Foundation (EFF) argues that fiber-to-the-home is the best option for consumers today because it will be easily upgradeable without touching the underlying cables and will support the next generation of applications (see https://www.eff.org/wp/case-fiber-home- today-why-fiber-superior-medium-21st-century-broadband). Libraries have worked with the EFF on issues related to privacy and government transparency. Maybe it’s time to team-up with them about broadband. Global Partners Low Earth Orbit (LEO) satellites could potentially bring broadband to everyone on Earth.12 Starlink (https://www.starlink.com/) is Elon Musk’s initiative and Project Kuiper (https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project- kuiper-satellite-constellation) is Amazon’s Jeff Bezos’ project. A private beta Starlink service is due (or perhaps it is already happening). If it works as Musk has envisioned, it could be a game- changer. Or it might just make the digital divide worse if it isn’t affordable to everyone who needs it. How might we lobby Musk to roll-out this service in a way that is equitable and fair? https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.starlink.com/ https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 5 SPEAK UP, SPEAK OUT, AND GET IN THE WAY These are just a few avenues that we, as professionals committed to free access to information, might pursue. I worry that we have not made enough noise about the problems we see in our communities that are a result of broadband inequity and digital poverty. And although virtually every library is doing something to address the problem, our efforts are no match for the magnitude of the problem. In a blog post on the Brookings Institution’s website, authors Lara Fishbane and Adie Tomer argue for a new agenda focused on comprehensive digital equity that includes (among other things) “building networks of local champions, ensuring community advocates, government officials, and private network providers share intelligence, debate priorities, and deploy new programming .”13 There are no better local champions and advocates for communities than the City or County Librarians and their staffs. Let’s treat this problem with the seriousness it deserves and at a scale that will be meaningful. To quote John Lewis (as so many of us have since his death on July 17, 2020), it's time for us to “speak up, speak out, and get in the way.”14 We have to make it painfully clear to policymakers that libraries cannot bridge the digital divide with public access computers and hotspots. We need to tell our communities’ stories, convene conversations, and agitate for equitable broadband that is as readily available as water and electricity. ENDNOTES 1 “Libraries Respond: COVID-19 Survey,” American Library Association, accessed August 25, 2020, http://www.ilovelibraries.org/sites/default/files/MAY-2020-COVID-Survey-PDF-Summary- of-Results-web-2.pdf. 2 Erica Freudenberger, “Reopening Libraries: Public Libraries Keep Their Options Open,” Library Journal, June 25, 2020, https://www.libraryjournal.com/?detailStory=reopening-libraries- public-libraries-keep-their-options-open. 3 Lauren Kirchner, “Millions of American Depend on Libraries for Internet. Now They’re Closed,” The Markup, June 25, 2020, https://themarkup.org/coronavirus/2020/06/25/millions-of- americans-depend-on-libraries-for-internet-now-theyre-closed. 4 Jim Lynch, “The Gates Library Foundation Remembered: How Digital Inclusion Came to Libraries,” TechSoup, accessed August 24, 2020, https://blog.techsoup.org/posts/gates- library-foundation-remembered-how-digital-inclusion-came-to-libraries. 5 Gina Millsap, “This was in April. Q. We’re starting a new school year and what has changed? A. Not much. It’s past time to get serious about universal broadband in the U.S.” Facebook, August 16, 2020, 5:37 a.m., https://www.facebook.com/gina.millsap.7/posts/10218986781485855. Accessed September 14, 2020. http://www.ilovelibraries.org/sites/default/files/MAY-2020-COVID-Survey-PDF-Summary-of-Results-web-2.pdf http://www.ilovelibraries.org/sites/default/files/MAY-2020-COVID-Survey-PDF-Summary-of-Results-web-2.pdf https://www.libraryjournal.com/?detailStory=reopening-libraries-public-libraries-keep-their-options-open https://www.libraryjournal.com/?detailStory=reopening-libraries-public-libraries-keep-their-options-open https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://www.facebook.com/gina.millsap.7/posts/10218986781485855 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 6 6 “Libraries are Filling the Homework Gap as Students Head Back to School,” Broadband USA, last modified September 4, 2018, https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are- filling-homework-gap-students-head-back-school. 7 James K. Willcox, “Libraries and Schools Are Bridging the Digital Divide During the Coronavirus Pandemic,” Consumer Reports, last modified April 29, 2020, https://www.consumerreports.org/technology-telecommunications/libraries-and-schools- ridging-the-digital-divide-during-the-coronavirus-pandemic/. 8 Sarah Chase Webber, “The Library’s Role in Bridging the Digital Divide”, Urban Libraries Council, last modified March 28, 2019, https://www.urbanlibraries.org/blog/the-librarys-role-in- bridging-the-digital-divide. 9 Cecilia Kang, “Parking Lots Have Become a Digital Lifeline,” The New York Times, May 20, 2020, https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html. 10 Ry Marcattilio-McCracken, “Electric Cooperatives Partners with Dominion Energy to Bring Broadband to Rural Virginia,” last modified August 6, 2020, https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring- broadband-rural-virginia. 11 “Newsom Issues Executive Order on Digital Divide,” CHEAC (Improving the Health of All Californians), last modified August 14, 2020, https://cheac.org/2020/08/14/newsom-issues- executive-order-on-digital-divide/. 12 Tyler Cooper, “Bezos and Musk’s Satellite Internet Could Save Americans $30B a Year,” Podium: Opinion, Advice, and Analysis by the TNW Community, last modified August 24, 2019, https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could- save-americans-30b-a-year/. 13 Lara Fishbane and Adie Tomer, “Neighborhood Broadband Data Makes It Clear: We Need an Agenda to Fight Digital Poverty,” Brookings Institution, last modified February 6, 2020, https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data- makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/. 14 Rashawn Ray, “Five Things John Lewis Taught us About Getting in ‘Good Trouble,’” Brookings Institution, last modified July 23, 2020, https://www.brookings.edu/blog/how-we- rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/. https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ Thinking Differently About Addressing the Digital Divide Community Partners State Partners National Partners Global Partners Speak Up, Speak Out, and Get in the Way ENDNOTES
12637 ---- Harnessing the Power of OrCam PUBLIC LIBRARIES LEADING THE WAY Harnessing the Power of OrCam Mary Howard INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12637 Mary Howard (mhoward@sccl.lib.mi.us) is Reference Librarian Library for Assistive Media and Talking Books (LAMTB) at the St. Clair County Library, Port Huron, Michigan. © 2020. Library for Assistive Media and Talking Books services (LAMTB) are located at the main branch for the St. Clair County’s Library System. LAMTB facilitates resources and technologies for residents of all ages who have visual, physical, and/or reading limitations that prevent them from using traditional print materials. Operating out of Port Huron, Michigan, we encounter many instances where we need to provide assistance above and beyond what a basic library may offer. We host Talking Book services which provide free players, cassettes, braille titles, and downloads to users who are vision or mobility impaired. We also have a large and stationary Kurtzweil reading machine that converts print to speech, video-enhanced magnifiers, large print books. We also provide home delivery service for patrons who are unable to travel to branches. The library has been searching for a more technology-forward focus for our patrons. The state’s Talking Books center in Lansing set up an educational meeting at the Library of Michigan in 2018 to see a live demonstration of the OrCam My Eye reader. This was the innovation we were seeking and I was thoroughly impressed with the compact and powerful design of the reader, the ease of use, and the stunningly accurate feedback provided by this AI reading assistive device. Users are able to read with minimal setup and total control. OrCam readers are lightweight, easily maneuverable assistive technology devices for users who are blind, visually impaired, or have a reading disability, including children, adults and the elderly. The device automatically reads any printed text: newspapers, money, books, menus, labels on consumer products, text on screens, books, or smartphones, etc. The OrCam reader will repeat back any text immediately and is fit for all ages and abilities. OrCam works with English, Spanish, and French languages and can identify money and other business and household items. It can be placed near either the left or right ear. Users can easily adjust the volume and speed of the read text. It can be to either the left or right temple on your glasses using a magnetic docking device. Having a diverse group of users with different needs use the reader as they like is one of the more impressive offerings. Changing most settings is normally facilitated with just a finger swipe on the OrCam device. The mission of OrCam is to develop a "portable, wearable visual system for blind and visually impaired persons, via the use of artificial computer intelligence and augmented reality” By offering these devices to our sight, mobility, or otherwise impaired patrons we open up the world of literacy, discovery and education. Some of our users are not able to read in any other fashion and the OrCam provides a much-needed boost to their learning profile. We secured a grant from the Institute of Museum and Library Services (IMLS) for the purchase of the readers (CFDA 45.310). We also worked with OrCam to get lower pricing for these units. Normally they retail for $3,500 but we were able to move this to the lower price point of $3,000. We also were awarded a $22,106 Improving Access to Information grant from the Library of Michigan to fund the entire purchase. Without this funding stream we would not have been able to secure the OrCam. However, if you have veterans in your service area please contact the company since there is availability for VA health coverage for low vision or legally blind veterans who may INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 HARNESSING THE POWER OF ORCAM | HOWARD 2 qualify to receive an OrCam device, fully paid for by the VA. Please visit https://orcam.com/en/veterans for more information. Figure 1. Close-up of the OrCam device. The grant was initially set to run from September 2019 to September 2020. We purchased six OrCam readers for our library users, and they were planned to be rotated among our twelve branches throughout this grant cycle. However, due to the pandemic and out of safety concerns for staff and visitors, our library was closed from March 23 to June 15 and we were only able to offer it to the public at six branches. As of July 14, 2020, we are projecting that we may open to the public in September, but COVID-19 issues could halt that. We have had to make arrangements with the grantor to extend the period for the usage of the OrCam from September to December. This will make up for some of the lost time and open a path for the other six libraries to have their turn offering the OrCam to their patrons. The interesting aspect of this is we now have to take our technology profile even further by offering remote training to prospective OrCam users. Thankfully, the design and rugged housing for the reader makes it easy to clean and maintain but the social distancing can prove to be intrusive for training. To set up a user you need to be within a foot or two of them and being very close in order to get them used to how the OrCam reads. There is a lot of directing involved and close contact with the user and instructor. We will use a work - around of providing distance instruction including in-person and remote training. OrCam also has a vast array of instructional videos that we will have cued up for users. We have had over 150 https://orcam.com/en/veterans INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 HARNESSING THE POWER OF ORCAM | HOWARD 3 residents attend presentations, demonstrations, and talks on the OrCam. I anticipate that this number will not be achieved for the second round; however, we may be more successful in our online presence since we can add the instruction to our YouTube page, offer segments on Facebook and other social media and provide film clips for our webpage. The situation has been difficult, but it has opened up LAMTB services to think about how we should be working to provide better and more remote service to our users. Since we cover over 800 square miles in the county, becoming more adaptable to servicing our patrons has become a paramount area of work for the library. The OrCam will bring about a new way of remote training to our patrons, which will bring about more awareness of the reader and how it can be beneficial to users. The St. Clair County Library System would like to thank the Institute of Museum and Library Services for supporting this program. The views, findings, conclusions or recommendations expressed in this article do not necessarily represent those of the Institute of Museum and Library Services.
12687 ---- In the Middle of Difficulty Lies Opportunity: Hope Floats LITA President’s Message In the Middle of Difficulty Lies Opportunity Hope Floats Evviva Weinraub Lajoie INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12687 Evviva Weinraub Lajoie (evviva@gmail.com) is Vice Provost for University Libraries and University Librarian, University at Buffalo, and the last LITA President. © 2020. If quarantine has illustrated anything to me, it’s that time is merely a construct. While my approximately 2-month term as President may be the shortest in LITA history, it has been filled with meetings, reports, protests, and preparations for our metamorphosis into Core. My thoughts have been consumed with the myriad of financial, health, and societal issues that have also filled my news feed. I spend a lot of time thinking and worrying about what their impact will be on our work and our institutions, how it impacts me and the people I work with personally, and what role Core may play for many of us in the future. I imagine all of us are thinking about health and safety. We are all balancing those parts of ourselves that want to aid, to help, to teach and guide with the parts of ourselves that are anxious and scared. Many of us have responsibilities where we need to protect our loved ones and ourselves. We are seeing the health and safety of our BIPOC colleagues disproportionately harmed. Balancing our crucial role within our communities is complicated and there are no right answers. I imagine many of us have been spending a lot of time thinking about money, whether it be personal concerns, institutional and organizational concerns, or their intersection point. We’re thinking a lot more about where our money comes from, how it is invested, how we pay for things, how we prioritize paying for things, who decides what gets purchased, and whose voice gets centered when we make that purchase. We’re thinking carefully about the institutions and infrastructures that have existed and how they will look different and should be different in a post-COVID landscape. I imagine most of us are thinking about societal connections. We are interacting with our professional colleagues differently, and many of us are, perhaps for the first time, perceiving the deep imbalances that permeate our personal, social, and professional lives. We are all trying to figure out how to do the work we need to do when we are uncomfortable and the world is uncertain and the demands for change are coming from all angles and in a variety of forms. LITA remained my professional home through the years because I found it to be a place where no matter who you were or where you worked, there was a place for you. That feeling of connection is so vital to all of us, pandemic and social unrest or not. Knowing there is a network I can depend on to be there when I’m working through the difficult and uncomfortable makes the work just a little bit easier and significantly more meaningful. Our professional organizations and affiliations have the ability to be an anchor in uncertain times - whether through a change in career, a financial crisis, an environmental catastrophe, or a global health emergency. On August 31, 2020, LITA officially dissolved and on September 1, our home became Core. At our last LITA Board meeting, Margaret Heller and Amanda L. Goodman presented a history of LITA. mailto:evviva@gmail.com http://hdl.handle.net/11213/14823 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 IN THE MIDDLE OF DIFFICULTY LIES OPPORTUNITY | WEINRAUB LAJOIE 2 What became clear to me in the retelling is that this is not LITA’s first reorganization. Nor is it our second or our third. LLAMA, LITA, and ALCTS have always been dancing with each other. Our merger is an acknowledgement that we “...play a central role in every library, shaping the future of the profession by striking a balance between maintenance and innovation, process and progress, collaboration and leading.” Collectively, we have had a year that is beyond comprehension—it has been filled with loss, anger, frustration, grief, anxiety, depression, horror...we have all been weathering the same storm, but our ships are not all equally prepared for the task laid ahead of them. That has been, for so many of us, the hardest part of all of this. We may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” Balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. I believe that together, we can make Core stand up to that challenge. It has been an honor to serve as the last LITA President. For the brief time I have served, to have the chance to hold an office so many people I truly admire have held...it is a legacy I am proud to have had a moment to uphold. I am gratified to transition LITA into a partnership that will take all that we have loved about LITA and make something new, something Core. https://core.ala.org/our-mission-vision-and-values/ https://core.ala.org/our-mission-vision-and-values/ https://core.ala.org/our-mission-vision-and-values/
12691 ---- Letter from the Editor (September 2020) LETTER FROM THE EDITOR September 2020 Kenneth J. Varnum INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12xxx With “unprecedented” rising to first place on my personal list of words I would prefer never to need to use again, let alone hear used, I find it eminently satisfying that some activities and events from before COVID continue in their usual, predictable ways. For me, the quarterly rhythm of publication of Information Technology and Libraries is one of those activities. It is helping keep me grounded. While it is certainly not much in the scope of what is happening all around me, it is at least something. One thing that is changing is that this journal, along with Library Resources and Technical Services and Library Leadership & Management are now publications of ALA’s newest division: Core: Leadership, Infrastructure, Futures. You’ll notice a new logo at the top of our site, reflecting the new organizational structure. I am excited about the possibilities of richer cross-Core cooperation and collaboration as we explore our new structure. This issue includes the first—and last—LITA President’s Message from incoming and outgoing LITA President Evviva Weinraub Lajoie. Evviva assumed the LITA presidency this summer, just before the merger of LITA, LLAMA, and ALCTS into the new Core division took place on September 1. Members of those three merged divisions should watch for information about elections for the new Core president in October. I am pleased that this issue includes the 2020 LITA/Ex Libris Student Writing Award winning article, Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results, by Sam Grabus of Drexel University. Julia Bauder, the Chair of this year’s selection committee (I was also a member, as ITAL editor) said, “This valuable work of original research helps to quantify the scope of a problem that is of interest not only in the field of library and information science, but that also, as Grabus notes in her conclusion, could affect research in fields from the digital humanities to the sciences.” Before closing, I would like to express my appreciation to Breanne Kirsch, who ably served on the editorial board from 2018-2020. Sincerely, Kenneth J. Varnum, Editor varnum@umich.edu September 2020 https://doi.org/10.6017/ital.v39i3.12235 https://doi.org/10.6017/ital.v39i3.12235 mailto:varnum@umich.edu
12847 ---- Public Libraries Respond to the COVID-19 Pandemic, Creating a New Service Model EDITORIAL BOARD THOUGHTS Public Libraries Respond to the COVID-19 Pandemic, Creating a New Service Model Jon Goddard INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.12847 Jon Goddard (jgoddard@northshorepubliclibrary.org) is a Librarian at the North Shore Public Library, and a member of the ITAL Editorial Board. © 2020. During the COVID-19 pandemic, public libraries have demonstrated, in many ways, their value to their communities. They have enabled their patrons to not only resume their lives, but to help them learn and grow. Additionally, electronic resources offered to patrons through their library card have allowed people to be educated and entertained. The credit must go to the librarians, who initially fueled, and have maintained this level of service by re-writing the rules—creating a new service model. Once libraries closed, librarians promoted ebooks and other important platforms available to patrons with their library cards. The result: The checkout of ebooks, and the use of these platforms rose, exponentially. Community engagement became completely virtual with librarians, and those who provide library programs to the public, providing services on platforms that they may or may not have heard of, such as Zoom and Discord. As libraries re-opened, many offered real-time reference services, as well as seamless and contactless curbside service, providing a sense of control and continuity amongst the chaos. EXPONENTIAL INCREASES IN ELECTRONIC RESOURCE USAGE Overdrive, which is currently used by nearly 90% of public libraries in the United States to manage both ebook and audiobook collections, saw an exponential increase in its usage. Since the lockdown began in mid-March, the daily average for ebook checkouts have been consistently 52% above pre-COVID periods. Additionally, new users to the platform have been consistently double and triple 2019 highs.1 Library staff have been helping readers during this time to ensure they obtain access with their devices. In Suffolk County, New York, where new patron registration to Overdrive is up 72% from last year (as of August 2020), there has been no shortage of requests for help.2 With kids being home from school and learning virtually, it is no surprise that ebook readership skyrocketed amongst YA and Juvenile readers with an 87% increase from last year. 3 To help them with their homework and studies, families turned to online tutoring. In Suffolk County, New York, the usage of the Brainfuse online live tutoring service has been consistently up by nearly 50% during the school closures.4 Gale, a Cengage company, which offers Miss Humblebee's Academy, a virtual learning program for preschoolers, saw its user sessions increase by 100% from the previous year.5 mailto:jgoddard@northshorepubliclibrary.org INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 PUBLIC LIBRARIES RESPOND TO THE COVID-19 PANDEMIC | GODDARD 2 Adults, also eager to learn new skills, took to online courses as well. Gale Courses saw a 50% increase in enrollments from March-July from the previous year. Likewise, Gale Presents: Udemy, which offers on-demand video courses, saw just over 21,000 course enrollments from March- June.6 To help those who did not have sufficient broadband Wifi to use these necessary resources and platforms, many libraries left their Wifi on even when the building was closed to allow access to those in the vicinity of the building. In addition, many libraries purchased Wifi hotspots to lend to their patrons. According to Pew Research, approximately 25% of households do not have a broadband internet connection at home.7 While public libraries cannot provide the only local solution to this gap, here are other steps libraries have been taking during the shutdown: • Strengthening wireless signals so people can access wireless from outside library buildings. • Hosting drive-up Wifi hotspot locations. • Partnering with schools to obtain and distribute Wifi hotspots to families in need. COMMUNITY ENGAGEMENT - VIRTUALLY Community engagement has been vital since the COVID-19 lockdown. Both librarians and those who provide library programs to the public had to quickly adjust to the virtual world in which we were suddenly living. Using a mixture of social media platforms, including Facebook Live and Stories, Discord, Instagram, YouTube, Zoom, and GoToMeeting, librarians flocked to the internet, providing a wide range of programming. Even those libraries that did not previously have any virtual programs managed to very quickly provide quality programs to their patrons. Virtual programming was not available at the San José Public Library (SJPL) prior to the shutdown. Librarians quickly started to move programs online, including story time, and created a program called Spring Into Reading, similar to the summer reading program, to continue to encourage families to read together. They also started a weekly recorded story time, so patrons could call the library and use their phones to hear a story. To date, SJPL has hosted over 2,000 virtual events since the lockdown began on March 17th.8 Some libraries, like the Oceanside Library in New York, were offering virtual programs before the pandemic. When the library closed on March 13, the team started planning to move completely virtual. Two days later, the library was offering four programs a day, including story times, book chats, and book clubs. By the end of the week, they were offering eight programs a day.9 In April, May, and June, they found book discussions and story times were the most popular programs. They then started to open their programs to people from out of state, partnering with other libraries. The result? Program attendance has increased and several Zoom meeting rooms have been maxed out.10 Through the lockdown, library patrons have been exercising, listening to concerts, taking virtual vacations, learning new skills, cooking, playing games, and reducing stress. This incredible INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 PUBLIC LIBRARIES RESPOND TO THE COVID-19 PANDEMIC | GODDARD 3 adaption was only possible due to library worker’s quick thinking and a never-ending determination to help. DELIVERING INFORMATION AND MATERIALS WITH A NEW SERVICE MODEL At the San Jose Public Library (SJPL), which has over 500,000 library members, library staff had to quickly shift to a new online reality just after the shutdown. To help patrons get the most from their electronic resources, SJPL used LibAnswers to post FAQs and email responses to their issues and questions. When a librarian was available, patrons could use LibChat to ask questions in real-time. Because no one was in the library buildings to answer phones, LibAnswers and LibChat became the only way the public could communicate with staff. Chat reference conversations increased by nearly 400%—from approximately 40 chat sessions per day to 160 per day. The chat service was also made available in Spanish, Vietnamese, and Chinese. When the library implemented its Express Pickup service, SJPL utilized the Spaces functionality in LibCal to allow patrons to create pickup appointments. When patrons arrived at the library for their appointment, the SMS functionality in LibAnswers allowed patrons to text staff upon arrival. Through the City of San José’s SJ Access initiative, which aims to help bridge the digital divide in the city, SJPL worked closely with other city departments, and the Santa Clara County Office of Education, to purchase approximately 16,000 high-speed AT&T hotspots for students and the public.11 Working Towards the New Normal The American Library Association (ALA) is committed to advocate strongly for libraries on several different fronts. Thanks to thousands of advocate communications with Congress, libraries secured $50 million for the Institute of Museum and Library Services (IMLS) in the Coronavirus Aid, Relief, and Economic Security (CARES) Act. This enabled libraries and museums to apply for grants during this time of need.12 In addition, the ALA is currently advocating for the passage of the Library Stabilization Fund Act (S.4181 / H.R.7486) to allow libraries to retain staff, maintain services, and safely keep communities connected and informed. The legislation calls for $2 billion in emergency recovery funding for America's libraries through the Institute of Museum and Library Services (IMLS).13 While the ALA is rightly advocating for these emergency funds, public librarians and administrators should take advantage of this time to strategically review what has been put into place to react to the COVID-19 pandemic, and plan for the long term. While it is true that libraries are physical spaces, they are also technology-driven services for learning and connections for all ages. Additionally, they have shown that due to this new service model, access has expo nentially expanded to new patrons, showing tremendous value when it comes to education and engagement. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 PUBLIC LIBRARIES RESPOND TO THE COVID-19 PANDEMIC | GODDARD 4 This new service model should be preserved. Programs that engage our communities should be both physical and virtual. Physical media and books should be provided both at the circulation desk and through a contactless service. Reference services should be provided both at the reference desk, and through chat reference services. This must be our new normal. ENDNOTES 1 David Burleigh, Director, Brand Marketing & Communication at Overdrive, phone conversation with author, October 9, 2020. 2 Maureen McDonald, Special Projects Supervisor at the Suffolk Cooperative Library System, phone conversation, September 14, 2020. 3 Burleigh. 4 McDonald. 5 Kayla Siefker, Head of Media & Public Relations at Gale, a Cengage Company, Brian Risse, VP of Sales – Public Libraries. Muna Sharif, Product Manager, Discovery & Analytics, phone conversation with author, October 16, 2020. 6 Siefker. 7 Pew Research Center, “Internet/Broadband Fact Sheet,” June 12, 2019, accessed October 13, 2020, https://www.pewresearch.org/internet/fact-sheet/internet-broadband/. 8 Laurie Willis, Web Services at SJPL, Phone conversation with author, October 14, 2020. 9 Erica Freudenberger, “Programming Through the Pandemic,” Library Journal, May 22, 2020, https://www.libraryjournal.com/?detailStory=Programming-Through-the-Pandemic-covid- 19. 10 Tony Iovino, Assistant Director for Community Services at the Oceanside Library, phone conversation with author, October 19, 2020. 11 Willis. 12 American Library Association, “Advocacy & Policy,” accessed October 15, 2020, http://www.ala.org/tools/covid/advocacy-policy. 13 Ibid. https://www.pewresearch.org/internet/fact-sheet/internet-broadband/ https://www.libraryjournal.com/?detailStory=Programming-Through-the-Pandemic-covid-19 https://www.libraryjournal.com/?detailStory=Programming-Through-the-Pandemic-covid-19 http://www.ala.org/tools/covid/advocacy-policy Exponential Increases in Electronic Resource Usage Community Engagement - Virtually Delivering Information and Materials with a New Service Model ENDNOTES
12857 ---- Journey with Veterans: Virtual Reality Program using Google Expeditions PUBLIC LIBRARIES LEADING THE WAY Journey with Veterans Virtual Reality Program using Google Expeditions Jessica Hall INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.12857 Jessica Hall (jessica.hall@fresnolibrary.org) Community Librarian, Fresno County Public Library. © 2020. “Where would you like to go?” is the question of the day. We have stood atop the Great Wall of China, swam with sea lions in the Galapagos Islands, and walked along the vast red sands of Mars. Each journey was unique and available through the library. As a community librarian in charge of outreach to seniors and veterans, I first learned about the virtual tour idea from a colleague who returned from a conference excited to tell me about a workshop she had attended. The workshop she had taken described a program which utilized Google Expeditions to take seniors on virtual tours. This idea stayed with me for months until Fresno County Public Library obtained the $3000 Value of Libraries grant, which was funded by the California Library Services Act. As a part of this grant, $2905 went to purchase a Google Expeditions kit and supplied to create a virtual reality program called Journey with Veterans. The kit includes 5 viewers and 1 tablet. A viewer is basically a Google Cardboard except the case is plastic and there is a smartphone inside of the case. During the program, I use the table to select and run each tour. The tour I select on the tablet is projected to the 5 viewers so participants can experience it. In this manner, veterans can explore places without physically having to travel anywhere. The Journey with Veterans program took the technology to the veterans instead of requiring them to come into the library. The two locations that were chosen were the Veterans Home of California - Fresno and the Community Living Center at the VA Medical Center in Fresno, CA. From the time the program began in September 2019 to March 2020, when the pandemic shutdown brought a halt to the program, the library hosted 26 sessions at these two locations with 182 veterans. In sessions where more than 5 people were in attendance, the viewers were shared between the participants. The tablet and smartphones inside of the viewers have an app installed on them called Google Expeditions which is the software that runs the tours. One hotspot, which was already owned by the library, was used for this program. It is a requirement that all the viewers and the tablet are connected to the same WiFi. Having a portable WiFi connection was necessary to run this program in locations where there was not access to a strong internet connection. Each tour is a selection of still 360-degree views. The landscape does not move. Instead, the participant turns their head around, up and down to look at the entire scene. The control tablet included additional menu items not seen by participants. These items included scripts that I can read off about the landscape we are looking at and suggested points of interest that I could highlight for participants. When I selected the point of interest on the tablet, the participant would see arrows pointing to that area of their screen. The participant would follow the arrows by turning their head in the direction that was indicated. The participants knew they were looking at the area of interest when the arrows disappeared and was replaced by a white circle surrounding the relevant portion of the screen. mailto:jessica.hall@fresnolibrary.org INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 JOURNEY WITH VETERANS | HALL 2 The viewers did not have straps attached to them and there was no way to attach straps to them. Therefore, the viewer could not be strapped to the participant’s head. Instead, the participant had to hold up the viewer the entire time they wished to look through it. This presented a challenge for participants who did not have the ability to hold the viewer on their own. At the locations I went to, there were staff available to help and they would hold the viewer up to a participant’s eyes. In some cases, one staff person held the viewer up for the participant while another would turn the participant’s wheelchair in a circle so they could see the entire image. Each program lasted 30-45 minutes but the amount of time looking through the viewer was kept to around 15-20 minutes. The rest of the time was filled with talking about the location that we are viewing. For the veterans in memory care at the Veterans Home of California - Fresno, this program was designed with the hope that it would allow the veterans to reminisce about places they had visited and lived in and encouraged them to talk about their experiences. Some of the participants had been to the countries that we visited virtually and they reminisced on their time there. At every session, the participants shared their enthusiasm and eagerness to continue the program. The program once was tried with music. On one of my first visits to the Community Living Center at the VA Medical Center, a participant asked if he could play music in the background. Since I had thought about incorporating music into the program, I agreed, and the participant played some classical music from his own device. Though it was a good idea, the execution did not work well. The music was coming from one location, which made it too loud when one stood near it but too quiet once one walked too far away. I found the music difficult to talk over while giving the tour. I believe that incorporating sounds of the location we visit, such as the sounds of the countryside or a big city would make the experience more immersive. However, I have yet to find a way to do so successfully. After the grant ended, I continued the program at both locations. The partnership I had created at the Veterans Home of California-Fresno grew into a second program, Storytime with Veterans which was requested by specifically by the residents. I alternated my visits so that some weeks we did a virtual reality program and some weeks I read to them. One time, there was miscommunication and the activity coordinator thought I had come to read a story but I was under the impression that it was a virtual reality week and so I had brought the Google Expeditions with me. The solution was to do both. One of the Google Expeditions tours is a very short and much abridged virtual reality version of Twenty Thousand Leagues Under the Sea by Jules Verne. The tour used artwork to represent scenes from the books and each scene tells a different part of the story. The Veterans Home’s residents were treated to both a story and a virtual reality tour at the same time. Up until the library’s shutdown in mid-March due to COVID-19, I was in the process of expanding the use of the Google Expeditions but was unable to continue. Since then, the equipment has not been used. Restarting the program now includes multiple challenges, not the least of which is sanitizing the devices. Sanitation was a consideration even before COVID-19 and sanitary virtual reality masks were acquired using grant funds as part of the initial program. These masks look like strips of cloth that line the eyes with strings to hook it around the ears to hold it in place. Cleaning products were also purchased and utilized to clean the devices after each program. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 JOURNEY WITH VETERANS | HALL 3 Before COVID-19, a viewer could be handled by multiple people before it was cleaned. I always handled them first to prepare them for use. Then I handed each one to the participant. Occasionally they were also handled by staff. I always cleaned the viewers right after the program ended but not during the program. With the current COVID-19 restrictions, the sanitation practices previously used are inadequate. I do not know the future of the program in a post- COVID-19 world, but I intend to begin the program again once when it becomes safe to do so and I will incorporate all required precautions and restrictions. I look forward to once more being able to take veterans on exciting virtual journeys.
13027 ---- Leadership and Infrastructure and Futures…Oh my! LETTER FROM THE CORE PRESIDENT Leadership, Infrastructure, Futures Christopher Cronin INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.13027 Christopher Cronin (cjc2260@columbia.edu) is Core President and Associate University Librarian for Collections, Columbia University. © 2020. I am so pleased to be able to welcome all ITAL subscribers to Core: Leadership, Infrastructure, Futures! This issue marks the first of ITAL since the election of Core’s inaugural leadership. A merger of what was formerly three separate ALA divisions—the Association of Library Collections & Technical Services (ALCTS), Library & Information Technology Association (LITA), and the Library Leadership & Management Association (LLAMA)—Core is an experiment of sorts. It is, in fact, multiple experiments in unification, in collaboration, in compromise, in survival. While initially born out of a sheer fight or flight response to financial imperatives and the need for organizational effectiveness, developing Core as a concept and as a model for an enduring professional association very quickly became the real motivation for those of us deeply embedded in its planning. Core is very deliberately not an all-caps acronym representing a single subset of practitioner within the library profession. It is instead an assertion of our collective position at the center of our profession. It is a place where all those working in libraries, archives, museums, historical societies—information and cultural heritage broadly—will find reward and value in membership and a professional home. All organizations need effective leaders, strong infrastructure, and a vision for the future. And that is what Core strives to build with and for its members. While I welcome ITAL’s readers into Core, I also welcome Core’s membership into ITAL. No longer publications of their former divisions, all three journals within Core have an opportunity to reconsider their mandates. As with all things, audience matters. ITAL’s readership has now expanded dramatically, and those new readers must be invited into ITAL’s world just as much as ITAL has been invited into theirs. As we embark on this first year of the new division, we do so with a sense of altogether newness more than of a mere refresh, and a sense of still becoming more than a sense of having always been. And who doesn’t want to reinvent themselves every once in a while? Start over. Move away from the bits that aren’t working so well, prop up those other bits that we know deserve more, and venture into some previously uncharted territory. How will being part of this effort, and of an expanded division, reframe ITAL’s mandate? The importance of information technology has never been more apparent. It is not lost on me that we do this work in Core during a year of unprecedented tumult. In 2020, a murderous global pandemic was met with unrelenting political strife, pervasive distribution of misinformation and untruths, devastating weather disasters, record-setting unemployment, heightened attention on an array of omnipresent social justice issues, and a racial reckoning that demands we look both inward and outward for real change. Individually and collectively, we grieve so many losses —loss of life, loss of income, loss of savings, loss of homes, loss of dignity, loss of certainty, loss of control, loss of physical contact. And throughout all of these challenges, what have we relied on more this year than technology? Technology kept us productive and engaged. It provided a focal point for communication and connection. It provided venues for advocacy, expression, inspiration, and, as a mailto:cjc2260@columbia.edu INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 LEADERSHIP, INFRASTRUCTURE, FUTURES | CRONIN 2 counterpoint to that pervasive distribution of misinformation, it provided mechanisms to amplify the voices of the oppressed and marginalized. For some, but unfortunately not all, technology also kept us employed. And as the physical doors of our organizations closed, technology provided us with new ways to invite our users in, to continue to meet their information needs, and to exceed all of our expectations for what was possible even with closed physical doors. And yet our reliance on and celebration of technology in this moment has also placed another critical spotlight on the devastating impact of digital poverty on those who continue to lack access, and by extension also a spotlight on our privilege. In her parting words to you in the final issue of ITAL as a LITA journal, Evviva Weinraub Lajoie, the last President of LITA, wrote: We may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” Balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. I believe that together, we can make Core stand up to that challenge. I believe we will do this, too, and with a spirit of reinvention that is guided by principles and values that don’t just inspire membership but also improve our professional lives and experience in tangible ways. It was a privilege to have served as the final President of ALCTS and such a humbling and daunting responsibility to now transition into serving as Core’s first. It is a responsibility I do not take lightly, particularly in this moment when so much is demanded of us. As we strive for equity and inclusion, we do so knowing that we are only as strong as every member’s ability to bring their whole selves to this work. We must work together to make our professional home everything we need it to be and to help those who need us. It is yours, it is theirs, it is ours. https://doi.org/10.6017/ital.v39i3.12687
13051 ---- Letter from the Editor: Farewell 2020 LETTER FROM THE EDITOR Farewell 2020 Kenneth J. Varnum INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.13051 I don’t think I’ve ever been so ready to see a year in the rear-view mirror as I am with 2020. This year is one I’d just as soon not repeat, although I nurture a small flame of hope. Hope that as a society what we have experienced this year will exert a positive influence on the future. Hope that we recall the critical importance of facts and evidence. Hope that we don’t drop the effort to be better members of our local, national, and global communities and treat everyone equitably. Hope that as a global populace we continue to get into “good trouble” and push back against institutionalized policies and practices of racism and discrimination and strive to be better. Despite the myriad challenges this year has brought, it is welcome to see so many libraries continuing to serve their communities, adapting to pandemic restrictions, and providing new and modified access to books and digital information. And equally gratifying, from my perspective as ITAL’s editor, is that so many library technologists continue to generously share what they have learned through submissions to this journal. Along those lines, I’m extending my annual invitation to our public library colleagues to propose a contribution to our quarterly column, “Public Libraries Leading the Way.” Items in this series highlight a technology-based innovation from a public library perspective. Topics we are interested in could include any way that technologies have helped you provide or innovate service to your communities during the pandemic, but could touch on any novel, interesting, or promising use of technology in a public library setting. Columns should be in the 1,000-1,500 word range and may include illustrations. These are not intended to be research articles. Rather, Public Libraries Leading the Way columns are meant to share practical experience with technology development or uses within the library. If you are interested in contributing a column, please submit a brief summary of your idea. Wishing you the best for 2021, Kenneth J. Varnum, Editor varnum@umich.edu December 2020 https://ejournals.bc.edu/index.php/ital/pllw https://docs.google.com/forms/d/e/1FAIpQLSd7c0-g-LxeTkJ2uKJoKD7OYT-VPrTOizdm1Fs8XuHKotCtug/viewform https://docs.google.com/forms/d/e/1FAIpQLSd7c0-g-LxeTkJ2uKJoKD7OYT-VPrTOizdm1Fs8XuHKotCtug/viewform mailto:varnum@umich.edu
5704 ---- Fulfill Your Digital Preservation Goals with a Budget Studio Yongli Zhou INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 26 ABSTRACT To fulfill digital preservation goals, many institutions use high-end scanners for in-house scanning of historical print and oversize materials. However, high-end scanner prices do not fit in many small institutions’ budgets. As digital single-lens reflex (DSLR) camera technologies advance and camera prices drop quickly, a budget photography studio can help to achieve institutions’ preservation goals. This paper compares images delivered by a high-end overhead scanner and a consumer-level DSLR camera, discusses pros and cons of using each method, demonstrates how to set up a cost-efficient shooting studio, and presents a budget estimate for a studio. INTRODUCTION Colorado State University Libraries (CSUL) are regularly engaged in a variety of digitization projects. Materials for some projects are digitized in-house, while items from selected projects are sometimes outsourced. Most fragile materials that require professional handling are digitized in- house using an expensive overhead scanner. However, the overhead scanner has been occasionally unstable since it was purchased, and this has delayed some of our digitization projects. As digital photography technologies advance, image quality delivered by digital single- lens reflex (DSLR) cameras is improving, and camera prices have lowered to an affordable level. In this paper, I will compare images produced by a scanner and a camera side-by-side, list pros and cons of using each method, illustrate how to establish a shooting studio, and present a budget estimate for that studio. LITERATURE REVIEW There are many online guidelines and manuals for digitizing print materials. Some universities and museums have information about their digitization equipment online. Most articles focus on either high-end scanners or customized scanning stations. These articles are very helpful for universities and museums that are relatively well funded. However, there is almost no literature discussing how to use inexpensive digital cameras and photography equipment to produce high- quality digitized images. This article will use a case study to prove that a low-budget studio can produce high-quality digitized images. COMPARISON OF SCANNED AND PHOTOGRAPHED IMAGES The test camera set was chosen because it was the one the author used for general purpose. The camera was also chosen by many professional photographers because of its quality and Yongli Zhou (yongli.zhou@colostate.edu) is Digital Repositories Librarian, Colorado State University Libraries, Fort Collins, Colorado. mailto:yongli.zhou@colostate.edu FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 27 affordability. To avoid dispute, the overhead scanner’s make and model are not revealed. Test Equipment Budget Studio Overhead Scanner • Nikon D800 • Nikon AF Micro-Nikkor 60mm f/2.8D Lens • Manfrotto 055CXPRO3 3-Section Carbon Fiber Tripod Legs • Really Right Stuff BH-40 LR II Ballhead • Nonreflective glass • Book cradles • X-Rite Original ColorChecker Card • Natural daylight • Total cost: $4,500 and no maintenance fees (priced in 2014) • Our overhead scanner • Nonreflective glass • Book cradles • Purchase price: $55,000 (purchase in 2007) • $8,000 annual maintenance (2013 price) Table 1. Test Equipment Focus and Sharpness A quality digitized image needs to have a good focus. A well-focused image shows details better and can produce better Optical Character Recognition (OCR) results for text-based documents. At CSUL, we have no control over the automatic focus on our overhead scanner and have noticed that sometimes one page is sharply focused but the next page is slightly out-of-focus. During the scanning process, our overhead scanner does not indicate if a shot is focused or not. A DSLR camera can beep or display a flashing dot on the viewfinder when in focus. Illustration The following two figures compare images produced by our test DSLR and overhead scanner. Both images were originals and have not been enhanced by software. In addition to this image, we tested nine other illustrations. Following our comparison study, we concluded that a semiprofessional DSLR camera produces sharper images than our expensive overhead scanner. In figure 1, at 100 percent zoom , the left image has a better focus, contains more details, and has colors closer to the original. The left image was taken using a Nikon D800 + Nikkor 60mm macro lens and under natural lighting. The right image was produced by our overhead scanner. In Figure 2, at 200 percent zoom, the left image (taking using the DSLR) shows much more detail than the image on the right (taken with the overhead scanner). INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 28 Figure 1. Comparative Images from DSLR (Left) and Overhead Scanner (Right), at 100 Percent Zoom. Image from Samuel M. Janney, The Life of William Penn; with Selections from His Correspondence and Auto-Biography (Philadelphia: Hogan Perkins & CO, 1852), plate between pages 296 and 297. Figure 2. Comparative Images from DSLR (Left) and Overhead Scanner (Right), at 200 Percent Zoom. Image from Samuel M. Janney, The Life of William Penn; with Selections from His FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 29 Correspondence and Auto-Biography (Philadelphia: Hogan Perkins & CO, 1852), frontispiece, print. At CSUL, the process of digitizing a text document includes scanning pages, converting them into Portable Document Format (PDF) files, and applying an OCR process. In general, a well-focused image of text produces better OCR results, although software such as Adobe Acrobat can tolerate fuzzy images and produce reasonably accurate OCR text. Our OCR tests from a slightly out-of-focus image and a well-focused image have no significant difference; however, from preservation and usability standpoints, we prefer well-focused images. Figure 3. The left image was produced by our test DSLR camera and has a better focus. The right image was produced by our overhead scanner. Samuel M. Janney, The Life of William Penn; with Selections from His Correspondence and Auto-Biography (Philadelphia: Hogan Perkins & CO, 1852), 300, print. Figure 4. We ran the OCR process on the above two images. The top image was produced by our test DSLR camera and the bottom image was produced by our overhead scanner. Samuel M. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 30 Janney, The Life of William Penn; with Selections from His Correspondence and Auto-Biography (Philadelphia: Hogan Perkins & CO, 1852), 300, print. Generated from the Image by Camera Generated from the Image by Scanner " On one or two points of high importance, he had notions more correct than were, in his day, common, even among men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." Yet, "he was not a man of stron sense." " On one or two points of high importance, he bad notions more correct than were, in his day, common, even arnong men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." Yet, "he was not a man of strong sense." Table 2. OCR Results Comparison These test results are very close because of the forgiveness of the Adobe Acrobat software. However, we have seen that for some other pages, a better-focused image generates improved OCR results. Photograph A 6.5 inches by 4.5 inches silver print was used for this test. Our tests show that the test DSLR camera produced a sharper image of this historic photograph. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 31 Figure 5. Tested 6.5 Inches by 4.5 Inches Photograph. The red square indicates the enlarged area for figure 6. Historical photograph from Colorado State University Archives and Special Collections. Figure 6. Screen View at 100 Percent Zoom of a Silver Print. The top image was produced by the test DSLR camera and the bottom one was produced by our overhead scanner. Historical photograph from Colorado State University Archives and Special Collections. Oversize Materials For oversized materials, overhead scanners and DSLR cameras have their drawbacks, so we do not think either option is ideal for them. Our library uses a map scanner to scan oversize maps and posters. However, a map scanner is expensive and may not fit many libraries’ budgets. A map scanner also is not suitable for fragile maps or posters. Our overhead scanner’s maximum scanning area is 24 inches by 17 inches, and the test map’s size is 25 inches by 26 inches. We had to scan the map in four sections and stitch them together using Adobe Photoshop. Each section image has a files size of 313 MB. Because of large file sizes, the stitching process is extremely slow. Also stitching images is not recommended because there are always some degrees of mismatching errors created by lens distortion. A camera can capture any material size, but the details of the photographed images diminish as the material’s size increases. The photo of the entire map taken by our test DLSR has a file size of 35.8 MB. The image produced by camera has a lower resolution and less detail. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 32 Figure 7. Oversized Materials Screen View at 100 Percent Zoom. The top image was photographed by the test DSLR. The bottom image was scanned by our overhead scanner. Historical map from Colorado State University Archives and Special Collections. Small Prints One big advantage of a DSLR camera is that it can be set farther away to take pictures of oversized materials or very close to smaller objects to take close-up pictures. Comparatively, the distance of lens and scanning platform on our overhead scanner is fixed, so no close-up images can be produced, and everything is reproduced at scale of 1:1. For the following example, we used a 5.5 inches by 3.5 inches drawing as our test subject. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 33 Figure 8. A 5.5 inches by 3.5 inches Fine Drawing. A historical booklet from Colorado State University Archives and Special Collections. Figure 9. Small Prints Screen View at 100 Percent Zoom. The left image is produced by a DSLR with a macro lens and the right image was scanned by our overhead scanner. A historical booklet from Colorado State University Archives and Special Collections. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 34 The image produced by our overhead scanner has a resolution of 3,427 pixels by 2,103 pixels. The camera produces a 6,776 pixels by 4,240 pixels image. The higher pixel count allows users to see more details at the same zoom level. The image produced by camera is not only sharper but also contains more details. It also is good for making enlarged prints for promotion materials. For smaller maps, a DSLR camera also produces superior images. For the following sample, we tested a 15 inches by 9.5 inches map. Figure 10. A 15 inches by 9.5 inches map. The blue square indicates the enlarged area for figure 11. Historical map from Colorado State University Archives and Special Collections. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 35 Figure 11. Small Map Screen Views at 100 Percent Zoom. The left image was photographed by a DSLR camera with a macro lens and the right image was produced by our overhead scanner. Historical map from Colorado State University Archives and Special Collections. Post-Processing Use of a Sharpening Filter Our tests showed that a main drawback of our overhead scanner is that images produced are out- of-focus. Some digitization guidelines recommend minor post-processing for delivered images files to improve image quality. One might argue that to fix our overhead scanner’s out-of-focus problem, sharpening can be applied. Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files recommends doing minor post-scan adjustment to optimize image quality and bring all images to a common rendition.1 This is good advice, but it is not applicable in real-world practice. To get the best result, each image would need to be evaluated and have a sharpening filter applied separately because when an improper sharpening setting is applied to an image, it often creates haloing artifacts and an unnatural look. The application of a sharpening filter to each image process will be extremely time-consuming. The haloing artifact is also called chromatic aberration (CA) effect. CA appears as unsightly color fringes near high contrast edges. Chromatic aberrations are typically only visible when viewing the image on-screen at higher zoom levels or on large prints. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 36 The following example shows that the CA may not appear at lower zoom levels, such as 50 percent or 100 percent. The left image has no sharpening filter applied and the right image has a sharpening filter applied. At 100 percent zoom, chromatic aberration is almost not identifiable, and the right image appears to be superior in turns of sharpness. Figure 12. Sharpening Filter Comparison Sample at 100 Percent Zoom. The left image has no sharpening filter applied and the right image has been applied a sharpening filter. Historical map from Colorado State University Archives and Special Collections. At a higher zoom level, we see CA, visible in the right image of figure 13. The extra colors are introduced by the software. Figure 13. Comparison of Sharpening Filter Applied to Images and at 500 Percent Zoom. The left image has no sharpening filter applied and the right image has sharpening filter applied. Historical map from Colorado State University Archives and Special Collections. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 37 We recommend not applying sharpening filters to original scanned images; instead, attempt to obtain well-focused images from the beginning. For this reason, the test DSLR camera out- performed our overhead scanner for most materials. Color Balance Have you seen a scanned color image or color photograph with colors very different from the original image? For example, a white area appears to be bluish, or it has an orange cast? When scanning or photographing an image under different lighting, the output image can have very different colors. In the following figure, the left image was shot at a correct white balance (WB) setting. WB is the process of removing unrealistic color casts so that objects that appear white in person are rendered white in your photo.2 The center image has a blue color cast, which was caused by a lower Kelvin setting, and the right image was shot at a higher Kelvin setting. A camera may create images with the wrong colors, but so will a scanner if it is not calibrated correctly. Figure 14. Images Shot under Different White Balance Settings. We pay an $8,000 annual service fee for overhead scanner maintenance, which includes scanner color calibration. In general, image colors rendered by the machine are close to original colors but not exact. We have noticed that some images have a very light green overcast and other others are overly yellow; sometimes images appear to be darker than they should be. Because we are not certified to calibrate the overhead scanner, we only use the prescribed settings set by technicians. Also, we have no control over maintaining a fading light bulb, which will affect correct exposure. WB adjustment on photographs taken in a studio can be very precise. Most DSLR contains a variety of preset white balances. In general, auto WB works well, but does not deliver the best results. Custom WB allows fine-tuning of colors. If a shooting studio is set up properly, the lighting should be consistent, so ideally one setting found most desirable can be used repeatedly. However, professional photographers do test shots at the beginning of each shooting session. Once they find INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 38 the optimal test shot, they will use the exact settings for the batch. Later, they will do minor color adjustment on the chosen test shot to ensure precise color representation, and then apply the adjustment settings on all other photos of the same batch. Because many small variations can be present for each shooting session, they do not use the settings from the previous shooting. It may seem arduous to do test shots for each shooting, but it ensures accurate color reproduction. Many professional photographers use ColorChecker Passport,3 which is a commercial product to help with quick and easy capture of accurate colors. I will demonstrate briefly a useful trick I learned from a professional photography seminar how to utilize ColorChecker Passport to apply correct white balance a group of images. 4 Step 1: Place an 18 percent gray card or a ColorChecker Passport card on top of a page. Choose the correct exposure and take the photo. Use the same exposure setting to take additional photos. For demonstration purposes, we deliberately used a very low and high Kelvin setting for sample images. The low Kelvin setting created cool and blue tones and the high Kelvin setting created a tone that was too warm. Note that the test shot with ColorChecker Board was not taken with exactly the correct white balance setting. Figure 15. Sample Images for White Balance Adjustment. Rocky Mountain Collegian 3–4 (1893), 118, Colorado State University Archives and Special Collections. Step 2: In Adobe Lightroom, select the test target image and switch to “Develop” mode. Select the White Balance tool, move the cursor over a gray area, try to find a spot where the red, green, and blue (RGB) values are close. If you can find a place with equal RGB values, it will be ideal. This simple click will set the test image’s white balance to an almost perfect setting. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 39 Figure 16. Applying a White Balance in Adobe Lightroom 4 Step 3. Synchronize other images’ settings with the target image. Select the target image and all other images, click the Sync button, and select settings you would like to synchronize. Make sure the WB button is checked. Figure 17. Synchronize Settings in Adobe Lightroom 4 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 40 Figure 18. Synchronized Images with Correct White Balance. Rocky Mountain Collegian 3–4 (1893), 118, Colorado State University Archives and Special Collections. Recently, I had the opportunity to visit the Spencer Museum of Art’s digitization lab. They have a different workflow to ensure even more scientifically correct colors. If you are interested in their approach, you can contact their information technology manager or photographer. Color Space One very important thing to understand is color space when you use a DSLR camera. Many DSLR cameras support Adobe RGB and sRGB. sRGB reflects the characteristics of the average cathode ray tube (CRT) display. This standard space is endorsed by many hardware and software manufacturers, and it is becoming the default color space for many scanners, low-end printers, and software applications. It is the ideal space for web work but not recommended for prepress work because of its limited color gamut. Adobe RGB (1998) was designed to encompass most of the colors achievable on CMYK printers, but only by using RGB primary colors on a device such as your computer display.5 It is recommended to use this color space if you need to do print production work with a broad range of colors. Many scanning vendors deliver images in Adobe RGB color space. ProPhoto RGB contains all colors that are in Adobe RGB, and Adobe RGB contains nearly every color that is in sRGB. This color space covers more colors than the human eye can see. It can only be used for images in RAW format and in 16-bit mode. Common file formats that support 16-bit images are TIFF and PSD. Most printers do not support 16-bit format. This color space normally is used by photographers who have a specific workflow and who print on specific high-end inkjet printers. When converting from 16-bit to 8-bit, some images will have banding or posterization problems. Banding is a digital imaging artifact. A picture with banding problem shows horizontal or vertical lines. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 41 Figure 19. An Example of Colour Banding, Visible in the Sky in This Thotograph.6 Posterization of an image entails conversion of a continuous gradation of tone to several regions of fewer tones, with abrupt changes from one tone to another.7 Figure 20. An Example of Posterization.8 While it is a good idea to capture images using Adobe RGB to preserve a wide range of colors, you should convert images to sRGB when delivering to unknown users and displaying on the web. Currently, sRGB is the only appropriate choice for images uploaded to the web, since most web browsers don’t support any color management. Adobe RGB images that are uploaded to websites without conversion to sRGB generally appear dark and muted.9 If they were printed on printers that do not support Adobe RGB format, colors will be dull too. SETTING UP A BUDGET STUDIO Commercial Approach BookDrive Pro is a commercially available digitization unit. It uses two digital cameras and built-in flash lights. It may be the optimal solution for your projects, but it also may not fit your library’s INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 42 budget. The unit also is not suitable for oversized material such as large maps and posters. For more information about this product, please visit http://pro.atiz.com/. Sample Budget Studio Setup A digitization lab can have three rooms or areas, one for oversized materials, one for smaller prints or 3-D objects, and one for computers. The area for shooting oversized materials should have black walls and floor. You can either use one flash light to bounce light off the ceiling or use two flash lights to shine lights directly onto the materials. For fragile materials, the first approach is more appropriate. The area for shooting smaller prints or 3-D objects should have a stable table and black or white background paper. For this room or area, black walls and floor are not required. For shooting equipment, I will use the set chosen by the photographer from the University of Kansas Spencer Museum of Art as my example. Item Name Sample Item Purchasing URL Price DSLR camera Nikon D810 http://www.bhphotovideo.co m/c/search?atclk=Camera+Mo del_Nikon+D810&ci=6222&N= 4288586280+3907353607 $2,996.95 Macro lens Nikon AF Micro-Nikkor 60mm f/2.8D Lens http://www.bhphotovideo.co m/c/product/66987- GREY/Nikon_1987_AF_Micro_ Nikkor_60mm_f_2_8D.html $429.00 Heavy duty mono stand Arkay 6JRCW Mono Stand Jr with Counter Weight— 6' http://www.bhphotovideo.co m/c/product/2727- REG/Arkay_605138_6JRCW_M ono_Stand_Jr.html $678.50 Strobe Broncolor G2 Pulso— 1600 Watt/Second Focusing Lamphead with 16' Cord http://www.bhphotovideo.co m/c/product/259745- REG/Broncolor_32_115_07_G2 _Pulso_with_16.html $3,053.68 Power pack Broncolor Senso A4 2,400W/s Power Pack http://www.bhphotovideo.co m/c/product/745060- REG/Broncolor_31_051_07_Se nso_A4_2_400W_s_Power.html $3,629.92 http://www.bhphotovideo.com/c/search?atclk=Camera+Model_Nikon+D810&ci=6222&N=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=Camera+Model_Nikon+D810&ci=6222&N=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=Camera+Model_Nikon+D810&ci=6222&N=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=Camera+Model_Nikon+D810&ci=6222&N=4288586280+3907353607 http://www.bhphotovideo.com/c/product/66987-GREY/Nikon_1987_AF_Micro_Nikkor_60mm_f_2_8D.html http://www.bhphotovideo.com/c/product/66987-GREY/Nikon_1987_AF_Micro_Nikkor_60mm_f_2_8D.html http://www.bhphotovideo.com/c/product/66987-GREY/Nikon_1987_AF_Micro_Nikkor_60mm_f_2_8D.html http://www.bhphotovideo.com/c/product/66987-GREY/Nikon_1987_AF_Micro_Nikkor_60mm_f_2_8D.html http://www.bhphotovideo.com/c/product/2727-REG/Arkay_605138_6JRCW_Mono_Stand_Jr.html http://www.bhphotovideo.com/c/product/2727-REG/Arkay_605138_6JRCW_Mono_Stand_Jr.html http://www.bhphotovideo.com/c/product/2727-REG/Arkay_605138_6JRCW_Mono_Stand_Jr.html http://www.bhphotovideo.com/c/product/2727-REG/Arkay_605138_6JRCW_Mono_Stand_Jr.html http://www.bhphotovideo.com/c/product/259745-REG/Broncolor_32_115_07_G2_Pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-REG/Broncolor_32_115_07_G2_Pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-REG/Broncolor_32_115_07_G2_Pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-REG/Broncolor_32_115_07_G2_Pulso_with_16.html http://www.bhphotovideo.com/c/product/745060-REG/Broncolor_31_051_07_Senso_A4_2_400W_s_Power.html http://www.bhphotovideo.com/c/product/745060-REG/Broncolor_31_051_07_Senso_A4_2_400W_s_Power.html http://www.bhphotovideo.com/c/product/745060-REG/Broncolor_31_051_07_Senso_A4_2_400W_s_Power.html http://www.bhphotovideo.com/c/product/745060-REG/Broncolor_31_051_07_Senso_A4_2_400W_s_Power.html FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 43 Reflector Broncolor P65 Reflector, 65 Degrees, 11" Diameter, for Broncolor Pulso 8, Twin and HMI http://www.bhphotovideo.co m/c/product/7162- REG/Broncolor_33_106_00_P6 5_Reflector_65_Degrees.html $513.52 Reflector Broncolor Softlight Reflector, 20" Diameter, for Broncolor Primo, Pulso 2/4 & HMI Heads http://www.bhphotovideo.co m/c/product/7167- REG/Broncolor_33_110_00_Sof tlight_Reflector_20_for.html $501.76 Light Stand Impact Air-Cushioned Light Stand http://www.bhphotovideo.co m/c/product/253067- REG/Impact_LS10AB_Air_Cush ioned_Light_Stand.html $44.99 Light meter Sekonic L-308S Flashmate—Digital Incident, Reflected and Flash Light Meter http://www.bhphotovideo.co m/c/product/368226- REG/Sekonic_401_309_L_308S _Flashmate_Light_Meter.html $199.00 Book cradle Book Exhibition Cradles http://www.universityproduct s.com/cart.php?m=product_list &c=1115&primary=1&parentI d=1271&navTree[]=1115 $30.00 Background paper Savage Seamless Background Paper (Both white and black) http://www.bhphotovideo.co m/c/product/45468- REG/Savage_1_12_107_x_12yd s_Background.html $45.00 x 2 = $90.00 Nonreflective glass 1/4" Optiwhite Starphire Purified Tempered Single Lite Clear Class Can be purchased at local glass store. $75.00 White balancing accessory X-Rite Original ColorChecker Card http://www.bhphotovideo.co m/c/product/465286- REG/X_Rite_MSCCC_Original_C olorChecker_Card.html $69.00 Software Adobe Lightroom 5 http://www.adobe.com/produ cts/photoshop-lightroom.html $150.00 Table 3. List of Items Needed to Prepare for a Budget Studio The total cost for a “budget” shooting studio ranges from $10,000 to $15,000, and there is no annual maintenance expense. http://www.bhphotovideo.com/c/product/7162-REG/Broncolor_33_106_00_P65_Reflector_65_Degrees.html http://www.bhphotovideo.com/c/product/7162-REG/Broncolor_33_106_00_P65_Reflector_65_Degrees.html http://www.bhphotovideo.com/c/product/7162-REG/Broncolor_33_106_00_P65_Reflector_65_Degrees.html http://www.bhphotovideo.com/c/product/7162-REG/Broncolor_33_106_00_P65_Reflector_65_Degrees.html http://www.bhphotovideo.com/c/product/7167-REG/Broncolor_33_110_00_Softlight_Reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-REG/Broncolor_33_110_00_Softlight_Reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-REG/Broncolor_33_110_00_Softlight_Reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-REG/Broncolor_33_110_00_Softlight_Reflector_20_for.html http://www.bhphotovideo.com/c/product/253067-REG/Impact_LS10AB_Air_Cushioned_Light_Stand.html http://www.bhphotovideo.com/c/product/253067-REG/Impact_LS10AB_Air_Cushioned_Light_Stand.html http://www.bhphotovideo.com/c/product/253067-REG/Impact_LS10AB_Air_Cushioned_Light_Stand.html http://www.bhphotovideo.com/c/product/253067-REG/Impact_LS10AB_Air_Cushioned_Light_Stand.html http://www.bhphotovideo.com/c/product/368226-REG/Sekonic_401_309_L_308S_Flashmate_Light_Meter.html http://www.bhphotovideo.com/c/product/368226-REG/Sekonic_401_309_L_308S_Flashmate_Light_Meter.html http://www.bhphotovideo.com/c/product/368226-REG/Sekonic_401_309_L_308S_Flashmate_Light_Meter.html http://www.bhphotovideo.com/c/product/368226-REG/Sekonic_401_309_L_308S_Flashmate_Light_Meter.html http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentId=1271&navTree%5B%5D=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentId=1271&navTree%5B%5D=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentId=1271&navTree%5B%5D=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentId=1271&navTree%5B%5D=1115 http://www.bhphotovideo.com/c/product/45468-REG/Savage_1_12_107_x_12yds_Background.html http://www.bhphotovideo.com/c/product/45468-REG/Savage_1_12_107_x_12yds_Background.html http://www.bhphotovideo.com/c/product/45468-REG/Savage_1_12_107_x_12yds_Background.html http://www.bhphotovideo.com/c/product/45468-REG/Savage_1_12_107_x_12yds_Background.html http://www.bhphotovideo.com/c/product/465286-REG/X_Rite_MSCCC_Original_ColorChecker_Card.html http://www.bhphotovideo.com/c/product/465286-REG/X_Rite_MSCCC_Original_ColorChecker_Card.html http://www.bhphotovideo.com/c/product/465286-REG/X_Rite_MSCCC_Original_ColorChecker_Card.html http://www.bhphotovideo.com/c/product/465286-REG/X_Rite_MSCCC_Original_ColorChecker_Card.html http://www.adobe.com/products/photoshop-lightroom.html http://www.adobe.com/products/photoshop-lightroom.html INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 44 Figure 21. The University of Kansas Spencer Museum of Art Digitization Lab Setup for Oversized Materials Figure 22. Steelworks Museum of Industry and Culture’s Digitization Lab Setup for Oversized Materials FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 45 Figure 23. The University of Kansas Spencer Museum of Art Digitization Lab Setup for Smaller Prints and 3-D Objects Figure 24. Steelworks Center of the West’s Digitization Lab Setup for 3-D Objects Functions of Some Elements in the Sample Shooting Studio 1. Macro Lens: It allows close up shooting of objects. It is especially useful when photograph small prints and small 3-D objects. It can also be used to photograph regular and oversized materials. 2. Heavy-duty mono stand: It replaces a traditional tripod. It is very stable and allows quick adjustment of camera height and location. 3. Strobe, power pack, and reflector: Together they generate consistent and homogeneous light distribution. Recommended further reading: “Introduction to Off- Camera Flash: Three Main Choices in Strobe Lighting.”10 4. Light stand: It holds strobe and reflector. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 46 5. Light meter: Hand-held exposure meters measure light falling onto a light-sensitive cell and converts it into a reading that enables the correct shutter speed and or lens aperture settings to be made.11 6. Book cradles: They help to minimize the stress on bookbindings and minimize page curvature problem. 7. Nonreflective glass: It helps to flatten a photographed page and reduce the reflection. However, it does not completely eliminate glass reflection. One very useful trick to reduce glass reflection is to place a black board with a hole above a page and shoot through the hole. This approach actually does not eliminate reflection but reflects black to the photograph. When the photograph is reviewed on computer, it will appear as no reflection has occurred. Figure 25. The University of Kansas Spencer Museum of Art Digitization Lab Setup for Materials Needed be Pressed Down by a Glass. Many librarians believe that digitizing print materials using a digital camera requires a professional photographer, but this is not necessarily true. A professional photographer or even an art student can act as a consultant to help set up a shooting studio and provide basic training. Also, many museums have professional photographers and have set up shooting studios for digitization. They are very willing to share their experience and even provide training. I believe the learning curve for operating a shooting studio is no greater than the learning curve to operate an overhead scanner machine and its software. PROS AND CONS No digitization equipment or system is perfect. They all have trade-offs in image quality, speed, convenience of use, quality of accompanying software, and cost. Our tests show that for most archival materials a DSLR camera will do a better job than an overhead scanner. Pros of Overhead Scanner FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 47 • The scanner is a complete scanning station. It can be connected to a computer and starts scanning immediately. Materials can be placed on the scanning surface, so no equipment adjustments are required while scanning. • It can scan and save images in bitmap format directly, while a DSLR camera can only shoot in grayscale or color. • Built-in book cradles help to scan thick books and those that cannot be fully opened. • Book curve correction functionality is provided by the accompanying software. Cons of Overhead Scanner • High cost. The overhead scanner we have cost more than $50,000, with an annual maintenance contract of $8,000. • High replacement cost. When a scanner is outdated or broken, the entire machine has to be replaced. • Instability. Our overhead scanner is unstable even when placed on a sturdy table and handled only by professionals. From April 2010 to October 2010, the scanner was down for a total of forty-two working days (sixty calendar days). The company fixed the machine onsite many times, but it continues to have minor problems and has not been completely reliable. • The autofocus feature does not work consistently. • Special training is needed to operate the machine and associated software. • File formats supported are limited. Most scanners only support TIFF, JPEG, JPEG 2000, Windows BMP, and PNG. • Unsupported outdated software: Our overhead scanner’s software can only be run on an older operating system (Windows XP) because there is no updated software for this model. Pros of Budget Studio • Stable. Under normal use DSLR cameras are much less likely to break down than scanners. For example, I have had an older DSLR, Nikon D200, for seven years. It has survived numerous backpacking trips, multiple drops, and extreme weather conditions. The camera still functions as needed. • Fast and accurate focus. DSLR cameras are designed to focus quickly, and their focus indicators provide instant feedback to the operators so they know that the image is focused. If operated properly, images delivered by DSLR cameras can be sharper than ones delivered by scanners. • Less expensive. A good quality DSLR camera and a lens can be purchased for fewer than $4,000 and last for years. As technologies advance, DSLR cameras’ prices will continue to drop. • Ability to save files in more formats. In addition to TIFF and JPEG formats, most DSLR cameras can save photos in RAW file format. Some cameras can directly save images in Digital Negative (DNG) format, and others deliver images in proprietary formats that can be INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 48 converted into DNG using a computer program. Editing RAW images is nondestructive, while editing of TIFF and JPEG images is irreversible. • Accurate WB and exposure. By using right shooting and post-processing techniques, photographs can have exact color reproduction. On the other hand, calibrating an overhead scanner most likely can only be performed by a company’s trained technician. Proper exposure and WB are not guaranteed. • The RAW file format usually provides more dynamic range. Overexposed and underexposed images can be fixed by adjusting exposure compensation via software; thus lost shadow or highlight detail can be restored. • Can photograph 3-D objects. Archival collections often have materials other than books, such as art pieces. These materials are better to be photographed than scanned. • Versatile. Cameras can perform on-site digitization, while overhead scanners are too bulky to be moved around. • Faster and better preview. Images can be viewed instantly on a computer when proper software, such as Adobe Lightroom, is used. Operators can compare multiple shoots on a screen side-by-side and decide which photo to retain. • More accessible technical support. The number of DSLR camera users is much higher than overhead scanner users. Technical questions can often be answered through online forums. • Easy to find replacement parts. When a piece in a shooting studio break down, it is easy to find replacement piece and replace by staff. • Easy software updates. Software used in a studio is independent from equipment. Cons of Budget Studio • There is learning curve for setting up a shooting studio, operating the studio, and mastering new image processing techniques. • A DSLR camera with a lower pixel setting will not be sufficient for scanning large-format materials, such as posters and maps. • No built-in book curve correction is provided by Adobe Photoshop or Lightroom. However, our experience proves that the automatic book curve function does not always work well. We normally use a home-made book cradle to help lay a page flat and use one or two weights to hold down the other side of book. For some books, if flatness is hard to achieve, we place a piece of glass on the top to ensure the flatness. • Security concern: Since a DSLR camera is highly portable, it can be stolen easily. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 49 Figure 26. Scanning Setup Using a Book Cradle. CONCLUSION The technology of DSLR cameras has advanced very quickly in the past ten years. Newer DSLR cameras can handle higher resolutions and have very little image noise even at a high ISO setting. The higher demand for DSLR cameras and accompanying image-editing software results in more rapid technology advances compared to low-demand and high-end overhead scanners. High consumer demand drives DSLR camera prices much lower than prices for overhead scanners. In addition, the wide range of consumers purchasing DSLR cameras and software prompts companies to offer more user-friendly interfaces. As you can see from our tests, for most library materials a DSLR camera can produce superior images. If you do not have a budget for high-end overhead scanners, you can still fulfill your digitization preservation goals with a budget studio. ACKNOWLEDGEMENT I would like to thank Robert Hickerson and Ryan Waggoner, the University of Kansas Spencer Museum of Art, Tim Hawkins, and Steelworks Center of the West for showing their digitization labs and sharing experience with me. REFERENCES 1. Federal Agencies Digitization Guidelines Initiative, “Technical Guidelines for Digitizing Cultural Heritage Material: Creation of Raster Image Master Files,” August 2010, http://www.digitizationguidelines.gov/guidelines/digitize-technical.html 2. “Tutorials: White Balance,” Cambridge in Colour, accessed March 9, 2016, http://www.cambridgeincolour.com/tutorials/white-balance.htm. http://www.cambridgeincolour.com/tutorials/white-balance.htm INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 50 3. “ColorChecker Passport User Manual,” X-Rite Incorporated, accessed March 9, 2016, http://www.xrite.com/documents/manuals/en/ColorCheckerPassport_User_Manual_en.pdf. 4. Scott Kelby, “Scott Kelby's Editing Essentials: How to Develop Your Photos,” Pearson Education, Peachpit, accessed March 9, 2016, http://www.peachpit.com/articles/article.aspx?p=2117243&seqNum=3. 5. “sRGB vs. Adobe RGB 1998,” Cambridge in Colour, accessed March 9, 2016, http://www.cambridgeincolour.com/tutorials/sRGB-AdobeRGB1998.htm. 6. “Colour Banding,” Wikipedia, accessed March 9, 2016, http://en.wikipedia.org/wiki/Colour_banding. 7. “Posterization,” Wikipedia, accessed March 9, 2016, http://en.wikipedia.org/wiki/Posterization. 8. “Image Posterization,” Cambridge in Colour, accessed March 9, 2016, http://www.cambridgeincolour.com/tutorials/posterization.htm. 9. Richard Anderson and Peter Krogh, “Color Space and Color Profiles,” American Society of Media Photographers, accessed March 9, 2016, http://dpbestflow.org/color/color-space-and- color-profiles. 10. Tony Roslund, “Introduction to Off-Camera Flash: Three Main Choices in Strobe Lighting,” Fstoppers (blog), accessed March 9, 2016, https://fstoppers.com/originals/introduction- camera-flash-three-main-choices-strobe-lighting-40364. 11. “Introduction to Light Meters,” B & H Foto & Electronics Corp., accessed March 9, 2016, http://www.bhphotovideo.com/find/Product_Resources/lightmeters1.jsp. http://www.xrite.com/documents/manuals/en/ColorCheckerPassport_User_Manual_en.pdf http://www.peachpit.com/articles/article.aspx?p=2117243&seqNum=3 http://www.cambridgeincolour.com/tutorials/sRGB-AdobeRGB1998.htm http://en.wikipedia.org/wiki/Colour_banding http://en.wikipedia.org/wiki/Posterization http://www.cambridgeincolour.com/tutorials/posterization.htm http://dpbestflow.org/color/color-space-and-color-profiles http://dpbestflow.org/color/color-space-and-color-profiles https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 http://www.bhphotovideo.com/find/Product_Resources/lightmeters1.jsp Oversize Materials Small Prints Use of a Sharpening Filter Color Balance Color Space SETTING UP A BUDGET STUDIO Commercial Approach Sample Budget Studio Setup Cons of Budget Studio ACKNOWLEDGEMENT I would like to thank Robert Hickerson and Ryan Waggoner, the University of Kansas Spencer Museum of Art, Tim Hawkins, and Steelworks Center of the West for showing their digitization labs and sharing experience with me.
8652 ---- Identifying Key Steps for Developing Mobile Applications and Mobile Websites for Libraries Devendra Dilip Potnis, Reynard Regenstreif- Harms, and Edwin Cortez INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 43 ABSTRACT Mobile applications and mobile websites (MAMW) represent information systems that are increasingly being developed by libraries to better serve their patrons. Because of a lack of in-house IT skills and the knowledge necessary to develop MAMW, a majority of libraries are forced to rely on external IT professionals who may or may not help libraries meet patron needs but instead may deplete libraries’ scarce financial resources. This paper applies a system analysis and design perspective to analyze the experience and advice shared by librarians and IT professionals engaged in developing MAMW. This paper identifies key steps and precautions to take while developing MAMW for libraries. It also advises library and information science graduate programs to equip their students with the specific skills and knowledge needed to develop and implement MAMW. INTRODUCTION The unprecedented adoption and ongoing use of a variety of context-specific mobile technologies by diverse patron populations, the ubiquitous nature of mobile content, and the increasing demand for location-aware library services have forced libraries to “go mobile.” Mobile applications and mobile websites (MAMW), that is, web portals running on mobile devices, represent information systems that are increasingly being developed and used by libraries to better serve their patrons. However, a majority of libraries often lack the in-house human resources necessary to develop MAMW. Because of a lack of staff equipped with the requisite IT skills and knowledge, libraries are often forced to partner with and rely on external IT professionals, potentially losing control over the process of developing MAMW.1 Partnerships with external IT professionals do not always help libraries meet the information needs of their patrons but instead can deplete their scarce financial resources. It then becomes necessary for librarians to understand the process of developing MAMW to better evaluate MAMW for better serving library patrons. One possibility Devendra Dilip Potnis (dpotnis@utk.edu) is Associate Professor, School of Information Sciences; Reynard Regenstreif-Harms (reynardrh@gmail.com) is Project Archives Technician, Great Smoky Mountains National Park, Gatlinburg, Tennessee; and Edwin Cortez (ecortez@utk.edu) is Professor, School of Information Sciences, University of Tennessee at Knoxville. mailto:dpotnis@utk.edu mailto:reynardrh@gmail.com) mailto:ecortez@utk.edu IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 44 is to re-educate themselves through continuing education or other professional development activities. Another solution would be to see library and information science (LIS) schools strengthen their curriculum in the area of management, evaluation, and application of MAMW and related emerging technologies. Issues, challenges, and strategies for providing librarians with these opportunities are abundant and have been debated for more than thirty years, especially since libraries started experiencing the impact of microchip and portable technologies.2 Any practical and immediate guidance could help librarians in charge of developing MAMW.3 However, a majority of the practical guidance available for developing MAMW for libraries is limited to specific settings or patron populations. Also, the practical guidance is not theoretically validated, curtailing its generalizability for diverse library settings. For instance, a number of librarians and IT professionals share their experience and stories of MAMW development to serve a specific patron population in a specific library setting.4,5 Their stories typically describe their success stories of developing MAMW, the lessons learned during the development of MAMW, or their advice for developing MAMW. This paper applies a system analysis and design perspective from the information systems discipline to examine the experience and advice shared by librarians and IT professionals for identifying the key steps and precautions to be taken when developing MAMW for libraries. System analysis and design, a branch of the information systems discipline, is the most widely used theoretical knowledgebase available for developing information systems.6 According to the system analysis and design perspective, development, planning, analysis, design, implementation, and maintenance are the six phases of building any information system.7 The next section synthesizes our method for this secondary research. The following section discusses the key steps we identified for developing, planning, analyzing, designing, implementing, and maintaining MAMW for libraries. The concluding section presents the implications of this study for libraries and LIS graduate programs. METHOD We began this study with a practitioner’s handbook guiding libraries to use mobile technologies for delivering services to diverse patron populations.8 To search the literature relevant to our research, we devised many key phrases, including but not limited to “mobile technolog*,” “mobile applications for libraries,” and “mobile websites for libraries.” As part of our active information- seeking process, we applied a snowball sampling technique to collect more than seventy-five scholarly research articles, handbooks, ALA library technology reports, and books hosted on EBSCO and Information Science Source databases. Our passive information-seeking was helped by article suggestions from Emerald Insight and Elsevier Science Direct, two of the most widely used journal hosting sites, in response to the journal articles we accessed there. We applied the following four criteria to establish the relevancy of publications to our research: accuracy of facts; duration of publications (i.e., from 2000 to 2014); credibility of authors; and content focused on INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 45 problems, solutions, advice, and tips for developing MAMW. Several research articles published by Information Technology and Libraries and Library Hi Tech, two top-tier journals covering the development of MAMW for libraries, built the foundation of this secondary research. We analyzed the collected literature using the qualitative data presentation and analysis method proposed by Miles and Huberman.9 We developed Microsoft Excel summary sheets to code the experience and advice shared by librarians and IT professionals. The coded data was read repeatedly to identify and name patterns and themes. Each relevant publication was analyzed individually and then compared across subjects to identify patterns and common categories. The inter-coder reliability between the two authors who analyzed data was 85 percent. Data analysis helped us identify the key steps needed for planning, analyzing, designing, implementing, and maintaining MAMW for libraries. FINDINGS AND DISCUSSION Key Steps for Planning MAMW Forming and Managing a Team Building teams of people with the appropriate skills, knowledge, and experience is one of the first steps suggested by the existing literature for planning MAMW. It is essential for team members to be aware of new developments and trends in the market.10 For instance, developers should be aware of print resources on relevant technologies such as Apache, ASP, JavaScript, PHP, Ruby on Rails, and Python, etc.; online resources such as detectmobilebrowser.com and W3C mobileOK Checker to test catalogs, design functionality, and accessibility on mobile devices; and various online communities of developers who could provide peer-support when needed.11 Team members are also expected to keep up with new developments in mobile devices, platforms, operating systems, digital rights management terms and conditions, and emerging standards for content formats.12 Periodic delegation of various tasks could help libraries develop MAMW effectively.13 Libraries should also form productive, financially feasible partnerships with external stakeholders such as Internet service providers and network administrators for hosting MAMW on appropriate Internet servers that meet desired safety and security standards.14,15 Requirements Gathering Requirements for developing MAMW can be collected through empirical research and secondary research. Typically, the goal of empirical research is to help libraries [set off as bulleted list?]gather patron preferences for and expectations of MAMW,16,17 stay abreast of the continual evolution of patron needs,18 periodically (e.g., quarterly, annually, biannually, etc.) gather and evaluate user needs,19 index the content of MAMW,20 investigate the acceptance of the library’s use of MAMW by patrons,21 understand user needs, and identify top library services requested by patrons. IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 46 Empirical research in the form of usability testing, functional validation, user surveys, etc., should be carried out before developing MAMW to inform the development process and/or after developing MAMW to study their adoption by library patrons. Empirical research typically involves the identification of patrons and other stakeholders who are going to be affected by MAMW. This step is followed by developing data-collection instruments, collecting data from patrons and other stakeholders, and analyzing qualitative and quantitative data using appropriate techniques and software.22 Secondary research mainly focuses on scanning and assessing existing literature. For instance, using appropriate datasets on mobile use, librarians may be able to identify the factors responsible for the adoption of mobile technologies.23 Typically, such factors include but are not limited to cognitive, affective, social, and economic conditions of potential users. MAMW developers could also scan the environment by examining existing MAMW and reviewing the literature to create sets of guidelines for replacing old information systems by developing new, well-functioning MAMW.24 Librarians could also scan the market for free software options to conserve financial resources.25 Making Strategic Choices Mobile Applications or Mobile Websites? One of the most important strategic decisions libraries need to make during this phase is whether to use a mobile app or a mobile website—that is, a web portal running on mobile devices—for offering services to patrons. Mobile websites are web browser-based applications that might direct mobile users to a different set of content pages, serve a single set of content to all patrons while using different style sheets or templates reformatted for desktop or mobile browsers, or use a site transcoder (a rule-based interpreter), which resides between a website and a web client and intercepts and reformats content in real time for a mobile device.26,27 Mobile apps are more challenging to build than mobile websites because they require separate and specific programming for each operating system.28 Mobile apps burden users and their devices. For instance, users are expected to remember the functionality of each menu item, and a significant amount of memory is required to store and support apps on mobile devices. However, potential profitability, better mobile-device functionality, and greater exposure through app stores can make mobile apps an economical option over mobile websites.29 Buy or Build? In the planning phase, libraries also need to decide whether to buy commercial, off-the-shelf (COTS) MAMW or build a customized MAMW. MAMW need to be evaluated in terms of customer support and service, maintenance, the ability to meet patron needs, and library needs when making this choice.30 Sometimes libraries purchase COTS products and end up customizing them, benefiting from both options. For example, some libraries first purchase packaged mobile frameworks to create simple, static mobile websites and subsequently develop dynamic library apps specific to library services.31 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 47 Managing Scope Many libraries have limited financial resources, which makes it necessary for their staff to manage the scope of MAMW development. The ability to prioritize tasks and identify mission-critical features of mobile MAMW are some of the most common activities undertaken by libraries to manage this scope.32 For instance, it is not practical to make entire library websites mobile because libraries would end up serving only those patrons who access their sites over mobile alone. Instead, libraries should determine which part of the website should go mobile. A growing trend of using products like Mobile First Design to design a mobile version of a website first and then work up to a larger desktop version could help librarians better manage the scope of MAMW development. Alternatively, Jeff Wisniewski, a leading web services librarian in the United States, advises libraries to create a new mobile-optimized homepage alone, which is faster than trying to retrofit the library’s existing homepage for mobile.33 This advice is highly practical because no webmaster has any interest in trying to maintain two distinct versions of the library’s webpages with details such as hours of operations and contact information. Selecting the Appropriate Software Development Method There are three key methods for developing MAMW: structured methodologies (e.g., waterfall or parallel), rapid application prototyping (e.g., phased, prototyping, or throwaway prototyping), and agile development, an umbrella term used to refer to the collection of agile methodologies like Crystal, dynamic systems development method, extreme programming, feature-driven development, and Scrum. There is a bidirectional relationship between these MAMW development methods and the resources available for their development. Project resources such as funding, duration, and human resources influence and are affected by the type of software development method selected for developing MAMW. However, studies rarely pay attention to this important dimension of the planning phase.34 Key Steps in the Analysis Phase Requirements Analysis After collecting data from patrons, the next natural step is to analyze the data to inform the process of conceptualizing, building, and developing MAMW.35 The requirements-analysis phase helps libraries achieve user-centered design of MAMW and assess the return on investment in MAMW. The context and goals of the patrons using mobile devices, and the tasks they are likely and unlikely to perform on a mobile device, are the key considerations for developing user-centered MAMW for library patrons.36 It is critical to gather, understand, and review user needs.37 Surveys can be developed on paper or online, which can be analyzed using advanced statistical techniques or qualitative software.38,39 The analysis allows the following questions to be answered: Which IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 48 library services do patrons use most frequently on their mobile devices? What is their level of satisfaction for using those services? What types of library services and products would they like to access with their mobile phones in the future? Survey analyses can help librarians predict which mobile services patrons will find most useful;40 they can also help librarians classify users on the basis of their perceptions, experience, and habits when using mobile technologies to access library services.41 As a result, libraries can identify and prioritize functional areas for their MAMW deployment.42 MAMW developers can learn from their users’ humbling and/or frustrating experience of using mobile devices for library services. In addition, libraries can keep track of their patrons’ positive and negative observations, their information-sharing practices, and howthey create group experiences on the platform provided by their libraries.43 To improve existing MAMW, libraries could also use Google Analytics, a free web metrics tool, for identifying the popularity of MAMW features and analyzing statistics on how they are used.44 To develop operating system-specific mobile apps, Google Analytics can be used to learn about the popularity of mobile devices used by patrons.45 Ideally, libraries should calculate and document ROI before investing in the development of MAMW.46 For instance, libraries can run a cost-benefit analysis on the process of developing MAMW and compare various library services offered over mobile devices.47 Typically the following data could help libraries run the cost-benefit analysis: specific deliverables (e.g., features of MAMW), resources (e.g., resources needed, available resources, etc.), risks (e.g., types of risks, level of risks, etc.), performance requirements, and security requirements for developing MAMW. This analysis would help libraries make decisions on service provisions such as specific goals to be set for developing MAMW, feasibility of introducing desired features of MAMW, and how to manage available resources to meet the set goals.48 Libraries should also examine what other libraries have already done to provide mobile services.49 Communication/Liaising with Stakeholders The effective communication between developers and stakeholders influences almost every aspect of developing information systems. However, existing studies do not emphasize the significance of communication with stakeholders. For instance, several studies vaguely refer to the translation of user needs into technology requirements.50 But few studies point out the precise modeling technique (e.g., Entity Relationship Diagrams, Unified Modeling Language, etc.) for converting user needs into a language understood by software developers. Developers should communicate best practices and suggestions for the future implementation of MAMW in libraries,51 which involves the prediction and selection of appropriate MAMW for libraries,52 the demonstration of what is possible and how services are relevant, and how new resources can help create value for libraries.53,54 Communication with users is also critical for creating value-added services for patrons who use different mobile technologies to meet their needs related to work, leisure, commuting, etc.55 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 49 However, the existing literature on MAMW development for libraries does not mention the significance of this activity. Key Steps for Designing MAMW Prototyping Prototyping refers to the modeling or simulation of an actual information system. MAMW can have paper-based or computer-based prototypes. Prototyping allows developers to directly communicate with MAMW users to seek their feedback. Developers can correct or modify the original design of MAMW until users and developers are in agreement about the system design. Building consensus between MAMW developers and potential users is another key challenge to overcome during this phase, which may put a financial burden on MAMW development projects. It requires skilled personnel to manage the scope, time, human resources, and budget of such projects. Wireframing is one of the most prominent prototyping techniques practiced by librarians and IT professionals for developing MAMW for libraries.56 This technique depicts schematic on-screen blueprints of MAMW, lacking style, color, or graphics, focusing mainly on functionality, behavior, and priority of content. Selecting Hardware, Programming Languages, Platforms, Frameworks, and Toolkits Existing literature on the development of MAMW for libraries covers the selection and management of software; software development kits; scripting languages like JavaScript; data management and representation languages such as HTML, XML, and their text editors; and AJAX for animations and transitions. The existing literature also guides libraries for training their staff for using MAMW to better serve patrons.57 Few studies also provide guidance on selecting COTS products such as WebKit, an open source web browser engine that renders webpages on smartphones and allows users to view high-quality graphics on data networks with faster throughput.58 However, it might be a good idea to use licensed open source COTS products because licensed software allows libraries to legally distribute software within their organizations as covered by the licensing agreement. Libraries that use software-licensing agreements may also be able to seek expert help and advice whenever they have a concern or query. In the authors’ experience, librarians have shared few effective strategies to design MAMW. One key strategy is to purchase reliable device emulators and cross-compatible web editors. These technologies allow the user to work with the design at the most basic level, save documents as text, transfer the documents between web programs, and direct designers toward simple solutions.59 Sample cross-compatible web editors include, but are not limited to, Notetab Pro (http://www.notetab.com/), Code Lobster (http://www.codelobster.com/), and Bluefish (http://bluefish.openoffice.nl). http://www.notetab.com/ http://www.codelobster.com/ http://bluefish.openoffice.nl/ IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 50 Hybrid mobile app frameworks like Bootstrap, Ionic, Mobile Angular UI, Intel XDK, Appcelerator Titanium, Sencha, Kendo UI, and PhoneGap use a combination of web technologies like HTML, CSS, and JavaScript for developing mobile-first, responsive MAMW. A majority of these frameworks use a drag-and-drop approach and do not require any coding for developing mobile apps. One-click API connect further simplifies the process. User-interface frameworks like jQueryMobile and Topcoat eliminate the need to design user interfaces manually. Importantly, MAMW developed using such frameworks can support many mobile platforms and devices. Toolkits like GitHub, skyronic, crudkit, and HAWHAW enable developers to quickly build mobile- friendly CRUD (create/read/update/delete) interfaces for PHP, Laravel, and Codeigniter apps. Such mobile apps also work with MySQL and other databases, allowing users to receive and process data and display information to users. Table 1 categorizes specific hardware and software features recommended for MAMW to better serve library patrons. # Areas of Information Systems/IT Specific Features Recommended for Developing MAMW for Libraries 1 Human-Computer Interaction (HCI) Behavioral, cognitive, motivational, and affective aspects of HCI Design responsive web sites for libraries to enhance user experience60 Design a user interface meeting the expectations and needs of potential users (e.g., menu with the following items: library catalog, patron accounts, ask a librarian, contact information, listing of hours, etc.)61 Design meaningful mobile websites based on user needs, documenting and maintaining mobile websites62 Usability engineering Design concise interfaces with limited links, descriptive icons, home and parent-link icons63 Create a user-friendly site (e.g., the DOK Library Concept Center in Delft, Netherlands, offers a welcome text message to first-time visitors)64 Effectively transition from traditional websites to mobile-optimized sites with responsive design65 Create user-friendly interface designs66 Present a clean, easy to navigate mobile version of search results67 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 51 Information visualization Automatically maintain reliable and stable fundamental information required by indoor localization systems68 Save time by redesigning existing sites69,70 2 Web Programming HTML, XML, etc. Design sites with a complete separation of content and presentation71 Code HTML and CSS for better user experiences72 Create and shorten links to make them easier to input using small or virtual keyboards73 Using cient-side and server-side scripting such as JavaScript Object Notation, etc. Design and develop mashups74 Develop MAMW using client-server architecture, accessible on mobile devices75 Without scripting Implement widgetization to facilitate the integration of mobile websites—developing a widget library for mobile-based web information systems76 3 Open Source Design mobile websites that allow users to leverage the same open source technology as the main websites77 Design mobile websites linking to other existing services like library h3lp and library catalogs with mobile interfaces such as MobileCat78 4 Networking Design a mobile website capable of exploiting advancements in technology such as faster mobile data networks79 Identify and address technology issues (e.g., connectivity, security, speed, signal strength, etc.) faced by patrons when using MAMW80 5 Input/Output Devices Use a mobile robot to determine the location of fixed RFID tags in space81 Design MAMW capable of processing data communicated using radio frequency identification devices, near-field communication technology, and Bluetooth- based technology like iBeacons82 Offer innovative services using augmented- reality tools83 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 52 6 Databases Integrate a back-end database of metadata with front-end mobile technologies84 Integrate front-end of mobile MAMW with back-end of standard databases and services85 7 Social Media and Analytics Integrate social media sites (e.g., Foursquare, Facebook Place, Gowalla, etc.) with existing checkout services for accurate and information rich entries86 Implement Google Voice or a free text- messaging service87 Use Google Analytics for mobile optimized website by copying the free JavaScript code generated from Google Analytics and paste it into library webpages to gain insight into what resources are used and who used them88 Integrate a geo-location feature with mobile services89 Table 1. MAMW with specific hardware and software features From the above table, which is based on the analysis of the literature on developing mobile applications and mobile websites for libraries, it becomes clear that web programming and HCI are the two leading technology areas that shape the development of MAMW and consequently the services offered by them. Designing User Interfaces of MAMW Librarians and IT professionals engaged in developing MAMW for libraries make the following recommendations. Use two style sheets: CSS play a key role in offering uniform display to user interfaces for all webpages. Studies recommend designing two style sheets—namely, mobile.css and iphone.css— when developing MAMW, since most of the time smartphones ignore mobile stylesheets.90 In that case, iphone.css could direct itself to browsers of a specific screen-width, helping those mobile devices that are not directed to the mobile website by the mobile.css stylesheet.91 Minimize use of JavaScript: JavaScript is instrumental in detecting what mobile device is being used by patrons and then directing them to the appropriate webpage with options including full website, simple text-based, and touch-mobile-optimized. However, it is critical to minimize the use of JavaScript on library mobile websites because not every smartphone offers the minimum level of support required to operate it.92 Handle images intelligently: To help patrons optimize their bandwidth use, image files on mobile sites should be incorporated with CSS rather than HTML code; also, to ensure consistency in the INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 53 appearance of user interfaces of mobile websites, images should be kept to the same absolute size.93 Key Steps for Implementing MAMW Programming for MAMW Programming is at the heart of developing MAMW. As shown in table 1 above, web programming enables developers to build MAMW with a number of value-added features for patrons. For instance, a web-application server running on Cold Fusion can process data communicated via web browsers on mobile devices; this feature allows MAMW users to access search engines on library websites via smartphones.94 Also, client-side processing of classes (with a widget library) allows patrons to use their mobile devices as thin clients, thereby optimizing the use of network bandwidth.95 Testing MAMW Past studies recommend testing the content, display/design, and functionality of MAMW in a controlled environment (e.g., usability lab) or in the real world (i.e., in libraries). Content: Librarians are advised to set up testing databases for testing image presentation, traditional free text search, location-based search, barcode scanning for ISBN search, QR encapsulation, and voice search.96 Display/design: Librarians can review and test MAMW on multiple devices to confirm that everything displays and functions as intended.97 They can also test a beta version of their mobile website with varying devices to provide guidance regarding image sizing;98 beta versions are also useful in testing mobile websites for their display on different browsers and devices.99 Functionality: Librarians can set up testing practices and environments for the most heavily used device platforms (e.g., HCI incubators such as eye testing software, which is a combination of virtual emulators and mobile devices not owned by libraries).100,101 They can also use the User Agent Switcher Add-On for Firefox to test a mobile website and use web-based services like Device Anywhere and Browser Cam offering mobile emulation to test the functionality of MAMW.102 Training Patrons Unless patrons realize the significance of a new information system for managing information resources they will hardly use it. However, training patrons for using a newly developed MAMW is almost completely missing from the studies describing the process of developing MAMW for libraries. Joe Murphy, a technology librarian at Yale University, identifies the significance of user training in managing the change from traditional to mobile search and advises librarians to explore the mobile literacy skills of their patrons and educate them on how to use new systems.103 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 54 Data Management MAMW cannot function properly without clean data. Cleaning up data, curating data, and addressing other data-related issues are some of the least mentioned activities in the literature for developing MAMW. However, it is necessary for librarians engaged in developing MAMW to identify and address common challenges for managing data when used for MAMW. For example, it might be a good strategy for librarians to study the best practices for managing data-related issues when offering reference services using SMS .104 Skills Needed for Maintaining MAMW Documentation and Version Control of Software Past studies recommend developing a mobile strategy for building a mobile-tracking device and evaluating mobile infrastructure to ensure the continued assessment and monitoring of mobile usage and trends among patrons.105 However, past studies do not report or provide many details about the maintenance of MAMW, which leads us to infer that maintenance of MAMW involving documentation and version control is a neglected aspect of their development. Open source software development is increasingly becoming a common practice for developing MAMW. Implementing version-control software (e.g., subversion and GitHub) to accommodate the needs of developers distributed across the world is a necessity for developing MAMW. Version- control software provides a code repository with a centralized database for developers to share their code, which minimizes errors associated with overwriting or reverting code changes and maximizes software development collaboration efforts.106 CONCLUSION There are various forces driving change in the knowledge and skills area for information professionals: technologies, changing environments, and the changing role of IT in managing and providing services to patrons. These forces affect all levels of IT-based professionals, those responsible for information processing and those responsible for information services. This paper has examined the key steps and precautions to be taken while developing MAMW to better serve their patrons. After analyzing the existing guidance offered by librarians and IT professionals from the system analysis and design perspective, we find that some of the most ignored activities in MAMW development are selecting appropriate software development methodologies, prototyping, communicating with stakeholders, software version control, data management, and training patrons to use newly developed or revamped MAMW. The lack of attention to these activities could hinder libraries’ ability to better serve patrons using MAMW. It is necessary for librarians and IT professionals to pay close attention to the above activities when developing MAMW. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 55 Our study also shows that web programming and HCI are the two most widely used technology areas for developing MAMW for libraries. To save their scarce financial resources, which otherwise could be invested in partnering with external IT professionals, libraries could either train their existing staff or recruit LIS graduates equipped with the skills and knowledge identified in this paper to develop MAMW (see table 2). # Key Steps for Developing MAMW Skills and Knowledge Required for Developing MAMW A Planning Phase 1 Forming and managing team Human resource management 2 Making strategic choices Time management Cost management Quality management Human resource management (e.g., staff capacity) 3 Requirements gathering Research (empirical and secondary) 4 Managing scope (e.g., managing financial resources, prioritizing tasks, identifying mission-critical features of MAMW, etc.) Scope management 5 Selecting an appropriate software development method Time management Cost management Quality management B Analysis Phase 6 Requirements analysis Research (empirical and secondary) 7 Communication/liaising with stakeholders Communications management C Design Phase 8 Prototyping Software development (HCI) 9 Selecting hardware and programming languages and platforms Software development (web programming and HCI) 10 Designing user interfaces of MAMW Software development (HCI) D Implementation Phase 11 Programming for MAMW Software development (web programming—e.g., Android, iOS, Visual C++, Visual C#, Visual Basic, etc.) 12 Testing MAMW Software development (web programming and HCI) IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 56 13 Training patrons Human resource management 14 Data management (e.g., cleaning up data, curating data, etc.) Data management E Maintenance Phase 15 Documentation and version control of software Software development (web programming and HCI) Table 2. Skills and knowledge necessary to develop MAMW The management of scope, time, cost, quality, human resources, and communication related to any project is known as project management.107 In addition to the skills and knowledge related to project management, librarians would also need to be proficient in software development (with an emphasis on HCI and web programming), data management, and the proper methods for conducting empirical and secondary research for developing MAMW. If LIS programs equip their graduate students with the skills and knowledge identified in this paper, the next generation of LIS graduates could develop MAMW for libraries without relying on external IT professionals, which would make libraries more self-reliant and better able to manage their financial resources.108 This paper assumes a very small number of scholarly publications to be reflective of the real- world scenarios of developing MAMW for all types of libraries. This assumption is one of the limitations of this study. Also, the sample of publications analyzed in this study is not statistically representative of the development of MAMW for libraries around the world. In the future, the authors plan to interview librarians and IT professionals engaged in developing and maintaining MAMW for their libraries to better understand the landscape of developing MAMW for libraries. REFERENCES 1. Devendra Potnis, Ed Cortez, and Suzie Allard, “Educating LIS Students as Mobile Technology Consultants” (poster presented at 2015 Association for Library and Information Science Education Annual Meeting, Chicago, January 25–27), http://f1000.com/posters/browse/summary/1097683. 2. Edwin Michael Cortez, “New and Emerging Technologies for Information Delivery,” Catholic Library World no. 54 (1982): 214–18. 3. Kimberly D. Pendell and Michael S. Bowman, “Usability Study of a Library’s Mobile Website: An Example from Portland State University,” Information Technology & Libraries 31, no. 2 (2012): 45–62, http://dx.doi.org/10.6017/ital.v31i2.1913. 4. Godmar Back and Annette Bailey, “Web Services and Widgets for Library Information Systems,” Information Technology & Libraries 29 no. 2 (2010): 76–86, http://dx.doi.org/10.6017/ital.v29i2.3146 . http://f1000.com/posters/browse/summary/1097683 http://dx.doi.org/10.6017/ital.v31i2.1913 http://dx.doi.org/10.6017/ital.v29i2.3146 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 57 5. Hannah Gascho Rempel and Laurie Bridges, “That was Then, This is Now: Replacing the Mobile Optimized Site with Responsive Design,” Information Technology & Libraries 32, no. 4 (2013): 8–24, http://dx.doi.org/10.6017/ital.v32i4.4636. 6. June Jamrich Parsons and Dan Oja, New Perspectives on Computer Concepts 2014: Comprehensive, Course Technology (Boston: Cengage Learning, 2013). 7. Ibid. 8. Andrew Walsh, Using Mobile Technology to Deliver Library Services: A Handbook (London: Facet, 2012). 9. Matthew B. Miles and A. Michael Huberman, Qualitative Data Analysis (Thousand Oaks, CA: Sage, 1994). 10. Bohyun Kim, “Responsive Web Design, Discoverability and Mobile Challenge,” Library Technology Reports 49, no 6 (2013): 29–39, https://journals.ala.org/ltr/article/view/4507. 11. James Elder, “How to Become the “Tech Guy and Make iPhone Apps for Your Library,” The Reference Librarian 53, no. 4 (2012): 448–55, http://dx.doi.org/10.1080/02763877.2012.707465. 12. Sarah Houghton, “Mobile Services for Broke Libraries: 10 Steps to Mobile Success,” The Reference Librarian 53, No. 3 (2012): 313–21, http://dx.doi.org/10.1080/02763877.2012.679195. 13. Pendell and Bowman, “Usability Study.” 14. Lisa Carlucci Thomas, “Libraries, Librarians and Mobile Services,” Bulletin of the American Society for Information Science & Technology 38, no. 1 (2011): 8–9, http://dx.doi.org/10.1002/bult.2011.1720380105. 15. Elder, “How to Become the ‘Tech Guy.’” 16. Kim, “Responsive Web Design.” 17. Chad Mairn, “Three Things You Can Do Today to Get Your Library Ready for the Mobile Experience,” The Reference Librarian 53, no. 3 (2012): 263–69, http://dx.doi.org/10.1080/02763877.2012.678245. 18. Rempel and Bridges, “That was Then.” 19. Rachael Hu and Alison Meier, “Planning for a Mobile Future: A User Research Case Study From the California Digital Library,” Serials 24, no. 3 (2011): S17–25. 20. Kim, “Responsive Web Design.” http://dx.doi.org/10.6017/ital.v32i4.4636 https://journals.ala.org/ltr/article/view/4507 http://dx.doi.org/10.1080/02763877.2012.707465 http://dx.doi.org/10.1080/02763877.2012.679195 http://dx.doi.org/10.1002/bult.2011.1720380105 http://dx.doi.org/10.1080/02763877.2012.678245 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 58 21. Lorraine Paterson and Boon Low, “Student Attitudes Towards Mobile Library Services for Smartphones,” Library Hi Tech 29, no. 3 (2011): 412–23, http://dx.doi.org/10.1108/07378831111174387. 22. Jim Hahn, Michael Twidale, Alejandro Gutierrez and Reza Farivar, “Methods for Applied Mobile Digital Library Research: A Framework for Extensible Wayfinding Systems,” The Reference Librarian 52, no. 1-2 (2011): 106–16, http://dx.doi.org/10.1080/02763877.2011.527600. 23. Patterson and Low, “Student Attitudes.” 24. Gillian Nowlan, “Going Mobile: Creating a Mobile Presence for Your Library,” New Library World 114, no. 3/4 (2013): 142–50, http://dx.doi.org/10.1108/03074801311304050. 25. Elder, “How to Become the ‘Tech Guy.’” 26. Matthew Connolly, Tony Cosgrave, and Baseema B. Krkoska, “Mobilizing the Library’s Web Presence and Services: A Student-Library Collaboration to Create the Library’s Mobile Site and iPhone Application,” The Reference Librarian 52, no. 1-2 (2010): 27–35, http://dx.doi.org/10.1080/02763877.2011.520109. 27. Stephan Spitzer, “Make That to Go: Re-Engineering a Web Portal for Mobile Access,” Computers in Libraries 3 no. 5 (2012): 10–14. 28. Houghton, “Mobile Services.” 29. Cody W. Hanson, “Mobile Solutions for Your Library,” Library Technology Reports 47, no. 2 (2011): 24–31, https://journals.ala.org/ltr/article/view/4475/5222. 30. Terence K. Huwe, “Using Apps to Extend the Library’s Brand,” Computers in Libraries 33, no. 2 (2013): 27–29. 31. Edward Iglesias and Wittawat Meesangnill, “Mobile Website Development: From Site to App,” Bulletin of the American Society for Information Science and Technology 38, no. 1 (2011): 18– 23. 32. Jeff Wisniewski, “Mobile Usability,” Bulletin of the American Society for Information Science & Technology 38, no. 1 (2011): 30–32, http://dx.doi.org/10.1002/bult.2011.1720380108. 33. Jeff Wisniewski, “Mobile Websites with Minimal Effort,” Online 34, no. 1 (2010): 54–57. 34. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 35. J. Michael DeMars, “Smarter Phones: Creating a Pocket Sized Academic Library,” The Reference Librarian 53, no. 3 (2012): 253–62, http://dx.doi.org/10.1080/02763877.2012.678236. http://dx.doi.org/10.1108/07378831111174387 http://dx.doi.org/10.1080/02763877.2011.527600 http://dx.doi.org/10.1108/03074801311304050 http://dx.doi.org/10.1080/02763877.2011.520109 https://journals.ala.org/ltr/article/view/4475/5222 http://dx.doi.org/10.1002/bult.2011.1720380108 http://dx.doi.org/10.1080/02763877.2012.678236 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 59 36. Kim Griggs, Laurie M. Bridges, and Hannah Gascho Rempel, “Library/Mobile: Tips on Designing and Developing Mobile Websites,” Code4lib no. 8 (2009), http://journal.code4lib.org/articles/2055. 37. DeMars, “Smarter Phones.” 38. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 39. Beth Stahr, “Text Message Reference Service: Five Years Later,” The Reference Librarian no. 52, no. 1-2 (2011): 9–19, http://dx.doi.org/10.1080/02763877.2011.524502. 40. Patterson and Low, “Student Attitudes.” 41. Ibid. 42. Ibid. 43. Hanson, “Mobile Solutions for Your Library.” 44. Stahr, “Text Message Reference Service.” 45. Spitzer, “Make That to Go.” 46. Allison Bolorizadeh et al., “Making Instruction Mobile,” The Reference Librarian 53, no. 4 (2012): 373–83, http://dx.doi.org/10.1080/02763877.2012.707488. 47. Maura Keating, “Will They Come? Get Out the Word About Going Mobile,” The Reference Librarian no. 52, no. 1-2 (2010): 20-26, http://dx.doi.org/10.1080/02763877.2010.520111. 48. Patterson and Low, “Student Attitudes.” 49. Hanson, “Mobile Solutions for Your Library.” 50. Patterson and Low, “Student Attitudes.” 51. Hanson, “Mobile Solutions for Your Library.” 52. Cody W. Hanson, “Why Worry About Mobile?,” Library Technology Reports no. 47, no. 2 (2011): 5–10, https://journals.ala.org/ltr/article/view/4476. 53. Keating, “Will They Come?” 54. Spitzer, “Make That to Go.” 55. Kim, “Responsive Web Design.” 56. Wisniewski, “Mobile Usability.” 57. Elder, “How to Become the ‘Tech Guy.’” http://journal.code4lib.org/articles/2055 http://dx.doi.org/10.1080/02763877.2011.524502 http://dx.doi.org/10.1080/02763877.2012.707488 http://dx.doi.org/10.1080/02763877.2010.520111 https://journals.ala.org/ltr/article/view/4476 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 60 58. Sally Wilson and Graham McCarthy, “The Mobile University: From the Library to the Campus,” Reference Services Review 38, no. 2 (2010): 214–32, http://dx.doi.org/10.1108/00907321011044990. 59. Brendan Ryan, “Developing Library Websites Optimized for Mobile Devices,” The Reference Librarian 52, no. 1-2 (2010): 128–35, http://dx.doi.org/10.1080/02763877.2011.527792. 60. Kim, “Responsive Web Design.” 61. Connolly, Cosgrave, and Krkoska, “Mobilizing the Library’s Web presence and Services.” 62. DeMars, “Smarter Phones.” 63. Mark Andy West, Arthur W. Hafner, and Bradley D. Faust, “Expanding Access to Library Collections and Services Using Small-Screen Devices,” Information Technology & Libraries 25 (2006): 103–7. 64. Houghton, “Mobile Services.” 65. Rempel and Bridges, “That was Then.” 66. Elder, “How to Become the ‘Tech Guy.’” 67. Heather Williams and Anne Peters, “And That’s How I Connect to MY Library: How a 42- Second Promotional Video Helped to launch the UTSA Libraries’ New Summon Mobile Application,” The Reference Librarian 53, no. 3 (2012): 322–25, http://dx.doi.org/10.1080/02763877.2012.679845. 68. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 69. Danielle Andre Becker, Ingrid Bonadie-Joseph, and Jonathan Cain, “Developing and Completing a Library Mobile Technology Survey to Create a User-Centered Mobile Presence,” Library Hi-Tech 31, no. 4 (2013): 688–99, http://dx.doi.org/10.1108/LHT-03-2013-0032. 70. Rempel and Bridges, “That was Then.” 71. Iglesias and Meesangnill, “Mobile Website Development.” 72. Elder, “How to Become the ‘Tech Guy.’” 73. Andrew Walsh, “Mobile Information Literacy: A Preliminary Outline of Information Behavior in a Mobile Environment,” Journal of Information Literacy 6, no. 2 (2012): 56–69, http://dx.doi.org/10.11645/6.2.1696. 74. Back and Bailey, “Web Services and Widgets.” 75. Ibid. 76. Ibid. 77. Spitzer, “Make That to Go.” http://dx.doi.org/10.1108/00907321011044990 http://dx.doi.org/10.1080/02763877.2011.527792 http://dx.doi.org/10.1080/02763877.2012.679845 http://dx.doi.org/10.1108/LHT-03-2013-0032 http://dx.doi.org/10.11645/6.2.1696 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 61 78. Iglesias and Meesangnill, “Mobile Website Development.” 79. Bohyun Kim, “The Present and Future of the Library Mobile Experience,” Library Technology Reports 49, no. 6 (2013): 15–28, https://journals.ala.org/ltr/article/view/4506. 80. Pendell and Bowman, “Usability Study.” 81. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 82. Andromeda Yelton, “Where to Go Next,” Library Technology Reports 48, no. 1 (2012): 25–34, https://journals.ala.org/ltr/article/view/4655/5511. 83. Ibid. 84. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 85. Houghton, “Mobile Services.” 86. Ibid. 87. Mairn, “Three Things You Can Do Today.” 88. Ibid. 89. Tamara Pianos, “EconBiz to Go: Mobile Search Options for Business and Economics— Developing a Library App for Researchers,” Library Hi Tech 30, no. 3 (2012): 436–48, http://dx.doi.org/10.1108/07378831211266582. 90. DeMars, “Smarter Phones.” 91. Ryan, “Developing Library Websites.” 92. Pendell and Bowman, “Usability Study.” 93. Ryan, “Developing Library Websites.” 94. Michael J. Whitchurch, “QR Codes and Library Engagement,” Bulletin of the American Society for Information Science & Technology 38, no. 1 (2011): 14–17. 95. Back and Bailey, “Web Services and Widgets.” 96. Jingru Hoivik, “Global Village: Mobile Access to Library Resources,” Library Hi Tech 31, no. 3 (2013): 467–77, http://dx.doi.org/10.1108/LHT-12-2012-0132. 97. Elder, “How to Become the ‘Tech Guy.’” 98. Ryan, “Developing Library Websites.” 99. West, Hafner and Faust, “Expanding Access.” 100. Hu and Meier, “Planning for a Mobile Future.” 101. Iglesias and Meesangnill, “Mobile Website Development.” https://journals.ala.org/ltr/article/view/4506 https://journals.ala.org/ltr/article/view/4655/5511 http://dx.doi.org/10.1108/07378831211266582 http://dx.doi.org/10.1108/LHT-12-2012-0132 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 62 102. Wisniewski, “Mobile Usability.” 103. Joe Murphy, “Using Mobile Devices for Research: Smartphones, Databases and Libraries,” Online 34, no. 3 (2010): 14–18. 104. Amy Vecchione and Margie Ruppel, “Reference is Neither Here nor There: A Snapshot of SMS Reference Services,” The Reference Librarian 53, no. 4 (2012): 355–72, http://dx.doi.org/10.1080/02763877.2012.704569. 105. Hu and Meier, “Planning for a Mobile Future.” 106. Wilson and McCarthy, “The Mobile University.” 107. Project Management Institute, A Guide to the Project Management Body of Knowledge (PMBOK Guide) (Newtown Square, PA: Project Management Institute, 2013). 108. Devendra Potnis et al., “Skills and Knowledge Needed to Serve as Mobile Technology Consultants in Information Organizations,” Journal of Education for Library & Information Science 57 (2016): 187–96. http://dx.doi.org/10.1080/02763877.2012.704569 ABSTRACT INTRODUCTION METHOD Forming and Managing a Team Key Steps in the Analysis Phase Key Steps for Designing MAMW Key Steps for Implementing MAMW Skills Needed for Maintaining MAMW CONCLUSION Forming and managing team This paper assumes a very small number of scholarly publications to be reflective of the real-world scenarios of developing MAMW for all types of libraries. This assumption is one of the limitations of this study. Also, the sample of publications anal... REFERENCES
8749 ---- In the Name of the Name: RDF Literals, ER Attributes, and the Potential to Rethink the Structures and Visualizations of Catalogs Manolis Peponakis ABSTRACT The aim of this study is to contribute to the field of machine-processable bibliographic data that is suitable for the Semantic Web. We examine the Entity Relationship (ER) model, which has been selected by IFLA as a “conceptual framework” in order to model the FR family (FRBR, FRAD, and RDA), and the problems ER causes as we move towards the Semantic Web. Subsequently, while maintaining the semantics of the aforementioned standards but rejecting the ER as a conceptual framework for bibliographic data, this paper builds on the RDF (Resource Description Framework) potential and documents how both the RDF and Linked Data’s rationale can affect the way we model bibliographic data. In this way, a new approach to bibliographic data emerges where the distinction between description and authorities is obsolete. Instead, the integration of the authorities with descriptive information becomes fundamental so that a network of correlations can be established between the entities and the names by which the entities are known. Naming is a vital issue for human cultures because names are not random sequences of characters or sounds that stand just as identifiers for the entities—they also have socio-cultural meanings and interpretations. Thus, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. In this study, a method is proposed to connect the names with the entities they represent and, in this way, to document the provenance of these names by connecting specific resources with specific names. INTRODUCTION The basic aim of this study is to contribute to the field of machine-processable bibliographic data. As to what constitutes “machine processable” we concur with the clarification of Antoniou and van Harmelen, who state, “In the literature the term machine-understandable is used quite often. We believe it is the wrong word because it gives the wrong impression. It is not necessary for intelligent agents to understand information; it is sufficient for them to process information effectively, which sometimes causes people to think the machine really understands.”1 Also, in the bibliography used, the term “computationally processable” is used as a synonym to “machine processable.” Manolis Peponakis (epepo@ekt.gr) is an information scientist at the National Documentation Centre, National Hellenic Research Foundation, Athens, Greece. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 19 mailto:epepo@ekt.gr With regard to machine-processable bibliographic data, we have taken into consideration both the practice and theory of Library and Information Science (LIS) and Computer Science. From LIS we have chosen the Functional Requirements for Bibliographic Records (FRBR) and the Functional Requirements for Authority Data (FRAD) while making comparisons with the Resource Description and Access (RDA) standard. From the Computer Science domain we have chosen the Resource Description Framework (RDF) as a basic mechanism for the Semantic Web. We examine the Entity Relationship (ER) model (selected from IFLA as a “conceptual framework” for the development of FRBR), 2 as well as the potential problems that may arise as we move towards the Semantic Web. Having rejected the ER model as a conceptual framework for bibliographic data, we have built on the potential of RDF and document how its rationale affects the modeling process. In the context of the Semantic Web and Uniform Resource Identifiers (URIs), the identification process has been transformed. For this reason we have performed an analysis of appellations and names as identifiers and also explored how we could move on from an era where controlled names play the role of identifiers to one of the URI dominion: “While it is self-evident that labels and comments are important for constructing and using ontologies by humans, the OWL standard does not pay much attention to them. The standard focuses on the syntax, structure and reasoning capabilities. . . . If the Semantic Web is to be queried by humans, there will be no other way than dealing with the ambiguousness of human language.”3 It is essential to build on the “library's signature service, its catalog,”4 and use it to provide added- value services. But to get there, first there has to be “a shift in perspective, from locked-up databases of records to open data shared on the Web.”5 This requires a transition from descriptions aimed at human readers to descriptions that put the emphasis on computational processes to escape the rationale of records being a condensed description in textual form and move towards more flexible and fruitful representations and visualizations. BACKGROUND FRBR and RDA The FR family has been growing for more than a decade. The first member of the family was the Functional Requirements for Bibliographic Records (FRBR),6 the first version of which was published towards the end of the last century. Subsequently, IFLA decided to extend the model in order to cover authorities. During this process, the task of modeling the names was separated from the task of modeling the subjects. Thus two new members were added to the family; the “Functional Requirements for Authority Data: A Conceptual Model” (FRAD) and the “Functional Requirements for Subject Authority Data (FRSAD).” 7,8 At the same period of time, the “Resource Description and Access” (RDA) standard was established as a set of cataloging rules to replace the AACR standard. According to its creators, the alignment with the FR family was crucial. As stated, IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 20 “A key element in the design of RDA is its alignment with the conceptual models for bibliographic and authority data developed by the International Federation of Library Associations and Institutions (IFLA): Functional Requirements for Bibliographic Records [and] Functional Requirements for Authority Data.”9 This paper uses the FR family and the RDA as a starting point but detects some problems and inconsistencies between these models. It sustains the basic semantics from these standards but rejects their structural formalism because it finds that it is quite problematic and lacks effectiveness in expressing highly machine-processable data. The effective processability of the data will be discussed in detail in the section “The Impact of the Representation Scheme’s Selection: RDF versus ER.” Among the FR family, the terminology is inconsistent and, as we pass from the FRBR to FRAD and FRSAD, even the perception angle of the general model undergoes change. In FRBR (the first in order), there is no notion of the name as an entity. FRAD introduces this perception (FRAD also adds family as a new entity) and FRSAD makes a step forward and introduces the concept of nomen instead of the concept of name. Hence, despite the fact that each of the members of the FR family of models has been represented in RDF,10 there is no established consolidated edition yet that combines the different angles using a common model and terminology (vocabulary).11 These representations (one for each model) are available at IFLA’s website.12 On the other hand, in the context of RDA there may be more consistency regarding terminology, but, as is well established in the relevant literature, there are significant differences between the two models, i.e. the FR family and RDA.13,14,15 Due to these differences, there are no URIs, not even in the RDA registry, in the examples of our study.16 Given the above, the terms appearing in the figures are a selection from the three texts of the FR family. Thus, nomen (from FRSAD) is used instead of name (from FRAD) as a more abstract notion, and the attribute—property in the context of RDF—“has string” (from FRAD) is used to assign a specific literal to a nomen. In figures 2–5 we have used the “has appellation” (reversed “is appellation of”) relationship of FRAD.17 Notes about Terminology and Graphs: How to Read the Figures In this paper two different sorts of figures appear. This covers the need to compare two different models and pinpoint the differences between them and the problems that arise from selecting the ER model to express FRBR. An explanation of the two major models follows in the next subsection. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 21 The first figure type follows the diagrams of the Entity–relationship model and is used in figure 1. In this case: • The rectangles represent entities. • The oval shapes represent attributes. • The diamond-shaped boxes represent relationships. The second figure type has been created according to the RDF graphical representations and is used in figures 2–5. In these cases: • The oval shapes represent nodes that are identified by a URI and they could serve as objects or subjects for further expansion of the network. In figures 3–5 all the names were derived from the FR entities. • The line connectors between nodes represent the predicates (i.e., they are properties) and should also serve as URIs. • The rectangle shapes represent literals consisting of lexical form. Language code could apply in these cases. With or without language codes, these are the end points and they could not be subject to new connections. We follow the common modeling of the language in RDF in which the literal itself contains a language code, for example "example"@en in standard Turtle syntax, or in RDFS XML coding. We must note that this kind of modeling is quite a simplistic way of language modeling because there is no mechanism to declare more information about language, such as multiple scripts, which could apply in the context of the same language. The Impact of the Representation Scheme’s Selection: RDF versus ER Nowadays, all the information on library catalogs is created through and stored in computers. This technological infrastructure provides specific methods and dictates limitations for the catalog’s data management. Hence, every model must take into consideration the basic rationale of the technological infrastructure that will curate and process the data. Depending on the syntax capabilities of the representation model, the expression of what we want to express becomes reasonably easy and accurate since “semantics is always going to have a close relationship with the field of syntax.”18 This establishes a vital relationship between what we want to do and how computers can do it. In this section we emphasize the limitations of the Entity Relationship (ER) implementation, which FRBR proposes, and denote how syntax affects expressiveness and, accordingly, functionality. Finally, we demonstrate how the selection of one implementation or another (in our case ER vs. RDF) has serious implications, both for cataloging rules and for cataloging practice. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 22 Why do we compare these two specific models? The ER model is the base that has been selected from IFLA as a “conceptual framework” 19 for the development of FRBR, while FRBR is the conceptual model upon which RDA has been founded. Subsequently, RDA is also affected by the choice of ER model. On the other hand, RDF is the current conceptualization for resource description in the web of data. So, what kind of problems and conflicts arise from the implementations of each of these models? The basic rationale of ER comprises three fundamental elements. There are entities; entities have attributes; and there are relationships between entities. It is also possible to declare cardinality constraints upon which the FR family builds. Then again, RDF implies quite a different model. “The core structure of the abstract syntax is a set of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph. An RDF graph can be visualized as a node and directed-arc diagram, in which each triple is represented as a node-arc-node link. . . . There can be three kinds of nodes in an RDF graph: IRIs, literals, and blank nodes.”20 “Linking the object of one statement to the subject of another, via URIs, results in a chain of linked statements, or linked data. This avoids the ambiguity of using natural language strings as headings to match statements. As a result, a literal object terminates a linked data chain, and literals are generally used for human-readable display data such as labels, notes, names, and so on.”21 As a representative example of the differences between the two models, let us consider “place of publication.” Peponakis counts nine attributes of place and notices that, due to the fact that the ER model does not allow links between attributes, there is no way to define explicitly whether these attributes address the same place or not.22 Taking into consideration this problem we demonstrate the transition from the ER attributes approach to RDF implementations in figures 1– 2. Let us assume that there is Person (X), who was born in London, is named John Smith and works at Publisher (Y). This Publisher is located in London, where Book (1), entitled History of London, has been published. For this specific book, Person X was the lithographer. If we create a strict mapping to FRBR entities, attributes, and relations, then we have the situation illustrated in figure 1. Due to the fact that there is no way to link the four occurrences of London (inasmuch as there is no option to define relations between attributes in the ER model), there is no way to be certain that London is the same in all cases. Judging only by the name, it could stand for London in England, in Ontario, in Ohio, or elsewhere. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 23 Figure 1. Example of “Place” as attribute of several entities The IFLA working group has faced the problem with place and noted the following. The model does not, however, parallel entity relationships with attributes in all cases where such parallels could be drawn. For example, “place of publication/distribution” is defined as an attribute of the manifestation to reflect the statement appearing in the manifestation itself that indicates where it was published. Inasmuch as the model also defines place as an entity it would have been possible to define an additional relationship linking the entity place either directly to the manifestation or indirectly through the entities person and corporate body which in turn are linked through the production relationship to the manifestation. To produce a fully developed data model further definition of that kind would be appropriate. But for the purposes of this study it was deemed unnecessary to have the conceptual model reflect all such possibilities. 23 Finally, they seem to avoid the problem and repeat their position in FRAD as well. In certain instances, the model treats an association between one entity and another simply as an attribute of the first entity. For example, the association between a person and the place in which the person was born could be expressed logically by defining a relationship (“born in”) between person and place. However, for the purposes of this study, it was deemed sufficient to treat place of birth simply as an attribute of person. 24 IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 24 For some reason the creators of the FR family have chosen not to “upgrade” the attributes of place into one and only one entity. Furthermore, the same problem exists for many attributes, not only for place. Thus, the problem has to do with the selection of ER as “conceptual framework” and not with the specific entity of place. If we accept that “Place of Publication” must not be recorded as it appears on the resource, an RDF-based approach makes things clearer, as figure 2 shows. In this case, all attributes of place are promoted to the same RDF node and, instead of four repeats of the attribute with the value “London,” we reduce it to one and only one node with four connections to it. Then, as illustrated by figure 2, we can be sure that all instances refer to the same London. Figure 2. RDF-based representations of figure 1 In figure 2, it is assumed that there is no need to transcribe the literal of “Place of Publication” from the resource; i.e., we did not follow rule 2.8.1.4 of RDA: “Transcribe places of publication and publishers' names as they appear on the source of information.” For cataloging rules that demand to record the place as it appears on the resource, the readers can consult the subsection “Place Names” in this study. Last but not least, RDF has another significant advantage compared to the ER model: data coded in RDF are packed ready for use in the Semantic Web. On the contrary, data coded in ER must undergo conversion—with all its implications—in order to be published in the Semantic Web. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 25 NAMES, ENTITIES, and IDENTITIES In this section, the significance of names as carriers of meaning is outlined and the importance of documenting the relations of names with the entities and identities they refer to is established. Additionally, the basic approaches are presented for metadata generation for managing names. These approaches resulted in the distinction (dissociation of authorities) from the bibliographic records, which in turn led (both FRBR/FRAD and RDA) to the lack of potentially linking—in an explicit way—the entity with the names it goes by. This linking, as it is presented later in this text, is fundamental for the description and interpretation of the entity. In everyday communication, the usage of a name in a sentence plays the role of the identifier for the entity that this specific name indicates. If the speakers share a common background, there is no need for qualifiers other than the name in order to disambiguate information such as whether Nick is Person X or Person Y, or if the word “London” indicates the city in Ohio or in England, etc. Thus, the common background leads to a very limited context in which the interpretation of the name and the assignment to the appropriate entity is sufficient and accurate. However, the context of the Internet is extended into a variety of possibilities, so there is need of a more precise way to identify specific entities. In this regard, a very essential issue is the distinction between the properties of the name and the properties of the entity that is represented by the specific name. The word “John” could be recognized as an English name, but we jump to a logical flaw if we assume that John knows English. A representative example of this kind of inference (syllogism) can be found in Rayside and Campbell.25 Statement: “Man is a species of animal. Socrates is a man. Therefore, Socrates is a species of animal. . . . ‘Man' is a three-lettered word. Socrates is a man. Therefore, Socrates is a three-lettered word.” Therefore the authorities of a catalog should embody a two-level modeling of the information they represent. The first has to do with the entities and the second with the names of these entities. Consequently, there is the need to find a way to pass from names to the entities they indicate; and, from entities, to the various appellations that these entities have. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 26 In catalogs, it is kind of vague whether the change of a name signifies a new identity. Niu states: “For example: the maiden name and the married name of an agent are normally not considered two separate identities, yet one pseudonym used for writing fiction and another pseudonym used for writing scientific works are often considered two different identities of an agent.”26 Then there can be one individual with many identities. But there can also be one identity which incorporates many individuals: for example, a shared pseudonym for a group of authors. To deal with these problems, FRAD introduces the notion of persona, rejecting at the same time the idea that a person is equal to an individual. FRAD defines a person as an “individual or a persona or identity established or adopted by an individual or group.”27 The question that arises here is when the persona must be conceived as a new identity. Yet, FRAD does not make a sufficient judgment; instead, they refer to cataloguing rules. “Under some cataloguing rules, for example, authors are uniformly viewed as real individuals, and consequently specific instances of the bibliographic entity person always correspond to individuals. Under other cataloguing rules, however, authors may be viewed in certain circumstances as establishing more than one bibliographic identity, and in that case a specific instance of the bibliographic entity person may correspond to a persona adopted by an individual rather than to the individual per se.”28 So there is no specific guidance if, for example, in the case of “religious relationship,”29 there must be one identity created with two alternative names or two different identities. Rule 9.2.2.8 in RDA does not elaborate further. Still, even with the problem of identities solved, the matter of appellations itself could be extremely complicated, and this is widely addressed in relevant literature.30,31,32 The VIAF project confirms this with an extremely huge data set .33 Assigning all appellations as attributes is an easy way to model the variants of a name, but it is very simplistic because it “does not allow these appellations to have attributes of their own and neither does it allow the establishing of relationships among the appellations. . . . FRAD makes a big step forward: all appellations are defined as entities in their own right, thus allowing full modeling.”34 Of course, FRAD’s approach is not a novelty in the domain of LIS since library catalogs have been modeling names since the era of MARC. In UNIMARC Authorities,35 the control subfield $5 contains a coded value to indicate the relations between the names with values such as “k = name before the marriage,” “i = name in religion,” “d = acronym,” etc., and in MARC 21 there is the corresponding subfield $w.36 FRAD puts these values on a more consistent and abstract level. FRAD also defines “Relationships between Persons, Families, Corporate Bodies, and Works” in section 5.3 and “Relationships between their Various Names” in section 5.4.37 The Distinction between Authorities and Descriptive Information Since the days of card catalogs and for as long as MARC and AACR have been used, bibliographic records have set their grounds on the dichotomy between descriptive information and control access points. The various types of headings stand for control access points. The terminus of headings was the alphabetical sorting. With the advent of computers, they were used as string identifiers to cluster and retrieve relevant bibliographic records. These bibliographic records had INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 27 a body of descriptive information that was transcribed from the resource and remained unchanged. So the headings were the keys to the records and the records were surrogates for documents. “The elements of a bibliographic record . . . were designed to be read and comprehended by human beings, not by machines”38; established headings are not an exception. One of their basic characteristics was the precondition that they were unique in the context of a specific catalog, thereby avoiding ambiguity. In every case of synonymy, qualifiers (such as date of birth or profession) were added to disambiguate, while the names also played the role of a unique identifier. From this process, an issue emerges: the information that appears on the document has changed and the controlled name may be completely different from the name on the resource. This means that the cataloger performs a transformation of the information, and this transformation carries two dangers. First, by changing the name, there is the possibility of assigning the entity behind the name to a wrong entity. Second, by disturbing the correspondence between the information on the resource and the information on the record of the resource, the record becomes a problematic surrogate of the resource. To surpass this obstacle, traditional catalogs split the information into two different areas: one with the established forms, i.e., the headings; and the second with the purely descriptive information, i.e., the information that must be transcribed from the resource. This is the reason why traditional library catalogs put much effort into transcribing information from resources and very detailed guidelines have been developed. On the other hand, current approaches on metadata creation (such as Dublin Core) seem to underestimate the importance of descriptive information while concentrating on the established forms of names. But how can we be sure that different literals communicate the same meaning? Does this kind of simplification, perhaps, cause problems regarding the integrity of the information? The names are not just sequences of characters (i.e., strings), but they carry latent information. It is known that there are women who wrote using male names (for example Mary Ann Evans wrote as George Eliot) and men who wrote by using female names. There are also nicknames for groups (e.g., “Richard Henry” is a pseudonym for the collaborative works of Richard Butler and Henry Chance Newton), etc. Therefore, it is important not to ignore names and the forms in which they appear on the resources, but to model them in such a way that integration between authorities and descriptive information is feasible, and the names are efficiently machine-processable. INTEGRATING AUTHORITIES WITH DESCRIPTIVE INFORMATION As we have already stated, traditional library catalogs are built on the dichotomy between description and access points. This analysis aims to bring descriptive information and authorities closer, i.e. to connect the access point of catalogs with the description of the resource. The basic principle of the model presented in this section is to promote each verbal (lexical) representation of a name to a nomen, whether this form of the name derives from a controlled vocabulary or not. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 28 In the cases that this form appears in a specific vocabulary, appropriate properties could be used to indicate such a relation. In this section, some representative examples are presented. It is important to note, once again, that every node and relation in the following figures could (and must, in the context of the Semantic Web) be identified by a URI, except for the values in rectangles, which are RDF simple literals and therefore cannot be the subjects of further expansion. Thus, the concatenation is the following: Every individual (instance of the relevant class) acquires a URI. Every individual is connected through the “has appellation” property (acquires URI) to a nomen (also acquires URI) and these nomens end up connected to a plain RDF literal, which is in natural language wording and cannot be subjected to further analysis. Place Names The problem of place as an attribute in FRBR and FRAD has also been analyzed in the Background Analysis of the current paper, specifically in the subsection “The Impact of the Representation Scheme’s Selection: RDF versus ER.” Here, a solution to this problem that is compatible with the FRBR/RDA solution is proposed. By promoting every nomen of a place to an RDF node, there is the option of referring to the entity of place as a whole or to a specific appellation of this entity. So, the relation (property in the context of RDF) between the subjects of a work could be indicated by connecting Work X with Place Z. On the other hand, according to rule 2.8.1.4 of RDA, the place of publication for the manifestation must be transcribed as it appears on the source of information. But following the connections presented in figure 3, it is easy to assume that this specific nomen corresponds to the same entity, i.e., to the same place. Figure 3. Place INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 29 Personal names In the section “Names, Entities and Identities,” we analyzed many of the problems associated with personal names. Here, a model is presented where the work (and expression) is connected directly with the author, whereas manifestation is connected with a specific appellation, i.e., nomen, of this author. Figure 4. Statements of responsibility RDA rule 2.4.1.4 states, “Transcribe a statement of responsibility as it appears on the source of information.” But occasionally the statement of responsibility may contain phrases and not just names. In these cases, a solution similar to the Metadata Object Description Schema (MODS) could be implemented where, if needed, the statement of responsibility is included in the note element using the attribute type="statement Of Responsibility." Titles The management of titles in FRBR and RDA indicates a different point of view between the two standards. According to RDA there is no title for the expression,39 and, as Taniguchi states, this is a “significant difference between FRBR and RDA.”40 BIBFRAME abides by the same principle of downgrading expression, since it entangles expression with work in an indivisible unit. In this regard, BIBFRAME is closer to RDA than to FRBR. The notion of work has nothing to do with specific languages, even in the case when the work is a written text. Therefore the assignment of the title of work to a specific appellation is an unnecessary limitation. On the contrary, the title of a manifestation is derived by a specific IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 30 resource. We argue that between these two poles there is the title of expression, which could stand as a uniform title per language. Figure 5. Titles V of BIBLIOGRAPHIC RECORDS and CATALOGING RULES Resource description in the domain of LIS—from Cutter’s era to the present day—emphasizes static linear textual representations. According to the RDA “0.1 Key Features,” “In RDA, there is a clear line of separation between the guidelines and instructions on recording data and those on the presentation of data. This separation has been established in order to optimize flexibility in the storage and display of the data produced using RDA. Guidelines and instructions on recording data are covered in chapters 1 through 37; those on the presentation of data are covered in appendices D and E.” But the tables in the relative appendices (D and E) contain guidelines that are mainly concentrated on punctuation issues, and they do not take into consideration the dynamics of current interactive user interface capabilities. As Coyle and Hillmann comment, “there are instructions for highly structured strings that are clearly not compatible with what we think of today as machine-manipulable data.”41 It is rather like producing high-tech cards: RDA is faithful INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 31 to the classical text-centric approaches that produce bibliographic records as a linear enumeration of attributes; thus, RDA can be likened to a new suit that is quite old fashioned. Traditional catalogs (from card catalogs to OPACs and repository catalogs) were built upon the principle of creating autonomous records. FRBR set this principle, i.e. one record for each resource, under dispute, while Linked Data abolishes it. This way, a gigantic graph of statements is created, while a certain part of these statements (not always the same) responds to or describes the desired information. Thus, a more sophisticated method emerges, if not makes itself imposed, for showing the results. Therefore, the issue is not to present a record that describes a specific resource, since this conceptualization tends to be obsolete altogether. Consequently, the visualization has to be different while in dependence with the data structure as well as the available interface of the searcher. In this context, the analysis of this study tries to keep in balance the machine-processable character of RDF that builds on identifiers (URIs), while paying attention to the linguistic representation of entities. We argue that the balance between them will result in highly accurate and efficient representations for both humans and software agents. Let us consider the model for titles that has been introduced in this study. According to FRBR, “if the work has appeared under varying titles (differing in form, language, etc.), a bibliographic agency normally selects one of those titles as the basis of a ‘uniform title’ for purposes of consistency in naming and referencing the work.”42 RDA treats the case in a very similar way: rule 5.1.3 states, “The term ‘title of the work’ refers to a word, character, or group of words and/or characters by which a work is known. The term ‘preferred title for the work’ refers to the title or form of title chosen to identify the work. The preferred title is also the basis for the authorized access point representing that work”. In this study, we consider the aforementioned statements as a projection that springs from the days when records were static textual descriptions independent of interfaces. Nowadays we are moving towards a much clearer distinction between the entity and its names. This is reflected in figure 5, in which the connection between a work and its author has nothing to do with specific names (appellations) but is based on URIs. The selection of the appropriate name as a title for the specific work could be based on certain criteria such as the language of the interface: in this case, the title of the work will be the title of the user interface language, and if this is not possible (i.e. there is no title label in this language), then it could be the title of the catalog’s default language. Following the kind of modeling proposed in the current study, the visualizations of data become more flexible and efficient in a variety of dynamic ways. Hence, we can isolate and display nodes and their connections, correlate them with the interface language or screen size (i.e., mobile phone or PC), create levels relative to the desired depth of analysis, personalize them upon the user’s request or habits, and so on. Also, it becomes possible to display the data in forms other than textual. “As a result, humans, with their great visual pattern recognition skills, can comprehend data tremendously faster and more effectively through visualization than by reading the numerical or textual representation of the data.”43 IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 32 As we have already mentioned, the syntax and the semantics are always going to have a close relationship, but it is crystal clear that, now more than ever, the current Semantic Web standards allow for greater flexibility. As Dunsire et al. put it, The RDF approach is very different from the traditional library catalog record exemplified by MARC21, where descriptions of multiple aspects of a resource are bound together by a specific syntax of tags, indicators, and subfields as a single identifiable stream of data that is manipulated as a whole. In RDF, the data must be separated out into single statements that can then be processed independently from one another; processing includes the aggregation of statements into a record-based view, but is not confined to any specific record schema or source for the data. Statements or triples can be mixed and matched from many different sources to form many different kinds of user-friendly displays.44 In this framework, cataloging rules must reexamine their instructions in light of the new opportunities offered by technological advancements. DISCUSSION Naming is a vital issue for human cultures. Names are not random sequences of characters or sounds that stand just as identifiers for the entities, but they also have socio-cultural meanings and interpretations. Recently, out of “political correctness” and fear of triggering racism, Sweden changed the names of bird species that could potentially offend, such as “gypsy bird” and “negro.”45 Therefore we cannot treat names just as random identifiers. In this study we examined how, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. We proposed a method for connecting the names to the entities they represent and, at the same time, we documented the provenance of these names by connecting specific resources with specific names. We illustrated how to establish connections between entities, connections between an entity and a specific name of another entity, as well as connections between one name and another name concerning one or two entities. In the proposed framework, we maintain the linguistic character of naming while modeling the names in a machine-processable way. This formalism allows for a high level of expressiveness and flexible descriptions that do not have a static, text-centric orientation, since the central point is not the establishment of the text values (i.e., heading) but the meaning of our statements. This study has shown that it is important to have the possibility to establish relationships both between entities and between specific appellations (nomens in the context of this study) of these entities. To achieve this we promoted every appellation to an RDF node. This is not something unheard of in the domain of RDF since this approach has also been adopted by W3C for the development of SKOS-XL.46 FRBRoo, which is another interpretation of increasing influence in the wider context of the FR family, adopts the same perspective. 47 FRBRoo also gives the option to connect a specific name with a resource through the property “R64 used name (was name used INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 33 by)” or to connect a name with someone who uses this specific name through the property “R63 named (was named by).” Murray and Tillett state that “cataloging is a process of making observations on resources”48; hence, the production of records is the result of the judgments during this process. But in the context of traditional descriptive cataloging, the cataloger was not required to judge information in any way other than its category, i.e. to characterize whether the X set of characters corresponded to the name of an author, publisher, or place and so on. There was no obligation of assigning a particular name to a specific author, publisher, or place. In our approach, the cataloger interprets the information and supports the catalog’s potential to deliver added-value information. Moreover, the initial information remains undifferentiated; hence, there is always the option of going back in order to generate new interpretations or validate existing ones. In recent years, there has been a significant increase in the attention given to multi-entity models of resource description.49 In this new environment, “the creation of one record per resource seems a deficient simplification.”50 RDF allows the transformation of universal bibliographic control to a giant global graph.51 In this manner, current approaches on resource description “cannot be considered as simple metadata describing a specific resource but more like some kind of knowledge related to the resource.”52 Indeed, this knowledge can be computationally processable and exploitable. Yet, to achieve this, “catalogers can only begin to work in this way if they are not held bound by the traditional definitions and conceptualizations of bibliographic records.”53 One critical issue is the isolation of parts (sets of statements) of this “giant graph” and the linking of these parts with something else; indeed, theory on this topic is starting to emerge.54 This is very essential because it allows for the creation of ad hoc clusters (i.e. the usage of a specific identity for an entity with all the names that have been assigned to this identity, in our context), which could be used as a set to link to some other entity. As a final remark, we could say that authorities manage controlled access points. In the Semantic Web, every URI is a controlled access point, and hence, the discrimination between description and authorities acquires a new meaning. In the context of machine-processable bibliographic data, the aim is to connect these two, i.e. the authorities with the description, and examine how one can support the other. However, since the emphasis is not on their individual management, we are drawn away from a mentality of ‘descriptive information versus access points” and towards one of “descriptive information as an access point.” ACKNOWLEDGEMENT The author wishes to thank Henry Scott who assisted in the proofreading of the manuscript. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 34 REFERENCES and NOTES 1. Grigoris Antoniou and Frank van Harmelen, A Semantic Web Primer, 2nd ed. (Cambridge, MA: MIT Press, 2008), 3. 2. IFLA, Functional Requirements for Bibliographic Records: Final Report, as amended and corrected through February 2009, IFLA Series on Bibliographic Control, vol. 19 (Munich: K.G. Saur, 1998), 6. 3. Daniel Kless et al., “Interoperability of Knowledge Organization Systems with and through Ontologies,” in Classification & Ontology: Formal Approaches and Access to Knowledge: Proceedings of the International UDC Seminar 19–20 September 2011, The Hague, the Netherlands, Organized by UDC Consortium, The Hague, edited by Aida Slavic and Edgardo Civallero (Würzburg: Ergon, 2011), 63–64. 4. Karen Coyle and Diane Hillmann, “Resource Description and Access (RDA): Cataloging Rules for the 20th Century,” D-Lib Magazine 13, no. 1/2 (January 2007): para. 2, doi:10.1045/january2007-coyle. 5. Cory K. Lampert and Silvia B. Southwick, “Leading to Linking: Introducing Linked Data to Academic Library Digital Collections,” Journal of Library Metadata 13, no. 2–3 (2013): 231, doi:10.1080/19386389.2013.826095. 6. IFLA, Functional Requirements for Bibliographic Records. 7. IFLA, Functional Requirements for Authority Data: A Conceptual Model, edited by Glenn E. Patton, IFLA Series on Bibliographic Control (Munich: K.G. Saur, 2009). 8. IFLA, “Functional Requirements for Subject Authority Data (FRSAD): A Conceptual Model” (IFLA, 2010), http://www.ifla.org/files/assets/classification-and-indexing/functional requirements-for-subject-authority-data/frsad-final-report.pdf. 9. ALA, “RDA Toolkit: Resource Description and Access,” sec. 0.3.1, accessed June 18, 2014, http://access.rdatoolkit.org/. 10. Gordon Dunsire, “Representing the FR Family in the Semantic Web,” Cataloging & Classification Quarterly 50, no. 5–7 (2012): 724–41, dx:10.1080/01639374.2012.679881. 11. While this paper was under review, IFLA released the draft “FRBR-Library Reference Model” (FRBR-LRM), which is a consolidated edition for the FR family standards. It is developed according to the respective individual standards following the principles of the entity relationship modeling, which is challenged in this paper. Taking into account the ER modeling and the statement (available on p.5 of the standard) that “the model is comprehensive at the conceptual level, but only indicative in terms of the attributes and relationships that are defined,” this consolidated edition could not be perceived as a standard that could be implemented directly as a property vocabulary qualifying for use in the RDF environment. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 35 http://dx.doi.org/10.1045/january2007-coyle http://dx.doi.org/10.1080/19386389.2013.826095 http://www.ifla.org/files/assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf http://www.ifla.org/files/assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf http://access.rdatoolkit.org/ http://dx.doi.org/10.1080/01639374.2012.679881 12. Main page (for all FR) at http://iflastandards.info/ns/fr/; “FRBR Model" available at http://iflastandards.info/ns/fr/frbr/frbrer/; “FRAD Model” available at http://iflastandards.info/ns/fr/frad/; “FRSAD Model” available at http://iflastandards.info/ns/fr/frsad/. An addition to the previous is FRBRoo: the element set is available at http://iflastandards.info/ns/fr/frbr/frbroo/. 13. Manolis Peponakis, “Conceptualizations of the Cataloging Object: A Critique on Current Perceptions of FRBR Group 1 Entities,” Cataloging & Classification Quarterly 50, no. 5–7 (2012): 587–602, doi:10.1080/01639374.2012.681275. 14. Pat Riva and Chris Oliver, “Evaluation of RDA as an Implementation of FRBR and FRAD,” Cataloging & Classification Quarterly 50, no. 5–7 (2012): 564–86, doi:10.1080/01639374.2012.680848. 15. Shoichi Taniguchi, “Viewing RDA from FRBR and FRAD: Does RDA Represent a Different Conceptual Model?,” Cataloging & Classification Quarterly 50, no. 8 (2012): 929–43, doi:10.1080/01639374.2012.712631. 16. RDA registry is available at http://www.rdaregistry.info/. 17. The nomen entity and the “has appellation” (reversed “is appellation of”) property are also used by the FRBR-LRM. 18. Paul H. Portner, What Is Meaning?: Fundamentals of Formal Semantics (Malden, MA: Blackwell, 2005), 34. 19. IFLA, Functional Requirements for Bibliographic Records, 19:6. 20. W3C, “RDF 1.1 Concepts and Abstract Syntax: W3C Recommendation,” February 25, 2014, http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/. 21. Gordon Dunsire, Diane Hillmann, and Jon Phipps, “Reconsidering Universal Bibliographic Control in Light of the Semantic Web,” Journal of Library Metadata 12, no. 2–3 (2012): 166, doi:10.1080/19386389.2012.699831. 22. Manolis Peponakis, “Libraries’ Metadata as Data in the Era of the Semantic Web: Modeling a Repository of Master Theses and PhD Dissertations for the Web of Data,” Journal of Library Metadata 13, no. 4 (2013): 333, doi:10.1080/19386389.2013.846618. 23. IFLA, Functional Requirements for Bibliographic Records, 19:32. 24. IFLA, Functional Requirements for Authority Data: A Conceptual Model, 36–37. 25. Derek Rayside and Gerard T. Campbell, “An Aristotelian Understanding of Object-Oriented Programming,” in Proceedings of the 15th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA ’00 (New York: ACM, 2000), 350, doi:10.1145/353171.353194. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 36 http://iflastandards.info/ns/fr/ http://iflastandards.info/ns/fr/frbr/frbrer/ http://iflastandards.info/ns/fr/frad/ http://iflastandards.info/ns/fr/frad/ http://iflastandards.info/ns/fr/frsad/ http://iflastandards.info/ns/fr/frbr/frbroo/ http://dx.doi.org/10.1080/01639374.2012.681275 http://dx.doi.org/10.1080/01639374.2012.680848 http://dx.doi.org/10.1080/01639374.2012.712631 http://www.rdaregistry.info/ http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/ http://dx.doi.org/10.1080/19386389.2012.699831 http://dx.doi.org/10.1080/19386389.2013.846618 http://dx.doi.org/10.1145/353171.353194 26. Jinfang Niu, “Evolving Landscape in Name Authority Control,” Cataloging & Classification Quarterly 51, no. 4 (2013): 405, doi:10.1080/01639374.2012.756843. 27. IFLA, Functional Requirements for Authority Data: A Conceptual Model, 24. 28. Ibid., 20. 29. “Religious relationship” is the “relationship between a person and an identity that person assumes in a religious capacity”; for example the “relationship between the person known as Thomas Merton and that person’s name in religion, Father Louis” (IFLA, 2009, 61–62). 30. Junli Diao, “‘Fu hao,’ ‘fu hao,’ ‘fuHao,’ or ‘fu Hao’? A Cataloger’s Navigation of an Ancient Chinese Woman’s Name,” Cataloging & Classification Quarterly 53, no. 1 (2015): 71–87, doi:10.1080/01639374.2014.935543. 31. On Byung-Won, Sang Choi Gyu, and Jung Soo-Mok, “A Case Study for Understanding the Nature of Redundant Entities in Bibliographic Digital Libraries,” Program: Electronic Library and Information Systems 48, no. 3 (July 1, 2014): 246–71, doi:10.1108/PROG-07-2012-0037. 32. Neil R. Smalheiser and Vetle I. Torvik, “Author Name Disambiguation,” Annual Review of Information Science and Technology 43, no. 1 (2009): 1–43, doi:10.1002/aris.2009.1440430113. 33. Thomas B. Hickey and Jenny A. Toves, “Managing Ambiguity in VIAF,” D-Lib Magazine 20, no. 7/8 (2014), doi:10.1045/july2014-hickey. 34. Martin Doerr, Pat Riva, and Maja Žumer, “FRBR Entities: Identity and Identification,” Cataloging & Classification Quarterly 50, no. 5–7 (2012): 524, doi:10.1080/01639374.2012.681252. 35. IFLA, UNIMARC Manual: Authorities Format, 2nd revised and enlarged edition, UBCIM Publications—New Series, vol. 22 (Munich: K.G. Saur, 2001). 36. Library of Congress, “MARC 21 Format for Authority Data” (Library of Congress, April 18, 1999), http://www.loc.gov/marc/authority/. 37. IFLA, Functional Requirements for Authority Data: A Conceptual Model. 38. Martha M. Yee, “FRBRization: A Method for Turning Online Public Findings Lists into Online Public Catalogs,” Information Technology and Libraries 24, no. 2 (2005): 81, doi:10.6017/ital.v24i2.3368. 39. See FRBR-RDA Mapping from Joint Steering Committee for Development of RDA available at http://www.rda-jsc.org/docs/5rda-frbrrdamappingrev.pdf 40. Taniguchi, “Viewing RDA from FRBR and FRAD,” 934. 41. Coyle and Hillmann, “Resource Description and Access (RDA): Cataloging Rules for the 20th Century,” sec. 8. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 37 http://dx.doi.org/10.1080/01639374.2012.756843 http://dx.doi.org/10.1080/01639374.2014.935543 http://dx.doi.org/10.1108/PROG-07-2012-0037 http://dx.doi.org/10.1002/aris.2009.1440430113 http://dx.doi.org/10.1045/july2014-hickey http://dx.doi.org/10.1080/01639374.2012.681252 http://www.loc.gov/marc/authority/ http://dx.doi.org/10.6017/ital.v24i2.3368 http://www.rda-jsc.org/docs/5rda-frbrrdamappingrev.pdf 42. IFLA, Functional Requirements for Bibliographic Records, 19:33. 43. Leonidas Deligiannidis, Amit P. Sheth, and Boanerges Aleman-Meza, “Semantic Analytics Visualization,” in Intelligence and Security Informatics, edited by Sharad Mehrotra et al., Lecture Notes in Computer Science 3975 (Springer Berlin Heidelberg, 2006), 49, http://link.springer.com/chapter/10.1007/11760146_5. 44. Dunsire, Hillmann, and Phipps, “Reconsidering Universal Bibliographic Control in Light of the Semantic Web,” 166. 45. Rick Noack, “Out of Fear of Racism, Sweden Changes the Names of Bird Species,” Washington Post, February 24, 2015, http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of racism-sweden-changes-the-names-of-bird-species/. 46. W3C, “SKOS eXtension for Labels (SKOS-XL) Namespace Document—HTML Variant,” 2009, http://www.w3.org/TR/2009/REC-skos-reference-20090818/skos-xl.html. 47. Chryssoula Bekiari et al., FRBR Object-Oriented Definition and Mapping from FRBRER, FRAD and FRSAD, version 2.0 (draft), 2013, http://www.cidoc crm.org/docs/frbr_oo//frbr_docs/FRBRoo_V2.0_draft_2013May.pdf. 48. Robert J. Murray and Barbara B. Tillett, “Cataloging Theory in Search of Graph Theory and Other Ivory Towers,” Information Technology and Libraries 30, no. 4 (January 12, 2011): 171, http://dx.doi.org/10.6017/ital.v30i4.1868. 49. Thomas Baker, Karen Coyle, and Sean Petiya, “Multi-Entity Models of Resource Description in the Semantic Web,” Library Hi Tech 32, no. 4 (2014): 562–82, http://dx.doi.org/10.1108/LHT 08-2014-0081. 50. Peponakis, “Libraries’ Metadata as Data in the Era of the Semantic Web,” 343. 51. Kim Tallerås, “From Many Records to One Graph: Heterogeneity Conflicts in the Linked Data Restructuring Cycle,” Information Research 18, no. 3 (2013), http://informationr.net/ir/18 3/colis/paperC18.html. 52. Peponakis, “Conceptualizations of the Cataloging Object,” 599. 53. Rachel Ivy Clarke, “Breaking Records: The History of Bibliographic Records and Their Influence in Conceptualizing Bibliographic Data,” Cataloging & Classification Quarterly 53, no. 3–4 (2015): 286–302, doi:10.1080/01639374.2014.960988. 54. Gianmaria Silvello, “A Methodology for Citing Linked Open Data Subsets,” D-Lib Magazine 21, no. 1/2 (2015), doi:10.1045/january2015-silvello. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 38 http://link.springer.com/chapter/10.1007/11760146_5 http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of-racism-sweden-changes-the-names-of-bird-species/ http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of-racism-sweden-changes-the-names-of-bird-species/ http://www.w3.org/TR/2009/REC-skos-reference-20090818/skos-xl.html http://www.cidoc-crm.org/docs/frbr_oo/frbr_docs/FRBRoo_V2.0_draft_2013May.pdf http://www.cidoc-crm.org/docs/frbr_oo/frbr_docs/FRBRoo_V2.0_draft_2013May.pdf http://dx.doi.org/10.6017/ital.v30i4.1868 http://dx.doi.org/10.1108/LHT-08-2014-0081 http://dx.doi.org/10.1108/LHT-08-2014-0081 http://informationr.net/ir/18-3/colis/paperC18.html http://informationr.net/ir/18-3/colis/paperC18.html http://dx.doi.org/10.1080/01639374.2014.960988 http://dx.doi.org/10.1045/january2015-silvello Introduction Background FRBR and RDA Notes about Terminology and Graphs: How to Read the Figures The Impact of the Representation Scheme’s Selection: RDF versus ER Names, Entities, and Identities The Distinction between Authorities and Descriptive Information Integrating authorities with descriptive information Place Names Personal names Titles Visualization of Bibliographic Records and Cataloging Rules Discussion References and Notes
8923 ---- Microsoft Word - March_ITAL_Kuglitsch_proof.docx Facilitating Research Consultations Using Cloud Services: Experiences, Preferences, and Best Practices Rebecca Zuege Kuglitsch, Natalia Tingle, and Alexander Watkins INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 29 ABSTRACT The increasing complexity of the information ecosystem means that research consultations are increasingly important to meeting library users' needs. Yet librarians struggle to balance escalating demands on their time. How can we embrace this expanded role and maintain accessibility to users while balancing competing demands on our time? One tool that allows us to better navigate this balance is Google Appointment Calendar, part of Google Apps for Education. It makes it easier than ever for students to book a consultation with a librarian, while at the same time allowing the librarian to better control their schedule. Our experience suggests that both students and librarians felt it was a useful, efficient system. INTRODUCTION The growing complexity of the information ecosystem means that research consultations are increasingly important to meeting library users' needs. Although reference interactions in academic libraries have declined overall, in-depth research consultations have not followed that trend.1 These research consultations represent an increasingly large proportion of academic librarians' reference interactions, and offer important opportunities to follow up on information literacy instruction, support student academic success, and relieve library anxiety. The library literature has demonstrated a need for and appreciation of these services.2 Moreover, students value face to face consultations because they provide an opportunity to talk through complex problems and questions while providing affective benefits such as relationship building and reassurance.3 It is evident that students seek out and value these services. But even as these services become increasingly important, librarians struggle to balance escalating demands on their time. How can we embrace this expanded role and maintain accessibility to users while managing competing priorities? We found little guidance in the literature to identify the most efficient technological tools to offer these services to undergraduates, so we began to explore options. One tool that allows us to better navigate this shifting landscape is Google Appointment Calendar, part of Google Apps for Education. It makes it easier for students to book a consultation with a librarian, while at the same time allowing the librarian to better control their schedule; Rebecca Zuege Kuglitsch (rebecca.kuglitsch@colorado.edu) is Head, Gemmill Library of Engineering, Mathematics & Physics, University of Colorado Boulder. Natalia Tingle (natalia.tingle@colorado.edu) is Business Collections & Reference Librarian, University of Colorado Boulder. Alexander Watkins (alexander.watkins@colorado.edu) is Art & Architecture Librarian, University of Colorado Boulder. FACILITATING RESEARCH CONSULTATIONS USING CLOUD SERVICES: EXPERIENCES, PREFERENCES, AND BEST PRACTICES | KUGLITSCH, TINGLE, AND WATKINS | https://doi.org/10.6017/ital.v36i1.8923 30 consequently, it is being adopted by many librarians at the University of Colorado Boulder. There are several other options available for librarians interested in calendar applications, such as YouCanBook.me.4 However, on campuses using Google Apps for Education, it may be easier to use a tool students are already familiar with and commonly use as part of their daily academic routines. Moreover, the integration with Apps for Education solves some of the problems Hess noted in the public version of Google Calendar Appointments (which is also no longer available), such as appointments booked without identifying information, and the extra step of logging in just for an appointment. Because students are often already logged in due to using Google Apps for word processing, group work, and more, there is no extra step to log in for a simple appointment.5 Our exploration of this tool suggests that it is helpful to librarians, but that it can also be of benefit to students, too. Research has proposed that students may hesitate to ask questions due to library anxiety. Would scheduling an appointment using a calendaring system be less intimidating than emailing a librarian directly, for example? We set out apply this technology in an environment of changing student preferences and expectations, explore how students received it, and establish effective practices for using it in an academic setting. Since we are liaisons to science, social science, and humanities subject areas, we were able to work with a wide spread of undergraduate students in our exploration to see what might be most effective for us, and also for students from a variety of backgrounds. Why Google Calendar We selected appointment booking via Google Calendar because of its ease of use and because the University of Colorado Boulder has Google Apps for Education. This means that every student will have a Google ID and the option of using Google Calendar as part of their normal routine. In December 2012, Google discontinued appointment calendars for general users, and limited claimable appointment slots to Google Apps for Education. For institutions which who do not subscribe, it may be worth investigating third-party Google Calendar apps, some of which are free or freemium, such as Calendly (https://calendly.com/), or SpringShare’s similar subscription service, LibCal (https://www.springshare.com/libcal/). Setting up Google Calendar One of the benefits of Google Calendar is its ease of use. Starting to set up the calendar for appointment slots is as simple as creating a new Google Calendar event and selecting appointment slots as the type of event. Next, you can give your appointment slots a name that correspond with the language your institution uses for research consultations, and schedule them for the desired length of time. It is possible to schedule blocks of appointments that Google will automatically break into shorter appointments of predetermined amounts of time. The authors created appointments lasting 30 minutes, 60 minutes, or a mix of both, depending on the expectations of our disciplines. It is also possible to create several simultaneous appointment slots, if you would like to accommodate small groups. As well as indicating time, each appointment also has a space to indicate location, particularly useful for librarians who might work in several branches or combine office hours in academic buildings with in-library office consultations. Once the events are named and saved, the calendar can be shared. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 31 Figure 1. Create a new event, selecting ‘Appointment slots’. Appointment calendars are given a unique shareable URL to direct users to available appointments; however, these URLs are necessarily long and complicated, so we recommend using a link shortener. To obtain the very long URL for an appointment calendar, click on ‘edit details’ in an appointment event. From there, it is possible to copy the link and use a link shortener to make a brief, understandable link. Figure 2. Obtain the shareable link When a student uses the link to make an appointment, both the librarian and the student receive an email with the student’s login name, email, appointment time, and other details. The slot immediately appears as taken on the calendar, so it is no longer available for other students, reducing confusion and double booking. Receiving the student’s email allows the librarian to initiate the reference interview and establish expectations. FACILITATING RESEARCH CONSULTATIONS USING CLOUD SERVICES: EXPERIENCES, PREFERENCES, AND BEST PRACTICES | KUGLITSCH, TINGLE, AND WATKINS | https://doi.org/10.6017/ital.v36i1.8923 32 Figure 3. Google calendar showing a variety of available appointments. Student Impressions We received positive feedback about the appointment calendars from students. Students commented: ● “I like the ability to see all of the possible openings,” ● “I already bookmarked that bit.ly, so you’ll probably hear from me” (which we did, shortly thereafter). ● “I like to be able to ‘schedule’ a consultation, not request one. It seems more useful and immediate.” We kept track of how many students who made calendar appointments over two semesters kept them, and sent a short, informal survey to students who made appointments. No students who made a calendar appointment failed to attend their consultation. Though our survey does not permit large-scale generalizations due to a very low response rate (4) and a small sample size (15), all of the students who responded and used the calendar found the experience of booking an appointment that way to be easy, convenient, and unintimidating. Everyone who used the calendar indicated that they would prefer to use it again, and about half of the respondents who set up their appointments via email told us that they would prefer to book a consultation through INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 33 an appointment calendar in the future. Our anecdotal evidence in succeeding semesters aligns with this perception. We found that using appointment calendars can have many benefits for students: ● They can reduce student anxiety from having to compose and send an email. ● Booking appointments can take less of their time. They book immediately without back and forth emailing. This also means there’s no time to rethink the appointment and either never send the email or back out later. ● The appointment is placed on their calendar, meaning they automatically have a built-in reminder and don’t need to search through their email to find the date and time of their appointment. ● Since the appointment calendars eliminate back and forth scheduling and reduce email fatigue, students may be more willing to use email to discuss their topic and/or question with the librarian. Librarian Impressions Our experience has been equally positive. We found that using the calendars radically streamlines the typical back and forth email exchanges for setting appointments. We emailed each student to confirm the appointment, but this single email is still a significant reduction of claim on the librarian’s attention from a minimum of three emails to schedule an appointment (which often realistically becomes five or more when negotiating a time) to two. Additionally, librarians can put appointment slots in between meetings and other times when they might only have a spare hour, which are often too tedious to list when emailing. Using appointment calendars lets librarians efficiently use their time even when it is fragmented. As well as facilitating efficient use of small amounts of time, appointment calendars also allow librarians to gently create boundaries. Rather than having to deny appointments requested for late nights or weekends, students are guided to viable times. While the use of Google Calendar is entirely voluntary at the University of Colorado Boulder we presented the tool at several reference librarian meetings with success and several other librarians have happily adopted the tool. One librarian who adopted the tool said: “Sending a student a calendar that they can use to request a meeting eliminates the twelve messages back and forth on when to schedule a meeting. I also like that it puts the meeting on both our calendars, reducing the number of no-shows.” BEST PRACTICES Our experiences and verbal feedback from students and librarians provided a foundation to develop best practices to minimize both librarian and student confusion. For students, confusion often centered around accessing the calendar, identifying which time slots were available, and identifying acceptable locations for appointments. The following best practices can help solve these difficulties. Use a link shortener and a consistent naming convention so the links are similar for multiple librarians. Using a link shortener makes it easy for students to jot down the calendar URL, either to manually enter into a browser later or to quickly get to the link and bookmark it. This makes it easy for students to file the link and return to it at point of need. Using a consistent naming FACILITATING RESEARCH CONSULTATIONS USING CLOUD SERVICES: EXPERIENCES, PREFERENCES, AND BEST PRACTICES | KUGLITSCH, TINGLE, AND WATKINS | https://doi.org/10.6017/ital.v36i1.8923 34 convention makes it intuitive for students to transfer the appointment method over to other librarians’ cases for future research needs. If your link shortener is case-sensitive, create capitalized and lowercase versions of the link. Many link shorteners are case-sensitive, unlike most URLs, which can confuse students and lead to frustration when they try to access a link later. While this could be solved to some extent by using only lowercase letters for the shortened link, that solution can create a cumbersome and difficult to read short URL. Simply creating two forms of the link efficiently solves this. Develop a naming convention so available appointment slots are obvious. We found that when naming time slots simply “Consultation” students sometimes assumed that all appointments were booked when, in fact, every appointment was open. Using a term like “Available consultation” made it clear to students that the appointments were not already booked. Google Calendar automatically makes booked appointments unavailable, eliminating the opposite frustration. Carefully consider the location in the bookable appointment form. Google Calendar allows librarians to enter or leave empty the location. If the field is left empty, users can specify a location, and students often filled in a location when none was indicated. If a librarian is not mobile, or is available in certain places only at certain times, it is key to identify a location. For example, in our study, one librarian held weekly office hours in two academic buildings; it was particularly important to identify which times the librarian was available in the library versus the academic buildings. On the other hand, it may also make sense not to designate a location. Another of the authors, serving a population that used the main library, one branch library, and research area of the campus with no onsite library services, chose not to enter any location in order to accommodate the extremely dispersed population. Users frequently indicated in which location they would be willing to meet, an option the librarian wanted to support in order to underscore the availability of services wherever users were located on campus. Schedule two weeks of availability. We found that students could almost always find a time that worked for them with two weeks of available appointments. Moreover, other than recurring office hours, it was difficult for librarians to predict their schedule further into the future than a few weeks. Librarian concerns centered around keeping calendars synchronized, providing enough lead time for users to book appointments, and publicizing the service. We found several best practices that eased these concerns. Designate a day each week to update hours and clear conflicts on the calendar. If Google Calendar is not the primary calendaring software for the library, it can be challenging to synchronize calendars. Google Calendar sends a calendar invitation to the librarian when an appointment is claimed, which they can accept on their primary calendaring system, but conflicts that arise on the primary calendaring system are not automatically sent to Google Calendar. By selecting a day and habitually updating the Google Calendar and quickly checking for conflicts that have arisen with unclaimed slots, librarians can avoid forgetting to add slots or remove those that conflict with other late-arising obligations. Advertise the link on the library web site, give out the calendar link during class sessions and give it to professors to embed in course management systems. While appointment calendars still INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 35 benefit librarian workflows without advertising, students need easy access to the calendar. For maximum user uptake, it is important to put the calendar link anywhere a librarian’s contact information can be found. We found it helpful to promote the link in classes, and that it was particularly effective when professors agreed to place the link in the class web site. This positions library research assistance next to assignments when they are given out and drafts when they are returned--hopefully reminding students that the library is available for assistance at moments in which they are most likely to seek it. REFLECTIONS AND CONCLUSIONS Our experiences support the idea that online appointment calendars are appreciated by students, streamline work for librarians, and are easily adopted by both parties. More use of this technology, whether via Google Apps for Education or another service, can be mutually beneficial to librarians and students. Students using the calendar indicated that it was not more intimidating than emailing a librarian, and by removing the waiting period for a response, a calendar can prevent student distraction or students persuading themselves that they actually do not need help in the interim. By providing a calendar where students can quickly and simply book an appointment with a librarian for research assistance, librarians can support students seeking assistance, and thus ultimately bolster student success and increase the library’s relevance. REFERENCES 1. Naomi Lederer and Louise Mort Feldmann, “Interactions: A Study of Office Reference Statistics,” Evidence Based Library and Information Practice 7, no. 2 (2012): 5–19. 2. Ramirose Attebury, Nancy Sprague, and Nancy J. Young, “A Decade of Personalized Research Assistance,” Reference Services Review 37, no. 2 (2009): 207–20, https://doi.org/10.1108/00907320910957233; Trina J. Magi and Patricia E. Mardeusz, “What Students Need from Reference Librarians: Exploring the Complexity of the Individual Consultation,” College & Research Libraries News 74, no. 6 (2013): 288–91. 3. Trina J. Magi and Patricia E. Mardeusz, “Why Some Students Continue to Value Individual, Face- to-Face Research Consultations in a Technology-Rich World,” College & Research Libraries 74, no. 6 (November 1, 2013): 605–18, https://doi.org/10.5860/crl12-363. 4. Amanda Nichols Hess, “Scheduling Research Consultations with YouCanBook.Me Low Effort, High Yield,” College & Research Libraries News 75, no. 9 (October 1, 2014): 510–13. 5. Hess, “Scheduling Research Consultations with YouCanBook.Me Low Effort, High Yield,” 511.
8930 ---- September_ITAL_Ullah_for_proofing Bibliographic Classification in the Digital Age: Current Trends and Future Directions Asim Ullah, Shah Khusro, and Irfan Ullah INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 48 ABSTRACT Bibliographic classification is among the core activities of Library & Information Science that brings order and proper management to the holdings of a library. Compared to printed media, digital collections present numerous challenges regarding their preservation, curation, organization and resource discovery & access. Therefore, true native perspective is needed to be adopted for bibliographic classification in digital environments. In this research article, we have investigated and reported different approaches to bibliographic classification of digital collections. The article also contributes two evaluation frameworks that evaluate the existing classification schemes and systems. The article presents a bird’s-eye view for researchers in reaching a generalized and holistic approach towards bibliographic classification research, where new research avenues have been identified. INTRODUCTION Classification is the primary instinct of human beings in arranging, understanding, and relating knowledge artifacts. Bibliographic classification provides a framework for arranging and organizing knowledge artifacts preserved in the form of books, magazines, newspapers and other holdings to explore new avenues of knowledge management. Today several classification schemes are in use ranging from conventional classification schemes including Library of Congress Classification (LCC), Dewey Decimal Classification (DDC), Colon Classification (CC), and Universal Decimal Classification (UDC) to classification for digital environments including Association for Computing Machinery (ACM) digital library1, Institute of Electrical and Electronics Engineering (IEEE) digital library2, and Online Computer Library Center (OCLC) cooperative catalogue3. Besides the difficulties that lie in devising a classification scheme (time-consuming and resource- consuming), it is required that either the existing schemes should be revised and extended or a new classification scheme should be devised, which could act as a common platform for representing knowledge artifacts belonging to different contexts. Such a classification scheme should also resolve the challenges in digital preservation and curation and support the precise Asim Ullah (asimullah@upesh.edu.pk), Shah Khusro (khusro@upesh.edu.pk), and Irfan Ullah (cs.irfan@upesh.edu.pk) are researchers at the Department of Computer Science, University of Peshawar, Peshawar, Pakistan. 1 http://dl.acm.org/ 2 http://ieeexplore.ieee.org/Xplore/home.jsp 3 https://www.oclc.org/ BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 49 And accurate search and retrieval of digital collections. The first step, in this connection, is to properly analyze and evaluate the existing bibliographic classification schemes and to dig out their strengths and limitations in classifying digital collections accurately and appropriately. Therefore, the objectives of this research article include: • To investigate and evaluate the available approaches to bibliographic classification from the perspective of devising a classification scheme that can act as a common platform for classifying any type of digital collection. • To devise evaluation frameworks that compares the available bibliographic classification schemes and approaches. • To present issues, challenges, and research opportunities in state-of-the-art bibliographic classification research. The rest of the paper is organized as: Section 2 presents the current trends in the classification of digital collections. Section 3 presents two evaluation frameworks for comparing and evaluating the existing solutions. Section 4 presents research challenges and opportunities in bibliographic classification research. Finally, Section 5 concludes our discussion. References are presented at the end of the paper. Classifying Digital Collections – A Mixed Trend The bibliographic classification has been the focus of several researchers to properly classify, catalogue, and describe digital collections. In this regard, two approaches have been adopted: the former supports the use of conventional classification schemes including CC, DDC, and LCC etc., in describing and classifying digital documents, while the latter recommends devising some new ways of classification such as ACM4 computing classification. However, in most of the digital environments, a mixed trend has been observed, where along with new classification schemes, categorization is also used as a complementary solution. For example, ACM presents its own classification system as poly-hierarchical ontology in describing Computer Science literature and for using in Semantic Web applications. ACM has replaced its 2008 ACM classification system that serves as de-facto model for the classification of Computer Science literature by giving visual topic display along with searching services. It serves as semantic vocabulary for categorizing concepts and a foundation of computing disciplines ("The 2012 ACM Computing Classification System,"). Similarly, IEEE digital library categorizes its holdings into directories per its own rules of cataloguing and categorization. It categorizes articles and standards in to several subject areas and clusters documents through year of publication, author names, content type, affiliation, publication title, publisher, country of publication, alphabets, numerals and alphanumeric values5. The document collection can be navigated through collection names, number of documents, by topic and International Classification for Standards (ICS). 4 http://dl.acm.org 5 http://ieeexplore.ieee.org/browse/standards/ics/ieee/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 50 The DMOZ6 directory is the largest human made directory of web pages. Since its inception in 1998, it categorizes 3,861,137 websites available in 90 languages into 1,031,719 categories and sub-categories by 91,928 editors and volunteers. In addition, it has its DMOZ RDF dumps available on Linked Open Data (LOD) cloud. According to the World Wide Web Consortium (W3C), LOD enables the data integration and reasoning at a large scale ("Linked data,"). It establishes links among data enabling machines and users to explore the web of data rather than the web of documents along with finding related data (Berners-Lee, 2006; Bizer, Heath, & Berners-Lee, 2009). However, it lacks in semantic search (meaningful search), which affects the precision and accuracy in exploring the required resources. Also, the categories, under which the websites are kept, are needed to be revised because there can be faceted and intra-hierarchical links among web pages. In addition, the content management needs to be upgraded for updating the directory with new entries and the way it reviews and categorizes websites (Boykin, 2016). Institutional repositories use the mixed approach towards creating, collecting and managing metadata for printed and digital collections using several sources including conventional and digital. This mixed trend introduces challenges to the metadata managers (Chapman, Reynolds, & Shreeves, 2009). To deal with these challenges, the subject classification systems can be very beneficial in providing Web-oriented services including searching of contents through search patterns, browsing, and content filtering by subject area. However, at the same time, a cognitive overload rises for the authors and depositors of the institutional repository (Cliff, 2008) that needs further attention. To handle the information overload in retrieving digital collections, several controlled methods have been proposed in the literature ranging from manual techniques (e.g., web directories) to automatic techniques including clustering and classification. Several classification schemes including sentiment and subject classification have been developed for classifying (and categorizing) web pages. Classification is used in focused crawling, searching and ranking results, and classifying queries. Clustering also classifies web resources but it is slightly different from classification, which is based on a rigid predefined taxonomy and rules for interpreting the meaning of classification order. On the other hand, clustering shows flexibility in classification (categorization) of web documents (Zhu, 2011). However, a mixed trend has been observed, where classification and categorization are intermingled to facilitate organization, description, exploration, and retrieval of digital collections. Semantic Web brings meaningful connections between the web of data so that not only humans but machines can also understand the content of documents to retrieve the most intended and required documents. This way other related documents could also be easily connected and retrieved (Berners-Lee, 2006). To understand, describe, and relate concepts within documents, ontologies are used. Therefore, researchers have been working on bringing semantics through Semantic Web and related technologies to automatically classify digital collections. For example, 6http://www.dmoz.org BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 51 (Beghtol, 1986) argues that semantic axis makes syntactical classification structure more meaningful and provides the platform for developing relationships among knowledge artifacts through several warrants in classification systems. Similarly, classification ontology is used in automatic classification (Wijewickrema & Gamage, 2013) to minimize the ambiguity in vocabulary. To obtain a single subject for the input document, several weight functions including the term frequency-inverse document frequency (TF-IDF), and filtering methods are applied. Semantic Web and LOD technologies have also been used in dealing with bibliographic data. For example, BibBase7, a bibliographic data publishing and management tool (Xin, Hassanzadeh, Fritz, Sohrabi, & Miller, 2013) publishes bibliographic data on the user website according to LOD principles. However, these are limited because of the lack of interoperability among native languages while translating classification records from source language to the target language (Kwaśnik & Rubin, 2003). The classification schemes are also being converted into ontologies. (Giunchiglia, Marchese, & Zaihrayeu, 2007) have applied reasoning capabilities of OWL ontologies to classification schemes. These ontologies are used as interfaces to human knowledge for machines whereas classification schemes are interfaces to knowledge for humans. However, there is limited support available for cross-disciplinary searching and accommodation for more views and interpretations of knowledge (Albrechtsen, 2000). The supervised and unsupervised machine learning techniques are used for automatic text classification. Supervised machine learning techniques use models including multinomial Naïve Bayes model, and Bernoulli model (Manning, Raghavan, & Schütze, 2008) for classification. Yelton (2011) applies probabilistic classification of important words (and therefore of documents) especially by considering Amazon’s Statistically Improbable Phrases (SIPs)8 and Google phrase search inside a book. For subject analysis, he mentions simplistic; content-based; and requirements-based methods in terms of understanding text classification and manipulation of books. The Wikipedia page structural hierarchy is exploited in automatically harvesting, classification, categorization, clustering, and metadata enrichment (Yelton, 2011). Information Extraction (IE) is also applied in classifying books automatically. For example, (Betts, Milosavljevic, & Oberlander, 2007) use IE methods for automatic labeling of books using LCC classification. They used bag-of-words (BOW) model, bag-of-named-entity recognition (NER) model, generalizing named entities (GAZ) model in automatic text classification. To achieve better accuracy, they also combined the results of these models. However automatic classification may lead to limited search and retrieval because of the missing semantics associated with phrases or key words. To overcome this issue, a fundamental and practical theoretical model of classification is required (Jones, 1970). 7 https://bibbase.org/ 8http://www.amazon.com/gp/search-inside/sipshelp.html INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 52 Table 1 categorizes the bibliographic classification approaches into three broader categories namely: theoretical approaches, practical approaches and approaches used in digital environments. Theoretically researchers have discussed different viewpoints for classification, whereas we get a different view when these schemes are applied for classification. Practically, the syntactic structure is valued by using faceted and enumerative techniques. In digital environments like the Web and digital libraries, strict boundaries of classification are often compromised by categorization. Approaches to Classification Techniques Used Theoretical Approaches 1. Biasness (Mai, 2009) (Mai, 2010) 2. Subjectivity and objectivity (Hjørland, 2016) 3. Epistemological and Semiotic approaches (Hjørland, 2013) (Lee, 2012; Mai, 2011) (Tennis, 2008) 4. Empiricism, Rationalism, Historicism and Pragmatism (Hjørland, 2013) 5. Multidisciplinarity approach (Beghtol, 1998) 6. Scientific approaches (Hjørland, 2008) 7. Positivistic and pragmatic approaches (Dousa, 2009) (Mai, 2011) 8. Interdisciplinary and evidence based practice classification (Hjørland, 2016) 9. Social and cultural context (J.-E. Mai, 2004) 10. By tracking the universe of knowledge 11. Universal order (Smiraglia & Van den Heuvel, 2011) 12. Integrative levels in classification (Dousa, 2009) 13. Literary warrant (Rodriguez, 1984) 14. Education warrant (Hjørland, 2007) (Beghtol, 1986) 15. Semantic warrant (Beghtol, 1986) 16. Syntactic warrant (Beghtol, 1986) 17. Domain and users requirements (Mai, 2005) 18. Pluralism and human interpretations Practical Approaches 1. Enumerative and Faceted (Batley, 2014). 2. General Purpose approach (Mai, 2003) and Special Purpose approach (Mancuso, 1994) e.g. classification schemes for general classes of knowledge areas or for a special class of knowledge area. 3. Syntactic axis (Beghtol, 1986) (Beghtol, 2001) 4. Semantic axis (Beghtol, 1986) (Beghtol, 2001) BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 53 Classification in Digital Environment 1. Document Similarity (Hamming distance and Euclidean geometric approaches) (Losee, 1993) 2. Fuzzy approach (Jacob, 2004) 3. Clustering (Nizamani, Memon, & Wiil, 2011) 4. Categorization (Koshman, 1993) 5. TF-IDF weighting (Dorji et al., 2011) 6. Unsupervised machine learning techniques (Joorabchi & Mahdi, 2011). (K-mean Clustering, hierarchical clustering) 7. Supervised machine learning techniques (Wang, 2009) (Multinomial Naïve BAYES, Bernoulli model, Support Vector Machine, Random Forest, K-NN technique) 8. Information Extraction methods (Gilchrist, 2015) 9. Probabilistic text and document classification (Maron, Kuhns, & Ray, 1959) 10. Ontologies (Campbell, 2002) Table 1. Categorization of approaches towards bibliographic classification Evaluating Classification Schemes & Approaches In this Section, we present two evaluation frameworks to compare and evaluate the existing classification and categorization systems and well-known bibliographic classification ontologies. We have chosen CC, DDC, LCC, and Universal Decimal Classification (UDC) on the basis of their structural properties and wide usage both in conventional and digital libraries ("Subject classification schemes," 2015) ("Library of Congress Classification," 2014) ("About Universal Decimal Classification (UDC),") (Press, 2002) (Encyclopedia, 1 August 2014). Some of these properties include: citation and filling order; notations expressiveness; flexibility in classification principles, rules and notations; coverage of the knowledge areas; classification schedules and notations structure; notations brevity and simplicity; notations mnemonics; notations hospitality; schedules with updateable and comprehensive subjects order; and knowledge coverage (Batley, 2014). The UDC, LCC, and DDC are universal, multidisciplinary, and widely used systems (Koch & Day, 1997), whereas CC has the seminal and inspirational value for the faceted structure of the bibliographic classification. Therefore, the evaluation framework mainly targets these classification schemes as our natural choice for the evaluation and comparison. Similarly, we evaluate ACM9, IEEE10 and DMOZ11 using the evaluation framework as these are the well-known and widely used document classification & categorization systems for the digital libraries. Table 2 presents 22 metrics used in the evaluation framework. These evaluation metrics are extracted 9 http://www.acm.org/about/class 10http://www.ieee.org/about/today/at_a_glance.html?utm_source=mm_link&utm_campaign=iaa&utm_medium=ab& utm_term=at%20a%20glance 11 https://www.dmoz.org/docs/en/about.html INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 54 from the existing literature (Kaosar, 2008) (Painter, 1974) (Encyclopedia, 1 August 2014) (Buchanan, 1979) (Kaosar, 2008) (Painter, 1974) (Encyclopedia, 1 August 2014) (Koch et al., 1997) (Reiner, 2008) (Gnoli, Merli, Pavan, Bernuzzi, & Priano, 2008) (Francu, 2007) (Chan, Intner, & Weihs, 2016). These metrics include: (i) structural complexity; (ii) notational brevity; (iii) predefined structure; (iv) rules complexity; (v) theoretical laws; (vi) mnemonics; (vii) hospitality; (viii) search complexity; (ix) usability; (x) precision and accuracy; (xi) multilinguality; (xii) interoperability; (xiii) semantic search; (xiv) bias in subject representation; (xv) enumerative structure; (xvi) faceted structure; (xvii) faceted search; (xviii) consistency; (xix) LOD datasets; (xx) Linked Open Vocabularies (LOV) support; (xxi) platform; and (xxii) warrants of classification. These metrics, their need, and their use in ratings of classification systems are discussed in the following paragraphs. In Table 2, these bibliographic systems are evaluated for these metrics. The indicator ü shows the presence of metric value, û indicator represents that the system has no or minimal support for the mentioned metric, whereas and N/A is used for not applicable. In addition, each classification system has been evaluated and rated based on these metrics (Table 3), where Figure 1 graphically demonstrates the rankings and ratings of these classification systems. Schemes Metrics CC UDC DDC LCC ACM IEEE DMOZ Structural Complexity ü ü û û û û û Notational Brevity û û ü ü ü ü N/A Predefined Structure ü û ü ü ü ü ü Rules Complexity ü ü û ü û û û Theoretical Laws ü ü ü ü û û û Mnemonics ü ü ü ü ü ü û Hospitality ü ü ü ü ü ü ü Search Complexity ü ü û û û û ü Usability ü ü ü ü ü ü û Accuracy and Precision ü ü ü ü ü ü û Multilinguality ü ü ü ü û û ü Interoperability û ü ü ü ü ü û Semantic search ü ü ü ü ü ü û Bias in representation ü ü ü ü ü ü û Enumerative Structure û ü ü ü û û û BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 55 Faceted Structure û ü û û ü ü û Faceted Search û ü ü ü ü ü û Consistency ü ü ü ü ü ü ü LOD Datasets û ü ü ü ü ü ü LOV Support û û û û ü û û Platform N/A UDC consortium OCLC Library of Congress ACM digital library IEEE Xplore digital library Open Directory Project Warrants of classification Literary Warrant (Giess, Wild, & McMahon, 2007) Literary Warrant (Perles, 1995) Literary and Scientific Warrant (Giess et al.,2007) Literary and Scientific Warrant (Giess et al.,2007) Scientific Research warrant Scientific Research warrant N/A Table 2. Evaluation of Classification Schemes The structural complexity means difficulties in using the structure and notations in classifying and describing a specific subject area. The metric will help us in selecting a classification scheme or system that is easy to use in classifying document collection by requiring short notations and simple rules. The notations and rules are complex in CC and UDC (Ranganathan, 1968). This complexity is because of the faceted structure in these classification schemes (Sukhmaneva, 1970). The structural complexity of CC is greater than that of UDC. UDC comes at second position in complexity as compared to CC. Because of its enumerative structure, LCC stands at third position, as it is lesser complex than CC and UDC. DDC is the simplest in this list because it is based on enumerative classification structure and on the principle of dividing universe of knowledge into defined classes. IEEE is more complex than ACM, whereas DMOZ is the least complex system. The classification system with greater structural complexity is ranked lower in the list. Therefore, based on this metric, the classification systems can be ranked as DMOZ, ACM, IEEE, DDC, LCC, UDC, and CC. The notational brevity means how brief are the notations in describing and understanding the holdings with minimum number of symbols and minimal cognitive load. DDC uses well-organized short notations and their mnemonic value is also greater (Comaromi & Satija, 1983) (Hyman, 1980). LCC has notational brevity (Chan et al., 2016). UDC uses lengthy notations (Kaosar, 2008) as compared to DDC, whereas CC also uses lengthy and complex notations (Chatterjee, 2016). ACM notations are shorter than IEEE, whereas DMOZ do not use any notations at all. Using this metric, these classification systems can be ranked as ACM, IEEE, DDC, LCC, UDC, CC, and DMOZ at the last with no usage of symbols at all. The predefined structure means that the classification scheme follows rigid pre-assumed subject categorization along with classification class marks. In this regard, UDC and LCC are enumerative INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 56 and impose subjectivity viewpoint of classification by following a predefined structure (Goh, Giess, McMahon, & Liu, 2009). Being faceted, CC arranges basic concepts in few predefined categories (Satija & Martínez-Ávila, 2015). DDC also has the predefined hierarchical structure of classification (Press, 2002) (Jonassen, 2004). Among these schemes, CC has minimal predefined structure because of using facets; UDC is both enumerative and analytico-synthetic. LCC is enumerative but possesses weaker predefined rules for the structural design. Because of the rigid enumerative hierarchies and predefined class structure, DDC comes at first position. DMOZ has the most rigid predefined structure as compared to that of IEEE and ACM. The classification system with most rigid and predefined structure is ranked lower, and therefore, the ranking could be CC, ACM, IEEE, UDC, DDC, LCC and DMOZ. The complexity in rules determines the difficulty level in applying classification rules on knowledge artifacts. CC presents a complex set of rules and classification theory, which is comparatively difficult to implement and understand (Tennis, 2011). LCC is also complex ("Library of Congress Subject Headings: Pre- vs. Post-Coordination and Related Issues," March 15, 2007 ) in implementing Library of Congress Subject Headings (LCSH) in pre-coordinated subject strings. DDC’s rules and principles are comprehensive and complete (Press, 2002) and easier than those of CC and LCC. UDC is also easy to understand and implement (Piros, 2014). ACM, IEEE, and DMOZ are simple to use and understand, and therefore, bears no such complexity. A classification system with greater complexity is ranked lower, therefore, based on this metric, the rankings could be ACM, IEEE, DMOZ are on top with similar rankings followed by UDC, DDC, LCC, and CC. Theoretical laws are considered as a metric to analyze the foundations of classification systems to understand whether they are based on certain theoretical laws and principles of classification or not. UDC combines the enumerative and faceted approaches gathered from DDC and CC (Kaosar, 2008). The synthetic principle of UDC contributes to its widespread use but it is not enough at the intellectual level for making the relations between the subject facets (Kyle & Vickery, 1961). UDC lacks standard rules for its application for making facets, but there are rules for its structural representation (McIlwaine, 1997). Therefore, the structural and synthetic rules are good enough for its applicability but it should be refined further at the intellectual level. The theoretical laws of CC are based on the faceted approach of managing knowledge artifacts. CC has sound rules and principles, which include different postulates, laws, principles and canons (Batley, 2014) (Arashanapalai Neelameghan & Parthasarathy, 1997). On the other hand, LCC has weaker theoretical foundations. There also exists some intellectual and structural limitations due to its enumerative structure (San Segundo Manuel, 2008). DDC has the hierarchical and the enumerative structure which is based on the knowledge philosophy of hierarchical division (Hjorland, 1999). Because of the strong theoretical foundations, CC is at the top of this list, DDC is second because of its universal theory of knowledge division, UDC is third for being exploiting the theories of DDC and CC, LCC is at fourth position for comparatively weak theory of classification, whereas ACM, IEEE, and DMOZ present no or very limited theoretical laws or philosophical rules of classification. BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 57 The support for using mnemonics enables human classifiers to easily memorize the symbols and notations of classification scheme. The systematic and literal mnemonics are used in UDC (Satija, 2013) (Kaosar, 2008). The mnemonics are increased through mnemonic devices, which are described through the canons of mnemonics (Kaula, 1965). LCC uses literal mnemonics (Satija, 2013), whereas DDC uses systematic and literal mnemonics but its systematic mnemonics are not consistent (Satija, 2013). There are several seminal mnemonics in CC (Rahman & Ranganathan, 1962). These mnemonic devices increase mnemonics in CC, but the formation and length of the notations affects this mnemonic quality. ACM has greater support for mnemonics in comparison with IEEE, whereas DMOZ is the collection of web pages under specific categories. Based on this metric, the rankings of classification systems could be DDC, UDC, LCC, ACM, IEEE, CC, whereas DMOZ lacks in using any mnemonic devices or notations. Hospitality means the ability of a classification scheme to incorporate new knowledge areas expressed in different multilingual contexts. Hospitality is present in UDC (Kaosar, 2008). CC is also hospitable for new subjects (De Grolier, 1962). LCC is hospitable for expressing the new subjects and knowledge areas (Satija, 2013). DDC is hospitable for new subject areas (Satija, 2013). By applying this metric, a classification scheme with faceted approach is naturally more hospitable than others. Therefore, CC is more hospitable and at the top in this list followed by UDC. DDC is at third position for being following enumerative approach. LCC is at fourth position because of it’s of pure enumerative structure. IEEE and ACM are at fifth position by covering short span of knowledge areas, faceted structure, and efficient search. DMOZ is covering only web pages in already specified categories therefore it is at seventh position. Search complexity measures the difficulty in searching artifacts using a classification scheme. It describes that which classification scheme is worth in searching a specific document. Search complexity is minimal in UDC because of its syntactic-analytico and enumerative nature (Kaosar, 2008), which can contribute in search applications in both Web based and in-house searching applications e.g., Online Public Access Catalog (OPAC). The theory and philosophy of CC is the trend setter for the knowledge management, resource discovery & access, however, according to (Raghavan, 2016) searching through CC is comparatively weaker than other bibliographic classification schemes. According to (Chan, 2000), LCC and LCSH have the potential to provide the ease in searching because of richer vocabulary for greater subject coverage, synonym and homograph capabilities, pre-coordinated system, browsing capability in multi-faceted structure, multilingual support and MARC format support with sematic interoperability. However, it is limited in providing ease in search & retrieval process, which include syntax and application rules complexity, lack of training for the personnel, and too lengthy and complex searching strings. DDC and LCC are aggregated in Classify12 project initiated by OCLC. With the use of the Classify application, the search experience of the catalogers and patrons becomes much easier. Using this metric, DDC stands at the top with least complexity than LCC, UDC, and CC. IEEE is more complex than ACM and DMOZ. The classification scheme with less search complexity will be ranked higher. 12 http://www.oclc.org/research/themes/data-science/classify.html INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 58 Therefore, ACM and IEEE, DDC, and LCC stand first with least search complexity followed by UDC, and CC. DMOZ stands at the last position with greater search complexity having loose boundaries of categorization. Usability analyzes the difficulty in using a classification scheme for classifying and searching documents. This metric defines the ease of learning and effective usage. Usability measures user satisfaction, user understanding of the system, and precision with minimal recall in lesser amount of time (Singapore, 2016). OCLC has included structural changes to improve usability and simplify classification tasks ("Dewey Services: Dewey Decimal Classification System,"). The Classify13 project aims at finding books through a web interface, which is easy to use and understand by using DDC and LCC. UDC is extensively used in Web-based search and retrieval applications (Kaosar, 2008). This classification scheme is used in several institutions’ OPAC systems ("Library OPACs containing UDC codes,"). The UDC notations are supportive for the usability (Slavic- Overfield, 2005). However, the user interface of these OPAC search systems could be further improved (Slavic, 2006) (Pollitt, 1998) (Schallier, 2005). CC is the source of inspiration and a standardized model for the usability of faceted structure of bibliographic classification in the electronic and web based environments (Thelwall, 2009). In (Rosenfeld & Morville, 2002), the philosophy and methodology of CC is considered at the abstract and theoretical level. This assessment of CC leads us to the argument that the faceted structure is supportive in precise retrieval with a considerably high cognitive work at the user end as compared to DDC and LCC because of their simple enumerative structures. Library of Congress uses LCC in its catalog14 and Classification Web15 applications. These applications are exploiting LCSHs and LCC in user friendly manner. By looking at the usability aspect of these classification schemes, the ranking through this metrics appears as DDC is at the top for its easy enumerative structure and notational simplicity along with easy to use Web applications. LCC is at second position because of its enumerative structure and adoptability in web applications. Being enumerative and faceted, UDC stands at the third position. CC for being a pure faceted scheme with complex notations and rules, is ranked at the fourth position. Similarly, IEEE and ACM are faceted and easy to use, and therefore, share the first position with DDC. DMOZ with loose boundaries of categorization is least usable with limited browsing and search. The accuracy and precision metric measures how accurate and precise a classification system can identify the exact locations of the holdings in the given knowledge space. UDC shows accuracy and precision in finding the required knowledge artifact (Kaosar, 2008). The accuracy and precision of CC gets compromised as its lengthy notations introduces complexity in searching and discovering documents (Satija, 2015). LCC and DDC were researched for accuracy and precision by using a prototype model (Gnoli, Pusterla, Bendiscioli, & Recinella, 2016) for automatic text classification of electronic documents using classification metadata of library holdings from LCC and DDC 13 http://classify.oclc.org/classify2/ 14 https://catalog.loc.gov/vwebv/searchBasic 15 https://www.loc.gov/cds/classweb/classwebfeatures.html BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 59 datasets. It was observed that for precision, there is a need for increasing DDC and LCC bibliographic data on the Web, introducing searching capabilities for bibliographic data at the micro level of any document, and increasing the efficiency of user interfaces for navigation using DDC-based browsing structure (Joorabchi & Mahdi, 2009) (Joorabchi & Mahdi, 2011). Therefore, CC because of the pure faceted approach has high-level precision in search and resource discovery. UDC stands second because of being enumerative and enumerative and analytico- synthetic. DDC is at the third position as OCLC maintains and updates its structure regularly along with state-of-the-art search applications. LCC shares the third position with DDC, being regularly updated and maintained by Library of Congress for precision in their search application. IEEE and ACM also show greater precision in their search & retrieval, and therefore, share the third position with DDC and LCC. DMOZ are the manually created and updated categories of web pages, having limited keyword search with very low precision. In connection to the evaluation framework, multilinguality means to classify and describe the knowledge artifacts written and expressed in variety of natural languages and the availability of any classification scheme in different natural languages. DMOZ supports 72 different languages of the world and therefore stays at the top. UDC is multilingual by supporting French, Portuguese, Spanish and Russian (Slavic, 2008) (Koch & Day, 1997) and has been translated into languages ("Universal Decimal Classification summary," 2017). LCC supports works in 19 language subclasses ("Library of Congress Classification Outline: Class P - Language and Literature,") including German, Slavic, Oriental Languages and Roman languages etc. The translations of DDC support to localize this scheme for different languages of the world (Vizine-Goetz, 2009). DDC is translated in 30 different languages but covers different languages in only seven classes i.e., from 420 to 490 class number ("Dewey Decimal Classification summaries,"). CC shows minimal multilingual support because of its sub-continental origin (A Neelameghan & Lalitha, 2013; Raghavan, 2016). ACM and IEEE are in English languages only and therefore, show no multilinguality at all. Using this metric, we can conclude that DMOZ is at the first position, followed by UDC, DDC, LCC, and then CC. Consistency measures the level of uniformity in classification system to classify subjects. According to (Batty, 1967), in the earlier stages, CC shows no consistency but by the addition of consistency cannons, it has gradually become consistent. LCC seems less consistent in expressing different subjects areas (Madge, 2011). DDC and LCC were found short of defining and classifying religious holdings especially Jewish contents. These schemes also show biasness towards different religious and regional contents (Maddaford & Briefing). Although DDC is a little bit inconsistent, still it can classify complex subjects (Gnoli et al., 2016). UDC also shows inconsistency, which can be sorted out by introducing specific UDC classes to database in online system (Kaosar, 2008). DDC shows comparatively greater consistency in classifying new subjects with constant uniformity; CC is ranked second because of the introduction of canons of consistency. LCC and UDC are ranked at third position. For being only limited to the scientific research articles, IEEE INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 60 and ACM are at fourth position. DMOZ stands at fifth position due to its loose boundaries of categorization. The interoperability determines how much a given classification scheme is interoperable in expressing its classification artifacts with other schemes. UDC is interoperable (Koch & Day, 1997) and supports integration with other systems. CC, because of its sub-continental origin, shows limited interoperability (A Neelameghan & Lalitha, 2013) (Raghavan, 2016). LCC shows interoperability by being capable to map with DDC (Vizine-Goetz, 2009). The interoperability and multilinguality of DDC enables it to map with other classification schemes (Vizine-Goetz, 2009). IEEE, ACM and DMOZ datasets are interoperable with other web applications. Based on this metric, DDC, LCC, UDC, ACM, IEEE and DMOZ are standing at first position because of the presence of their interoperability and data harvesting protocols and ontologies in the digital environment. DMOZ stands at the second position because of limited interoperability. CC provides only philosophical and theoretical model but we found no practical web-based application so it is not included in this list. By enabling semantic search, a classification scheme can proactively respond to information seekers using its faceted structure. UDC, because of its semantic structure (Slavic, 2008), contains semantic search capability. The classification theory and philosophy of CC provides the basis for classification ontology development (Panigrahi & Prasad, 2005), which makes obvious its capability of semantic search and inference. LCC supports semantic search through LOD support, semantically enabled LCSH and authority control files ("LC Linked Data Service: Authorities and Vocabularies,") (Harper & Tillett, 2007). DDC also contains semantic features (Green, 2015), which can be utilized in the semantic search applications. Therefore, it can be concluded that semantic search is also supported by DDC. This metric can be better analyzed in the digital environment and especially through analyzing these bibliographic classifications for their ontologies. LCC could be ranked first because of its expressive ontology with efficient semantic search application. DDC is at second position, because of efficient search but limited usage of its ontology. ACM is at third position because of its expressive ontology and efficient search but limited coverage to scientific domain. IEEE is at fourth position because of its faceted semantic search. UDC comes at fifth position because of its ontological presence but with limited usage. CC has no application in the digital environment, which could demonstrate its capability for semantic search, although it provides the basis for the semantic level for all bibliographic classification systems. DMOZ lacks in semantic search, where it is only based on keywords. Bias in subject representation means inclination for or against some subjects which results in unfair, partial negligence or fully ignoring any subject. DDC and LCC are biased in representing different knowledge and regional information, e.g., Anglo-American bias (Tomren, 2003), while UDC is biased towards European culture (Fandino, 2008). CC is biased towards different knowledge areas (Satija & Singh, 2010). A classification system with least biasness is ranked higher. Therefore, in this connection, DMOZ is ranked higher for showing no/least biasness; CC is ranked second because of the presence of acute biasness followed by DDC showing comparatively BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 61 less biasness towards religion and regional subjects. LCC comes at the fourth position followed by IEEE and ACM that show greater biasness towards certain domains. Enumerative structure exhibits the rigid hierarchies. LCC is enumerative (Goh et al., 2009; Perles, 1995) (Bryant, October 4, 1993). UDC is nearly enumerative and faceted (Kaosar, 2008) (Bryant, October 4, 1993) and DDC is both analytico-synthetic and enumerative (Hallows, 2014). CC is faceted (Chatterjee, 2016; Dawson, Brown, & Broughton, 2006). By comparing these systems, LCC fully supports enumerative structure, and then comes DDC, whereas UDC is nearly enumerative and CC shows no enumerative structure at all. LCC and DDC are enumerative. The trend is towards semantic and faceted structure, and therefore, enumerative structure in classification systems is not a desirable characteristic. Therefore, the system with enumerative nature will be ranked lower. Based on this metric, CC and DMOZ are least enumerative and therefore, ranked higher, followed by IEEE and ACM at the second position, then UDC at the third position, while DDC and LCC at the last. The faceted structure means the semantically interlinked structure of categories, which can be merged and combined to generate an expression for existing or new concepts (Svenonius, 2000). CC is faceted (Chatterjee, 2016; Dawson et al., 2006). UDC is analytico-synthetic (Kaosar, 2008) and follows the faceted method of CC using different connecting symbols in mixed notations and using subject facets including time and space (Chatterjee, 2016). IEEE and ACM possess faceted structures. DMOZ has only hierarchical structure and predefined categories. Based on this metric we rank CC first, UDC second, ACM and IEEE third while DDC and LCC are enumerative structures, and therefore, cannot be included in the list. Faceted search means to navigate or browse through the faceted structure of a faceted classification scheme. Faceted search is also applied by selecting different ranges and choices from different facets that are given by any faceted system to search the required contents. It is different from search complexity in the sense that it looks at the pattern and criteria of search that exist in any classification scheme either in there OPACs or web applications. The theory and philosophy of CC supports faceted search & browsing economically (Kong, 2016), however, to the best of our knowledge, no real-world application demonstrates its usefulness. UDC is based on the faceted approach, which supports faceted search (Tunkelang, 2009). LCC supports faceted search with the help of LCSH (McGrath, 2007). LCC also provides faceted search through the Faceted Application of Subject Terminology (FAST) application ("Faceted Application of Subject Terminology," 2017). DDC provides the faceted search through the OCLC Classify16 application. Using this metric, these classification schemes can be ranked as DDC at first position because DDC is adopting the faceted approach along with its native enumerative nature and state-of-the-art web based search applications developed by OCLC. LCC is at second position because of its web based search applications and its adaptation of comparatively restricted faceted approach. IEEE, for providing extensive choice of searching patterns, stands at the third position. ACM has poly-hierarchical and 16 http://classify.oclc.org/classify2/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 62 multi-faceted classification structure along with robust search mechanism; therefore, it is on fourth position in this list. There are very limited faceted search applications of UDC and therefore it stands at fifth position. DMOZ has hierarchical structure in which the required element can be accessed through a keyword search. Therefore, it is not providing any faceted search. CC has no search applications that could confirm its support for the faceted search. LOD datasets means the availability of datasets of a given classification system on LOD cloud. Among our choice of well-known classification systems UDC, LCC, DDC, IEEE, ACM and DMOZ have datasets in the LOD cloud whereas CC has no such datasets. The definitions of classes and properties are gathered in Linked Data Vocabularies (LOV), which are used for describing different types of objects used in LOD cloud. These definitions of different things provide vocabularies for linking the linked data (Foundation, 2017). CC, UDC, DDC, LCC, IEEE and DMOZ have no LOV, whereas ACM has LOV vocabularies. The metric “platforms” in the evaluation framework, considers the applicability of a given classification system in real-world web applications and other digital environments. In this regard, UDC is supported by UDC consortium, DDC by OCLC, LCC by Library of Congress, ACM by ACM digital library, IEEE by IEEE Xplore digital library, and DMOZ by Open Directory Project. To the best of our knowledge, CC has not been used by any of the online applications. Table 3. Ranking and Average Ranking of Classification Schemes The warrants of classification work as authoritative acts for classificationists to perform the cognitive practice for designing the classes and concepts in the classification system, their structural properties and then putting subjects in the specified classes (Beghtol, 1986). CC and R an ki n g St ru ct ur al C om pl ex it y N ot at io n al b re vi ty P re de fi n ed S tr uc tu re R ul es C om pl ex it y T he or et ic al L aw s M n em on ic s H os pi ta li ty Se ar ch C om pl ex it y U sa bi li ty P re ci si on a n d A cc ur ac y M ul ti li n gu il it y In te ro pe ra bi li ty Se m an ti c Se ar ch B ia sn es s En um er at iv e St ru ct ur e Fa ce te d S tr uc tu re Fa ce te d S ea rc h Co n si st en cy LO D d at as et s LO V S up po rt A ve ra ge R an ki n g CC 1 2 7 1 5 2 6 2 2 4 2 1 7 5 4 4 1 4 1 1 3.1 UDC 2 3 4 4 3 6 5 3 3 3 5 3 3 3 2 3 2 5 2 1 3.25 DDC 4 5 3 3 4 7 4 4 5 2 4 3 6 4 1 1 6 3 2 1 3.6 LCC 3 4 2 2 2 5 3 4 4 2 3 3 2 1 1 1 5 3 2 1 2.65 ACM 6 7 6 5 1 4 2 4 5 2 1 3 5 2 3 2 3 2 2 1 3.3 IEEE 5 6 5 5 1 3 2 4 5 2 1 3 4 2 3 2 4 2 2 2 3.15 DMOZ 7 1 1 5 1 1 1 1 1 1 6 2 1 6 4 1 1 1 2 1 2.25 BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 63 UDC use literary warrant; DDC and LCC use literary & scientific warrants. ACM and IEEE use scientific research warrant, while DMOZ exhibits no warrant of classification. In the above paragraphs, we compared and evaluated the selected classification system using the evaluation metrics (shown in Table 2), and discussed how these systems can be ranked based on a given evaluation metric. However, to give a holistic view of this comparison and evaluation, we introduce a ranking score or levels ranging from 1 (meaning low ranking, not applicable, or not available) to 7 (meaning high ranking) in how a classification scheme is best among its counterparts in the list. It is also the case that for a given metric, multiple systems may belong to the same ranking level. By assigning these ranking levels, Table 3 compares these systems based on 20 metrics by excluding platforms and warrants of classification. Table 3 also reports the average ranking of these classification systems, showing DDC at top with average ranking of 3.6, followed by ACM = 3.3, and UDC = 3.25. It can be concluded that DDC and UDC are among the best classification schemes for describing printed as well as digital collections, whereas ACM is best for classifying digital collections belonging to Computer Science domain. However, ACM classification system can be extended to include other domains as well. Figure 1 illustrates graphically the comparison and evaluation of these systems. Figure 1. Comparison and Ranking of Classification Systems Table 4 presents the state-of-the-art bibliographic classification ontologies including Bibliographic ontology, LCC ontology, DDC ontology, UDC ontology, and DMOZ ontology. Some of these ontologies were designed specifically for certain targeted applications e.g., ACM ontology for ACM digital library, and LCC ontology for Library of Congress etc., whereas others have multiple usage scenarios and have been used by several applications. An example of such general-purpose INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 64 bibliographic classification ontology is the Bibliographic Ontology17, which is used by several bibliographic services and digital libraries e.g., digital object identifier (DOI), Zotero, and Library of Congress Classification Number (LCCN) permalink service (Giasson, 2012). This evaluation framework compares these ontologies based on their size (in terms of number of classes), usage in the state-of-the-art applications, LOD support, the availability of datasets on datahub18, and LOV support. By looking at Table 3, ACM show more comprehensiveness in terms of number of classes, triples and LOV support. Classification and Categorization Ontologies No. of classes Applications LOD datasets LOD datasets triples LOV support Bibliographic ontology19 69 Library of Congress and BibBase ü 200000 ü LCC ontology20 40+ Library of Congress ü Not Given û DDC ontology21 20+ OCLC ü 402288 û UDC ontology22 2,600 UDC23 ü 69,000 û ACM ontology24 1469 ACM ü 12402336 ü IEEE LOM metadata ontology (Casali, Deco, Romano, & Tomé, 2013) 9 IEEE25 Xplore digital library ü 91564 ü DMOZ ontology26 Not given Open Directory Project ü Not given û Table 4. Comparison of classification and categorization ontologies 18 https://datahub.io 19 http://purl.org/ontology/bibo 20 http://id.loc.gov/ 21 http://dewey.info/ 22 http://udcdata.info/ 23 http://udcdata.info/ 24 http://dl.acm.org/ccs/ccs.cfm 25 http://ieee.rkbexplorer.com/id/ 26 https://www.dmoz.org/rdf.html BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 65 Issues & Challenges in Classification Research Although bibliographic classification has been practiced since the use of books and the inception of library & information science practices, further research & development efforts are required for to meet the classification needs of the digital age. Especially, with the arrival of digital holdings, researchers face several issues and challenges. For example, automatic text classification performs categorization of resources using ordinary metrics including TF-IDF and classification in its true sense is yet to be achieved (Yi, 2006). To handle the issue, text classification is also carried out through semantic indexing but its accuracy and precision are yet to be achieved. Semantic and structural relationships among different parts of text corpus is still at infant level and has not been exploited to their fullest so that these can be used in text classification in more meaningful ways. Other challenges in text classification include handling huge data resulted by applying a classification scheme, dynamism in classification, and structure dissimilarity among classification schemes although they agree upon subject as the primary characteristic. The biasness in DDC and LCC needs to be resolved. Several revisions and proposals are put forward for addressing the problem of systematic knowledge organization and searching through natural language terms (Miksa, 2007). There are various issues regarding the structural updates, search & retrieval criteria, and visualization (Slavic-Overfield, 2005). There are two main challenges for the application of the bibliographic classification principles in classifying the Web. First, the principles of the bibliographic classification are formulated for the printed documents, which should also be applicable to digital collections. For addressing these challenges, there is need to apply and modify bibliographic classification principles in digital environments. Second, it is required to exploit hidden hierarchies and concepts to be better classified by the principles of bibliographic classification for precise discovery, search and retrieval (J. Mai, 2004). The issue of dependent process of classification of any object per predefined criteria and principles is important to address for finding a place in this age of search engines. This issue can be tailored by the principles of classification, so that the conventional principles are modified to consider the purpose of classification and domain of objects. For this issue semantic web and ontologies can play a vital role in bibliographic classification, which can provide independent classification of the bibliographic classification predefined theories (Hjørland, 2012). The issue of heterogeneity conflicts, which arise because of the inconsistencies and structural divergences, are the challenges for the semantic interoperability. Semantic interoperability can be brought into the bibliographic records inside the bibliographic system and across the systems through different phases of interlinking, evaluation, analysis, remodeling & conversion for analyzing, and restructuring the bibliographic data (Tallerås, 2013). Bibliographic data is in multi-format, multi-topical, multi-lingual and multi-targeted. For tackling these issues, the bibliographic data must be made mutually interoperable for making it INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 66 interlinked, searchable, and presented in a harmonized way across the boundaries of the datasets and data silos. The interoperability problem arises at the syntactic level for making consistent the character sets, notations, data formats and records in different systems. The interoperability problem is also arising at the semantic level because of the difference in data interpretations and difference in vocabularies, and precision levels in data encoding. Publishing, collecting and maintenance of bibliographic data by multi organizations through own established standards and best practices in Web 2.0 (Hyvönen, 2012). With these problems in hand, the transition of this data from syntactical Web to Semantic Web is a challenge for bringing the uniformity in records that are generated by diverse sources, encoded in multi-bibliographic systems, cross bibliographic systems interoperability, the visualization of bibliographic data accordingly as per need for different contexts. For addressing these problems there is a need for coordination and collaboration between bibliographic data publishers and the technical developers of the web applications (Hyvönen, 2012). There is variety of metadata standards and schemas for defining, managing, resource discovery, search & retrieval, preserving, mapping, cross-walking, integrity, accuracy, and authenticity of metadata and bibliographic data. But for these tasks to be handled with great simplicity, semantic richness and accuracy, a universal all in one metadata format and schema is the need of the day (Ramesh, Vivekavardhan, & Bharathi, 2015) to get out of this jungle of standards (Gartner, 2016). This way, the metadata publishers and managers could get relieved and the job will become economic in terms of time, management, and search & retrieval. Three main tasks were set in Semantic Publishing challenges 2015. These tasks are: (i) extracting data on workshops’ quality indicators; (ii) extracting data on affiliations, citations, funding; and (iii) interlinking. Several challenges were faced while fulfilling these tasks. These tasks are being fulfilled through a proposed solution, which is composed of a text mining pipeline, LODeXporter and Named Entity Recognition (NER) for named entities extraction form text and linking them to resources on the LOD cloud (Sateli & Witte, 2015). In (Peroni, 2012) three main issues of semantic publishing are addressed which are: lack of document publishing universal metadata schemas according to publishing vocabulary, lacking of efficient user interface that are based on models and theories of semantic publishing, and there is a need for a tool that semantically link and describe document text. These issues require the urgent need for comprehensive ontologies for document publishing domain. (Ferrara & Salini, 2012) tossed 10 challenges for multiple dimensions of data in terms of bibliographic analysis. These challenges are: (i) analyzing bibliographic data in a multidimensional pattern; (ii) discovering and integrating data coming from diverse sources; (iii) detecting multiple references to the same item and cleaning, normalizing, and disambiguating bibliographic data records; (iv) analyzing multidimensional nature of bibliographic data through multivariate analysis for aggregating the data; (v) comparing different elements of bibliographic data and its ranking accordingly, (vi) aggregating indexes of different nature with respect to BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 67 different parameters, dimensions, and elements of bibliographic data; (vii) dealing with multiple indexes for the same item with different values coming from different sources; (viii) extracting and indexing textual information from text corpus in support of text mining; (ix) analyzing textual data topic-wise and describing these topics for research and learning process and tracing different trends; and (x) combining multidimensional information for finding trends in bibliographic data collection. Bibliographic classification systems are being incorporated in LOD. In Dewey.info27, a prototype version of DDC is designed for linking its dataset in linked data cloud. The intention is to provide a platform for DDC data on the Web having summaries of top 3 levels of classification order of DDC 22nd edition in 11 different language encoded in RDF/SKOS, having actionable URIs for every class, representation for machines is in RDF, and for humans in XHTML+RDFs, and serialization available in formats of RDF/XML, Turtle and JSON, and with SPARQL endpoint. (OCLC 2011; Mitchell and Panzer 2013). However, this version of DDC on LOD cloud is still at infant stage to cover different subjects and to be widely used in generating and creating documents metadata. Library of Congress Linked Data service provides access to commonly used standards and vocabularies developed by Library of Congress. This includes data values, controlled vocabularies, and preservation vocabularies which are part of this service. This service provides access to LCSH, LCC name authority files, LCC28, LC children's subject headings, LC genre/form terms, thesaurus for graphic materials, MARC relators, MARC countries, MARC geographic areas, MARC languages, ISO639-1 languages, ISO639-2 languages, ISO639-5 languages, extended date/time format, preservation events, and preservation level role and cryptographic hash functions. The authorities and vocabularies currently included in this service are listed on the Linked Data service (Library of Congress 2014). However, it lacks in vocabularies for supporting PREMIS, MARC, MODS, METS, and MIX. As presented in Section 2, several ontologies have been developed for describing and sharing knowledge about bibliographic classification. However, the available ontologies are limited in several ways e.g., these ontologies are not the complete clones of classification schemes of which they are deemed to be ontologies and they also not mature enough in terms of metadata collection. In addition, these ontologies still couldn’t break the cross-classification scheme metadata collection barriers i.e., they are not interoperable enough to harvest the metadata across bibliographic ontology system. Therefore, further initiatives are required to develop matured bibliographic ontologies which fully clone bibliographic schemes that are in practical use and have strong theoretical ground. These ontologies must be interoperable and sharing metadata collection with other bibliographic ontologies. In this way in future we can have ontology-based general bibliographic classification system by fusion of the new and existing bibliographic ontologies for better management of the knowledge artifacts. 27 https://datahub.io/dataset/dewey_decimal_classification INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 68 CONCLUSIONS With the arrival of digital collections, new challenges of preservation, curation as well as resource discovery & access (retrieval) have emerged that needs proper attention, where classification schemes and ontologies can play a significant role. By comparing and evaluating the available bibliographic classification and categorization systems it is concluded that currently DDC is the best classification system followed by UDC, and ACM. The bibliographic classification ontologies are limited in one way or the other e.g., some of these are comprehensive like UDC and ACM but lack support for LOD and LOV etc., while others support these later aspects but lack comprehensiveness. Keeping in view the available bibliographic classification ontologies and their limitations, we recommend that a universal bibliographic classification ontology should be developed by using the classes from the available ontologies and providing support in terms of availability of datasets, support for interoperability, LOD, and Linked data vocabularies. For developing a more meaningful classification system, equally applicable to digital environments, it is necessary to consider the book structural semantics such as table of contents, headings, chapters, sections, subsections, figures, algorithms, mathematical equations, quotations etc., and the logical connections in contents (Khusro & Ullah, 2016; I. Ullah & Khusro, 2016) as well as about the book information i.e., the bibliographic details of the holdings. To meet, the former requirement, a comprehensive ontology like BookOnt (A. Ullah, Ullah, Khusro, & Ali, 2016) could be used, which can be mapped with any bibliographic ontology like e.g., Bibliographic Ontology29. However, as the evaluation frameworks suggest, DDC, UDC, and ACM Classification System should be exploited in designing such a general-purpose classification system. REFERENCES The 2012 ACM Computing Classification System. Retrieved March 20, 2017, from http://www.acm.org/about/class/2012 About Universal Decimal Classification (UDC). Retrieved March 21, 2017, from http://www.udcc.org/index.php/site/page?view=about Albrechtsen, H. (2000). Who wants yesterday's classifications? Information science perspectives on classification schemes in common information spaces. In K. Schmidt (Ed.), Papers. Technical University of Denmark, Center for Tele-Information. Batley, S. (2014). Classification in Theory and Practice. Oxford: Chandos Publishing. Batty, C. D. (1967). An introduction to colon classification: Archon Books. Beghtol, C. (1986). Semantic validity: concepts of warrant in bibliographic classification systems. Library resources & technical services, 30(2), 109-125. 29 http://bibliontology.com/# BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 69 Beghtol, C. (1998). Knowledge domains: multidisciplinarity and bibliographic classification systems. Knowledge Organization, 25(1-2), 1-12. Beghtol, C. (2001). Relationships in classificatory structure and meaning Relationships in the organization of knowledge (pp. 99-113): Springer. Berners-Lee, T. (2006, June 18, 2009). Linked data. Design Issues. Retrieved March 21, 2017, from https://www.w3.org/DesignIssues/LinkedData.html Betts, T., Milosavljevic, M., & Oberlander, J. (2007). The utility of information extraction in the classification of books Advances in Information Retrieval (pp. 295-306): Springer. Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. Semantic Services, Interoperability and Web Applications: Emerging Concepts, 205-227. Boykin, J. (2016). Assessing DMOZ: A Quality Review. Retrieved 14-03-2016, 2016, from https://www.seochat.com/c/a/search-engine-news/assessing-dmoz-a-quality-review/ Bryant, B. (October 4, 1993). 'Numbers You Can Count On' Dewey Decimal Classification Is Maintained at LC. Library of Congress Information Bulletin, 52(18). http://www.loc.gov/loc/lcib/93/9318/count.html Buchanan, B. (1979). Theory of library classification. Campbell, D. G. (2002). Centripetal and Centrifugal Forces in Bibliographic Classification Research. Paper presented at the ASIS SIG/CR Classification Research Workshop. Casali, A., Deco, C., Romano, A., & Tomé, G. (2013). An assistant for loading learning object metadata: An ontology based approach. Chan, L. M. (2000). Exploiting LCSH, LCC, and DDC to Retrieve Networked Resources: Issues and Challenges. Chan, L. M., Intner, S. S., & Weihs, J. (2016). Guide to the Library of Congress classification: ABC- CLIO. Chapman, J. W., Reynolds, D., & Shreeves, S. A. (2009). Repository metadata: approaches and challenges. Cataloging & classification quarterly, 47(3-4), 309-325. Chatterjee, A. (2016). Universal Decimal Classification and Colon Classification: Their mutual impact. Annals of Library and Information Studies (ALIS), 62(4), 226-230. Cliff, P. (2008). JISC-Repositories: Subject Classification Thread Summary. Comaromi, J. P., & Satija, M. P. (1983). Brevity of notation in Dewey decimal classification: Metropolitan. Dawson, A., Brown, D., & Broughton, V. (2006). The need for a faceted classification as the basis of all methods of information retrieval. Paper presented at the Aslib proceedings. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 70 De Grolier, E. (1962). A study of general categories applicable to classification and coding in documentation. Dewey Decimal Classification summaries. Retrieved March 21, 2017, from https://www.oclc.org/en/dewey/features/summaries.html Dewey Services: Dewey Decimal Classification System. Retrieved March 20, 2017, from https://www.oclc.org/content/dam/oclc/services/brochures/211422usb_dewey_services. pdf Dorji, T. C., Atlam, E.-s., Yata, S., Fuketa, M., Morita, K., & Aoe, J.-i. (2011). Extraction, selection and ranking of Field Association (FA) Terms from domain-specific corpora for building a comprehensive FA terms dictionary. Knowledge and Information Systems, 27(1), 141-161. doi: 10.1007/s10115-010-0296-x Dousa, T. M. (2009). Evolutionary order in the classification theories of CA Cutter & EC Richardson: its nature and limits. Encyclopedia, N. W. (1 August 2014). Library classification. 2017, from http://www.newworldencyclopedia.org/entry/Library_classification Faceted Application of Subject Terminology. (2017). Retrieved March 21, 2017, from http://www.oclc.org/research/themes/data-science/fast.html Fandino, M. (2008). UDC or DDC: a note about the suitable choice for the National Library of Liechtenstein. Extensions and Corrections to the UDC. Ferrara, A., & Salini, S. (2012). Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics, 93(3), 765-785. Foundation, O. K. (2017). About LOV. from http://lov.okfn.org/dataset/lov/about Francu, V. (2007). Multilingual access to information using an intermediate language: Proefschrift voorgelegd tot het behalen van de graad van doctor in de Taal-en Letterkunde aan de Universiteit Antwerpen. Gartner, R. (2016). Metadata: Springer. Giasson, B. D. A. F. (2012). Projects using BIBO. from http://www.bibliontology.com/projects.html Giess, M. D., Wild, P., & McMahon, C. (2007). The use of faceted classification in the organisation of engineering design documents. Paper presented at the Proceedings of the International Conference on Engineering Design 2007. Gilchrist, A. (2015). Reflections on Knowledge, Communication and Knowledge Organization. Knowledge Organization, 42(6), 456-469. Giunchiglia, F., Marchese, M., & Zaihrayeu, I. (2007). Encoding classifications into lightweight ontologies Journal on data semantics VIII (pp. 57-81): Springer. BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 71 Gnoli, C., Merli, G., Pavan, G., Bernuzzi, E., & Priano, M. (2008). Freely faceted classification for a Web-based bibliographic archive: the BioAcoustic Reference Database. Gnoli, C., Pusterla, L., Bendiscioli, A., & Recinella, C. (2016). Classification for collections mapping and query expansion. Goh, Y. M., Giess, M., McMahon, C., & Liu, Y. (2009). From faceted classification to knowledge discovery of semi-structured text records Foundations of Computational, IntelligenceVolume 6 (pp. 151-169): Springer. Green, R. (2015, October 29-30, 2015). Relational aspects of subject authority control: the contributions of classificatory structure. Paper presented at the Proceedings of the International UDC Seminar 2015 Classification & authority control Expanding resource discovery, Lisbon. Hallows, K. M. (2014). It's All Enumerative: Reconsidering Library of Congress Classification in US Law Libraries. Law Libr. J., 106, 85. Harper, C. A., & Tillett, B. B. (2007). Library of Congress controlled vocabularies and their application to the Semantic Web. Cataloging & classification quarterly, 43(3-4), 47-68. Hjorland, B. (1999). The DDC, the universe of knowledge, and the post-modern library. Journal of the Association for Information Science and Technology, 50(5), 475. Hjørland, B. (2007). Semantics and knowledge organization. Annual review of information science and technology, 41(1), 367-405. Hjørland, B. (2008). Core classification theory: a reply to Szostak. Journal of Documentation, 64(3), 333-342. Hjørland, B. (2012). Is classification necessary after Google? Journal of Documentation, 68(3), 299- 317. Hjørland, B. (2013). Theories of knowledge organization—theories of knowledge: Keynote March 19, 2013. 13th Meeting of the German ISKO in Potsdam. Knowledge Organization, 40(3), 169-181. Hjørland, B. (2016). Subject (of documents). Knowledge Organization, 44(1), 55-64. Hyman, R. J. (1980). Shelf classification research: past, present--future? Occasional papers (University of Illinois at Urbana-Champaign. Graduate School of Library Science); no. 146 (Nov. 1980). Hyvönen, E. (2012). Publishing and using cultural heritage linked data on the semantic web. Synthesis Lectures on the Semantic Web: Theory and Technology, 2(1), 1-159. Jacob, E. K. (2004). Classification and categorization: a difference that makes a difference. Library trends, 52(3), 515. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 72 Jonassen, D. H. (2004). Handbook of research on educational communications and technology: Taylor & Francis. Jones, K. S. (1970). Some thoughts on classification for retrieval. Journal of Documentation, 26(2), 89-101. Joorabchi, A., & Mahdi, A. E. (2009). Leveraging the legacy of conventional libraries for organizing digital libraries. Paper presented at the International Conference on Theory and Practice of Digital Libraries. Joorabchi, A., & Mahdi, A. E. (2011). An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. J. Inf. Sci., 37(5), 499-514. doi: 10.1177/0165551511417785 Kaosar, A. (2008). Merit & Demerit of using Universal Decimal Classification on the Internet. Kaula, P. (1965). Colon Classification: Genesis and Development. Library Science Today. Ranganathan’s Festschrift, 1, 87-93. Khusro, S., & Ullah, I. (2016). Towards a semantic book search engine. Paper presented at the 2016 International Conference on Open Source Systems & Technologies (ICOSST'16), Lahore, Pakistan. Koch, T., & Day, M. (1997). DESIRE - Development of a European Service for Information on Research and Education. Koch, T., Day, M., Brümmer, A., Hiom, D., Peereboom, M., Poulter, A., & Worsfold, E. (1997). The role of classification schemes in Internet resource description and discovery. Work Package, 3. Kong, W. (2016). Extending Faceted Search to the Open-Domain Web. University of Massachusetts Amherst. Koshman, S. (1993). Categorization and classification revisited: a review of concept in library science and cognitive psychology. Current Studies in Librarianship Spring/Fall, 26. Kwaśnik, B. H., & Rubin, V. L. (2003). Stretching conceptual structures in classifications across languages and cultures. Cataloging & classification quarterly, 37(1-2), 33-47. Kyle, B., & Vickery, B. C. (1961). The Universal Decimal Classification: present position and future developments: Unesco. LC Linked Data Service: Authorities and Vocabularies. Retrieved 28 Feb 2017, 2017, from http://id.loc.gov Lee, H.-L. (2012). Epistemic foundation of bibliographic classification in early China: A Ru classicist perspective. Journal of Documentation, 68(3), 378-401. Library of Congress Classification. (2014, 10/1/2014). Retrieved March 20, 2017, from https://www.loc.gov/catdir/cpso/lcc.html BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 73 Library of Congress Classification Outline: Class P - Language and Literature. [Press release]. Retrieved from https://www.loc.gov/aba/cataloging/classification/lcco/lcco_p.pdf . Library of Congress Subject Headings: Pre- vs. Post-Coordination and Related Issues. (March 15, 2007 ) Report for Beacher Wiggins, Director, Acquisitions & Bibliographic Access Directorate, Library Services, Library of Congress (pp. 49). Cataloging Policy and Support Office. Library OPACs containing UDC codes. Retrieved March 21, 2017, from http://www.udcc.org/index.php/site/page?view=opacs Linked data. from https://www.w3.org/standards/semanticweb/data Losee, R. M. (1993). Seven fundamental questions for the science of library classification. Knowledge Organization, 20, 65-65. Maddaford, S., & Briefing, C. Library of Congress Classification System. Madge, O.-L. (2011). Evidence Based Library and Information Practice. Studii de Biblioteconomie şi Ştiinţa Informării(15), 107-112. Mai, J.-E. (2003). The future of general classification. Cataloging & classification quarterly, 37(1-2), 3-12. Mai, J.-E. (2004). Classification in context: relativity, reality, and representation. Knowledge Organization, 31(1), 39-48. Mai, J.-E. (2005). Analysis in indexing: document and domain centered approaches. Information Processing & Management, 41(3), 599-611. doi: http://dx.doi.org/10.1016/j.ipm.2003.12.004 Mai, J.-E. (2009). The boundaries of classification. Mai, J.-E. (2010). Classification in a social world: bias and trust. Journal of Documentation, 66(5), 627-642. Mai, J.-E. (2011). The modernity of classification. Journal of Documentation, 67(4), 710-730. Mai, J. (2004). Classification of the Web: challenges and inquiries. Knowledge Organization, 31(2), 92. Mancuso, J. (1994). General Purpose vs Special Purpose Couplings. Paper presented at the 23rd Turbomachinery Symposium, Dallas, TX, Sept. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1): Cambridge university press Cambridge. Maron, M. E., Kuhns, J. L., & Ray, L. C. (1959). Probabilistic indexing: a statistical approach to the library problem. Paper presented at the Preprints of papers presented at the 14th national meeting of the Association for Computing Machinery, Cambridge, Massachusetts. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 74 McGrath, K. (2007). Facet-based search and navigation with LCSH: Problems and opportunities. code4lib Journal, 1. McIlwaine, I. C. (1997). The Universal Decimal Classification: Some factors concerning its origins, development, and influence. Journal of the American Society for Information Science (1986- 1998), 48(4), 331. Miksa, S. D. (2007). The challenges of change: a review of cataloging and classification literature, 2003-2004. Library resources & technical services, 51(1), 51. Neelameghan, A., & Lalitha, S. (2013). Multilingual thesaurus and interoperability. DESIDOC Journal of Library & Information Technology, 33(4). Neelameghan, A., & Parthasarathy, S. (1997). SR Ranganathan's Postulates and Normative Principles: Applications in Specialized Databases Design, Indexing and Retrieval: Sarada Ranganathan Endowment for Library Science. Nizamani, S., Memon, N., & Wiil, U. K. (2011). Cluster Based Text Classification Model Counterterrorism and Open Source Intelligence (pp. 265-283): Springer. Painter, A. F. (1974). Classification: Theory and Practice. Drexel Library Quarterly, 10(4), n4. Panigrahi, P., & Prasad, A. (2005). Inference Engine for Devices of Colon Classification in AI-based Automated Classification System. Perles, B. (1995). Faceted Classifications and Thesauri. Retrieved from Howard Besser's Web website: http://besser.tsoa.nyu.edu/impact/f95/Papers-projects/Papers/perles.html Peroni, S. (2012). Semantic Publishing: issues, solutions and new trends in scholarly publishing within the Semantic Web era. alma. Piros, A. (2014, 29 February 1, 2014). A different approach to Universal Decimal Classification in a Mechanized Retrieval System. Paper presented at the Proceedings of the 9th International Conference on Applied Informatics Eger, Hungary. Pollitt, A. S. (1998). The key role of classification and indexing in view-based searching: Technical report, University of Huddersfield, UK, 1998. http://www.ifla.org/IV/ifla63/63polst.pdf. Press, O. F. (2002). Introduction to the dewey decimal classification. Raghavan, K. (2016). The Colon Classification: A few considerations on its future. Annals of Library and Information Studies (ALIS), 62(4), 231-238. Rahman, A., & Ranganathan, T. (1962). Seminal Mnemonics. Annals of Library Science, 9, 53-67. BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 75 Ramesh, P., Vivekavardhan, J., & Bharathi, K. (2015). Metadata Diversity, Interoperability and Resource Discovery Issues and Challenges. DESIDOC Journal of Library & Information Technology, 35(3). Ranganathan, S. R. (1968). Choice of scheme for classification. Lib. Sci. with a slant to Documentation, 5(1), 1-69. Reiner, U. (2008). Automatic analysis of Dewey decimal classification notations Data analysis, machine learning and applications (pp. 697-704): Springer. Rodriguez, R. D. (1984). Hulme's concept of literary warrant. Cataloging & classification quarterly, 5(1), 17-26. Rosenfeld, L., & Morville, P. (2002). Information architecture for the world wide web: " O'Reilly Media, Inc.". San Segundo Manuel, R. (2008). Some arguments against the suitability of Library of Congress Classification for Spanish Libraries. Extensions and Corrections to the UDC. Sateli, B., & Witte, R. (2015). Automatic construction of a semantic knowledge base from CEUR workshop proceedings. Paper presented at the Semantic Web Evaluation Challenge. Satija, M. P. (2013). The theory and practice of the Dewey decimal classification system: Elsevier. Satija, M. P. (2015). Save the national heritage: Revise the Colon Classification. Satija, M. P., & Martínez-Ávila, D. (2015). Features, Functions and Components of a Library Classification System in the LIS tradition for the e-Environment. Journal of Information Science Theory and Practice, 3(4), 62-77. Satija, M. P., & Singh, J. (2010). Colon Classification (CC) Encyclopedia of library and information sciences (Vol. 2, pp. 1158-1168). Schallier, W. (2005). Subject retrieval in OPAC's: a study of three interfaces. Paper presented at the 7th ISKO-Spain Conference: The human dimension of knowledge Organization, Barcelona. Singapore, N. L. o. (2016). Usability on the Web. from http://www.nlb.gov.sg/resourceguides/usability-on-the-web/ Slavic-Overfield, A. (2005). Classification management and use in a networked environment: the case of the Universal Decimal Classification. University of London. Slavic, A. (2006). Interface to classification: some objectives and options. Slavic, A. (2008). Use of the Universal Decimal Classification: A world-wide survey. Journal of Documentation, 64(2), 211-228. Smiraglia, R. P., & Van den Heuvel, C. (2011). Idea Collider: From a theory of knowledge organization to a theory of knowledge interaction. Bulletin of the American Society for Information Science and Technology, 37(4), 43-47. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 76 Subject classification schemes. (2015). from http://www.ifla.org/best-practice-for-national- bibliographic-agencies-in-a-digital-age/node/9042 Sukhmaneva, E. (1970). The Problems of Notation and Faceted Classification. 17(3-4), 112-116. Svenonius, E. (2000). The intellectual foundation of information organization: MIT press. Tallerås, K. (2013). From Many Records to One Graph: Heterogeneity Conflicts in the Linked Data Restructuring Cycle. Information Research: An International Electronic Journal, 18(3), n3. Tennis, J. T. (2008). Epistemology, theory, and methodology in knowledge organization: toward a classification, metatheory, and research framework. Tennis, J. T. (2011). Ranganathan's layers of classification theory and the FASDA model of classification. Thelwall, M. (2009). Synthesis lectures on information concepts, retrieval, and services.". Introduction to webometrics: Quantitative web research for the social sciences. Tomren, H. (2003). Classification, bias, and American Indian materials. Unpublished work, San Jose State University, San Jose, California. Tunkelang, D. (2009). Faceted search. Synthesis lectures on information concepts, retrieval, and services, 1(1), 1-80. Ullah, A., Ullah, I., Khusro, S., & Ali, S. (2016, 19-21 Dec. 2016). BookOnt: A Comprehensive Book Structural Ontology for Book Search and Retrieval. Paper presented at the 2016 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan. Ullah, I., & Khusro, S. (2016). In Search of a Semantic Book Search Engine on the Web: Are We There Yet? Artificial Intelligence Perspectives in Intelligent Systems (pp. 347-357): Springer. Universal Decimal Classification summary. (2017). from http://www.udcsummary.info/php/index.php?id=67277&lang=en# Vizine-Goetz, J. S. M. D. (2009). The Dewey Decimal Classification. Encyclopedia of Library and Information Science. Wang, J. (2009). An extensive study on automated Dewey Decimal Classification. Journal of the American Society for Information Science and Technology, 60(11), 2269-2286. Wijewickrema, C. M., & Gamage, R. (2013). An ontology based fully automatic document classification system using an existing semi-automatic system. Xin, R. S., Hassanzadeh, O., Fritz, C., Sohrabi, S., & Miller, R. J. (2013). Publishing bibliographic data on the Semantic Web using BibBase. Semantic Web, 4(1), 15-22. Yelton, A. (2011). A Simple Scheme for Book Classification Using Wikipedia. Information Technology and Libraries, 30(1), 7-15. BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 77 Yi, K. (2006). Challenges in automated classification using library classification schemes. Paper presented at the Proceedings of world library and information congress: 72nd ifla general conference and council. Zhu, Z. (2011). Improving Search Engines via Classification. University of London.
8965 ---- Lessons Learned: A Primo Usability Study Kelsey Brett, Ashley Lierman, and Cherie Turner INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 7 ABSTRACT The University of Houston Libraries implemented Primo as the primary search option on the library website in May 2014. In May 2015, the Libraries released a redesigned interface to improve user experience with the tool. The Libraries took a user-centered approach to redesigning the Primo interface, conducting a “think-aloud” usability test to gather user feedback and identify needed improvements. This article describes the method and findings from the usability study, the changes that were made to the Primo interface as a result, and implications for discovery-system vendor relations and library instruction. INTRODUCTION Index-based discovery systems have become commonplace in academic libraries over the past several years, and academic libraries have invested a great deal of time and money into implementing them. Frequently, discovery platforms serve as the primary access point to library resources, and in some libraries they have even replaced traditional online public access catalogs. Because of the prominence of these systems in academic libraries and the important function that they serve, libraries have a vested interest in presenting users with a positive and seamless experience while using a discovery system to find and access library information. Libraries commonly conduct user testing on their discovery systems, make local customizations when possible, and sometimes even change products to present the most user-friendly experience possible. University of Houston Libraries has adopted new discovery technologies as they became available in an effort to provide simplified discovery and access to library resources. As a first step, the Libraries implemented Innovative Interface’s Encore, a federated search tool, in 2007. When index-based discovery systems became available, the Libraries saw them as a way to provide an improved and intuitive search experience. In 2010, the Libraries implemented Serials Solutions’ Summon. After three years and a thorough process of evaluating priorities and investigating alternatives, the Libraries made the decision to move to Ex Libris’ Primo, which was done in May of 2014. The Libraries’ intention was to continually assess and customize Primo to improve functionality and user experience. The Libraries conducted research and performed user testing, and in May Kelsey Brett (krbrett@ua.edu) is Discovery Systems Librarian, Ashley Lierman (arlierman@uh.edu) is Instructional Design Librarian, and Cherie Turner (ckturner2@uh.edu) is Chemical Sciences Librarian, University of Houston Libraries, Houston, Texas. mailto:krbrett@ua.edu mailto:arlierman@uh.edu mailto:ckturner2@uh.edu LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 8 2015 a redesigned Primo search results page was released. One of the activities that informed the Primo redesign was a “think-aloud” usability test that required users to complete a set of two tasks using Primo. This article will present the method and results of the testing as well as the customizations that were made to the discovery system as a result. It will also discuss some broader implications for library discovery and its effect on information literacy instruction. LITERATURE REVIEW There is a substantial body of literature discussing usability testing of discovery systems. In the interest of brevity, we will focus solely on studies and overviews involving Primo implementations, from which several patterns have emerged. Multiple studies have indicated that users’ responses to the system are generally positive; even in testing of very early versions by a development partner users responded positively overall.1 Interestingly, some studies found that in many cases users rated Primo positively in post-testing surveys even when their task completion rate in the testing had been low.2 Multiple studies also found evidence that, although users may struggle with Primo initially, the system is learnable over time. Comeaux found that the time it took users to use facets or locate resources decreased significantly with each task they performed,3 while other studies saw the use of facets per task increase for each user over the course of the testing.4 User reactions to facets and other post-limiting functions in Primo were divided. In one of the earliest studies, Sadeh found that users responded positively to facets,5 and some authors found users came to use them heavily while searching,6 while others found that facets were generally underused.7 Multiple studies found that users tended to repeat their searches with slightly different terms rather than use post-limiting options.8 Thomsett-Scott and Reese, in a survey of the literature on discovery tools, reported evidence of a trend that users reacted more positively to post-limiting in earlier discovery studies,9 while the broader literature shows more negative reactions in more recent studies. This could indicate that shifts in the software, user expectations, or both may have decreased users’ interest in these options. A few specific types of usability problems seem common across tests of Primo and other discovery systems. Across a large number of studies, it has been found that users—especially undergraduate students—struggle to understand library and academic terminology used in discovery. Some terminology changes were made after users had difficulty in the earliest usability tests of Primo,10 but users continued to struggle with terms like hold and recall in item records.11 Users also failed to understand the labels of limiters,12 and they also failed to recognize the internal names of repositories and collections.13 Literature reviews on discovery systems have found terminology to be a common stumbling block for searchers across a wide number of individual studies.14 Similarly, users often struggle to understand the scope of options available to them when searching and the holdings information in item records. Users failed in multiple tests to distinguish between the article level and the journal level,15 could not interpret bibliographic INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 9 information sufficiently to determine that they had found the desired item,16 and chose incorrect options for scoping their searches.17 Many studies found that users were unable to distinguish between multiple editions of a held item when all item types or editions were listed in the record.18 In other cases, users had difficulty interpreting locations and holdings information for physical items.19 Among the needs and desires expressed by and for Primo users in the literature, two in particular stand out. First, many users expressed a desire for more advanced search options; some wanted more complexity in certain facets and the ability to search within results,20 while other users simply wanted an advanced search option to be available.21 Secondly, a large number of studies indicated that instruction on Primo or other discovery systems was needed for users to search effectively. In some cases this was the conclusion of the researchers conducting the study,22 while in other cases users themselves either suggested or requested instruction on the system.23 It is also worth noting that it has been questioned whether usability testing as a whole is a sufficient mechanism for evaluating discovery-system functionality. Prommann and Zhang found that usability testing has focused almost exclusively on the technical functioning of the software and not adequately revealed the ability of discovery systems like Primo to successfully complete users’ desired tasks.24 They proposed hierarchical task analysis (HTA) as an alternative, to examine users’ most frequent desires and the capacity of discovery systems to meet them. Prommann and Zhang acknowledged, however, that as HTA is completed by an expert on the system rather than by an actual user, some of the valuable information derived from usability testing (including terms and functions that users do not understand, however well-designed) is lost in the process; they concluded that a combination of the two systems of testing is ideal to retain the best of both. BACKGROUND At the University of Houston Libraries, the Resource Discovery Systems department (RDS) is responsible for the maintenance and development of Primo. However, it is important to RDS to gather feedback and foster buy-in from stakeholders in the Library before making changes to the system. To that end, RDS works with two committees to assess the system and make recommendations for its improvement. The Discovery Usability Group and the Discovery Advisory Group include members from public services, technical services, and systems; each member brings a unique perspective on discovery. The Discovery Usability Group is charged with assessing the discovery system through a variety of methods including usability testing, focus groups, and user interviews. The Discovery Advisory Group reviews results of user testing and makes recommendations for improvement. All changes to the discovery system are reviewed by the Groups before they are released for public use. LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 10 In fall 2014, several months after the Primo implementation, the Discovery Usability Group conducted a focus group with student workers from the library’s information desk (a dual reference and circulation desk) to solicit feedback about the functionality of Primo and suggestions for its improvement. In the meantime, the Discovery Advisory Group was testing Primo and evaluating Primo sites at peer and aspirational institutions. The groups used the information collected through the focus group and research on Primo to make recommendations for improvement. RDS has access to a Primo development sandbox, and many of the recommended changes were made in the sandbox environment and reviewed by the two groups prior to public release. Changes to the search box can be seen in figure 1. Rarely used tabs were replaced with a drop- down menu to the right of the search box to allow users to limit to “everything,” “books+,” or “digital library.” To increase visibility, links to “Advanced Search” and “Browse Search” were made larger and more spacing was added. Live site: Development Sandbox: Figure 1. Search Box in Live Site (Above) and Development Sandbox (Below) at Time of Testing Changes were also made to create a cleaner and less cluttered search results page (see figure 2). More white space was added, and the links (or tabs) to “View Online,” “Request,” “Details,” etc., were redesigned and renamed for clarity. For example, the “View Online” link was renamed to “Preview Online” because it opens a box within the search results page that displays the item. The groups believed “Preview Online” more accurately represents what the link does. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 11 Live Site: Development Sandbox: Figure 2. Search Results in Live Site (Above) and Development Sandbox (Below) at Time of Testing The facets were also redesigned to look cleaner and larger to attract users’ attention (see figure 3). LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 12 Live Site: Development Sandbox: Figure 3. Facets in Live Site and Development Sandbox at Time of Testing Both groups were happy with the changes to the Primo development sandbox but wanted to test the effect of the changes on user search behavior before updating the live site. The Discovery Usability Group conducted a usability test within the development sandbox. The goal of the test was to find out if users could effectively complete common research tasks using Primo. With that goal in mind, the group developed a usability test and conducted it during the spring semester of 2015. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 13 METHODOLOGY The Discovery Usability Group developed a usability test using a “think-aloud” methodology, where users were asked to verbalize their thought process as they completed research tasks through Primo. Four tasks were designed to mirror tasks that users are likely to complete for class assignments or for general research. To minimize the testing time, each participant completed two tasks, with the facilitators alternating between two sets of tasks from one participant to the next. Test 1 Task 1: You are trying to find an article that was cited in a paper you read recently. You have the following citation: Clapp, E., & Edwards, L. (2013). Expanding our vision for the arts in education. Harvard Educational Review, 83(1), 5–14. Please find this article using OneSearch [the public-facing name given to the Libraries’ Primo implementation]. Task 2: You are doing a research project on the effects of video games on early childhood development. Find a peer-reviewed article on this topic, using OneSearch. Test 2 Task 1: Recently your friend recommended the book The Lighthouse by P. D. James. Use OneSearch to find out if you can check this book out from the library. Task 2: You are writing a paper about the drug cartels’ influence on Mexico’s relationship with the United States. Find a newspaper article on this topic, using OneSearch. Two facilitators set up a table with a laptop in the front entrance of the library. They alternated between the facilitator and note-taker roles. Another group member took on the role of “caller” and recruited library patrons to participate in the study. The caller set up a table visible to those passing by with library-branded T-shirts and umbrellas to incentivize participation. The caller explained what would be expected of the potential participant and went over the informed- consent document. After signing the form, the participant performed two tasks. After the test the participant received a library T-shirt or umbrella, and snacks. The facilitators used Morae Usability Software to record the screen and audio of each test. Participants were asked for permission to record their sessions, but could opt out. During the three hour testing period, fifteen library patrons participated in the study, and fourteen sessions were recorded. Of the fifteen participants, thirteen were undergraduate students (four freshman, one sophomore, seven juniors, and two seniors), one was a graduate student, and one was a post- baccalaureate student. The majority of the participants were from the sciences, along with two students from the College of Business and two from the School of Communications. There were no participants from the humanities. LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 14 The facilitators took notes on a rubric (see table 1) that simplified the processes of coding and reviewing the recordings. After the usability testing, the facilitators reviewed the notes and recordings, coded them for common themes and breakdowns, and prepared a report of their findings and design recommendations. The facilitators sent the report, along with audio and screen recordings, to the Discovery Advisory Group, who reviewed them along with RDS. The Discovery Advisory Group made additional design recommendations, and RDS used the information and recommendations to implement additional customizations to the Primo development sandbox. Preliminary Questions ASK: What is your affiliation with the University of Houston? Year? Major? ASK: How often do you use the library website? For what purpose(s)? Task 1 Describe the steps the participant took to complete the task S/U ASK: How did you feel about this task? What was simple? What was difficult? ASK: Is there anything that would make completing this task easier? Task 2 Describe the steps the participant took to complete the task S/U ASK: How did you feel about this task? What was simple? What was difficult? ASK: Is there anything that would make completing this task easier? Follow-up Question ASK: What can we do to improve the overall experience using OneSearch? Table 1. Task Completion Rubric for Test 1 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 15 RESULTS Test 1, Task 1 You are trying to find an article that was cited in a paper you read recently. You have the following citation: Clapp, E., & Edwards, L. (2013). Expanding our vision for the arts in education. Harvard Educational Review, 83(1), 5–14. Please find this article using OneSearch. Participant Time on Task Task Completion 1 1m 54s Y 2 4m 13s Y 3 1m 26s Y 4 1m 17s Y 5 1m 26s Y (required assistance) 6 1m 43s Y 7 1m 27s Y 8 1m 5s Y Table 2. Results for Test 1, Task 1 All eight participants successfully completed this task, although sophistication and efficiency varied between participants. Some searched by the authors’ last names, which was not specific enough to return the item in question. Four participants attempted to use advanced search or the drop-down menu to the right of the search box to pre-filter their results. Two participants viewed the options in the drop-down menu, which were “everything,” “books+,” and “digital library,” and left it on the default “everything” search. When prompted, the participants explained that they were expecting the drop-down to contain title and/or author limiters. Similarly, participants expected an author limiter in the advanced search. The citation format seemed to confuse participants, and they tended to search for the piece of information that was listed first—the authors—rather than the most unique piece of information—the title. If the first search did not return the correct item in the first few results, the participant would modify their search by searching for a different element of the citation or adding another element of the citation to the initial search until the item they were looking for appeared as one of the first few results. Participant 5 thought they had successfully completed the LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 16 task, but the facilitator had to point out that the item they chose did not meet the citation exactly, and on the second try they found the correct item. Participant 2 worked on the task for more than four minutes, significantly longer than the other seven participants. They immediately navigated to advanced search and filled out several fields in the advanced search form with the elements of the citation. If the search did not return their item, they added more elements until they finally found it. Simply searching the title in the citation would have returned the item as the first search result. Filling out the advanced search form with all of the information from the citation does not necessarily increase a user’s chances of finding the item in a discovery system, though it might do so when searching in an online catalog or subject database. The Discovery Advisory and Usability Groups made two recommendations to address some of the identified issues: include an author search option in the advanced search, and add an “articles+” option to the drop-down menu on the basic search. RDS implemented both recommendations. The Discovery Usability Group identified confusion around citations as a common breakdown during this task. The groups recommended providing instructional information about searching for known items to address this breakdown; however, RDS is still working on an effective method to provide this information in a simple and visible way. Test 1, Task 2 You are doing a research project on the effects of video games on early childhood development. Find a peer-reviewed article on this topic, using OneSearch. Participant Time on Task Task Completion 1 3m 44s Y 2 2m 21s Y 3 5m 23s Y (required assistance) 4 2m 5s Y 5 3m 32s Y 6 2m 45s Y 7 3m 8s Y 8 3m 1s Y (required assistance) Table 3. Results for Test 1, Task 2 All eight participants successfully found an article on this topic, but were less successful in determining whether the article was peer-reviewed. Only one participant used the “Peer-reviewed INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 17 Journals” facet without being prompted. Three users noticed the “[Peer-reviewed Journal]” note in the record information for search results, and used it to determine if the article was peer-reviewed. One participant went to the full-text of an article, and said it “seemed” like it was peer-reviewed and considered the task complete. The resource type facets were more heavily used during this task than the “Peer-reviewed Journals” facet, despite its being promoted to the top of the list of facets. Two participants used the “Articles” facet, and two participants used the “Reviews” facet, thinking it limited to peer-reviewed articles. Participants 3 and 8 needed help from the facilitator to determine whether a source was peer-reviewed. There was an overall misunderstanding of what peer-reviewed means, which affected participants’ confidence in completing the task. The design recommendations based on this task included changing the “Peer-reviewed Journals” facet to “Peer-reviewed Articles” or simply, “Peer-reviewed.” RDS changed the facet to “Peer- reviewed Articles” to help alleviate confusion. Additionally, the groups recommended emphasizing the “[Peer-reviewed Journal]” designations within the search results and providing a method for limiting to peer-reviewed materials before conducting a search. Customization limitations of the system have prevented RDS from implementing these design recommendations yet. A way to address the breakdowns caused by misunderstanding terminology also has yet to be identified. It was disheartening that participants did not use the “Peer-reviewed Journals” facet despite its being purposefully emphasized on the search results page. Test 2, Task 1 Recently your friend recommended the book The Lighthouse by P. D. James. Use OneSearch to find out if you can check this book out from the library. Participant Time on Task Task Completion 1 1m 7s Y 2 56s Y 3 No recording Y 4 2m 21s Y 5 1m 8s Y 6 2m 14s Y 7 1m 15s Y Table 4. Results for Test 2, Task 1 All seven participants were able to find this book using Primo, but had difficulty in determining what to do once they found it. For this task every participant searched by title and found the book as the first search result. Four users limited to “books+” before searching using the drop-down LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 18 menu, while the other three remained in the default “everything” search. Only one participant used the locations tab within the search results to determine availability; the others clicked the title and went to the item’s catalog record. All participants were able to determine that the book was available in the library, but there was an overall lack of understanding about how to use the information in the catalog to check out a book. Participant 1 said that they would write down the call number, take it to the information desk, and ask how to find it, which was the most sophisticated response of all seven participants. Participant 4 spent nearly two minutes clicking through links in the OPAC expecting to find a “Check Out” button and only stopped when the facilitator stepped in. A recommended design change based on this task was to have call numbers in Primo and the online catalog link to a stacks guide or map. This is a feature that may be developed in the future, but technical limitations prevented RDS from implementing it in time for the release of the redesigned search interface. Like the previous tasks, some of the breakdowns occurred because of a lack of understanding of library services. Users easily figured out that there was a copy of the book in the library, but had little sense of what to do next. None of the participants successfully located the stacks guide or the request feature that would put the item on hold for them. Steps should be taken to direct users to these features more effectively. Test 2, Task 2 You are writing a paper about the drug cartels’ influence on Mexico’s relationship with the United States. Find a newspaper article on this topic, using OneSearch. Participant Time on Task Task Completion 1 4m 45s Y (required assistance) 2 59s Y 3 No recording N 4 7m 47s Y 5 2m 52s Y 6 1m 33s Y 7 1m 30s Y Table 5. Results for Test 2, Task 2 This task was difficult for participants. Two users limited their search initially to “digital library” using the drop-down menu, thinking it would be a place to find newspaper articles; their searches returned zero results. Only two users used the “Newspaper Articles” facet without being prompted, and users did not seem to readily distinguish newspaper articles as a resource type. Participants INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 19 did not notice the resource type icons without being prompted. Several participants needed to be reminded that the task was to find a newspaper article, and not any other type of article. With guidance, most participants were able to complete the task. Participant 4 remained on the task for almost eight minutes because of their dissatisfaction with the relevancy of the results to the prompt. Interestingly, they found the “Newspaper Articles” facet and reapplied it after each modified search, suggesting that they learned to use system features as they went. One of the recommendations based on this task was to remove “digital library” as an option in the drop-down menu on the basic search. It was evident that “digital library” did not have the same meaning to end users as it does to internal users. This recommendation was easily implemented. Another recommendation was to emphasize the resource type icons within the search results, but we have not determined a way to do so effectively. One suggestion from the Discovery Usability Group was to exclude newspaper articles from the search results as a default, but no consensus was reached on this issue. LIMITATIONS The Discovery Usability Group identified limitations to the usability test that should be noted. Testing was done in a high-traffic portion of the library’s lobby, which is used as study space by a broad range of students. Participants were recruited from this study space, and we chose not to screen participants. The fifteen participants in the study did not constitute a representative sample. Almost all participants were undergraduate students, and no humanities majors participated. The outcomes might have been different if our participants had included more experienced researchers or students from a broader range of disciplines. By adding screening questions or choosing a more neutral location, we would have limited the number of participants who could complete our testing. Another limitation was that the participants started the usability test within the Primo interface. Because Primo is integrated into the Libraries’ website, users would typically begin searching the system from within the library homepage. The goals of the study required testing of our Primo development sandbox, which was not yet available to the public, and therefore could not be accessed in the same way. This gave participants some additional options from the initial search pages that are not usually available through the main search interface. While testing an active version of the interface would be preferable, one of our goals was to understand how our modifications affected user behavior, so testing the unmodified version was not an acceptable substitute. Additionally, the usability study presented tasks out of context and did not replicate a true user-searching experience. Despite the limitations, we learned valuable lessons from the participants in this study. DISCUSSION Users successfully completed the tasks in this usability study. Unfortunately, they did not take advantage of many of the features that can make such tasks easier—particularly facets. This was LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 20 especially apparent when we asked users to find a peer-reviewed journal article (Test 1, Task 2). Primo has a facet that will limit a search to only peer-reviewed journal articles, and only one out of eight participants used this facet during this task. Participants appreciated the pre-search filtering options, and requested more of them (such as an author search), while post-search facets were underutilized. Similarly, participants almost uniformly ignored the links, or tabs, within the search results, which would provide users with more information, a preview of the full-text, and additional features such as an email function. Users bypassed these options and clicked on the title instead. The Discovery Usability Group theorized that users clicked on the title of the item because that behavior would be successful in a more familiar search interface like Google. The team customized the configuration so that a title click would open either the full-text of electronic items or the catalog record for physical items to accommodate users’ instinctive search behaviors. The tabs, though a prominent feature of the discovery system, have proved to have little value for users. Throughout the implementation of discovery systems in academic libraries, both research studies and anecdotal evidence have suggested that users do not find end-user features like facets valuable; however, discovery system vendors have made no apparent attempt to reimagine the possibilities for search refinements. Indeed, most of the findings in this study will present few surprises to anyone familiar with the discovery usability literature, which is itself concerning. As our literature review has shown, many of the same general usability issues have repeated throughout studies of Primo since 2008, and most are very similar to usability issues in other, competitor discovery systems. This raises some concerns about the pace of innovation in the discovery field, and whether discovery vendors are genuinely taking into account the research findings about the needs of our users as they refine their products. In a recent article, David Nelson and Linda Turney identified many issues with discovery facets in their current form that may be barriers to usage, particularly labeling and library jargon; we join them in urging vendors and libraries to collaborate more closely for deep analysis of actual facet usage by users, and to address those factors that have negatively affected facets’ value.25 During our usability study, a common barrier to the successful completion of a task was not the technology itself but a lack of understanding of the task. Participants had difficulty deciphering a citation, which may have led to their tendency to search for a journal article by author and not by title. Many participants struggled with using call numbers, and how to find and check out books in the library. Peer review also proved to be a difficult or unfamiliar concept for many; when looking for peer-reviewed articles, some participants clicked on the “Reviews” facet, which limited their searches to an inappropriate resource type. Additionally, participants did not differentiate between journal articles and newspaper articles, which may indicate a broader inability to differentiate between scholarly and nonscholarly resources. This effect may be exaggerated by the high percentage of science students who participated, as these students may not have frequent need for newspaper articles. All of these challenges, however, are indicative of a deeper problem INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 21 with terminology. Regardless of how simple it is to limit a search to peer-reviewed articles, a user who does not understand what peer review means cannot complete the task with confidence or certainty. Librarians struggle with presenting understandable language and avoiding library terminology; as we discovered, academic language, like “peer-reviewed” and “citation,” presents a similar problem. These are not issues that can be resolved with a technological solution. Rather, we join previous authors in suggesting that instruction may be a reasonable way to address many usability issues in Primo. From our findings and from those in the wider literature, we conclude that general instruction in information literacy is prerequisite for effective use of this or any research tool, particularly for undergraduates. Nichols et al. “recommend studying how to effectively provide instruction on Primo searching and results interpretation,”26 but instruction on the use of a single tool is of limited utility to students in their academic lives. Instead libraries could bolster information literacy instruction on key concepts around the production and storage of information, scholarly communications, and differences in information types. Teaching these concepts effectively should help to alleviate the most common user issues, including understanding terminology and different types of information, as well as helping students to understand key elements of research in general. This is a particularly important point to note for librarians working as advocates for information literacy instruction, especially in cases where administrators or faculty may feel that more advanced tools, like discovery systems, should make instruction obsolete. CONCLUSION Several changes were made to the Primo interface in response to breakdowns identified during the usability study. Resource Discovery Systems first implemented the changes to the Primo development sandbox. After the Discovery Usability and Advisory Groups agreed on the changes, they were made available on the live site (see figure 4). The redesigned search results page became available to the general public between the spring and summer academic sessions of 2015. In addition to the changes that were made because the usability study, RDS made changes to the look and feel to make the search results interface more aesthetically pleasing and more in line with the University of Houston brand. LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 22 Before (live site): Figure 4. Primo Interface before Usability Testing During (development sandbox): Figure 5. Primo Interface during Usability Testing INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 23 After (live site): Figure 6. Primo Interface after Usability Testing Many larger assertions of this study, encompassing implications for instruction and our needs from discovery vendors, will require further study to address. The authors intend to continue to investigate these issues as additional usability testing is conducted and to use the data to support future vendor relations and instructional curriculum development discussions. REFERENCES 1. Tamar Sadeh, “User Experience in the Library: A Case Study,” New Library World 109, no. 1/2 (2008): 7–24, doi:10.1108/03074800810845976. 2. Aaron Nichols et al., “Kicking the Tires: A Usability Study of the Primo Discovery Tool,” Journal of Web Librarianship 8, no. 2 (2014): 172–95, doi: doi:10.1080/19322909.2014.903133; Scott Hanrath and Miloche Kottman, “Use and Usability of a Discovery Tool in an Academic Library,” Journal of Web Librarianship 9, no. 1 (2015): 1–21, doi:10.1080/19322909.2014.983259. 3. David J. Comeaux, “Usability Testing of a Web-Scale Discovery System at an Academic Library,” College & Undergraduate Libraries 19, no. 2–4 (2012): 189–206, doi:10.1080/10691316.2012.695671. 4. Kylie Jarrett, “FindIt@ Flinders: User Experiences of the Primo Discovery Search Solution,” Australian Academic & Research Libraries 43, no. 4 (2012): 278–300; Nichols et al., "Kicking the Tires." 5. Sadeh, “User Experience in the Library.” http://dx.doi.org/10.1108/03074800810845976 http://dx.doi.org/10.1080/19322909.2014.903133 http://dx.doi.org/10.1080/19322909.2014.983259 http://dx.doi.org/10.1080/10691316.2012.695671 LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 24 6. Jarrett, “FindIt@ Flinders”; Nichols et al., “Kicking the Tires.” 7. Xi Niu, Tao Zhang, and Hsin-liang Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library,” Libraries Faculty and Staff Scholarship and Research 30, no. 5 (2014), doi:10.1080/10447318.2013.873281; Hanrath and Kottman, “Use and Usability of a Discovery Tool in an Academic Library.” 8. Rice Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces,” Library Trends 61, no. 1 (2012): 186–207, doi:10.1353/lib.2012.0029; Niu, Zhang, and Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library.” 9. Beth Thomsett-Scott and Patricia E. Reese, “Academic Libraries and Discovery Tools: A Survey of the Literature,” College & Undergraduate Libraries 19, no. 2–4 (2012): 123–43, doi:10.1080/10691316.2012.697009. 10. Sadeh, “User Experience in the Library.” 11. Comeaux, “Usability Testing of a Web-Scale Discovery System at an Academic Library.” 12. Jessica Mahoney and Susan Leach-Murray, “Implementation of a Discovery Layer: The Franklin College Experience,” College & Undergraduate Libraries 19, no. 2–4 (2012): 327–43, doi:10.1080/10691316.2012.693435. 13. Joy Marie Perrin et al., “Usability Testing for Greater Impact: A Primo Case Study,” Information Technology & Libraries 33, no. 4 (2014): 57–67. 14. Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces”; Thomsett- Scott and Reese, “Academic Libraries and Discovery Tools.” 15. Jarrett, “FindIt@ Flinders”; Mahoney and Leach-Murray, “Implementation of a Discovery Layer.” 16. Jarrett, “FindIt@ Flinders”; Mahoney and Leach-Murray, “Implementation of a Discovery Layer”; Nichols et al., “Kicking the Tires." 17. Jarrett, “FindIt@ Flinders”; Mahoney and Leach-Murray, “Implementation of a Discovery Layer”; Perrin et al., “Usability Testing for Greater Impact : A Primo Case Study.” 18. Jarrett, “FindIt@ Flinders”; Nichols et al., “Kicking the Tires”; Hanrath and Kottman, “Use and Usability of a Discovery Tool in an Academic Library”; Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces.” 19. Comeaux, “Usability Testing of a Web-Scale Discovery System at an Academic Library”; Thomsett-Scott and Reese, “Academic Libraries and Discovery Tools.” 20. Jarrett, “FindIt@ Flinders.” 21. Mahoney and Leach-Murray, “Implementation of a Discovery Layer”; Perrin et al., “Usability Testing for Greater Impact.” http://dx.doi.org/10.1080/10447318.2013.873281 http://dx.doi.org/10.1353/lib.2012.0029 http://dx.doi.org/10.1080/10691316.2012.697009 http://dx.doi.org/10.1080/10691316.2012.693435 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 25 22. Mahoney and Leach-Murray, “Implementation of a Discovery Layer”; Nichols et al., “Kicking the Tires”; Niu, Zhang, and Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library.” 23. Thomsett-Scott and Reese, “Academic Libraries and Discovery Tools.” 24. Tao Zhang and Merlen Prommann, “Applying Hierarchical Task Analysis Method to Discovery Layer Evaluation,” Information Technology & Libraries 34, no. 1 (2015): 77–105, doi:10.6017/ital.v34i1.5600. 25. David Nelson and Linda Turney, “What’s in a Word? Rethinking Facet Headings in a Discovery Service,” Information Technology & Libraries 34, no. 2 (2015): 76–91, doi:10.6017/ital.v34i2.5629. 26. Nichols et al., “Kicking the Tires,” 184. http://dx.doi.org/10.6017/ital.v34i1.5600 http://dx.doi.org/10.6017/ital.v34i2.5629
9152 ---- Hitting the Road Towards a Greater Digital Destination: Evaluating and Testing DAMS at University of Houston Libraries Annie Wu, Santi Thompson, Rachel Vacek, Sean Watkins, and Andrew Weidner INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 5 ABSTRACT Since 2009, tens of thousands of rare and unique items have been made available online for research through the University of Houston (UH) Digital Library. Six years later, the UH Libraries’ new digital initiatives call for a more dynamic digital repository infrastructure that is extensible, scalable, and interoperable. The UH Libraries’ mission and the mandate of its strategic directions drives the pursuit of seamless access and expanded digital collections. To answer the calls for technological change, the UH Libraries administration appointed a Digital Asset Management System (DAMS) Implementation Task Force to explore, evaluate, test, recommend, and implement a more robust digital asset management system. This article focuses on the task force’s DAMS selection activities: needs assessment, systems evaluation, and systems testing. The authors also describe the task force’s DAMS recommendation based on the evaluation and testing data analysis, a comparison of the advantages and disadvantages of each system, and system cost. Finally, the authors outline their DAMS implementation strategy comprised of a phased rollout with the following stages: system installation, data migration, and interface development. INTRODUCTION Since the launch of the University of Houston Digital Library (UHDL) in 2009, the UH Libraries have made tens of thousands of rare and unique items available online for research using CONTENTdm. As we began to explore and expand into new digital initiatives, we realize that the UH Libraries’ digital aspirations require a more dynamic, flexible, scalable, and interoperable digital asset management system that can manage larger amounts of materials in a variety of formats. We plan to implement a new digital repository infrastructure that accommodates creative workflows and allows for the configuration of additional functionalities such as digital exhibits, data mining, cross-linking, geospatial visualization, and multimedia presentation. The Annie Wu (awu@uh.edu) is Head of Metadata and Digitization Services, Santi Thompson (sathompson3@uh.edu) is Head of Repository Services, Rachel Vacek (evacek@uh.edu) is Head of Web Services, Sean Watkins (slwatkins@uh.edu) is Web Projects Manager, and Andrew Weidner (ajweidner@uh.edu) is Metadata Services Coordinator, University of Houston Libraries. mailto:awu@uh.edu mailto:sathompson3@uh.edu mailto:evacek@uh.edu mailto:slwatkins@uh.edu mailto:ajweidner@uh.edu HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 6 new system will be designed with linked data in mind and will allow us to publish our digital collections as linked open data within the larger semantic web environment. The UH Libraries Strategic Directions set forth a mandate for us to “work assiduously to expand our unique and comprehensive collections that support curricula and spotlight research. We will pursue seamless access and expand digital collections to increase national recognition.”1 To fulfill the UH Libraries’ mission and the mandate of our Strategic Directions, the UH Libraries administration appointed a Digital Asset Management System (DAMS) Implementation Task Force to explore, evaluate, test, recommend, and implement a more robust digital asset management system that would provide multiple modes of access to the UH Libraries’ unique collections and accommodate digital object production at a larger scale. The collaborative task force comprises librarians from four departments: Metadata and Digitization Services (MDS), Web Services, Digital Repository Services, and Special Collections. The core charge of the task force is to: • Perform a needs assessment and build criteria and policies based on evaluation of the current system and requirements for the new DAMS • Research and explore DAMS on the market and identify the top three systems for beta testing in a development environment • Generate preliminary recommendations from stakeholders' comments and feedback • Coordinate installation of the new DAMS and finish data migration • Communicate the task force work to UH Libraries colleagues LITERATURE REVIEW Libraries have maintained DAMS for the publication of digitized surrogates of rare and unique materials for over two decades. During that time, information professionals have developed evaluation strategies for testing, comparing, and evaluating library DAMS software. Reviewing these models and associated case studies provided insight into common practices for selecting systems and informed how the UH Libraries DAMS Implementation Task Force conducted its evaluation process. One of the first publications of its kind, “A Checklist for Evaluating Open Source Digital Library Software” by Dion Hoe-Lian Goh et al., presents a comprehensive list of criteria for library DAMS evaluation.2 The researchers developed twelve broad categories for testing (e.g., content management, metadata, and preservation) and generated a scoring system based on the assignment of a weight and a numeric value to each criterion.3 While the checklist was created to assist with the evaluation process, the authors note that an institution’s selection decision should be guided primarily by defining the scope of their digital library, the content being curated using the software, and the uses of the material.4 Through their efforts, the authors created a rubric that can be utilized by other organizations when selecting a DAMS. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 7 Subsequent research projects have expanded upon the checklist evaluation model. In “Choosing Software for a Digital Library,” Jody DeRidder outlines major issues that librarians should address when choosing DAMS software, including many of the hardware, technological, and metadata concerns that Goh et al. identified.5 Additionally, she emphasizes the need to account for personnel and service requirements with a variety of activities: usability testing and estimating associated costs; conducting a formal needs assessment to guide the evaluation process; and a tiered-testing approach, which calls upon evaluators to winnow the number of systems.6 By considering stakeholder needs, from users to library administrators, DeRidder’s contributions inform a more comprehensive DAMS evaluation process. In addition to creating evaluation criteria, the literature on DAMS selection has also produced case studies that reflect real-world scenarios and identify use cases that help determine user needs and desires. In “Evaluation of Digital Repository Software at the National Library of Medicine,” Jennifer L. Marill and Edward C. Luczak discuss the process that the National Library of Medicine (NLM) used to compare ten DAMS, both proprietary and open-source.7 Echoing Goh et al. and DeRidder, Marill and Luczak created broad categories for testing and developed a scoring system for comparing DAMS.8 Additionally, Marill and Luczak enriched the evaluation process by implementing two testing phases: “initial testing of ten systems” and “in-depth testing of three systems.”9 This method allowed NLM to conduct extensive research on the most promising systems for their needs before selecting a DAMS to implement. The tiered approach appealed to the task force, and influenced how it conducted the evaluation process, because it balances efficiency and comprehensiveness. In another case study, Dora Wagner and Kent Gerber describe the collaborative process of selecting a DAMS across a consortium. In their article “Building a Shared Digital Collection: The Experience of the Cooperating Libraries in Consortium,”10 the authors emphasize additional criteria that are important for collaborating institutions: the ability to brand consortial products for local audiences; the flexibility to incorporate differing workflows for local administrators; and the shared responsibility of system maintenance and costs.11 While the UH Libraries will not be managing a shared repository DAMS, the task force appreciated the article’s emphasis on maximizing customizations to improve the user experience. In “Evaluation and Usage Scenarios of Open Source Digital Library and Collection Management Tools,” Georgios Gkoumas and Fotis Lazarinis describe how they tested multiple open-source systems against typical library functions—such as acquisitions, cataloging, digital libraries, and digital preservation—to identify typical use cases for libraries.12 Some of the use cases formulated by the researchers address digital platforms, including features related to supporting a diverse array of metadata schema and using a simple web interface for the management of digital assets.13 These use cases mirror local feature and functionality requests incorporated into the UH Libraries’ evaluation criteria. HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 8 In “Digital Libraries: Comparison of 10 Software,” Mathieu Andro, Emmanuelle Asselin, and Marc Maisonneuve discuss a rubric they developed to compare six open-source platforms (Invenio, Greenstone, Omeka, EPrints, ORI-OAI, and DSpace) and four proprietary platforms (Mnesys, DigiTool, YooLib, and CONTENTdm) around six core areas: document management, metadata, engine, interoperability, user management, and Web 2.0. 14 The authors note that each solution is “of good quality” and that institutions should consider a variety of factors when selecting a DAMS, including the “type of documents you will want to upload” and the “political criteria (open source or proprietary software)” desired by the institution.15 This article provided the UH Libraries with additional factors to include in their evaluation criteria. Finally, Heather Gilbert and Tyler Mobley’s article “Breaking Up with CONTENTdm: Why and How One Institution Took the Leap to Open Source,” provides a case study for a new trend: selecting a DAMS for migration from an existing system to a new one.16 The researchers cite several reasons for their need to select a new DAMS, primarily their current system’s limitations with searching and displaying content in the digital library.17 They evaluated alternatives and selected a suite of open-source tools, including Fedora, Drupal, and Blacklight, which combine to make up their new DAMS.18 Gilbert and Mobley also reflect on the migration process and identify several hurdles they had to overcome, such as customizing the open-source tools to meet their localized needs and confronting inconsistent metadata quality.19 Gilbert and Mobley’s article most closely matches the scenario faced by the UH Libraries. Our study adds to the limited literature on evaluating and selecting DAMS for migration in several ways. It demonstrates another model that other institutions can adapt to meet their specific needs. It identifies new factors for other institutions to take into account before or during their own migration process. Finally, it adds to the body of evidence for a growing movement of libraries migrating from proprietary to open-source DAMS. DAMS EVALUATION AND ANALYSIS METHODOLOGY Needs Assessment The DAMS Implementation Task Force fulfilled the first part of its charge by conducting a needs assessment. The goal of the needs assessment was to collect the key requirements of stakeholders, identify future features of the new DAMS, and gather data in order to craft criteria for evaluation and testing in the next phase of its work. The task force employed several techniques for information gathering during the needs assessment phase: • Identified stakeholders and held internal focus group interviews to identify system requirement needs and gaps • Reviewed scholarly literature on DAMS evaluation and migration • Researched peer/aspirational institutions • Reviewed national standards around DAMS INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 9 • Determined both the current use of UHDL as well as its projected use of UHDL • Identified UHDL materials and users Task force members took detailed notes during each focus group interview session. The literature research on DAMS evaluation helped the task force to find articles with comprehensive DAMS evaluation criteria. The NISO criteria for core types of entities in digital library collections were also listed and applied to the evaluation after reviewing the NISO Framework of Guidance for Building Good Digital Collections.20 More than forty peer and aspirational institutions’ digital repositories were benchmarked to identify web site names, platform architecture, documentation, and user and system features. The task force analyzed the rich data gathered from needs assessment activities and built the DAMS evaluation criteria that prepared the task force for the next phase of evaluation. Evaluation, Testing, and Recommendation The task force began its evaluation process by identifying twelve potential DAMS for consideration that were ultimately narrowed down to three systems for in-depth testing. Using data from focus group interviews, literature reviews, and DAMS best practices, the group generated a list of benchmark criteria. These broad evaluation criteria covered features in categories of system functionality, content management, metadata, user interface, and search support. Members of the task force researched DAMS documentation, product information, and related literature to score each system against the evaluation criteria. Table 1 contains the scores of the initial evaluation. From this process, five systems emerged with the highest scores: ● Fedora (and, closely associated, Fedora/Hydra and Fedora/Islandora) ● Collective Access ● DSpace ● RosettaCONTENTdm The task force eliminated Collective Access from the final systems for testing because of its limited functionality. It is based around archival content only, and is not widely deployed. The task force decided not to test CONTENTdm because of the system’s known functionalities that we identified through firsthand experience. After the initial elimination process, Fedora (including Fedora/Hydra and Fedora/Islandora), DSpace, and Rosetta remained for in-depth testing. HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 10 DAMS Evaluation Score* Fedora 27 Fedora/Hydra 26 Fedora/Islandora 26 Collective Access 24 DSpace 24 Rosetta 20 CONTENTdm 20 Trinity (iBase) 19 Preservica 16 Luna Imaging 15 RODA† 6 Invenio‡ 5 Table 1. Evaluation scores of twelve DAMS using broad evaluation criteria The task force then created detailed evaluation and testing criteria by drawing from the same sources used previously: focus groups, literature review, and best practices. While the broad evaluation focused on high-level functions, the detailed evaluation and testing criteria for the final three systems closely analyzed the specific features of each DAMS in eight categories: ● System Environment and Function ● Administrative Access ● Content Ingest and Management ● Metadata ● Content Access ● Discoverability ● Report and Inquiry Capabilities ● System Support * Total Possible Score: 29. † Removed from evaluation because the system does not support Dublin Core metadata. ‡ Removed from evaluation because the system does not support Dublin Core metadata. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 11 Prior to the in-depth testing of the final three systems, the task force researched timelines for system setup. Rosetta’s timeline for system setup proved to be prohibitive. Consequently, the task force eliminated Rosetta from the testing pool and moved forward with Fedora and DSpace. To conduct the detailed evaluation, the task force scored the specific features under each category utilizing systems testing and documentation. A score range from zero to three (0 = None, 1 = Low, 2 = Moderate, 3 = High) was assigned for each feature evaluated. After evaluating all features, the score was tallied for each category. Our testing revealed that Fedora outperformed DSpace in over half of the testing sections: Content Ingest and Management, Metadata, Content Access, Discoverability, and Report and Inquiry Capabilities. See table 2 for the tallied scores in each testing section. Testing Sections DSpace Score Fedora Score Possible Score System Environment and Testing 21 21 36 Administrative Access 15 12 18 Content Ingest and Management 59 96 123 Metadata 32 43 51 Content Access 14 18 18 Discoverability 46 84 114 Report and Inquiry Capabilities 6 15 21 System Support 12 11 12 TOTAL SCORE: 205 300 393 Table 2. Scores of top two DAMS from testing using detailed evaluation criteria After review of the testing results, the task force conducted a facilitated activity to summarize the advantages and disadvantages of each system. Based on this comparison, the DAMS Task Force recommended that the UH Libraries implement a Fedora/Hydra repository architecture with the following course of action: ● Adapt the UHDL user interface to Fedora and re-evaluate it for possible improvements ● Develop an administrative content management interface with the Hydra framework ● Migrate all UHDL content to a Fedora repository HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 12 Fedora/Hydra Advantages Fedora/Hydra Disadvantages Open source Steep learning curve Large development community Long setup time Linked data ready Requires additional tools for discovery Modular design through API No standard model for multi-file objects Scalable, sustainable, and extensible Batch import/export of metadata Handles any file format Table 3. Fedora/Hydra advantages and disadvantages The primary advantages of a DAMS based on Fedora/Hydra are: a large and active development community; a scalable and modular system that can grow quickly to accommodate large scale digitization; and a repository architecture based on linked data technologies. This last advantage, in particular, is unique among all systems evaluated, and will give the UH Libraries the ability to publish our collections as linked open data. Fedora 4 conforms to the World Wide Web Consortium (W3C) recommendation for Linked Data Platforms.21 The main disadvantage of a Fedora/Hydra system is the steep learning curve associated with designing metadata models and developing a customized software suite, which translates to a longer implementation time compared to off-the-shelf products. The UH Libraries must allocate an appropriate amount of time and resources for planning, implementation, and staff training. The long-term return on investment for this path will be a highly skilled technical staff with the ability to maintain and customize an open-source, standards-based repository architecture that can be expanded to support other UH Libraries content such as geospatial data, research data, and institutional repository materials. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 13 Dspace Advantages DSpace Disadvantages Open source Flat file and metadata structure Easy installation / ready out of box Limited reporting capabilities Existing familiarity through Texas Digital Library Limited metadata features User group / profile controls Does not support linked data Metadata quality module Limited API Batch import of objects Not scalable / extensible Poor user interface Table 4. DSpace advantages and disadvantages The main advantages of DSpace are ease of installation, familiarity of workflows, and additional functionality not found in CONTENTdm.22 Installation and migration to a DSpace system would be relatively fast, and staff could quickly transition to new workflows because they are similar to CONTENTdm. DSpace also supports authentication and user roles that could be used to limit content to the UH community only. Commercial add-on modules, although expensive, could be purchased to provide more sophisticated content management tools than are currently available with CONTENTdm. The disadvantages of a DSpace system are the same long-term, systemic problems with the current CONTENTdm repository. DSpace uses a flat metadata structure, has a limited API, does not scale well, and is not customizable to the UH Libraries’ needs. Consultations with peers indicated that both CONTENTdm and DSpace institutions are exploring the more robust capabilities of Fedora-based systems. Migration of the digital collections in CONTENTdm to a DSpace repository would provide few, if any, long term benefits to the UH Libraries. Of all the systems considered, implementation of a Fedora/Hydra repository aligns most clearly with the UH Libraries Strategic Directions of attaining national recognition and improving access to our unique collections. The Fedora and Hydra communities are very active, with project management overseen by Duraspace and Hydra respectively.23,24 Over the long term, a repository based on Fedora/Hydra will give the UH Libraries a low cost, scalable, flexible, and interoperable platform for providing online access to our unique collections. HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 14 Cost Considerations To balance the current digital collections production schedule with the demands of a timely implementation and migration, the task force identified the following investments as cost effective for Fedora/Hydra and DSpace, respectively: Fedora/Hydra DSpace Metadata Librarian: annual salary ● manages daily Metadata Unit operations during implementation ● streamlines the migration process Metadata Librarian: annual salary ● manages daily Metadata Unit operations during implementation ● streamlines the migration process @Mire Modules: $41,500 ● Content Delivery (3): $13,500 ● Metadata Quality: $10,000 ● Image Conversion Suite: $9,000 ● Content & Usage Analysis: $9,000 ● These modules require one-time fees to @Mire that recur when upgrading to a new version of DSpace Table 5. Start-up costs associated with Fedora/Hydra and DSpace The task force determined that an investment in one librarian’s salary is the most cost-effective course of action. The new Metadata Librarian will manage daily operations of the Metadata Unit in Metadata & Digitization Services while the Metadata Services Coordinator, in close collaboration with the Web Projects Manager, leads the DAMS implementation process. In contrast to Fedora, migration to DSpace would require a substantial investment in third party software modules from @Mire to deliver the best possible content management environment and user experience. IMPLEMENTATION STRATEGIES The implementation of the new DAMS will occur in a phased rollout comprised of the following stages: System Installation, Data Migration, and Interface Development. MDS and Web Services will perform the majority of the work, in consultation with key stakeholders from Special Collections and other units. Throughout this process, the DAMS Implementation Task Force will INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 15 consult with the Digital Preservation Task Force* to coordinate the preservation and access systems. Phase One System Installation Phase Two Data Migration Phase Three Interface Development Set up production and server environment Formulate content migration strategy and schedule Reevaluate front-end user interface Rewrite UHDL front-end application for Fedora/Solr Migrate test collections and document exceptions Rewrite UHDL front end as a Hydra head OR . . . Create metadata models Conduct the data migration . . . Update current front end Coordinate workflows with Digital Preservation Task Force Create preservation metadata for migrated data Establish inter- departmental production workflows Begin development of administrative Hydra head for content management Continue development of the Hydra administrative interface Refine administrative Hydra head for content management Table 6. Overview of DAMS phased implementation Phase One: System Installation During the first phase of DAMS implementation, Web Services and MDS will work closely together to install an open-source repository software stack based on Fedora, rewrite the current PHP front-end interface to provide public access to the data in the new system, and create metadata content models for the UHDL based on the Portland Common Data Model,25 in consultation with the Coordinator of Digital Projects from Special Collections and other key stakeholders. The DAMS Task Force will consult with the Digital Preservation Task Force† to determine how closely the preservation and access systems will be integrated and at what points. The two groups will also jointly outline a DAMS migration strategy that aligns with the preservation system. Web Services and MDS will collaborate on research and development of an administrative interface, based on the Hydra framework, for day-to-day management of UHDL content. * An appointed task force to create a digital preservation policy and identify strategies, actions, and tools needed to sustain long-term access to digital assets maintained by UH Libraries. † A working team at UH Libraries that enforces the digital preservation policy and maintains the digital preservation system.[convert these footnotes to endnotes?] HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 16 Phase Two: Data Migration In the second phase, MDS will migrate legacy content from CONTENTdm to the new system and work with Web Services, Special Collections, and the Architecture and Art Library to resolve any technical, metadata, or content problems that arise. The second phase will begin with the development of a strategy for completing the work in a timely fashion, followed by migration of representative sample collections to the new system to test and refine its capabilities. After testing is complete, all legacy content will be migrated from CONTENTdm to Fedora, and preservation metadata for migrated collections will be created and archived. Development work on the Hydra administrative interface will also continue. After the data migration is complete, all new collections will be ingested into Fedora/Hydra, and the current CONTENTdm installation will be retired. Phase Three: Interface Development In the final phase, Web Services will reevaluate the current front-end user interface (UI) for the UHDL by conducting user tests to better understand how and why users are visiting the UHDL. Web Services will also analyze web and system analytics and gather feedback from Special Collections and other stakeholders. Depending on the outcome of this research, Web Services may create a new UI based on the Hydra framework or choose to update the current front-end application with modifications or new features. Web Services and MDS will also continue to develop or adopt tools for the management of UHDL content and work with Special Collections and the branch libraries to establish production workflows in the new system. Continued development work on the front-end and administrative interfaces, for the life of the new Digital Asset Management System, is both expected and desirable as we maintain and improve the UHDL infrastructure and contribute to the open source software community in line with the UH Libraries Strategic Directions. Ongoing: Assessment, Enhancement, Training, and Documenting Throughout the transition process MDS and Web Services will undergo extensive training in workshops and conferences to develop the skills necessary for developing and maintaining the new system. They will also establish and document workflows to ensure the long-term viability of the system. Regular consultation with Special Collections, the branch libraries, and other stakeholders will be conducted to ensure that the new system satisfies the requirements of colleagues and patrons. Ongoing activities will include: ● Assessing service impact of new system ● User testing on UI ● Regular system enhancements ● Establishing new workflows ● Creating and maintaining documentation ● Training: conferences, webinars, workshops, etc. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 17 CONCLUSION Transitioning from CONTENTdm to a Fedora/Hydra repository will place the UH Libraries in a position to sustainably grow the amount of content in the UH Digital Library and customize the UHDL interfaces for a better user experience. Publishing our data in a linked data platform will give the UH Libraries the ability to more easily publish our data for the semantic web. In addition, the Fedora/Hydra architecture can be adapted to support a wide range of UH Libraries projects, including a geospatial data portal, a research data repository, and a self-deposit institutional repository. Over the long term, the return on investment for implementing an open-source repository architecture based on industry standard software will be: improved visibility of our unique collections on the Web; expanded opportunities for aggregating our collections with high- profile repositories such as the Digital Public Library of America; and increased national recognition for our digital projects and staff expertise. REFERENCES 1. “The University of Houston Libraries Strategic Directions, 2013–2016,” accessed July 22, 2015, http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016- libraries-strategic-directions-final.pdf. 2. Dion Hoe-Lian Goh et al., “A Checklist for Evaluating Open Source Digital Library Software,” Online Information Review 30, no. 4 (July 13, 2006): 360–79, doi:10.1108/14684520610686283. 3. Ibid., 366. 4. Ibid., 364. 5. Jody L. DeRidder, “Choosing Software for a Digital Library,” Library Hi Tech News 24, no. 9 (2007): 19–21, doi:10.1108/07419050710874223. 6. Ibid., 21. 7. Jennifer L. Marill and Edward C. Luczak, “Evaluation of Digital Repository Software at the National Library of Medicine,” D-Lib Magazine 15, no. 5/6 (May 2009), doi:10.1045/may2009- marill. 8. Ibid. 9. Ibid. 10. Dora Wagner and Kent Gerber, “Building a Shared Digital Collection: The Experience of the Cooperating Libraries in Consortium,” College & Undergraduate Libraries 18, no. 2–3 (2011): 272–90, doi:10.1080/10691316.2011.577680. 11. Ibid., 280–84. http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://dx.doi.org/10.1108/14684520610686283 http://dx.doi.org/10.1108/07419050710874223 http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1080/10691316.2011.577680 HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 18 12. Georgios Gkoumas and Fotis Lazarinis, “Evaluation and Usage Scenarios of Open Source Digital Library and Collection Management Tools,” Program: Electronic Library and Information Systems 49, no. 3 (2015): 226–41, doi:10.1108/PROG-09-2014-0070. 13. Ibid., 238–39. 14. Mathieu Andro, Emmanuelle Asselin, and Marc Maisonneuve, “Digital Libraries: Comparison of 10 Software,” Library Collections, Acquisitions, & Technical Services 36, no. 3–4 (2012): 79–83, doi:10.1016/j.lcats.2012.05.002. 15. Ibid., 82. 16. Heather Gilbert and Tyler Mobley, “Breaking Up with CONTENTdm: Why and How One Institution Took the Leap to Open Source,” Code4Lib Journal, no. 20 (2013), http://journal.code4lib.org/articles/8327. 17. Ibid. 18. Ibid. 19. Ibid. 20. NISO Framework Working Group with support from the Institute of Museum and Library Services, A Framework of Guidance for Building Good Digital Collections (Baltimore, MD: National Information Standards Organization (NISO), 2007). 21 . “Linked Data Platform 1.0”, W3C, accessed July 22, 2015, http://www.w3.org/TR/ldp/. 22. “DSpace,” accessed July 22, 2015, http://www.dspace.org/. 23. “Fedora Repository Home,” accessed July 22, 2015, https://wiki.duraspace.org/display/FF/Fedora+Repository+Home. 24. “Hydra Project,” accessed July 22, 2015, http://projecthydra.org/. http://dx.doi.org/10.1108/PROG-09-2014-0070 http://dx.doi.org/10.1016/j.lcats.2012.05.002 http://journal.code4lib.org/articles/8327 http://www.w3.org/TR/ldp/ http://www.dspace.org/ https://wiki.duraspace.org/display/FF/Fedora+Repository+Home http://projecthydra.org/ INTRODUCTION LITERATURE REVIEW DAMS EVALUATION AND ANALYSIS METHODOLOGY Needs Assessment Evaluation, Testing, and Recommendation Cost Considerations Implementation Strategies Phase One: System Installation Phase Two: Data Migration Phase Three: Interface Development Ongoing: Assessment, Enhancement, Training, and Documenting CONCLUSION
9182 ---- Transitioning from XML to RDF: Considerations for an Effective Move Towards Linked Data and the Semantic Web Juliet L. Hardesty INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 51 INTRODUCTION Metadata, particularly within the academic library setting, is often expressed in eXtensible Markup Language (XML) and managed with XML tools, technologies, and workflows. Software tools such as the Oxygen XML Editor and querying languages such as XPath and XQuery over time have become capable of helping that management. However, managing a library’s metadata currently takes on a greater level of complexity as libraries are increasingly adopting the Resource Description Framework (RDF). Semantic Web initiatives are surfacing in the library context with experiments in publishing metadata as Linked Data sets, BIBFRAME development using RDF, and software developments such as the Fedora 4 digital repository using RDF. Challenges are evident when considering examples of transitions from XML into RDF and show the need for communication and coordination between efforts to incorporate and implement RDF. This article outlines these challenges using different use cases from the literature and first-hand experience. The follow-up discussion considers ways to progress forward from metadata formatted in XML to metadata expressed in RDF. The options explored are not only targeted to metadata practitioners considering this transition but also to programmers, librarians, and managers. LITERATURE REVIEW AND CONCEPTS As an initial example of the challenges faced when considering RDF, clarifying terminology is still a helpful activity. RDF focuses on sets of statements describing relationships and meaning. These statements consist of a subject, a predicate, and an object (i.e., an article, has an author, Jane Smith). These statement parts are also referred to as a resource, a property, and a property value. Since there are three parts to RDF statements, they are referred to as triples. The predicate or property of an RDF statement defines the relationship between the subject and the object. RDF ontologies are sets of properties for a particular domain. For example, Darwin Core has an RDF ontology to express biological properties,1 and EBUCore has an RDF ontology to express properties about audiovisual materials.2 Pulling apart the many issues involved in moving from XML to RDF is an exploration into the Juliet L. Hardesty (jlhardes@iu.edu) is Metadata Analyst at Indiana University Libraries, Bloomington, Indiana. mailto:jlhardes@iu.edu TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 52 purpose of metadata, the tools available and their capabilities, and the various strategies that can be employed. Poupeau rightly states that XML provides structural logic in its hierarchical identification of elements and attributes, where RDF provides data logic declaring resources that relate to each other using properties.3 These properties are ideally all identified with single reference points (Uniform Resource Identifiers or URIs) rather than a description encased in an encoding. A source of honest confusion, however, is that RDF can be expressed as XML. Lassila’s note regarding the Resource Description Framework specification from the World Wide Web Consortium (W3C) states, “RDF encourages the view of ‘metadata being data’ by using XML (eXtensible Markup Language) as its encoding syntax.”4 So even though RDF can use XML to express resources that relate to each other via properties, identified with single reference points (URIs), RDF is itself not an XML schema. RDF has an XML language (sometimes called, confusingly, RDF, and from here forward called RDF/XML). Additionally, RDF Schema (RDFS) declares a schema or vocabulary as an extension of RDF/XML to express application-specific classes and properties.5 Simply speaking, RDF defines entities and their relationships using statements. There are various ways to make these statements, but the original way formulated by the W3C is using an XML language (RDF/XML) that can be extended by an additional XML schema (RDFS) to better define those relationships. Ideally, all parts of that relationship (the subject, predicate, object, or the resource, property, property value) are URIs pointing to an authority for that resource, that property, or that property value. An additional concept worth covering is serialization. This term is used as a way to describe how RDF data is expressed using various formatting languages. RDF/XML, N-triples, Turtle, and JSON- LD are all examples of RDF serializations.6 Describing something as being in RDF really means the framework of subject, predicate, object is being used. Describing something as being expressed in RDF/XML or JSON-LD means that the RDF statements have been serialized into either of those formatting languages. Using “RDF” to refer not only to the framework to describe something (RDF) but also the serialization of that description (RDF/XML) can easily muddle the discussion. Other thoughts about the difference between XML and RDF or moving metadata from XML into RDF point to the difference in perspective and the change in thinking that is required to manage such a move. In an online discussion about RDF in relation to TEI (Text Encoding Initiative), Cummings talks about the need for both XML and RDF, using XML to encode text and RDF to extract that data and make it more useful.7 Yee, in her in-depth look at bibliographic data as part of the Semantic Web, points out that RDF is designed to encode knowledge, not information.8 The RDF Primer 1.0 also states “RDF directly represents only binary relationships.”9 XML describes what something is by encoding it with descriptive elements and attributes. RDF, on the other hand, constructs statements about something using direct references—a reference to the thing itself, a reference to the descriptor, and a reference to the descriptor’s value. As Farnel discussed in her 2015 Open Repositories presentation about the University of Alberta’s move to RDF, they learned they were moving from a records-based framework in XML to a things-based framework in RDF.10 What is pointed out here time and again is something else Farnel discussed—moving from XML to INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 53 RDF is not simply a conversion between encoding formats; it is a translation between two different ways of organizing knowledge. It involves understanding the meaning of the metadata encoded in XML and representing that meaning with appropriate RDF statements. The tools most commonly employed for reworking XML into RDF are OpenRefine when accompanied by its RDF extension; a triplestore database such as OpenLink Virtuoso,11 Apache Fuseki,12 or Sesame13; Oxygen XML Editor14; and Protégé,15 an ontology editor. OpenRefine is, according to the website, “a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.”16 The RDF extension, called RDF Refine, allows for importing existing vocabularies and reconciling against SPARQL endpoints (web services that accept SPARQL queries and return results).17,18 SPARQL is similar to SQL as a language for querying a database, but the syntax is specifically designed to allow for querying data formatted in triple statements instead of tables with columns.19 Triplestore databases such as OpenLink Virtuoso can store and index RDF statements for searching as a SPARQL endpoint, offering a way to retrieve information and visualize connections across a collection of triples. Oxygen XML Editor has proven helpful in formulating eXtensible Stylesheet Language (XSL) transformations to move metadata from a particular XML schema or format into RDF/XML or other serializations such as JSON-LD (JavaScript Object Notation for Linking Data).20 Protégé is a tool developed by Stanford University that supports the OWL 2 Web Ontology Language and has helped to convert XML schemas to RDF ontologies and establish ways to express XML metadata in RDF. These tools provide the technical means to take metadata expressed in XML and physically reformat it to metadata expressed in an RDF serialization. What that reformatting also encompasses, however, is a review of the information expressed in XML and a set of decisions as to how to express that information as RDF statements. Strategic approaches and ideas for handling data transformations into RDF have involved the XML schema or document type definition (DTD). These include Thuy, Lee, and Lee’s approach to map an XML schema (the XSD) to RDF, associating simpleType’s XSD in XML with properties in RDF, defining complexType’s XSD in XML as classes in RDF, and handling a hierarchy of XML schema elements with top levels as domains and lower-level elements and attributes as container classes or subproperties in those domains.21 Thuy et al. earlier worked on a method to transform XML to RDF by translating the DTD using RDFS (ELEMENTs in the DTD are RDF classes or subclasses, ATTLISTs are RDF properties, and ENTITIES—preset variables in the DTD—are called up for use in RDF as encountered).22 Similarly, Hacherouf, Bahloul, and Cruz translate an XML schema into an OWL ontology.23 Klein et al. point out that while ontologies serve to describe a domain, XML schemas are meant to provide constraints on documents or structure for data so it can be advantageous to work out an RDF expression this way.24 Tim Berners-Lee puts it simply: “the same RDF tree results from many XML trees,” meaning the same single statement in RDF (an article has an author Jane Smith) can be expressed in many ways in XML and can vary on the basis of the source of the XML, any schemas involved, and the people creating the metadata.25 Transitioning from XML to RDF using the XML schema might serve to ensure all XML elements are TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 54 replicated in RDF but does not necessarily establish the relationships meant by that XML encoding without additional evaluation. There is no single strategy that will always work to move XML metadata into RDF, even within the same set of tools (such as Fedora/Hydra) or the same area of concern (libraries, archives, or museums). USE CASES FOR RDF The following use cases explain approaches to transition to RDF taken from two differing perspectives. The first set describes efforts to express XML schemas or standards as RDF ontologies. The second set describes efforts by various library or cultural-heritage digital collections to transform metadata records into RDF statements. They also show that strategies to transform XML to RDF cannot occur without a shift in view from structure to relationships and, likewise, from descriptive encoding to direct meaning. Moving an XML Schema/Standard to an RDF Ontology As a graduate student at Kent State University, Mixter took on converting the descriptive metadata standard VRA Core 4.0 from an XML schema to an RDF ontology.26 Using the VRA Data Standards Committee Guidelines to ensure all minimum fields were included,27 Mixter mapped VRA XML elements and attributes to schema.org, FOAF, VoID, and DC Terms ontologies. This process is known as “cherry-picking,” or combining various ontologies that already exist to represent properties or relationships (the predicates in RDF statements) as RDF instead of creating new proprietary RDF properties. Using OWL and RDFS as metavocabularies in Protégé, this created an ontology that could “retain the granularity required to describe library, archive, or museum items” of VRA Core 4.0’s design in XML without being a straight conversion of VRA Core 4.0 from XML to RDF.28 The outcome was an XSLT stylesheet that was tested on VRA Core 4.0 XML records to produce that same information as RDF statements. One point that seemed to help in testing was the fact that all controlled vocabulary terms had reference identifiers in the XML (ready-made URIs). Something not discussed in the outcomes was that dates resulted in complex RDF (RDF statements that encompass additional RDF statements or blank nodes) and there was no discussion about this complexity or its effect on using those particular RDF statements. VRA Core 4.0 now has an RDF ontology in draft form, with Mixter as one of its authors.29 The OWL ontology still points to schema.org, FOAF, and VoID for equivalent classes and properties, but everything is now named within a VRA RDF ontology and namespace and translates to such when VRA Core 4.0 XML is transformed to RDF. Another case in the category of going from an XML standard to an RDF ontology is the development of the BIBFRAME model for bibliographic description from the Library of Congress. The BIBFRAME model is expressed as RDF. According to the BIBFRAME site, “in addition to being a replacement for MARC, BIBFRAME serves as a general model for expressing and connecting bibliographic data.”30 MARC has its own format of expression with numbered fields and subfields but can be expressed or serialized in XML and is often shared that way. The BIBFRAME model, INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 55 while revamping the way a bibliographic record is described on the basis of work, instance, authority, and annotation, also provides tools to transform records from MARC/XML to the RDF statements of BIBFRAME.31 A single namespace serves the BIBFRAME model and is explained as a long-term strategy to ensure namespace persistence over the next forty-plus years.32 The transformations produced from Library of Congress MARC records and local MARC records contain complex hierarchical RDF statements, particularly when ascribing authority sources to names, subjects, and types of identifiers. As it is still a work in progress there are no tools making use of BIBFRAME records in RDF. An additional example is the work happening with PBCore, the public broadcasting metadata standard managed by the Corporation for Public Broadcasting.33 Public broadcasting stations and other institutions across the United States provide descriptive, technical, and structural metadata for audiovisual materials using this XML standard. In Boston, WGBH’s use of PBCore coincides with its digital asset management system, HydraDAM, built on Fedora 3 and the Hydra technology stack (based on Blacklight, Solr, and the Fedora Digital Repository).34 Fedora 3 does not natively support RDF statements as properties on objects like Fedora 4. Building off an interest to move HydraDAM to Fedora 4 and leverage RDF for metadata about audiovisual collections, WGBH began exploring transitioning the PBCore XML metadata standard into an RDF ontology. EBUCore, the European Broadcasting Union’s metadata standard, is already expressed as an RDF ontology.35 A comparison between the XML standard of PBCore and the classes and properties expressed in EBUCore revealed that most PBCore elements were covered by the EBUCore ontology.36 Efforts are ongoing to offer PBCore 3.0 as an RDF ontology that uses EBUCore with the addition of a smaller set of properties along with a way to transform PBCore XML to PBCore 3.0 in RDF.37 The Hydra community, in an effort to help the transition from Fedora 3 with its XML binary files of descriptive metadata to Fedora 4 using RDF statements as properties on objects, is working on a recommendation and transformation to move descriptive metadata in MODS XML into RDF that is usable in Fedora 4.38 The MODS standard has a draft of an RDF ontology and a stylesheet transformation available,39 but the complex hierarchical RDF produced from this transformation is unmanageable with the current Fedora 4 architecture. The Hydra MODS and RDF Descriptive Metadata Subgroup is attempting to reflect the MODS elements in simple RDF statements that can be incorporated as properties on a Fedora 4 digital object.40 Led by Steven Anderson at the Boston Public Library, this group is moving through MODS element by element, asking the question, “If you had to express this MODS element from your metadata in RDF today, how would you do that?” Participating institutions are reviewing their MODS records and exploring the possible RDF predicates that could be used to represent the meaning of that information. Some are even considering how to construct those RDF statements so that MODS XML can be re-created as close to the original MODS as possible (this is called “round tripping”). There are still questions as to whether every single MODS element will be reflected in this transformation, how exactly Fedora 4 will make use of these descriptive RDF statements, and if the original MODS XML will need to be preserved as part of the digital object in Fedora, but this group is recognizing that moving from TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 56 Fedora 3 to Fedora 4 requires a major shift in thinking about descriptive metadata. This transformation tool is an effort to help make that transition possible. The Avalon Media System is an open source system for managing and providing access to large collections of digital audio and video.41 It is built on Fedora 3 and the Hydra technology stack and uses MODS XML to store descriptive metadata. As development progresses and the available descriptive fields expand, maintaining the workflow to update XML records in Fedora and reindexing objects in the Hydra interface becomes increasingly complicated. Each time an update is made to descriptive information about an audiovisual item through the Avalon interface, the entire XML record for that object, stored as a binary text file, is rewritten in Fedora 3 and reindexed in Solr. In considering advantages to using Fedora 4, it appears that descriptive metadata properties stored in RDF are easier to manage programmatically (updating content, adding new fields, more focused reindexing) because descriptive information would not be stored in a single binary file but as individual properties on the object. Turning XML metadata into RDF or Linked Data for publishing, search and discovery, and management As Southwick describes the process, the library at the University of Nevada Las Vegas (UNLV) took a collection with descriptive records from CONTENTdm and published them as a single RDF Linked Open Data set.42 After cleaning up controlled vocabulary terms across collections and solidifying locally controlled vocabularies, they exported tab-delimited CSV records from CONTENTdm. These records were brought into OpenRefine with its RDF extension where they reviewed the data and mapped to various properties within the Europeana Data Model (EDM). Controlled vocabulary terms were in text form and had to be reconciled against a SPARQL endpoint, either locally from downloaded data or from the controlled vocabulary service, to gather the URIs to use as the object or value in the RDF statement. OpenRefine was then used to create RDF files that were uploaded to a triplestore (first Mulgara then OpenLink Virtuoso). This provided public access to the Linked Open Data set and a SPARQL endpoint for querying the data set. After publishing the data set they experimented with PivotViewer from OpenLink Virtuoso and RelFinder to see what kinds of connections and relationships could be visualized from the data as Linked Open Data. The outlined steps are clear and the outcomes are described, but interestingly the data set itself no longer appears to be available online.43 Although the UNLV use case relies on CSV instead of XML as the data source, the tools and workflows enlisted to transform the data set into RDF Linked Open Data are still applicable. OpenRefine can import XML just as it imports CSV, so this described case shows the tools that can be used and decisions to be made in processing that data into RDF statements. In Oregon Digital,44 XML from Qualified Dublin Core, VRA Core, and MODS at two different institutions (University of Oregon and Oregon State University) were mapped as Linked Open Data and stored in a triplestore to be served up in a new web application using the Hydra technology stack.45 An inventory of metadata fields across all collections was first mapped to existing Linked INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 57 Data terms, or properties (those with available URIs), then properties that were needed in the new web application but did not have available corresponding URIs were mapped to a newly devised local namespace for Oregon Digital. Any properties that were not used were kept in the original static XML file for the record as part of the digital object in Fedora. The focus here appears to be on mapping properties without as much detail provided on whether the objects were kept as text or mapped to URI values where possible. From the sample record provided the objects appear to be text and not URIs. The real power of this project is finding common properties to describe objects from diverse collections and institutions. What also comes out in the example mappings is the use of many different namespaces or ontologies (DC Terms, MARC Relators, but also MODS and MADS that produce complex RDF). The University of Alberta also combined a variety of XML metadata from different sources into a new digital asset management system based on Fedora 4 and the Hydra technology stack, called the Education and Research Archive.46 Reporting on the experience at Open Repositories 2015, Farnel described the process as working in phases.47 Beginning with item types, languages, and licenses, then moving to place names and controlled subject terms, and finally person names and free-form subjects, they made multiple passes converting XML metadata into RDF statements and incorporating URIs whenever possible. They are combining all of this into a single data dictionary,48 making use of several RDF ontologies to cover the various metadata properties that are being described about objects and collections. University of California at San Diego (UCSD) has developed a local data model using a mix of external (MADS, VRA Core, Darwin Core, PREMIS) and local ontologies. They published a data dictionary and are working on a substantially different revision as part of the metadata workflow they use to bring digital objects into their digital asset management system from a variety of source metadata formats including XML.49 This allows metadata to be created from disparate source formats and makes it possible to bring them together as RDF for delivery, management, and preservation. DISCUSSION If metadata is in XML form and the desire is to express it as RDF, this is not merely a transformation from one XML schema to another. It is changing the expression of that data and changing its use. Having metadata in XML means information is encoded in a specific way that allows for interchange and sharing. Having metadata in RDF is making statements that have direct meaning and can be used independently. There are different perspectives involved in metadata when approaching RDF: those that manage metadata standards (the XML standard side) and those that have metadata encoded using those XML standards (the data management side). Depending on the desired outcomes, the needs of these two perspectives can conflict. When managing a metadata standard the RDF transition tends to follow certain patterns: TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 58 • Transform an XML standard into a new RDF ontology o Examples: Dublin Core (DC), Darwin Core (DWC), MODS, VRA Core • Establish a move to RDF that incorporates another existing ontology o Example: PBCore, Hydra community From the data management side, the RDF transition means different patterns occur. These scenarios often start by reviewing the needed outcome, deciding how much metadata needs to be expressed in RDF, and what works best to get the metadata to that point. Cases include the following commonalities: • Creating new search and discovery end user applications o Example: Oregon Digital, University of Alberta • Publishing Linked Data sets o Example: UNLV, University of Alberta • Managing metadata using software that supports RDF o Example: University of Alberta, UCSD, Hydra community Conflicts are occurring when the needed outcome on the data management side is not supported by the RDF ontology transitions that have occurred for the XML standards being used. An example of this is how RDF is handled in Fedora 4. When RDF is complex (the object of one statement is another entire RDF statement), Fedora produces blank nodes as new objects within the repository. While not technically problematic, descriptive metadata with complex RDF can result in a situation where a digital object ends up referencing a blank node that then points to, for example, a subject or a genre. This subject or genre has been created as its own object within the digital repository even though that subject or genre is only meant to provide meaning for the digital object. MODS RDF produces this complexity and thus is not workable to use with Fedora 4. In contrast, other standards such as DC or DWC in RDF produce simple statements that Fedora 4 can apply to a digital object without any additional processing. Complications in transitioning from XML to RDF also occur when the original XML does not include URIs or authority-controlled sources. Converting this metadata to RDF can mean locally minting URIs or bringing data over as literals (strings of text) without using URIs at all. Ideally, the result is somewhere in the middle with externally controlled vocabularies incorporated as much as possible and literals or locally minted URIs only used where absolutely necessary. Translating strings to authoritative sources is intensive work. If the XML standard cannot be expressed as a single RDF ontology, work is further complicated by the need to map XML elements to different RDF ontologies using logic that is often decided locally. While it is possible to transition XML to RDF, the process is not uniform and the pathway involves a lot of labor. Potential alleviators for this labor might involve a more user-centered approach by XML standard bodies to consider the ways their standards can be used when translated into RDF (“users” in this context meaning the users of the standards, not the end users searching and INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 59 discovering digital content). Triplestores can manage queries for complex RDF, but digital repository systems are not there yet. Those that support RDF for description of objects do so on the basis of simple property statements. A complex RDF ontology is going to be a challenge to support over time. Another way to progress forward is for the data management side of the equation to focus efforts on showing, in an end user search and discovery format, what is currently possible when XML is transitioned into RDF. Published Linked Data sets need to have interfaces for access and use, showing the value of what is currently available and any needs or gaps that remain. Libraries and cultural-heritage organization engaged in this work should also openly share the processes that both work and do not work so others contemplating this transformation can consider how to forge ahead themselves. Libraries and cultural-heritage organizations moving metadata from XML to RDF should provide feedback to XML standard bodies regarding the usefulness or complications of any RDF transitional help an XML standard might provide. Technologies for incorporating RDF into web applications and truly connecting triples across the web also require further work. Triplestores have so far been the main way to expose data sets but have not been incorporated into common library or cultural-heritage end user search and discovery web applications. Additionally, triplestore use does not seem to extend to management or long-term storage of complete data about digital objects. There seems to be a decision to either reduce the data stored in a triplestore down to simple statements or use the triplestore more like an isolated index or SPARQL endpoint only and manage the complete metadata record separately (in a static file containing text or in a separate database). That aligns triples in RDF more with relational database storage than with catalog records. Triple statements focus on relationships and not the complete unique details of the thing being described. Triplestores can handle complex hierarchical RDF graphs and provide responses on the basis of queries against those complexities,50 but triplestores do not appear to be taking over as either the main search and discovery mechanism for online digital resources or for digital object management. Software using RDF natively is also not currently widespread. A project such as the BIBFRAME Initiative that plans to incorporate RDF needs to make sure the complexity of its data model in RDF is manageable by any tools it produces and that it is possible for vendors and suppliers to encompass the data model in their software development. CONCLUSION The reasons for deciding metadata should transition to RDF are just as important as determining the best process for implementing that transition. Reasons for transitioning to RDF are conceptually based around making data more easily shareable and setting up data to have meaning and relationships as opposed to local static description that requires programmatic interpretation. The use cases outlined in this article show the reality does not quite yet match the concept. Transitioning an XML standard to RDF does not make that data more shareable or more easily understood unless there are end user applications for using that data in RDF. Publishing TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 60 Linked Data involves going through transitional steps, but the endpoint seems to be more of a byproduct. The real goal is going through the process of producing Linked Data to learn how that works. Self-contained projects that aim to express collections in RDF for the purpose of a new search and discovery interface are more successful in implementing RDF that has that new level of meaning and relationship. Beyond the borders of these projects, however, the data is not being shared or used. The use cases described above show some examples of what is happening now when transitioning from XML to RDF. Approaches include XML standards converting to RDF expression as well as digital collections with metadata in XML that have an interest in producing that metadata as RDF. Software that incorporates RDF is still developing and maturing. Helping that process along by providing a pathway from XML to functionally usable RDF improves the chances of the Semantic Web becoming a real and useful thing. It is vital to understand that transitioning from XML to RDF requires a shift in perspective from replicating structures in XML to defining meaningful relationships in RDF. Metadata work is never easy, and for metadata to move from encoded strings of text to statements with semantic relationships requires coordination and communication. How best to achieve this coordination and communication is a topic worth engaging as the move to use RDF, produce Linked Data, and approach the Semantic Web continues. BIBLIOGRAPHY Berners-Lee, Tim. “Linked Data.” Linked Data - Design Issues, June 18, 2009. http://www.w3.org/DesignIssues/LinkedData.html. ———. “Why RDF Model Is Different from the XML Model.” Semantic Web, September 1998. http://www.w3.org/DesignIssues/RDF-XML.html. Estlund, Karen, and Tom Johnson. “Link It or Don’t Use It: Transitioning Metadata to Linked Data in Hydra,” July 2013. http://ir.library.oregonstate.edu/xmlui/handle/1957/44856. Farnel, Sharon. “Metadata at a Crossroads: Shifting ‘from Strings to Things’ for Hydra North.” Slideshow presented at the Open Repositories, Indianapolis, Indiana, 2015. http://slideplayer.com/slide/5384520/. Hacherouf, Mokhtaria, Safia Nait Bahloul, and Christophe Cruz. “Transforming XML Documents to OWL Ontologies: A Survey.” Journal of Information Science 41, no. 2 (April 1, 2015): 242–59. doi:10.1177/0165551514565972. Klein, Michel, Dieter Fensel, Frank van Harmelen, and Ian Horrocks. “The Relation between Ontologies and XML Schemas.” In Linköping Electronic Articles in Computer and Information Science, 2001. doi:10.1.1.14.1037. Lassila, Ora. “Introduction to RDF Metadata.” W3C, November 13, 1997. http://www.w3.org/TR/NOTE-rdf-simple-intro-971113.html. http://www.w3.org/DesignIssues/LinkedData.html http://www.w3.org/DesignIssues/RDF-XML.html http://ir.library.oregonstate.edu/xmlui/handle/1957/44856 http://slideplayer.com/slide/5384520/ http://dx.doi.org/10.1177/0165551514565972 http://dx.doi.org/10.1.1.14.1037 http://www.w3.org/TR/NOTE-rdf-simple-intro-971113.html INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 61 Manola, Frank, and Eric Miller. “RDF Primer 1.0, Section 2.3 Structured Property Values and Blank Nodes.” W3C Recommendation, February 10, 2004. http://www.w3.org/TR/2004/REC-rdf- primer-20040210/#structuredproperties. Mixter, Jeff. “Using a Common Model: Mapping VRA Core 4.0 Into an RDF Ontology.” Journal of Library Metadata 14, no. 1 (January 2014): 1–23. 10.1080/19386389.2014.891890. Poupeau, Gautier. “XML vs RDF: logique structurelle contre logique des données (XML vs RDF: structural logic against logic data).” Les Petites Cases, August 29, 2010. http://www.lespetitescases.net/xml-vs-rdf. “RDF and TEI XML,” October 13, 2010. https://listserv.brown.edu/archives/cgi- bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928. Southwick, Silvia B. “A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies.” Journal of Library Metadata 15, no. 1 (March 2015): 1–35. doi: 10.1080/19386389.2015.1007009. Thuy, Pham Thi Thu, Young-Koo Lee, and Sungyoung Lee. “A Semantic Approach for Transforming XML Data into RDF Ontology.” Wireless Personal Communications 73, no. 4 (2013): 1387–1402. doi: 10.1007/s11277-013-1256-z. Thuy, Pham Thi Thu, Young-Koo Lee, Sungyoung Lee, and Byeong-Soo Jeong. “Transforming Valid XML Documents into RDF via RDF Schema.” In Next Generation Web Services Practices, International Conference on, 0:35–40. Los Alamitos, CA: IEEE Computer Society, 2007. doi:10.1109/NWESP.2007.23. “XML RDF.” W3Schools. Accessed September 30, 2015. http://www.w3schools.com/xml/xml_rdf.asp. Yee, Martha M. “Can Bibliographic Data Be Put Directly onto the Semantic Web?” Information Technology and Libraries 28, no. 2 (March 1, 2013): 55–80. doi:10.6017/ital.v28i2.3175. NOTES 1. “Darwin Core,” Darwin Core Task Group, Biodiversity Information Standards, last modified May 5, 2015, http://rs.tdwg.org/dwc/. 2. “Metadata specifications,” European Broadcasting Union, https://tech.ebu.ch/MetadataEbuCore. 3. Gautier Poupeau, “XML vs RDF: logique structurelle contre logique des données (XML vs RDF: structural logic against logic data),” Les Petites Cases (blog), August 29, 2010, http://www.lespetitescases.net/xml-vs-rdf. 4. Ora Lassila, “Introduction to RDF Metadata,” W3C, November 13, 1997, http://www.w3.org/TR/NOTE-rdf-simple-intro-971113.html. http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#structuredproperties http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#structuredproperties http://dx.doi.org/10.1080/19386389.2014.891890 http://www.lespetitescases.net/xml-vs-rdf https://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928 https://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928 http://dx.doi.org/10.1080/19386389.2015.1007009 http://dx.doi.org/10.1007/s11277-013-1256-z http://dx.doi.org/10.1109/NWESP.2007.23 http://www.w3schools.com/xml/xml_rdf.asp http://dx.doi.org/10.6017/ital.v28i2.3175 http://rs.tdwg.org/dwc/ https://tech.ebu.ch/MetadataEbuCore http://www.lespetitescases.net/xml-vs-rdf http://www.w3.org/TR/NOTE-rdf-simple-intro-971113.html TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 62 5. “XML RDF,” W3Schools, accessed September 30, 2015, http://www.w3schools.com/xml/xml_rdf.asp. 6. See “Serialization formats” from Resource Description Framework on Wikipedia. “Resource Description Framework,” Wikipedia, March 18, 2016, https://en.wikipedia.org/wiki/Resource_Description_Framework#Serialization_formats. 7. “RDF and TEI XML,” email thread on TEI-L@listserv.brown.edu, October 13–18, 2010, https://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928. 8. Martha M. Yee, “Can Bibliographic Data Be Put Directly onto the Semantic Web?” Information Technology & Libraries 28, no. 2 (March 1, 2013): 57, doi:10.6017/ital.v28i2.3175. 9. Frank Manola and Eric Miller, “RDF Primer 1.0, Section 2.3 Structured Property Values and Blank Nodes,” W3C Recommendation, February 10, 2004, http://www.w3.org/TR/2004/REC- rdf-primer-20040210/#structuredproperties. 10. Sharon Farnel, “Metadata at a Crossroads: Shifting ‘from Strings to Things’ for Hydra North” (slideshow presentation, Open Repositories, Indianapolis, Indiana, 2015), http://slideplayer.com/slide/5384520/. 11. http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/. 12. https://jena.apache.org/documentation/fuseki2/. 13. http://rdf4j.org. 14. http://www.oxygenxml.com. 15. http://protege.stanford.edu. 16. http://openrefine.org. 17. https://en.wikipedia.org/wiki/SPARQL. 18. http://refine.deri.ie. 19. https://jena.apache.org/tutorials/sparql.html. 20. http://json-ld.org. 21. Pham Thi Thu Thuy, Young-Koo Lee, and Sungyoung Lee, “A Semantic Approach for Transforming XML Data into RDF Ontology,” Wireless Personal Communications 73, no. 4 (2013): 1392–95, doi:10.1007/s11277-013-1256-z. 22. Pham Thi Thu Thuy et al., “Transforming Valid XML Documents into RDF via RDF Schema,” in Next Generation Web Services Practices, International Conference on, vol. 0 (Los Alamitos, CA: IEEE Computer Society, 2007), 37, doi:10.1109/NWESP.2007.23. http://www.w3schools.com/xml/xml_rdf.asp https://en.wikipedia.org/wiki/Resource_Description_Framework#Serialization_formats https://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928 http://dx.doi.org/10.6017/ital.v28i2.3175 http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#structuredproperties http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#structuredproperties http://slideplayer.com/slide/5384520/ http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/ https://jena.apache.org/documentation/fuseki2/ http://rdf4j.org/ http://www.oxygenxml.com/ http://protege.stanford.edu/ http://openrefine.org/ https://en.wikipedia.org/wiki/SPARQL http://refine.deri.ie/ https://jena.apache.org/tutorials/sparql.html http://json-ld.org/ http://dx.doi.org/10.1007/s11277-013-1256-z http://dx.doi.org/10.1109/NWESP.2007.23 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 63 23. See Mokhtaria Hacherouf, Safia Nait Bahloul, and Christophe Cruz, “Transforming XML Documents to OWL Ontologies: A Survey,” Journal of Information Science 41, no. 2 (April 1, 2015): 242–59, doi:10.1177/0165551514565972. 24. Michel Klein et al., “The Relation between Ontologies and XML Schemas,” section 5 in Linköping Electronic Articles in Computer and Information Science, 6 (2001), doi:10.1.1.108.7190. 25. Tim Berners-Lee, “Why RDF Model Is Different from the XML Model,” Semantic Web Road map, September 1998, http://www.w3.org/DesignIssues/RDF-XML.html. 26. See Jeff Mixter, “Using a Common Model: Mapping VRA Core 4.0 Into an RDF Ontology,” Journal of Library Metadata 14, no. 1 (January 2014): 1–23, doi:10.1080/19386389.2014.891890. 27. The document currently labeled “How to Convert Version 3.0 to Version 4.0” contains a recommendation for a minimum set of elements for “meaningful retrieval” in VRA Core: http://www.loc.gov/standards/vracore/convert_v3-v4.pdf. 28. Mixter, “Using a Common Model,” 2. 29. “VRA Core RDF Ontology Available for Review,” Visual Resources Association, October 7, 2015, http://vraweb.org/vra-core-rdf-ontology-available-for-review/. 30. “Bibliographic Framework Initiative,” Library of Congress, https://www.loc.gov/bibframe/. 31. See “MARC to BIBFRAME transformation tools” at “Tools” BIBFRAME, http://bibframe.org/tools/. 32. “Why a single namespace for the BIBFRAME vocabulary?” Library of Congress, BIBFRAME Frequently Asked Questions, https://www.loc.gov/bibframe/faqs/#q06. 33. “PBCore 2.1,” Public Broadcasting Metadata Dictionary Project, http://pbcore.org. 34. “WGBH,” Hydra Community Partners, http://projecthydra.org/community-2-2/partners-and- more/wgbh/. 35. “Metadata specifications,” European Broadcasting Union, https://tech.ebu.ch/MetadataEbuCore. 36. See notes from PBCore Hackathon Part 2, which occurred in June 2015 showing an element- by-element analysis of PBCore against EBUCore. “PBCore Hackathon Part 2,” June 15, 2015, https://docs.google.com/document/d/1pWDfYIzHpfjCn5RWJ1fioweXg5RIrXuDxCWkBQ5BMl A/. 37. “Join us for the PBCore Sub-Committee Meeting at AMIA!” Public Broadcasting Metadata Dictionary Project Blog, November 11, 2015, http://pbcore.org/join-us-for-the-pbcore-sub- committee-meeting-at-amia/. http://dx.doi.org/10.1177/0165551514565972 http://dx.doi.org/10.1.1.108.7190 http://www.w3.org/DesignIssues/RDF-XML.html http://dx.doi.org/10.1080/19386389.2014.891890 http://www.loc.gov/standards/vracore/convert_v3-v4.pdf http://vraweb.org/vra-core-rdf-ontology-available-for-review/ https://www.loc.gov/bibframe/ http://bibframe.org/tools/ https://www.loc.gov/bibframe/faqs/#q06 http://pbcore.org/ http://projecthydra.org/community-2-2/partners-and-more/wgbh/ http://projecthydra.org/community-2-2/partners-and-more/wgbh/ https://tech.ebu.ch/MetadataEbuCore https://docs.google.com/document/d/1pWDfYIzHpfjCn5RWJ1fioweXg5RIrXuDxCWkBQ5BMlA/ https://docs.google.com/document/d/1pWDfYIzHpfjCn5RWJ1fioweXg5RIrXuDxCWkBQ5BMlA/ http://pbcore.org/join-us-for-the-pbcore-sub-committee-meeting-at-amia/ http://pbcore.org/join-us-for-the-pbcore-sub-committee-meeting-at-amia/ TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 64 38. “MODS and RDF Descriptive Metadata Subgroup,” last modified March 19, 2016, https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgrou p 39. “MODS RDF Ontology,” Library of Congress, https://www.loc.gov/standards/mods/modsrdf/. 40. “MODS and RDF Descriptive Metadata Subgroup,” last modified March 19, 2016, https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgrou p 41. “Avalon Media System,” http://www.avalonmediasystem.org. 42. See Silvia B. Southwick, “A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies,” Journal of Library Metadata 15, no. 1 (March 2015): 1– 35, http://dx.doi.org/10.1080/19386389.2015.1007009. 43. The URL for information is a blog with no links to a data set (https://www.library.unlv.edu/linked-data) and the collection site seems to still be based on CONTENTdm (http://digital.library.unlv.edu/collections). 44. “Oregon Digital,” http://oregondigital.org. 45. See Karen Estlund and Tom Johnson, “Link It or Don’t Use It: Transitioning Metadata to Linked Data in Hydra,” July 2013, http://ir.library.oregonstate.edu/xmlui/handle/1957/44856, accessed from ScholarsArchive@OSU. 46. “ERA: Education & Research Archive,” https://era.library.ualberta.ca. 47. Farnel, “Metadata at a Crossroads.” 48. https://docs.google.com/spreadsheets/d/1hSd6kf4ABm- m8VtYNyqfJGtiZG7bLJQ3fWRbF_nVoIw/edit#gid=1362636241. 49. The substantially revised data model is not available online yet, but the following shows some of the progress toward an RDF data model: “Overview of DAMs Metadata Workflow,” UC San Diego, May 21, 2014, https://tpot.ucsd.edu/metadata-services/mas/data-workflow.html; “DAMS4 Data Dictionary,” https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/da ta-dictionary.html, retrieved from GitHub. 50. See the Apache Jena SPARQL Tutorial for an example of complex RDF with sample queries against that complexity. “SPARQL Tutorial - Data Formats,” The Apache Software Foundation, https://jena.apache.org/tutorials/sparql_data.html. https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup https://www.loc.gov/standards/mods/modsrdf/ https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup http://www.avalonmediasystem.org/ http://dx.doi.org/10.1080/19386389.2015.1007009 https://www.library.unlv.edu/linked-data http://digital.library.unlv.edu/collections http://oregondigital.org/ http://ir.library.oregonstate.edu/xmlui/handle/1957/44856 https://era.library.ualberta.ca/ https://docs.google.com/spreadsheets/d/1hSd6kf4ABm-m8VtYNyqfJGtiZG7bLJQ3fWRbF_nVoIw/edit#gid=1362636241 https://docs.google.com/spreadsheets/d/1hSd6kf4ABm-m8VtYNyqfJGtiZG7bLJQ3fWRbF_nVoIw/edit#gid=1362636241 https://tpot.ucsd.edu/metadata-services/mas/data-workflow.html https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html https://jena.apache.org/tutorials/sparql_data.html LITERATURE REVIEW AND CONCEPTS USE CASES FOR RDF DISCUSSION CONCLUSION BIBLIOGRAPHY NOTES
9190 ---- Library Discovery Products: Discovering User Expectations through Failure Analysis Irina Trapido INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 9 ABSTRACT As the new generation of discovery systems evolve and gain maturity, it is important to continually focus on how users interact with these tools and what areas they find problematic. This study looks at user interactions within SearchWorks, a discovery system developed by Stanford University Libraries, with an emphasis on identifying and analyzing problematic and failed searches. Our findings indicate that users still experience difficulties conducting author and subject searches, could benefit from enhanced support for browsing, and expect their overall search experience to be more closely aligned with that on popular web destinations. The article also offers practical recommendations pertaining to metadata, functionality, and scope of the search system that could help address some of the most common problems encountered by the users. INTRODUCTION In recent years, rapid modernization of online catalogs has brought library discovery to the forefront of research efforts in the library community, giving libraries an opportunity to take a fresh look at such important issues as the scope of the library catalog, metadata creation practices, and the future of library discovery in general. While there is an abundance of studies looking at various aspects of planning, implementation, use, and acceptance of these new discovery environments, surprisingly little research focuses specifically on user failure. The present study aims to address this gap by identifying and analyzing potentially problematic or failed searches. It is hoped that focusing on common error patterns will help us gain a better understanding of users’ mental models, needs, and expectations that should be considered when designing discovery systems, creating metadata, and interacting with library patrons. TERMINOLOGY In this paper, we adopt a broad definition of discovery products as “tools and interfaces that a library implements to provide patrons the ability to search its collections and gain access to materials.”1 These products can be further subdivided into the following categories: Irina Trapido (itrapido@stanford.edu) is Electronic Resources Librarian at Stanford University Libraries, Stanford, California. mailto:itrapido@stanford.edu LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 10 • Online catalogs (OPACs)—patron-facing modules of an integrated library system. • Discovery layers (also referred to as “discovery interfaces” or “next-generation library catalogs”)—new catalog interfaces, decoupled from the integrated library system and offering enhanced functionality, such as faceted navigation, relevance-ranked results, as well as the ability to incorporate content from institutional repositories and digital libraries. • Web-scale discovery tools, which in addition to providing all interface features and functionality of next generation catalogs, broaden the scope of discovery by systematically aggregating content from library catalogs, subscription databases, and institutional digital repositories into a central index. LITERATURE REVIEW To identify and investigate problems that end users experience in the course of their regular searching activities, we analyzed digital traces of user interactions with the system recorded in the system’s log files. This method, commonly referred to as transaction log analysis, has been a popular way of studying information-seeking in a digital environment since the first online search systems came into existence, allowing researchers to monitor system use and gain insight into the users’ search process. Server logs have been used extensively to examine user interactions with web search engines, consistently showing that web searchers tend to engage in short search sessions, enter brief search statements, do not browse the results beyond the first page, and rarely resort to advanced searching.2 A similar picture has emerged from transaction log studies of library catalogs. Researchers have found that library users employ the same surface strategies: queries within library discovery tools are equally short and simply constructed; 3 the majority of search sessions consist of only one or two actions.4 Patrons commonly accept the system’s default search settings and rarely take advantage of a rich set of search features traditionally offered by online catalogs, such as Boolean searching, index browsing, term truncation, and fielded searching.5 Although advanced searching in library discovery layers is uncommon, faceted navigation, a new feature introduced into library catalogs in the mid-2000s, quickly became an integral part of the users’ search process. Research has shown that facets in library discovery interfaces are used both in conjunction with text searching, as a search refinement tool, and as a way to browse the collection with no search term entered.6 A recent study that analyzed interaction patterns in a faceted library interface at the North Carolina State University using log data and user experiments demonstrated that users of faceted interfaces tend to issue shorter queries, go through fewer iterations of query reformulation, and scan deeper along the result list than those who use nonfaceted search systems. The authors also concluded that facets increase search accuracy, especially for complex and open-ended tasks, and improve user satisfaction.7 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 11 Another traditional use of transaction logs has been to gauge the performance of library catalogs, mostly through measuring success and failure rates. While the exact percentage of failed searches varied dramatically depending on the system’s search capabilities, interface design, the size of the underlying database, and, most importantly, on the researchers’ definition of an unsuccessful search, the conclusion was the same: the incidence of failure in library OPACs was extremely high.8 In addition to reporting error rates, these studies also looked at the distribution of errors by search type (title, author, or subject search) and categorized sources of searching failure. Most researchers agreed that typing errors and misspellings accounted for a significant portion of failed searches and were common across all search types.9 Subject searching, which remained the most problematic area, often failed because of a mismatch between the search terms chosen by the user and the controlled vocabulary contained in the library records, suggesting that users experienced considerable difficulties in formulating subject queries with Library of Congress Subject Headings.10 Other errors reported by researchers, such as the selection of the wrong search index or the inclusion of the initial article for title searches, were also caused by users’ lack of conceptual understanding of the search process and the system’s functions.11 These research findings were reinforced by multiple observational studies and user interviews, which showed that patrons found library catalogs “illogical,” “counter-intuitive,” and “intimidating,”12 and that patrons were unwilling to learn the intricacies of catalog searching.13 Instead, users expected simple, fast, and easy searching across the entire range of library collections, relevance-ranked results that exactly matched what users expected to find, and convenient and seamless transition from discovery to access.14 Today’s library discovery systems have come a long way: they offer one-stop search for a wide array of library resources, intuitive interfaces that require minimal training to be searched effectively, facets to help users narrow down the result set, and much more.15 But are today’s patrons always successful in their searches? Usability studies of next-generation catalogs and, more recently, of web-scale discovery systems have pointed to patron difficulties associated with the use of certain facets, mostly because of terminological issues and inconsistencies in the underlying metadata.16 Researchers also reported that users had trouble interpreting and evaluating the results of their search;17 users also were confused as to what resources were covered by the search tool.18 Our study builds on this line of research by systematically analyzing real-life problematic searches as reported by library users and recorded in transaction logs. BACKGROUND Stanford University is a private, four-year or above research university offering undergraduate and graduate degrees in a wide range of disciplines to about sixteen thousand students. The study analyzed the use of SearchWorks, a discovery platform developed by Stanford University Libraries. SearchWorks features a single search box with a link to advanced search on every page, relevance- ranked results, faceted navigation, enhanced textual and visual content (summaries, tables of LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 12 content, book cover images, etc.), as well as “browse shelf” functionality. SearchWorks offers searching and browsing of catalog records and digital repository objects in a single interface; however, it does not allow article-level searching. SearchWorks was developed on the basis of Blacklight (projectblacklight.org), an open-source application for searching and interacting with collections of digital objects.19 Thanks to Blacklight’s flexibility and extensibility, SearchWorks enables discovery across an increasingly diverse range of collections (MARC catalog records, archival materials, sound recordings, images, geospatial data, etc.) and allows to continuously add new features and improvements (e.g., https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released). STUDY OBJECTIVES The goal of the present study was two-fold. First, we sought to determine how patrons interact with the discovery systems, which features they use and with what frequency. Second, this study aimed to identify and analyze problems that users encounter in their search process. METHOD This study used data comprising four years of SearchWorks use, which was recorded in Apache Solr logs. The analysis was performed at the aggregate level; no attempts were made to identify individual searchers from the logs. At the preprocessing stage, we created and used a series of Perl scripts to clean and parse the data and extract only those transactions where the user entered a search query and/or selected at least one facet value. Page views of individual records were excluded from the analysis. The resulting output file contained the following parameters for each transaction: a time stamp, search mode used (basic or advanced), query terms, search index (“all fields,” “author,” “title,” “subject,” etc.), facets selected, and the number of results returned. The query stream was subsequently partitioned into task-based search sessions using a combination of syntactic features (word co- occurrence across multiple transactions) and temporal features (session time-outs: we used fifteen minutes of inactivity as a boundary between search sessions). The analysis was conducted over the following datasets: Dataset 1. Aggregate data of approximately 6 million search transactions conducted between February 13, 2011, and December 31, 2014. We performed quantitative analysis of this set to identify general patterns of system use. Dataset 2. A sample of 5,101 search sessions containing 11,478 failed or potentially problematic interactions performed in the basic search mode and 2,719 sessions containing 3,600 advanced searches, annotated with query intent and potential cause of the problem. The searches were performed during eleven twenty-four-hour periods, representing different years, academic http://projectblacklight.org/ https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 13 quarters, times of the school year (beginning of the quarter, midterms, finals, breaks), and days of the week. This dataset was analyzed to identify common sources of user failure. Dataset 3. User feedback messages submitted to SearchWorks between January 2011 and December 2014 through the “Feedback” link, which appears on every SearchWorks page. While the majority of feedback messages were error and bug reports, this dataset also contained valuable information about how users employed various features of the discovery layer, what problems they encountered, and what features they felt would improve their search experience. For the manual analysis of dataset 2, all searches within a search session were reconstructed in SearchWorks and, in some cases, also in external sources such as WorldCat, Google Scholar, and Google. They were subsequently assigned to one of the following categories: known-item searches (searches for a specific resource by title, combination of title and author, a standard number such as ISSN or ISBN, or a call number), author searches (queries for a specific person or organization responsible for or contributing to a resource), topical searches, browse searches (searches for a subset of the library collection, e.g., “rock operas,” “graphic novels,” “DVDs,” etc.), invalid queries, and queries where the search intent could not be established. To identify potentially problematic transactions, the following heuristic was employed: we selected all search sessions where at least one transaction failed to retrieve any records, as well as sessions consisting predominantly of known-item or author searches, where the user repeated or reformulated the query three or more times within a five-minute time frame. We hypothesized that this search pattern could be part of the normal query formulation process for topical searches, but it could serve as an indicator of the user’s dissatisfaction with the results of the initial query for known-item and author searches. We identified seventeen distinct types of problems, which we further aggregated into the following five groups: input errors, absence of the resource from the collection, queries at the wrong level of granularity, erroneous or too restrictive use of limiters, and mismatch between the search terms entered and the library metadata. Each search transaction in dataset 2 was manually reviewed and assigned to one or more of these error categories. FINDINGS Usage Patterns Our analysis of the aggregate data suggests that keyword searching remains the primary interaction paradigm with the library discovery system, accounting for 76 percent of all searches. However, users also increasingly take advantage of facets both for browsing and refining their searches: the use of facets grew from 25 percent in 2011 to 41 percent in 2014. LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 14 Although both the basic and the advanced search modes allow for “fielded” searches, where the user can specify which element of the record to search (author, title, subject, etc.), searchers rarely made use of this feature, relying mostly on the system’s defaults (the “all fields” search option in the basic search mode): users selected a specific search index in less than 25 percent of all basic searches. Advanced searching was infrequent and declining (from 11 percent in 2011 to 4 percent in 2014). Typically, users engaged in short sessions with a mean session length of 1.5 queries. Search queries were brief: 2.9 terms per query on average. Single terms made up 23 percent of queries; 26 percent had two terms, and 19 percent had three terms. Error Patterns The breakdown of errors by category and search mode is shown in figure 1. In the following sections, we describe and analyze different types of errors. Figure 1. Breakdown of errors by category and search mode Input Errors Input errors accounted for the largest proportion of problematic searches in the basic search mode (29 percent) and for 5 percent of problems in the advanced search. While the majority of such errors occurred at the level of individual words (misspellings or typographical errors), entire search statements were also imprecise and erroneous (e.g., “Diary of an Economic Hit Man” instead of “Confessions of an Economic Hit Man” and “Dostoevsky War and Peace” instead of “Tolstoy War and Peace”). It is noteworthy that in 46 percent of all search sessions containing INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 15 problems of this type, users subsequently entered a corrected query. However, if such errors occurred in a personal name, they were almost half as likely to be corrected. Absence of the Item Sought from the Collection Queries for materials that were not in the library’s collection accounted for about a quarter of all potentially problematic searches. In the advanced search modality, where the query is matched against a specific search field, such queries typically resulted in zero hits and can hardly be considered failures per se. However, in the default cross-field search, users were often faced with the problem of false hits and had to issue multiple progressively more specific queries to ascertain that the desired resource was absent from the collection. Queries at the Wrong Level of Granularity A substantial number of user queries failed because they were posed at the level of specificity not supported by the catalog. Such queries accounted for the largest percentage of problematic advanced searches (63 percent), where they consisted almost exclusively of article-level searching: users either tried to locate a specific article (often by copying the entire citation or its part from external sources) or conducted highly specific topical searches more suitable for a full- text database. In the basic search mode, the proportion of searches at the wrong granularity level was much lower, but still substantial (20 percent). In addition to searches for articles and narrowly defined subject searches, users also attempted to search for other types of more granular content, such as book chapters, individual papers in conference proceedings, poems, songs, etc. Erroneous or Too Restrictive Use of Limiters Another common source of failure was the selection of the wrong search index or a facet that was too restrictive to yield any results. The majority of these errors were purely mechanical: users failed to clear out search refinements from their previous search or entered query terms into the wrong search field. However, our analysis also revealed several conceptual errors, typically stemming from a misunderstanding of the meaning and purpose of certain limiters. For example, “Online,” “Database,” and “Journal/Periodical” facets were often perceived by the user as a possible route to article-level content. Even seemingly straightforward limiters such as “Date” caused confusion, especially when applied to serial publications: users attempted to employ this facet to drill down to the desired journal issue or article, most likely acting on the assumption that the system included article-level metadata. Lack of Correspondence between the Users’ Search Terms and the Library Metadata A significant number of problems in this group involved searches for non-English materials. When performed in their English transliteration, such queries often failed because of users’ lack of LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 16 familiarity with the transliteration rules established by the library community, whereas searches in the vernacular scripts tended to produce incomplete or no results because not all bibliographic records in the database contained parallel non-Roman script fields. Author and title searches often failed because of the users’ tendency to enter abbreviated queries. For example, personal name searches where the user truncated the author’s first or middle name to an initial while the bibliographic records only contained this name in its full form were extremely likely to fail. Abbreviations were also used in searches for journals, conference proceedings, and occasionally even for book titles (e.g., “AI: a modern approach” instead of “Artificial intelligence: a modern approach”). Such queries were successful only if the abbreviation used by the searcher was included in the bibliographic records as a variant title. A somewhat related problem occurred when the title of a resource contained a numeral in its spelled out form but was entered as a digit by the user. Because these title variations are not always recorded as additional access points in the bibliographic records, the desired item either did not appear in the result set or was buried too deep to be discovered. Topical searches within the subject index were also prone to failure, mostly because patrons were unaware that such searches require the use of precise terms from controlled vocabularies and resorted to natural language searching instead. User Feedback Our analysis of user feedback revealed substantial differences in how various user groups approach the search system and which areas of it they find problematic. Students were often frustrated by the absence of spelling suggestions, which, as one user put it, “left the users wander [to?] in the dark” as to the cause of searching failure. This user group also found certain social features desirable: for example, one user suggested that having ratings for books would be helpful in his choice of a good programming book. By contrast, faculty and researchers were more concerned about the lack of the more advanced features, such as cross-reference searching and left-anchored browsing of the title, subject, and author indexes. However, there were several areas that both groups found problematic: students and faculty alike saw the system’s inability to assist in the selection of the correct form of the author’s name as a major barrier to effective author searching and also converged on the need for more granular access to formats of audiovisual materials. DISCUSSION Scope of the Discovery System The results of our analysis point to users’ lack of understanding of what is covered by the discovery layer. Users are often unaware of the existence of separate specialized search interfaces for different categories of materials and assume that the library discovery layer offers Google-like INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 17 searching across the entire range of library resource types. Moreover, they are confused by the multiple search modalities offered by the discovery layer: one of the common misconceptions in SearchWorks is that the advanced search will allow the user to access additional content rather than offer a different way of searching the same catalog data. In addition to the expanded scope of the discovery tools, there is also a growing expectation of greater depth of coverage. According to our data, searching in a discovery layer occurs at several levels: the entire resource (book, journal title, music recording), its smaller integral units (book chapters, journal articles, individual musical compositions, etc.), and full text. User Search Strategies The search strategies employed by SearchWorks users are heavily influenced by their experiences with web search engines. Users tend to engage in brief search sessions and use short queries, which is consistent with the general patterns of web searching. They rely on relevance ranking and are often reluctant to examine search results in any depth: if the desired item does not appear within the first few hits, users tend to rework their initial search statement (often with only a minimal change to the search terms) rather than scrolling down to the bottom of the results screen or looking beyond the first page of results. Given these search patterns, it is crucial to fine-tune relevance-ranking algorithms to the extent that the most relevant results are displayed not just on the first page but are included in the first few hits. While this is typically the case for unique and specific queries, more general searches could benefit from a relevance-ranking algorithm that would leverage the popularity of a resource as measured by its circulation statistics. Adding this dimension to relevance determination would help users make sense of large result sets generated by broad topical queries (e.g., “quantum mechanics,” “linear algebra,” “microeconomics”) by ranking more popular or introductory materials higher than more specialized ones. It could also provide some guidance to the user trying to choose between different editions of the same resource and improve the quality of results of author searches by ranking works created by the author before critical and biographical materials. Users’ query formulation strategies are also modeled by Google, where making search terms as specific as possible is often the only way to increase the precision of a search. Faceted search systems, however, require a different approach: the user is expected to conduct a broad search and subsequently focus it by superimposing facets on the results. Qualifying the search upfront through keywords rather than facets is not only ineffective, but may actually lead to failure. For example, a common search pattern is to add the format of a resource as a search term (e.g., “Fortune magazine,” “Science journal,” “GRE e-book,” “Nicole Lopez dissertation,” “Woody Allen movies”), and because the format information is coded rather than spelled out in the bibliographic records, such queries either result in zero hits or produce irrelevant results. In a similar vein, making the query overly restrictive by including the year of publication, publisher, or edition LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 18 information often causes empty retrievals because the library might not have the edition specified by the user or because the query does not match the data in the bibliographic record. Thus our study lends further weight to claims that even in today’s reality of sophisticated discovery environments and unmediated searching, library users can still benefit from learning the best search techniques that are specifically tailored to faceted interfaces.20 Error Tolerance Input errors remain one of the major sources of failure in library discovery layers. Users have become increasingly reliant on error recovery features that they find elsewhere on the web, such as “Did you mean . . . ” suggestions, automatic spelling corrections, and helpful suggestions on how to proceed in situations where the initial search resulted in no hits. But perhaps even more crucial are error-prevention mechanisms, such as query autocomplete, which helps users avoid spelling and typographical errors and provides interactive search assistance and instant feedback during the query formulation process. Our visual analysis of the logs from the most recent years revealed an interesting search pattern, where the user enters only the beginning of the search query and then increments it by one or two letters: pr pro proq proque proques proquest Such search patterns indicate that users expect the system to offer query expansion options and show the extent to which the query autocomplete feature (currently missing from SearchWorks) has become an organic part of the users’ search process. Topical Searching While next-generation discovery systems represent a significant step toward enabling more sophisticated topical discovery, a number of challenges still remain. Apart from mechanical errors, such as misspellings and wrong search index selections, the majority of zero-hit topical searches were caused by a mismatch between the user’s query and the vocabulary in the system’s index. In many cases such queries were formulated too narrowly, reflecting the users’ underlying belief that the discovery layer offers full-text searching across all of the library’s resources. In addition to keyword searching, libraries have traditionally offered a more sophisticated and precise way of accessing subject information in the form of Library of Congress Subject Headings (LCSH). However, our results indicate that these tools remain largely underused: users took advantage of this feature in only 21 percent of all subject searches in our sample. We also found INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 19 that 95 percent of LCSH usage came from clicks on subject heading links within individual bibliographic records rather than from “Subject” facets, corroborating the results of earlier studies.21 There is a whole range of measures that could help patrons leverage the power of controlled vocabulary searching. They include raising the level of patron familiarity with the LCSHs, integrating cross-references for authorized subject terms, enabling more sophisticated facet- based access to subject information by allowing users to manipulate facets independently, and exposing hierarchical and associative relationships among LCSHs. Ideally, once the user has identified a helpful controlled vocabulary term, it should be possible to expand, refine, or change the focus of a search through broader, narrower, and related terms in the LCSH’s hierarchy as well as to discover various aspects of a topic through browse lists of topical subdivisions or via facets. Known-Item Searching Important as it is for the discovery layer to facilitate topical exploration, our data suggests that SearchWorks remains, first and foremost, a known-item lookup tool. While a typical SearchWorks user rarely has problems with known-work searches, our analysis of clusters of closely related searches has revealed several situations where users’ known-item search experience could be improved. For example, when the desired resource is not in the library’s collection, the user is rarely left with empty result sets because of automatic word-stemming and cross-field searching. While this is a boon for exploratory searching, it becomes a problem when the user needs to ensure that the item sought is not included in the library’s collection. Another common scenario arises when the query is too generic, imprecise, or simply erroneous, or when the search string entered by the user does not match the metadata in the bibliographic record, causing the most relevant resources to be pushed too far down the results list to be discoverable. Providing helpful “Did you mean . . . ” suggestions could potentially help the user distinguish between these two scenarios. Another feature that would substantially benefit the user struggling with the problem of noisy retrievals is highlighting the user’s search terms in retrieved records. Displaying search matches could alleviate some of the concerns over lack of transparency as to why seemingly irrelevant results are retrieved, repeatedly expressed in user feedback, as well as expedite the process of relevance assessment. Author Searching Author searching remains problematic because of a convergence of factors: a. Misspellings. According to our data, typographical errors and misspellings are by far the most common problem in author searching. When such errors occur in personal names, they are much more difficult to identify than errors in the title, and in the absence of LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 20 index-based spell-checking mechanisms, often require the use of external sources to be corrected. b. Mismatch between the form and fullness of the name entered by the user and the form of the name in the bibliographic record. For example, a user’s search for “D. Reynolds” will retrieve records where “D” and “Reynolds” appear anywhere in the record (or anywhere in the author fields, if the user opts for a more focused “author” search), but will not bring up records where the author’s name is recorded as “Reynolds, David.” c. Lack of cross-reference searching of the LC Name Authority file. If the user searches for a variant name represented by a cross-reference on an authority record, she might not be directed to the authorized form of the name. d. Lack of name disambiguation, which is especially problematic when the search is for a common name. While the process of name authority control ensures the uniqueness of name headings, it does not necessarily provide information that would help users distinguish between authors. For instance, the user often has to know the author’s middle name or date of birth to choose the correct entry, as exemplified by the following choices in the “Author” facet resulting from the query “David Kelly”: Kelly, David Kelly, David (David D.) Kelly, David (David Francis) Kelly, David F. Kelly, David H. Kelly, David Patrick Kelly, David St. Leger Kelly, David T. Kelly, David, 1929 July 11– Kelly, David, 1929– Kelly, David, 1929–2012 Kelly, David, 1938– Kelly, David, 1948– Kelly, David, 1950– Kelly, David, 1959– e. Errors and inaccuracies in the bibliographic records. Given the past practice of creating undifferentiated personal-name authority records, it is not uncommon to have one name heading for different authors or contributors. Conversely, situations where a single person is identified by multiple headings (largely because some records still contain obsolete or variant forms of a personal name) are also prevalent and may INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 21 become a significant barrier to effective retrieval as they create multiple facet values for the same author or contributor. f. Inability to perform an exhaustive search on the author’s name. A fielded “Author” search will miss the records where the name does not appear in the “Author” fields but appears elsewhere in the bibliographic record. g. Relevance ranking. Because search terms occurring in the title have more weight than search terms in the “Author” fields, works about an author are ranked higher than works of the author. Browsing Like many other next-generation discovery systems, SearchWorks features faceted navigation, which facilitates both general-purpose browsing and more targeted search. In SearchWorks, facets are displayed from the outset, providing a high-level overview of the collection and jumping-off points for further exploration. Rather than having to guess the entry vocabulary, the searcher may just choose from the available facets and explore the entire collection along a specific dimension. However, findings from our manual analysis of the query stream suggest that facets as a browsing tool might not be used to their fullest potential: users often resort to keyword searching when faceted browsing would have been a more optimal strategy. There are at least two factors that contribute to this trend. The first is users’ lack of awareness of this interface feature: it is common for SearchWorks users to issue queries such as “dissertations,” “theses,” and “newspapers” instead of selecting the appropriate value of the “Format” facet. Second, many of the facets that could be useful in the discovery process are not available as top-level browsing categories. For example, users expect more granular faceting of audiovisual resources, which would include the ability to browse by content type (“computer games,” “video games”) and genre (“feature films,” “documentaries,” “TV series,” “romantic comedies”). Another category of resources commonly accessed by browsing is theses and dissertations. Users frequently try to browse dissertations by field or discipline (issuing searches such as “linguistics thesis,” “dissertations aeronautics,” “PhD thesis economics,” “biophysics thesis”), by program or department and by the level of study (undergraduate, master’s, doctoral), and could benefit from a set of facets dedicated to these categories. Browsing for books could be enhanced by additional faceting related to intellectual content, such as genre and literary form (e.g., “fantasy,” “graphic novels,” “autobiography,” “poetry”) and audience (e.g., “children’s books”). Users also want to be able to browse for specific subsets of materials on the basis of their location (e.g., permanent reserves at the engineering library). Browsing for new acquisitions with the option of limiting to a specific topic is also a highly desirable feature. LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 22 While some browsing categories are common across all types of resources, others only apply to specific types of materials (e.g., music, cartographic/geospatial materials, audiovisual resources, etc.). For example, there is a strong demand among music searchers for systematic browsing by specific musical instruments and their combinations. Ideally, the system should offer both an optimal set of initial browse options and intuitive context-specific ways to progressively limit or expand the search. Offering such browsing tools may require improvements in system design as well as significant data remediation and enhancement because much of the metadata that could be used to create these browsing categories is often scattered across multiple fixed and variable fields in the bibliographic records, inconsistently recorded, or not present at all. One of the hallmarks of modern discovery tools has been their increased focus on developing tools that would facilitate serendipitous browsing. SearchWorks was one of the pioneers to offer virtual “browse shelf” feature, which is aimed at emulating browsing the shelves in a physical library. However, because this functionality relies on the classification number, it does not allow browsing of many other important groups of materials, such as multimedia resources, rare books, or archival resources. Call-number proximity is only one of the many dimensions that could be leveraged to create more opportunities for serendipitous discoveries. Other methods of associating related content might include recommendations based on subject similarity, authorship, keyword associations, forward and backward citations, and use. Implications for Practice Addressing the issues that we identified would involve improvements in several areas: • Scope. Our findings indicate that library users increasingly perceive the discovery interface as a portal to all of the library’s resources. Meeting this need goes far beyond offering the ability to search multiple content sources from a single search box: it is just as important to help users make sense of the results of their search and to provide easy and convenient ways to access the resources that they have discovered. And whatever the scope of the library discovery layer is, it needs to be communicated to the user with maximum clarity. • Functionality. Users expect a robust and fault-tolerant search system with a rich suite of search-assistance features, such as index-based alternative spelling suggestions, result screens displaying keywords in context, and query auto-completion mechanisms. These features, many of which have become deeply embedded into user search processes elsewhere on the web, could prevent or alleviate a substantial number of issues related to problematic user queries (misspellings, typographical errors, imprecise queries, etc.), enable more efficient recovery from errors by guiding users to improved results, and facilitate discovery of foreign-language materials. Equally important is the continued focus on relevance ranking algorithms, which ideally should move beyond simple keyword- INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 23 matching techniques toward incorporating social data as well as leveraging the semantics of the query itself and offering more intelligent and possibly more personalized results depending on the context of the search. • Metadata. The quality of the user experience in the discovery environments depends as much on the metadata as it does on the functionality of the discovery layer. Thus it remains extremely important to ensure consistency, granularity, and uniformity of metadata, especially as libraries are increasingly faced with the problem of integrating heterogeneous pools of metadata into a single discovery tool. CONCLUSIONS AND FUTURE DIRECTIONS The analysis of the transaction log data and user feedback has helped us identify several common patterns of search failure, which in turn can reveal important assumptions and expectations that users bring to the library discovery. These expectations pertain primarily to the system’s functionality: in addition to simple, intuitive, and visually appealing interfaces and relevance- ranked results, users expect a sophisticated search system that would consistently produce relevant results even for incomplete, inaccurate, or erroneous queries. Users also expect a more centralized, comprehensive, and inclusive search environment that would enable more in-depth discovery by offering article-level, chapter-level, and full-text searching. Finally, the results of this study have underscored the continued need for a more flexible and adaptive system that would be easy to use for novices while offering advanced functionality and more control over the search process for the “power” users, a system that would provide targeted support for the different types of information behavior (known-item look-up, author searching, topical exploration, browsing) and would facilitate both general inquiry and very specialized searches (e.g., searches for music, cartographic and geospatial materials, digital collections of images, etc.). Just like discovery itself, building discovery tools is a dynamic, complex, iterative process that requires intimate knowledge of ever-changing and evolving user needs and expectations. It is hoped that ongoing focus on user problems and frustrations in the new discovery environments can complement other assessment methods by identifying unmet user needs, thus helping create a more holistic and nuanced picture of users’ search and discovery behaviors. REFERENCES 1. Marshall Breeding, “Library Resource Discovery Products: Context, Library Perspectives, and Vendor Positions,” Library Technology Reports 50, no. 1 (2014): 5–58. 2. Craig Silverstein et al., “Analysis of a Very Large Web Search Engine Query Log,” SIGIR Forum 33, no. 1 (1999): 6–12; Bernard J. Jansen, Amanda Spink, and Tefko Saracevic, “Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web,” Information LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 24 Processing & Management 36, no. 2 (2000): 207–27, http://dx.doi.org/10.1016/S0306- 4573(99)00056-4; Amanda Spink, Bernard J. Jansen, and H. Cenk Ozmultu, “Use of Query Reformulation and Relevance Feedback by Excite Users,” Internet Research 10, no. 4 (2000): 317–28; Amanda Spink et al., “Searching the Web: The Public and Their Queries,” Journal of the American Society for Information Science & Technology 52, no. 3 (2001): 226–34; Bernard J. Jansen and Amanda Spink, “An Analysis of Web Searching by European AllteWeb.com Users,” Information Processing & Management 41, no. 2 (2005): 361–81, http://dx.doi.org/10.1016/S0306-4573(03)00067-0. 3. Cory Lown and Bradley Hemminger, “Extracting User Interaction Information from the Transaction Logs of a Faceted Navigation OPAC,” code4lib 7, June 26, 2009, http://journal.code4lib.org/articles/1633; Eng Pwey Lau and Dion Ho-Lian Goh, “In Search of Query Patterns: A Case Study of a University OPAC,” Information Processing & Management 42, no. 5 (2006): 1316–29, http://dx.doi.org/10.1016/j.ipm.2006.02.003; Heather Moulaison, “OPAC Queries at a Medium-Sized Academic Library: A Transaction Log Analysis,” Library Resources & Technical Services 52, no. 4 (2008): 230–37. 4. William H. Mischo et al., “User Search Activities within an Academic Library Gateway: Implications for Web-Scale Discovery Systems,” in Planning and Implementing Resource Discovery Tools in Academic Libraries, edited by Mary Pagliero Popp and Diane Dallis, 153–73 (Hershey, : Information Science Reference, 2012); Xi Niu, Tao Zhang, and Hsin-liang Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library,” International Journal of Human-Computer Interaction 30, no. 5 (2014): 422–33, http://dx.doi.org/10.1080/10447318.2013.873281. 5. Eng Pwey Lau and Dion Ho-Lian Goh, “In Search of Query Patterns”; Niu, Zhang, and Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library.”. 6. Lown and Hemminger, “Extracting User Interaction; Kristin Antelman, Emily Lynema, and Andrew K. Pace, “Toward a Twenty-First Century Library Catalog,” Information Technology & Libraries 25, no. 3 (2006): 128–39; Niu, Zhang, and Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library.” 7. Xi Niu and Bradley Hemminger, “Analyzing the Interaction Patterns in a Faceted Search Interface,” Journal of the Association for Information Science & Technology 66, no. 5 (2015): 1030–47, http://dx.doi.org/10.1002/asi.23227. 8. Steven D. Zink, “Monitoring User Search Success through Transaction Log Analysis: The WolfPAC Example,” Reference Services Review 19, no. 1 (1991): 49–56; Deborah D. Blecic et al., “Using Transaction Log Analysis to Improve OPAC Retrieval Results,” College & Research Libraries 59, no. 1 (1998): 39–50; Holly Yu and Margo Young, “The Impact of Web Search http://dx.doi.org/10.1016/S0306-4573(99)00056-4 http://dx.doi.org/10.1016/S0306-4573(99)00056-4 http://dx.doi.org/10.1016/S0306-4573(03)00067-0 http://journal.code4lib.org/articles/1633 http://dx.doi.org/10.1016/j.ipm.2006.02.003 http://dx.doi.org/10.1080/10447318.2013.873281 http://dx.doi.org/10.1080/10447318.2013.873281 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 25 Engines on Subject Searching in OPAC,” Information Technology & Libraries 23, no. 4 (2004): 168–80; Moulaison, “OPAC Queries at a Medium-Sized Academic Library.” 9. Thomas Peters, “When Smart People Fail,” Journal of Academic Librarianship 15, no. 5 (1989): 267–73; Zink, “Monitoring User Search Success through Transaction Log Analysis”; Rhonda H. Hunter, “Successes and Failures of Patrons Searching the Online Catalog at a Large Academic Library: A Transaction Log Analysis,” Reference Quarterly (Spring 1991): 395–402. 10. Karen Antell and Jie Huang, “Subject Searching Success: Transaction Logs, Patron Perceptions, and Implications for Library Instruction,” Reference & User Services Quarterly 48, no. 1 (2008): 68–76; Hunter, “Successes and Failures of Patrons Searching the Online Catalog at a Large Academic Library”; Peters, “When Smart People Fail.” 11. Peters, “When Smart People Fail.”; Moulaison, “OPAC Queries at a Medium-Sized Academic Library”; Blecic et al., “Using Transaction Log Analysis to Improve OPAC Retrieval Results.” 12. Lynn Silipigni Connaway, Debra Wilcox Johnson, and Susan E. Searing, “Online Catalogs from the Users’ Perspective: The Use of Focus Group Interviews,” College & Research Libraries 58, no. 5 (1997): 403–20, http://dx.doi.org/10.5860/crl.58.5.403. 13. Karl V. Fast and D. Grant Campbell, “‘I Still Like Google’: University Student Perceptions of Searching OPACs and the Web,” ASIST Proceedings 41 (2004): 138–46; Eric Novotny, “I Don’t Think I Click: A Protocol Analysis Study of Use of a Library Online Catalog in the Internet Age,” College & Research Libraries 65, no. 6 (2004): 525–37, http://dx.doi.org/10.5860/crl.65.6.525. 14. Xi Niu et al., “National Study of Information Seeking Behavior of Academic Researchers in the United States,” Journal of the American Society for Information Science & Technology 61, no. 5 (2010): 869–90, http://dx.doi.org/10.1002/asi.21307; Lynn Sillipigni Connaway, Timothy J. Dikey, and Marie L. Radford, “If It Is Too Inconvenient I’m Not Going after It: Convenience as a Critical Factor in Information-Seeking Behaviors,” Library & Information Science Research 33, no. 3 (2011): 179–90; Karen Calhoun, Joanne Cantrell, Peggy Gallagher and Janet Hawk, Online Catalogs: What Users and Librarians Want: An OCLC Report (Dublin, OH: OCLC Online Computer Library Center, 2009). 15. F. William Chickering and Sharon Q. Young, “Evaluation and Comparison of Discovery Tools: An Update,” Information Technology & Libraries 33, no.2 (2014): 5–30, http://dx.doi.org/10.6017/ital.v33i2.3471. 16. William Denton and Sarah J. Coysh, “Usability Testing of VuFind at an Academic Library,” Library Hi Tech 29, no. 2 (2011): 301–19, http://dx.doi.org/10.1108/07378831111138189; Jennifer Emanuel, “Usability of the VuFind Next-Generation Online Catalog,” Information Technology & Libraries 30, no. 1 (2011): 44–52; Erin Dorris Cassidy et al., “Student Searching http://dx.doi.org/10.5860/crl.58.5.403 http://dx.doi.org/10.5860/crl.65.6.525 http://dx.doi.org/10.1002/asi.21307 http://dx.doi.org/10.6017/ital.v33i2.3471 http://dx.doi.org/10.1108/07378831111138189 LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 26 with EBSCO Discovery: A Usability Study,” Journal of Electronic Resources Librarianship 26, no. 1 (2014): 17–35, http://dx.doi.org/10.1080/1941126X.2014.877331. 17. Sarah C. Williams and Anita K. Foster, “Promise Fulfilled? An EBSCO Discovery Service Usability Study,” Journal of Web Librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; Rice Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces,” Library Trends 61, no. 1 (2012): 186– 207; Andrew D. Asher, Lynda M. Duke, and Suzanne Wilson, “Paths of Discovery: Comparing the Search Effectiveness of EBSCO Discovery Service, Summon, Google Scholar, and Conventional Library Resources,” College & Research Libraries 74, no. 5 (2013): 464–88. 18. Jody Condit Fagan et al., “Usability Test Results for a Discovery Tool in an Academic Library,” Information Technology & Libraries 31, no. 1 (2012): 83–112; Megan Johnson, “Usability Test Results for Encore in an Academic Library,” Information Technology & Libraries 32, no. 3 (2013): 59–85. 19. Elizabeth (Bess) Sadler, “Project Blacklight: A Next Generation Library Catalog at a First Generation University,” Library Hi Tech 27, no. 1 (2009): 57–67, http://dx.doi.org/10.1108/07378830910942919; Bess Sadler, “Stanford's SearchWorks: Unified Discovery for Collections?” in More Library Mashups: Exploring New Ways to Deliver Library Data, edited by Nicole C. Engard, 247–260 (London: Facet, 2015). 20. Andrew D. Asher, Lynda M. Duke and Suzanne Wilson, “Paths of Discovery: Comparing the Search Effectiveness of EBSCO Discovery Service, Summon, Google Scholar, and Conventional Library Resources,” College & Research Libraries 74, no. 5 (2013): 464–88; Kelly Meadow and James Meadow, “Search Query Quality and Web-Scale Discovery: A Qualitative and Quantitative Analysis,” College & Undergraduate Libraries 19, no. 2–4 (2012): 163–75, http://dx.doi.org/10.1080/10691316.2012.693434. 21. Sarah C. Williams and Anita K. Foster, “Promise Fulfilled? An EBSCO Discovery Service Usability Study,” Journal of Web Librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; Kathleen Bauer and Alice Peterson-Hart, “Does Faceted Display in a Library Catalog Increase Use of Subject Headings?” Library Hi Tech 30, no. 2 (2012), 347–58, http://dx.doi.org/10.1108/07378831211240003. http://dx.doi.org/10.1080/1941126X.2014.877331 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378830910942919 http://dx.doi.org/10.1080/10691316.2012.693434 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378831211240003 ABSTRACT INTRODUCTION REFERENCES
9255 ---- Critical Success Factors for Integrated Library System Implementation in Academic Libraries: A Qualitative Study Shea-Tinn Yeh and Zhiping Walter INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 27 ABSTRACT Integrated library systems (ILSs) support the entire business operations of an academic library from acquiring and processing library resources to making them available to user communities and preserving them for future use. As libraries’ needs evolve, there is a pressing demand for libraries to migrate from one generation of ILS to the next. This complex migration process often requires significant financial and personnel investment, but its success is by no means guaranteed. We draw on enterprise resource planning and critical success factors (CSFs) literature to identify the most salient CSFs for ILS migration success through a qualitative study with four cases. We found that careful selection process, top management involvement, vendor support, project team competence, staff user involvement, interdepartmental communication, data analysis and conversion, project management and project tracking, staff user education and training, and managing staff user emotions are the most salient CSFs that determine the success of a migration project. INTRODUCTION The first generation of integrated library systems (ILSs) were developed specifically for library operations focused on the selection, acquisition, cataloging, and circulation of print collections. As libraries’ nonprint materials steadily grow, the print-centric ILSs became less and less efficient in supporting libraries’ daily operations. Recent years have seen an emergence of a new generation of ILSs, commonly called Library Services Platforms (LSPs), that takes into account the management of both print and electronic collections. LSPs take advantage of cloud computing and network advancements to provide economies of scale and to allow a library to better share data with other libraries. Furthermore, LSPs unify the entire suite of library operations to provide efficient workflow at the back end and advanced online discovery tools at the front end for the library.1 Given the claimed benefits of the emerging LSP and the fact that vendors are phasing out support for their legacy ILSs, we project that more libraries will be migrating to LSPs as the systems mature and libraries’ needs evolve. Shea-Tinn Yeh (sheila.yeh@du.edu) is Assistant Professor and Library Digital Infrastructure and Technology Coordinator, University of Denver Libraries. Zhiping Walter (zhiping.walter@ucdenver.edu) is Associate Professor, Business School, University of Colorado Denver. mailto:sheila.yeh@du.edu mailto:zhiping.walter@ucdenver.edu) CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 28 Migrating from one generation of ILS to another is a significant initiative that affects the entire library operation.2 Because of its scale and complexity, the migration project is not always smooth and often fraught with problems, with some projects falling behind migration completion schedule.3, 4, 5 In addition, committing to a new system often results in significant financial and personnel costs for an academic library.6 Understandably, there is considerable trepidation before, during, and after the migration process.6, 7 What contributes to a smooth migration process and a successful migration project? This is an urgent question at present and an enduring question for the future. This is because, as libraries continue to evolve, their operations and management needs are destined to outgrow functionalities of the current generation of ILS. Therefore migration to a new generation of ILS is destined to occur periodically for a library. In this research, we study critical success factors (CSFs) that contribute to a smooth migration process and a successful migration project defined as on-time and on-budget project completion and a smooth implementation process. To achieve our research goal, we anchor our theoretical foundation in the enterprise resource planning (ERP) system-implementation literature. ERP is “business process management software that allows an organization to use a system of integrated applications to manage the business and automate many back office functions related to technology, services and human resources.”9 Since a complete ILS is formed from a suite of integrated functions to manage a broad range of library processes, it is in fact an ERP for libraries.10 A literature review of CSFs for ERP system implementation success revealed more than ninety CSF factors.11, 12 The contribution of our research is in identifying, through qualitative research method, the most salient CSFs that contribute to the success of a library system migration project from one generation of ILS to another. Results of this study can help library administrators to improve the chance of success and decrease the level of anxiety during a migration project now and in the future. The remainder of the article is organized as follows: Section 2 reviews ERP, ILS, LSP, CSFs, and information system success measurement described in the literature. Section 3 describes the guided interviews that have been conducted to identify the CSFs, the results, and the analysis of the results. Finally, we offer conclusions and limitations as well as recommend future work. LITERATURE REVIEW ERP is business-management software comprising a suite of integrated applications that an organization can use to collect, store, manage, and interpret data from many business activities, including product planning, manufacturing, service delivery, marketing and sales, and human resources. The core idea of an ERP system is to integrate both the data and the process dimensions in a business so that transactions can be monitored and analyzed for planning and strategic purposes.13 Modules of the system cover different functions within a company and are linked so users can see what is happening in all areas of the company. An ERP system can improve a business’s back offices as well as its front-end functions, with both operational and strategic benefits.14 Some of the benefits include reliability in information access, data and operations INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 29 redundancy, data retrieval and reporting efficiency, easy module extension, and Internet commerce capability. Just like an ERP system for a business, a complete library management solution comprises a suite of integrated applications that manage a broad range of library processes including circulation, acquisition, cataloging, electronic resources management, and system administration. LSPs, the current generation of library management systems, are designed to manage both physical and digital collections. LSPs follow the service-oriented architecture (SOA) and can be deployed through multitenant Software as a Service (SaaS) distribution model.15 In addition to supporting all library functions, LSPs integrate with other university systems, such as student registry and finance, and provide front-end for library patrons in a cloud environment that leverages a global network of systems for discovery of a wide array of resources.16 Since an LSP is essentially an enterprise system for library functions, CSFs of ERP implementation success could guide LSP implementation. CSFs are conditions that must be met for an implementation to be successful.17 More than ninety CSFs have been identified for ERP implementation success.18, 19 Those CSFs have been classified according to various schemes, but we found the strategic versus tactical classification most relevant to the library context.20 Strategic factors address the big picture involving the breakdown of goals into do-able items. Tactical factors, on the other hand, are the methods to accomplish the doable items that lead to achieving the goals.21 By examining the entire list of CSFs from both the strategic and the tactical perspectives, we identify top CSFs for library-management-solution implementation and migration success, defined as on-time and on-budget delivery as well as smooth implementation process,22, 23 through a qualitative study. METHOD We conducted semi-structured interviews with open-ended questions to identify the most salient CSFs for implementation success. Since we needed to reduce more than ninety CSFs in the literature to a list of most salient CSFs in the library context and to potentially identify new CSFs, a qualitative-interview approach was more suitable than a quantitative-survey approach. A two- step process was used to arrive at the final list. First, we evaluated all CSFs in the literature and identified a subset of CSFs that might be most relevant for library-systems implementation.24 Second, this CSFs subset was used to develop an interview guide for semistructured interviews conducted later to further reduce this subset. Open-ended questions were also used during the interviews to elicit additional CSFs. An institutional review board (IRB) application was submitted and approved. The result of this two-step process is a list of ten CSFs discussed in the results section, with nine CSFs coming from our initial list and one CSF emerging from the interviews. The criterion for recruiting study libraries is that the library has implemented a new LSP within the last three years. This is because the LSP is the current generation of ILS, and it is only within the last few years that various LSP vendors began to promote and implement the LSPs. A CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 30 recruitment email was sent to libraries listed as adopters on various vendors’ press release sites. Participating recipients referred the interview request to appropriate migration team members whom we later contacted to schedule interviews. This resulted in up to five people from each participating library being interviewed in person or via Skype. Their positions are listed in table 1. Interviews were recorded, transcribed, and cleaned. Emails to the same interviewees were used for follow-up questions as needed. After interviews with each library, qualitative data analysis was performed to identify CSFs that emerged from the interviews. Interviews continued until no new CSFs emerged in the last interview. In total, staff from four libraries were interviewed between October 2014 and March 2015 about their implementation process and experience from staff user perspective. The design and implementation of discovery public interface experience was not part of this inquiry. Table 1 summarizes characteristics of the four libraries. Case numbers instead of university names are used to protect identities of participating libraries and interviewees. Case 1 Case 2 Case 3 Case 4 Type of university Private Public Public Private Student population 11,000+ 32,000+ 2,400+ 2,700+ Operating budget 11 million 13 million 1.5 million 1.3 million Library employees 150 400 17 13.5 Project length 6 months 9 months 6 months 9 months ILS used before Millennium Aleph Evergreen Voyager LSP implemented Sierra Alma Sierra Sierra Reasons for migration Discontinued vendor system support; servers out of warranty; vendor gave incentives Outdated servers; servers out of warranty In need of a robust system and provides discovery layer In need of a modern system demonstrating the library’s moving with the times Positions of interviewees Head of systems; module experts Heads of systems Director of library; head of systems Director of library Table 1. Summary of case study site characteristics. RESULTS The following CSFs emerged from interviews: careful selection process, top management involvement, vendor support, project team competence, staff user involvement, interdepartmental communication, data analysis and conversion, project management and project tracking, staff user education and training, managing staff user emotions. We discuss each CSF next. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 31 Careful Selection Process Most ILSs are commercial, off-the-shelf software systems that can vary dramatically in functionality from system to system.25 For example, some packages are more suitable for large institutions while others are more suitable for smaller ones. To mitigate risks in productivity or transaction loss and to minimize system and implementation costs, a library needs to determine the best “fitness-of-use” system. Such a determination is the outcome of a careful selection process. Although there is no commonly accepted technique, method, or tool for this process, all selection processes share common key steps suggested in the literature.26 They are the following as applied to library-systems selection: define stakeholder requirements, search for products, create a short list of most promising candidates based on a set of “must-have” requirements, evaluate the candidates on the short list, and analyze the evaluation data to make a selection. In addition, if the server option was chosen instead of the cloud option, selected hardware needs to satisfy system requirements for the final configuration. Careful selection process emerged as a CSF that affected implementation outcome for all four libraries. All cases were migrating to an LSP system. Some systems can be offered as locally installed systems, which require appropriate in-house and hardware capabilities. Case 1 did not consider its IT capability when deciding on a turnkey system. As a result, the library experienced difficulties in setting up the infrastructure in-house during the implementation. Each of the other three cases considered the candidate system’s compatibility with the legacy system, the match between library needs and system functionalities, system maturity, migration costs, data storage needs, and vendor support before and during the implementation as well as continued vendor support throughout the life of the new system. Even though each of the three libraries arrived at its system choice differently, on reflection, interviewees expressed relief and satisfaction in their decisions to choose their respective systems. “We were in the position where our servers were out of date and warranty, needed to be replaced. The servers were too small. We had sizing issues and we couldn’t update to the most recent version of Aleph . . . Alma being a cloud based solution will eliminate our need to be ‘in the server business.’” (Case 2). “We went through a very extensive formal process to select this system.” (Case 3) Top Management Involvement Successful implementation requires strong leadership by executives who understand, support, and champion the project.27 When this involvement is trickled down through organizational hierarchy, it leads to an organizational commitment, which is required for implementation success for complex projects.28, 29 Since library-system implementation is a complex project that (if done correctly) will transform the entire library and reposition it for better efficiency, strong leadership is critical as well. CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 32 In all four cases, top management were involved in the final decisions of their respective system choices. In cases 1 and 2, top management also took charge in securing funding for the migration projects. Interviewees stressed that top management support was very important in their respective project implementations. “The top level management took the recommendations from the systems librarians at the time, with the blessing of the council determined whether they want to proceed with the product Alma, and had funding conversations with the financial people.” (Case 2) “We have faculty library committee, faculty governance oversight. We showed them webinars of the products we considered before we signed them, so we have faculty representation on board. We held open forum and were inclusive in our invitations.” (Case 4) Vendor Support With a new technology, it is critical to acquire external technical expertise, often from the vendor, to facilitate successful implementation.30 Effective vendor support includes adequate and high- quality technical support during and after implementation, sufficient training provided for both the project team and staff users, and positive relationships between all parties in the project.31 Additionally, there should be adequate knowledge transfer between the vendor consultants and the clients, which can be achieved by defining roles, achieving shared understanding, and enhancing relationships through competent communication.32, 33 In the case of library-system implementations, vendor support is particularly important because of the complexity of each new generation of the system and the library personnel’s knowledge gap in understanding the nuts and bolts of the new system. Effective vendor support was identified in each case as a critical success factor determining the implementation outcome even though the form of vendor support varied from case to case. In case 1, the vendor sent different consultants with various expertise as project managers on the basis of the project phase. In case 2, the vendor sent one consultant who served as the main project manager. In case 3, the vendor provided a project manager and a team of technicians. In case 4, consultants were shared across multiple consortium libraries that were implementing the system at the same time. No matter how vendor support was provided, it was essential for implementation success as indicated by interviewees. “The vendor has been very supportive and provides a group of experts throughout the process, some are knowledgeable in server business while others are skilled project managers.” (Case 1) Project Team Competence Since library-system migration affects all functional areas of a library, members of the implementation team need to be cross-functional. Furthermore, members with both business INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 33 knowledge and technology knowhow are especially crucial for implementation success.34 Competence of vendor consultants assigned to the project also influences implementation success, as discussed earlier. Additionally, it is important to have an in-house project leader who champions the project and who has the essential skills and authority to set goals that legitimize change.35 Having a competent project team was essential for implementation success for each of our cases. In each case, the vendor provided the project manager and the library provided a co-manager who was a champion figure. Other team members came from various functional areas such as acquisition, circulation, cataloging, electronic resources management, and system administration. For example, in case 1, the technology librarian participated as a co-project manager. The project- management team comprised module experts within the library and from functional areas. In addition, the university’s technology services department lent technical support during early stages of implementation when servers need to be set up. The interviewees all stressed the importance of project-team competence. “Without the infrastructure knowledge from the university’s technology team and their time and full support to negotiate with the vendor, the migration project would not have been possible.” (Case 1) “The university’s IT made sure that we are in compliance with campus policies and expectations for securities.” (Case 2) Staff User Involvement It is important that the project team involve staff users early on, otherwise the implementation process may be bumpy. When end users are involved in decisions relating to system selection and implementation, they are more invested in and concerned with the success of the system, which in turn leads to greater system use and user satisfaction.36, 37 As such, it is one of the most cited critical success factors in ERP implementation.38 Because personal relevance to the system is just as important for library-system implementation, effective staff user involvement with implementation is positively related to implementation success. Staff user involvement has emerged as a main success factor in all our cases and contributed to the implementation project outcome. In case 1, staff users were not consulted as to whether an LSP was necessary for the library, although they were informed of the reasons for implementation. Additionally, staff users were not involved when the project timetable was negotiated. This lack of early staff user involvement led to considerable stress down the road, which made the implementation process bumpy. The other three cases involved staff users early on; as a result, staff users experienced much less stress and frustration down the road. Specifically, in case 2, the staff users were educated about the need for migration through staff meetings, town hall meetings, supervisory meetings, council meetings, and forums. Many product-demo sessions were conducted for the staff so they would have the knowledge to participate before the final decision CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 34 was made. There were daily internal newsletters conveying implementation news throughout implementation months. In case 3, the entire library was involved with the selection of a new system. While the key staff (such as circulation manager, acquisition manager, and reference manager) had more input than others, everyone offered input about the project. As such, the buy- in with the new system was strong from all stakeholders. In case 4, staff users were involved early on through open forums and webinars. The following quotes are examples of interviewee sentiment concerning staff user involvement: “Everybody is involved in choosing the system; partially because Evergreen had been so problematic. We wanted to make sure that everyone is on board.” (Case 3) “Migration is the most time consuming aspect of the library staff work during the time of the project, without their buy-ins, it is difficult to have a successful project.” (Case 4) Interdepartmental Communication The importance of effective communications across functional and departmental boundaries is well known in information-systems-implementation literature.39 With consultants coming from the vendor, project team members coming from different functional areas, and staff users with different perceptions and understandings of the implementation project, the importance of effective communications between all involved cannot be overstated. Communications should start early, be consistent and continuous throughout various stages of the implementation process, and include a system overview, rationale for implementation, briefings for process changes, and contact-points establishment.40 Expectations and goals should be communicated to all stakeholders and to all levels of the organization.41 Effectiveness of interdepartmental communication affected the implementation outcome in all our cases. In case 1, the library’s project manager was designated to communicate with the vendor when issues arose, such as hardware and software configurations, system backup and use, and task assignments. The formal project plan was established using the web-based Basecamp so that team members in different roles with different responsibilities could communicate and work together online. Regular meetings were held and emails were exchanged between project team members. However, there is a lack of effective interdepartmental communication with staff who were not on the project team. This resulted in the absence of necessary system testing that would have detected some data-integrity issues. Such issues later caused the system to be offline for days, which brought much frustration and stress to everyone. In the other three cases, all actors were well informed through news releases, meetings, presentations, and webinars. Concerns were communicated to the project team and addressed timely. As a result, the level of frustration was very low for those three cases. Data Analysis and Conversion A fundamental requirement for the effectiveness of an ERP system is the accuracy of its data,42 and the same is true for a library system. Data types in a legacy ILS are often of an outdated format and INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 35 can differ from formats supported by a new library system. Conversion from one format to another can be an overwhelming process, especially when there is no existing expertise in the library. Since migrating legacy data to the new system is essential, effective data analysis for conversion is a critical success factor for implementation success. The smoothness of each of the four implementation cases was related to the project team’s data analysis and conversion efforts. In case 1, the library did not spend any effort to analyze, convert, or clean the data. As a result, the system experienced data-integrity issues after it went live. The other three libraries either devoted time to clean and convert the data or had a third party do the data cleaning. As a result, no system issues arose from data-integrity problems. Interviewees from case 2 told us, “We elected to freeze the data 30 days sooner in terms of bibliographic data, so that we can do an authority control project with a third party vendor.” Project Management and Project Tracking According to ERP implementation literature, effective project-management practices are critical for implementation success. Such practices include defining clear objectives, establishing a formal implementation plan, designing a realistic work plan, and establishing resource requirements.43 The formal implementation plan needs to identify modules to be implemented, tasks to be undertaken, and all technical and nontechnical issues to be considered.44 Project progress must be carefully monitored through meetings and reports.45, 46 Effective project management and tracking has affected implementation outcome in all our cases. A popular project management and tracking software is Basecamp, a web-based project management and collaboration tool initially released in 2004.47 It offers discussion boards, to-do lists, file sharing, milestone management, event tracking, and messaging system that help project teams stay organized and connected despite their different locations. All cases used Basecamp for project management and tracking, which contributed to on-time and on-budget project completion for all cases. Staff User Education and Training A new system often frustrates users who do not receive adequate training in its functionalities and use.48 When feeling frustrated and stressed, users may avoid using the system. Proper and adequate training will sooth users and eliminate their reluctance to use the new system, which in turn helps realize productivity gains.49, 50 Training processes should consider factors such as training curriculum, user commitment, trainers’ personnel skills and competence, as well as training schedule, budget, evaluation, and methods.51 Effective staff user training has emerged as a critical success factor from all our cases. In case 1, staff users had access to a vendor-supplied preview portal, which simulated system functionalities. Staff users were so familiar with the new system by the time the system went live that they were eager to engage with it. In cases 2, 3 and 4, staff users were trained through demo products, online video trainings, Q&A, and on-site training sessions conducted by the vendor. CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 36 These training materials and sessions served to ease staff user’s feeling of uncertainty and anxiety, as the following quotes show: “The online training videos were provided to all staff in the library and followed up with Q&A sessions which members of the committee will host in their respective areas. . . . Then Ex Libris did a week long onsite training workshop serve for the final deep configuration issues. . . . We know that there are staff users who want to be ahead of the game, yet there are always people who don’t want to learn until the day before they go live.” (Case 2) “We have a training package with several onsite visits, each one is for a few days. The trainer focused on one aspect of the system. It was more than watching the videos online. Because of the small staff here, almost everyone attended at least one training.” (Case 3) “The trainers varied with their expertise, we developed fondness for some more than others. The training is functional in nature. The vendor’s priority was about trainer availability and to keep the project on time. We became familiar with trainers’ expertise; we were able to request the right trainer with the job.” (Case 4) Managing Staff User Emotions Although education and training eases user anxiety, it does not completely eliminate it. Emotions felt by users early in the implementation of a new system have important effects on the use of the system later on.52 How to manage staff user anxiety and negative emotions when they appear has emerged as a critical success factor in all our cases, as shown in the following quotes: “There were so many things going on in the library during the migration go-live week. The unknown of the migration success made staff users uncomfortable. Should the migration date be decided in consideration of other initiatives, the frustration experienced would have been a lot less and might not have been ignored during the going-live week.” (Case 1) “The frustration was just change; it was the fact that we have to learn something new. . . . Primarily the frustration was handled by the lead.” (Case 2) “There was a challenge, especially early on, in getting people to engage with the manuals and the literature in documentation. It is as if everyone is being asked to learn a new language. . . . The key relationship between the onsite coordinator and the project manager on the vendor side is important. When those two exchange information and handle frustration diplomatically, this bridge between the two organizations can smooth over a lot of rough feathers on either or both sides.” (Case 4) INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 37 This final CSF did not come directly from the ninety-plus CSFs that we started with, although it aligned closely with “Change Management” category.53 This CSF emerged mostly from the interview process. Summary of Results The results of the case studies for each critical factor are summarized in table 2. Implementation project outcome is summarized in table 3. An implementation is considered successful if it was completed on-time and on-budget and if the implementation process was smooth as reflected in the number and degree of unexpected problems along the way. Critical Success Factor Case 1 Case 2 Case 3 Case 4 Careful selection process No Yes Yes Yes Top management involvement Yes Yes Yes Yes Vendor support Yes Yes Yes Yes Project team competence Yes Yes Yes Yes Staff user involvement No Yes Yes Yes Interdepartmental communication No Yes Yes Yes Data analysis & conversion No Yes Yes Yes Project management and tracking Yes Yes Yes Yes Staff user education and training Yes Yes Yes Yes Managing staff user emotions No Yes Yes Yes Table 2. Summary of case study critical success factors findings Case 1 Case 2 Case 3 Case 4 On time implementation Yes Yes Yes Yes On budget implementation Yes Yes Yes Yes Smoothness of implementation No Staff users experienced data integrity issue, system downtime, as well as anxiety and stress with the system implementation process Yes Yes Yes Table 3. Summary of case study implementation success measures DISCUSSION AND CONCLUSIONS The implementation of a new ILS is a large-scale undertaking that affects every aspect of a library’s operations as well as every staff user’s workflow process. As such, it is imperative for CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 38 library administrators to understand what factors contribute to a successful implementation. Our qualitative study shows that there are two categories of CSFs: strategic and tactical. From the strategic perspective, top management involvement, vendor support, staff user involvement, interdepartmental communication, and staff user emotion management are critical. From the tactical perspective, project team competence, project management and project tracking, data analysis and conversion, and staff user education and training to break down the technical barrier greatly affect implementation outcome. In addition, selection of the final system from a variety of choices and options requires a careful consideration of both strategic and tactical issues. Each factor identified is important in its own right during the implementation process. Combined, they complement each other to guide an implementation to success. Among the list of CSFs identified, the role of staff user emotion management was not identified during the theoretical phase of the study; it only emerged as an important CSF during interviews. Top management involvement, vendor support, project team competence, project management and tracking, and staff user education and training are CSFs that were somewhat intuitive, and they were implemented by all cases. However, a library may select an end system without careful considerations. It may also be unaware of the importance of involving users early on, the importance of opening clear lines of interdepartmental communications, or the importance of performing data analysis and conversion before the implementation. Staff user emotion management, especially, is at the risk of being an afterthought of an implementation. By identifying the most salient CSFs, this study offers practical contributions to academic library leaders and administrators in understanding how critical success factors play a role in ensuring a smooth and successful ILS implementation. Although CSFs have been extensively studied in the discipline of information-systems management, this is the first study to apply CSFs in the library context. Since library management has its unique challenges compared to businesses, identifying CSFs for library-system-implementation success is important not only for the current migration to LSPs but also for future migrations to future generations of ILSs as the needs of libraries continue to evolve. As with any empirical research, there are limitations to this study. The number of academic libraries interviewed is small despite no new information being discovered after the fourth interview. The vendors represented in this study are only two of the many in the market providing LSPs to libraries. With these aforementioned limitations, the results of this study may not be generalizable to libraries implementing an LSP with vendors other than Innovative Interfaces and Ex Libris. Additionally, the results may not be generalizable to nonacademic libraries. This research can be extended to validate the proposed CSFs quantitatively by performing a survey research in academic libraries. Studying interactions between identified factors will offer an even greater contribution. This research can be experimented in other types of libraries to generalize inferences. In addition, case libraries 3 and 4 both expressed that LSP changes the public interface that is used by external users, and they wished to have more opportunities for outreach prior to the implementation. Although the design and implementation of the public interface was not considered within the scope of this research, this comment is insightful because INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 39 it may imply that future studies should consider a project champion to be a critical success factor. The project champion must have people-related skills and position to introduce changes in achieving buy-in from staff users.54, 55 REFERENCES 1. Richard M. Jost, Selecting and Implementing an Integrated Library System: The Most Important Decision You Will Ever Make (Boston: Chandos, 2015). 2. Ibid., 3. 3. Suzanne Julich, Donna Hirst and Brian Thompson, “A Case Study of ILS Migration: Aleph500 at the University of Iowa,” Library Hi Tech 21, no. 1 (2003): 44–55, http://dx.doi.org/10.1108/07378830310467391. 4. Zahiruddin Khurshid, “Migration from DOBIS LIBIS to Horizon at KFUPM,” Library Hi Tech 24, no. 3 (2006): 440–51, http://dx.doi.org/10.1108/07378830610692190. 5. Vandana Singh, “Experiences of Migrating to an Open-Source Integrated Library System,” Information Technology & Libraries 32, no. 1 (2013): 36–53. 6. Jost, “Selecting and Implementing an Integrated Library System.” 7. Yongming Wang and Trevor A. Dawes, “The Next Generation Integrated Library System: A Promise Fulfilled,” Information Technology & Libraries 31, no. 3 (2012): 76–84. 8. Keith Kelley, Carrie C. Leatherman, and Geraldine Rinna, “Is It Really Time to Replace Your ILS with a Next-Generation Option?” Computers in Libraries 33, no. 8 (2013): 11–15. 9. Vangie Beal, “ERP—Enterprise Resource Planning,” Webopedia, http://www.webopedia.com/TERM/E/ERP.html. 10. “Library Management System,” Tangient LLC, https://libtechrfp.wikispaces.com/Library+Management+System. 11. Christopher P. Holland and Ben Light, “A Critical Success Factors Model for ERP Implementation,” IEEE Software 16, no. 3 (1999): 30–36, http://dx.doi.org/10.1109/52.765784. 12. Levi Shaul and Doron Tauber, “Critical Success Factors in Enterprise Resource Planning Systems: Review of the Last Decade,” ACM Computing Surveys 45 no. 4 (2013): 1–39, http://dx.doi.org/10.1145/2501654.2501669. 13. Yahia Zare Mehrjerdi, “Enterprise Resource Planning: Risk and Benefit Analysis,” Business Strategy Series 11, no. 5 (2010): 308–24, http://dx.doi.org/10.1108/17515631011080722. 14. Mohammad A. Rashid, Liaquat Hossain, and Jon David Patrick, “The Evolution of ERP Systems: A Historical Perspective,” in Enterprise Resource Planning: Global Opportunities and Challenges (Hershey, PA: Idea Group, 2002). http://dx.doi.org/10.1108/07378830310467391 http://dx.doi.org/10.1108/07378830610692190 http://www.webopedia.com/TERM/E/ERP.html https://libtechrfp.wikispaces.com/Library+Management+System http://dx.doi.org/10.1109/52.765784 http://dx.doi.org/10.1145/2501654.2501669 http://dx.doi.org/10.1108/17515631011080722 CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 40 15. Marshall Breeding, “Library Systems Report 2014: Competition and Strategic Cooperation,” American Libraries 45, no. 5 (2014): 21–33. 16. Sharon Yang, “From Integrated Library Systems to Library Management Services: Time for Change?” Library Hi Tech News 30, no. 2 (2013): 1–8, http://dx.doi.org/10.1108/LHTN-02- 2013-0006. 17. Shahin Dezdar, “Strategic and Tactical Factors for Successful ERP Projects: Insights from an Asian Country,” Management Research Review 35, no. 11 (2012): 1070–87, http://dx.doi.org/10.1108/14637151111182693. 18. Ibid. 19. Shahin Dezdar and Ainin Sulaiman, “Successful Enterprise Resource Planning Implementation: Taxonomy of Critical Factors,” Industrial Management & Data Systems 109, no. 8 (2009): 1037– 52, http://dx.doi.org/10.1108/02635570910991283. 20. Sherry Finney and Martin Corbett, “ERP Implementation: A Compilation and Analysis of Critical Success Factors,” Business Process Management Journal 13, no. 3 (2007): 329–47, http://dx.doi.org/10.1108/14637150710752272. 21. F. Pearce, Business Building and Promotion: Strategic and Tactical Planning (Houston: Pearman Cooperation Alliance, 2004). 22. Jennifer Bresnahan, “Mixed Messages,” CIO (May 16, 1996), 72, http://dx.doi.org/10.1016/j.jchf.2013.07.005. 23. Majed Al-Mashari, Abdullah Al-Mudimigh, and Mohamed Zairi, “Enterprise Resource Planning: A Taxonomy of Critical Factors,” European Journal of Operational Research 146, no. 2 (2003): 352–64, http://dx.doi.org/10.1016/S0377-2217(02)00554-4. 24. Shaul and Tauber, “Critical Success Factors in Enterprise Resource Planning Systems.” 25. H. Akkermans and K. van Helden, “Vicious and Virtuous Cycles in ERP Implementation: A Case Study of Interrelations between Critical Success Factors,” European Journal of Information Systems 11, no. 1 (2002): 35–46, http://dx.doi.org/10.1057/palgrave.ejis.3000418. 26. Abdallah Mohamed, Guenther Ruhe, and Armin Eberlein, “COTS Selection: Past, Present, and Future” (paper presented at the 14th Annual IEEE International Conference and Workshops on the Engineering of Computer-based System, 2007), http://dx.doi.org/10.1109/ECBS.2007.28. 27. M. Michael Umble, Elisabeth J. Umble, and Ronald R. Haft, “Enterprise Resource Planning: Implementation Procedures and Critical Success Factors,” European Journal of Operational Research 146 no. 2 (2003): 241–57, http://dx.doi.org/10.1016/S0377-2217(02)00547-7. 28. Jim Johnson, “Chaos: the Dollar Drain of IT Project Failures,” Application Development Trends 2, no. 1 (1995): 41–47. http://dx.doi.org/10.1108/LHTN-02-2013-0006 http://dx.doi.org/10.1108/LHTN-02-2013-0006 http://dx.doi.org/10.1108/14637151111182693 http://dx.doi.org/10.1108/02635570910991283 http://dx.doi.org/10.1108/14637150710752272 http://dx.doi.org/10.1016/j.jchf.2013.07.005 http://dx.doi.org/10.1016/S0377-2217(02)00554-4 http://dx.doi.org/10.1057/palgrave.ejis.3000418 http://dx.doi.org/10.1109/ECBS.2007.28 http://dx.doi.org/10.1016/S0377-2217(02)00547-7 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 41 29. Prasad Bingi, Maneesh K. Sharma, and Jayanth K. Godla, “Critical Issues Affecting an ERP Implementation,” Information Systems Management 16, no. 3 (1999): 7–14, http://dx.doi.org/10.1201/1078/43197.16.3.19990601/313. 30. Mary Sumner, “Critical Success Factors in Enterprise Wide Information Management Systems Projects,” Proceedings of the 1999 ACM SIGCPR Conference on Computer Personnel Research, 1999 (New York: ACM, 1999), http://dx.doi.org/10.1145/299513.299722. 31. Eric T. G. Wang et al., “The Consistency among Facilitating Factors and ERP Implementation Success: A Holistic View of Fit,” Journal of Systems & Software 81 no. 9 (2008): 1609–21, http://dx.doi.org/10.1016/j.jss.2007.11.722. 32. Dong-Gil Ko, Laurie J. Kirsch, and William R. King, “Antecedents of Knowledge Transfer from Consultants to Clients in Enterprise System Implementations,” MIS Quarterly 29, no. 1 (2005): 59–85. 33. Al-Mashari, “Enterprise Resource Planning.” 34. Fiona Fui-Hoon Nah and Santiago Delgado, “Critical Success Factors for Enterprise Resource Planning Implementation and Upgrade,” Journal of Computer Information Systems 46 no. 5 (2006): 99–113. 35. Liang Zhang et al., “A Framework of ERP Systems Implementation Success in China: An Empirical Study,” International Journal of Production Economics 98, no. 1 (2005): 56–80, http://dx.doi.org/10.1016/j.ijpe.2004.09.004. 36. Ann-Marie K. Baronas and Meryl Reis Louis, “Restoring a Sense of Control During Implementation: How User Involvement Leads to System Acceptance,” MIS Quarterly 12, no. 1 (1988): 111–24. 37. Joseph Esteves, Joan Pastor and Joseph Casanovas, “A Goals/Questions/Metrics Plan for Monitoring User Involvement and Participation in ERP Implementation Projects,” IE working paper, March 11, 2004, http://dx.doi.org/10.2139/ssrn.1019991. 38. Khaled Al-Fawaz, Zahran Al-Salti, and Tillal Eldabi, “Critical Success Factors in ERP Implementation: A Review” (paper presented at the European and Mediterranean Conference on Information Systems, Dubai, May 25–26, 2008). 39. H. Akkermans and K. van Helden, “Vicious and Virtuous Cycles in ERP Implementation: A Case Study of Interrelations between Critical Success Factors,” European Journal of Information Systems 11, no. 1 (2002): 35–46, http://dx.doi.org/10.1057/palgrave.ejis.3000418. 40. Nancy Bancroft, Henning Seip, and Andrea Sprengel, Implementing SAP R/3: How to Introduce a Large System Into a Large Organisation (Greenwich, UK: Manning, 1998). 41. Nah, “Critical Success Factors.” http://dx.doi.org/10.1201/1078/43197.16.3.19990601/313 http://dx.doi.org/10.1016/j.jss.2007.11.722 http://dx.doi.org/10.1016/j.ijpe.2004.09.004 http://dx.doi.org/10.2139/ssrn.1019991 http://dx.doi.org/10.1057/palgrave.ejis.3000418 CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 42 42. Toni M. Somers and Klara Nelson, “The Impact of Critical Success Factors Across the Stages of Enterprise Resource Planning Implementations,” Proceedings of the 34th Hawaii International Conference on System Sciences, 2001, http://dx.doi.org/10.1109/HICSS.2001.927129. 43. Shi-Ming Huang et al., “Assessing Risk in ERP Projects: Identify and Prioritize the Factors,” Industrial Management & Data Systems 104, no. 8 (2004): 681–88, http://dx.doi.org/10.1108/02635570410561672. 44. Nah, “ERP Implementation.” 45. Umble, “Enterprise Resources Planning.” 46. Nah, “ERP Implementation.” 47. “Basecamp, in a Nutshell,” Basecamp, https://basecamp.com/about/press. 48. Nah, “ERP Implementation.” 49. Umble, “Enterprise Resources Planning.” 50. Mo Adam Mahmood et al., “Variables Affecting Information Technology End-user Satisfaction: A Meta-analysis of the Empirical Literature,” International Journal of Human-Computer Studies 52, no. 4 (2000): 751–71, http://dx.doi.org/10.1006/ijhc.1999.0353. 51. Iuliana Dorobat and Floarea Nastase, “Training Issues in ERP Implementations,” Accounting & Management Information Systems 11, no. 4 (2012): 621–36. 52. Anne Beaudry and Alain Pinsonneault, “The Other Side of Acceptance: Studying the Direct and Indirect Effects of Emotions on Information Technology Use,” MIS Quarterly 34, no. 4 (2010): 689–710. 53. Shaul and Tauber, “Critical Success Factors in Enterprise Resource Planning Systems.” 54. Andrew Lawrence Norton et al., “Ensuring Benefits Realisation from ERP II: The CSF Phasing Model,” Journal of Enterprise Information Management 26, no. 3 (2013): 218–34, http://dx.doi.org/10.1108/17410391311325207. 55. Chong Hwa Chee, “Human Factor for Successful ERP2 Implementation,” New Straits Times, July 28, 2003, https://www.highbeam.com/doc/1P1-76161040.html. http://dx.doi.org/10.1109/HICSS.2001.927129 http://dx.doi.org/10.1108/02635570410561672 https://basecamp.com/about/press http://dx.doi.org/10.1006/ijhc.1999.0353 http://dx.doi.org/10.1108/17410391311325207 https://www.highbeam.com/doc/1P1-76161040.html ABSTRACT INTRODUCTION LITERATURE REVIEW METHOD RESULTS Careful Selection Process Top Management Involvement Vendor Support Project Team Competence Staff User Involvement Interdepartmental Communication Data Analysis and Conversion Project Management and Project Tracking Staff User Education and Training Managing Staff User Emotions Summary of Results
9268 ---- Editorial Board Thoughts: The Importance of Staff Change Management in the Face of the Growing “Cloud” Mark Dehmlow INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 3 The library vendor market likes to throw around the word “cloud” to make their offerings seem innovative and significant. In many ways, much of what the library IT market refers to as “cloud,” especially SAAS (software as a service) offerings, are really just a fancier term for hosted services. The real gravitas behind the label cloud really emanated from grid-computing or large interconnected, and quickly deployable infrastructure like Amazon’s AWS or Microsoft’s Azure platforms. Infrastructure at that scale and that level of geographic distribution was revolutionary when it emerged. Still these offerings at their core are basically IAAS (infrastructure as a service) bundled as a menu of services. So I think the most broadly applicable synonym for the “cloud” could be “IT as a service” in various forms. Outsourcing in this way isn’t entirely new to libraries. The function and structure of OCLC has arguably been one of the earlier instantiations of “IT as a service” for libraries vis-à-vis their MARC record aggregation and distribution which OCLC has been doing for decades. The more recent trend toward hosted IT services has been relatively easy for non-IT related units in our library. A service no different to most library staff based on where it is hosted. And with many services implementing APIs for libraries, that distinction is becoming less significant for our application developers too. For many of our technology staff, who have built careers around systems administration, application development, systems integration, and application management, hosted services represent a threat to not only their livelihoods but in some ways also their philosophical perspectives that are grounded in open source and do-it- yourself oriented beliefs. In many ways the “cloud” for the IT segment of our profession is perhaps more synonymous with change, and with change requires effective management of that change, especially for the human element of our organizations. Recently, our Office of Information Technologies started an initiative to move 80% of their technology infrastructure into the cloud. They have proposed an inverted pyramid structure for determining where IT solutions should reside — focusing first on hosted software as a service solutions for the largest segment of applications, followed by hosting those applications we would have typically installed locally onto a platform or infrastructure as a service provider, and then limiting only those applications that have specialized technical or legal needs to reside on premise. This is a big shift for our IT staff, especially, but not limited to, our systems administrators. The IAAS platform our university is migrating to is Amazon Web Services and their infrastructure is Mark Dehmlow (mdehmlow@nd.edu), a member of LITA and the ITAL editorial board, is the Director, Information Technology Program, Hesburgh Libraries, University of Notre Dame, South Bend, Indiana. EDITORIAL BOARD THOUGHTS: THE IMPORTANCE OF STAFF CHANGE MANAGEMENT IN THE FACE OF THE GROWING “CLOUD” | DEHMLOW | doi: 10.6017/ital.v35i1.8965 4 largely accessible via a web dashboard, so that the myriad of tasks our systems administrators took days and weeks to do can now, in some adjusted way, be accomplished with a few clicks. This example is on one extreme end of the spectrum as far as IT change goes, but simultaneously, we have looked at the vendor market to lease pre-packaged tools that support standard functions in academic libraries and can be locally branded and configured with our data — things like course guides, A-Z journal lists, scheduling events, etc. The overarching goals of these efforts are cost savings and increased velocity and resiliency of infrastructure, but also and perhaps more important, is giving us flexibility in how we invest our staff time. If we are able to move high level tasks from staff to a platform, then we will be able to reallocate our staff’s time and considerable talent to take on the constant stream of new, high level technology needs. Partnering with the University, we are aiming towards their defined goal of moving 80% of our technical infrastructure into the “cloud.” We have adopted their overall strategy of approach to systems infrastructure, at least in principle and are integrating into our own strategy significant consideration for the impact of these changes on our staff. Our organization has recognized that people form not only habits around process, but also personal and emotional attachments to why we do things the way we do them, both from a philosophical as well as a pragmatic perspective. Our approach to staff change is layered as well as long term. We know that getting from shock to acceptance is not an overnight process and that staff who adopt our overarching goals and strategy as their own will be more successful in the long term. To make this transition, we have developed several strategic approaches: 1. Explaining the Case: My experience is that staff can live through most changes as long as they understand why. Helping them gain that understanding can take some time, but ultimately having that comprehension will help them fully understand our strategic goals as well as help them make decisions that are in alignment with the overall approach. I often find it is important to remember that, as managers, we have been a part of all of the change conversations and we have had time to assimilate ideas, discuss points of view, and process the implications of change. Each of our staff needs to go through the same process and it is up to leadership to guide them through that process and ensure they get to participate in similar conversations. It is tempting to want to hit an initiative running, but there is significant value in seeding those discussions gradually over a somewhat gradual time period to more holistically integrate staff into the broader vision. It is important to explain the case for change multiple times and actively listen to staff thoughts and concerns and to remember to lay out the context for change, why it is important, and how we intend to accomplish things. Then reassure, reassure, and reassure. The threats to staff may seem innocuous or unfounded to managers, but staff need to feel secure during a process to ultimately buy in. 2. Consistency and Persistence: Staff acceptance doesn’t always come easy — nor should it necessarily. Listening and integrating their perspectives into the planning and INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 5 implementation process can help demonstrate that they matter, but equally important is that they feel our approach is built on something solid. Stability is reinforced through consistency in messaging. Not only in individual consistency, but also team consistency, and upper management consistency — everyone should be able to support and explain messaging around a particular change. Any time staff approach me and say, “it was much easier to do it this other way,” I talk about the efficiency we will garner through this change and how we will be able to train and repurpose staff in the future. The more they hear the message, the more ingrained it becomes, and the more normative it begins to feel. 3. Training and Investment: IT futures require investment, not just in infrastructure, but also in skill development. We continue to invest significantly in providing some level of training on new technologies that we implement. That training will not only prove to staff that you are invested in their development as well as their job security, but it will also give them the tools they need to be successful in implementing new technologies. Change is anxiety inducing because it exposes so many unknowns. Providing training helps build confidence and competence for staff, reducing anxieties and providing some added engagement in the process. It also gives them exposure to the real world implementation of technologies where they can begin to see the benefits that you have been communicating for themselves. 4. Envisioning the Future: Improvements and Roles — One of the initial benefits we will be getting from recouping staff time is around shoring up our processes. We have generally had a more ad hoc approach to managing the day to day. It has been difficult to institute a strong technical change management process, in part, because of time. We will be able to remove that consideration from our excuses as we take advantage of the “cloud.” The net effect will be that we will do our work more thoughtfully and less ad hoc and use better defined processes that will meet group-developed expectations. In addition to doing things better, we do expect to do things differently. With fewer tasks at the operational level, we believe we will be able to transition staff into newly defined roles. Some of these roles include DevOps Engineers, a hybrid of application engineering (the dev) and systems administration (the ops), these staff will help design automation and continuous integration processes that allow developers to focus on their programming and less on the environment they are deploying their applications in; Financial Engineers who will take system requirements and calculate costs in somewhat complex technical cloud environments; Systems Architects who will be focused on understanding the smorgasbord of options that can be tied together to provide a service to meet expected response performance, disaster recovery, uptime, and other requirements; and Business Analysts - who will focus on taking technical requirements and looking at all of the potential approaches to solve that need whether it be a hosted service, a locally developed solution, an implementation of an open source system, or some integration of all or some of the EDITORIAL BOARD THOUGHTS: THE IMPORTANCE OF STAFF CHANGE MANAGEMENT IN THE FACE OF THE GROWING “CLOUD” | DEHMLOW | doi: 10.6017/ital.v35i1.8965 6 above. This list is by no means exhaustive, but I think it forms a good foundation on which to help staff develop their skill set along with our changing environment. I believe it is important to remind those of us who are managing IT departments in Libraries that in many ways the easiest parts of change are the logistics. The technology we work with is bounded by sets of guidelines that define how they are used and ensure that if they are implemented properly, they will work effectively. People on the other hand are not bounded as neatly by stringent rules. They are guided by diverse backgrounds, personalities, experiences, and feelings. They can be unpredictable, difficult to fully figure out, and behaviorally inconsistent. And yet, they are the great constant in our organizations and therefore require significant attention. Our field needs “servant leaders” dedicated to supporting and developing staff, and not just being competent at implementing technologies. Those managers who invest in staff, their well-being, development, and sense of engagement in their jobs, will find their organizations are able to tackle most anything. But those who ignore their staffs’ needs over pragmatic goals will likely find their organizations struggling to move quickly and instead spend too much energy overcoming resistance instead of energizing change.
9343 ---- Let’s Get Virtual: Examination of Best Practices to Provide Public Access to Digital Versions of Three-Dimensional Objects Tanya M. Johnson INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 39 ABSTRACT Three-dimensional objects are important sources of information that should not be ignored in the increasing trend towards digitization. Previous research has not addressed the evaluation of digitized versions of three-dimensional objects. This paper first reviews research concerning such digitization, in both two and three dimensions, as well as public access in this context. Next, evaluation criteria for websites incorporating digital versions of three-dimensional objects are extrapolated from previous research. Finally, five websites are evaluated, and suggestions for best practices to provide public access to digital versions of three-dimensional objects are proposed. INTRODUCTION Much of the literature surrounding the increased efforts of libraries and museums to digitize content has focused on two-dimensional forms, such as books, photographs, or paintings. However, information does not only come in two dimensions; there are sculptures, artifacts, and other three-dimensional objects that have been unfortunately neglected by this digital revolution. As one author stated, “While researchers do not refer to three-dimensional objects as commonly as books, manuscripts, and journal articles, they are still important sources of information and should not be taken for granted” (Jarrell 1998, 32). The importance of three-dimensional objects as information that can and should be shared is not a new phenomenon; indeed, as early as 1887, museologists and educators forwarded the view that “museums were in effect libraries of objects” that provided information not supplied by books alone (Given and McTavish 2010, 11). However, it is only recently, with the advent of newer technological mechanisms, that such objects could be shared with the public on a larger scale. No longer do people need to physically visit museums to experience and learn from three- dimensional objects. Rather, various techniques have been utilized to place digital versions of such objects on the websites of museums and archives, and projects have been created by various universities in order to enhance that digital experience. Nevertheless, as Newell (2012) states: Collections-holding institutions increasingly regard digital resources as additional objects of significance, not as complete replacements for the original. Digital technologies work best when they enable people who feel connected to museum objects to have the freedom to deepen these Tanya M. Johnson (tmjohnso@gmail.com), a recent MLIS degree graduate from the School of Communication & Information, Rutgers, The State University of New Jersey, is winner of the 2016 LITA/Ex Libris Student Writing Award. mailto:tmjohnso@gmail.com LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 40 relationships and, where appropriate, to extend outsiders’ understandings of the objects’ cultural contexts. The raison d’être of museums and other cultural institutions remains centred on the primacy of the object and in this sense continues to privilege material authenticity. (303) In this regard, three-dimensional visualization of physical objects can be seen as the next step for museums and cultural heritage institutions that seek to further patrons’ connection to such objects via the internet. Indeed, in this digital age, the goals of museums and archives are changing, converging with those of libraries to focus more efforts on providing information to the public, and, along with the growing trend to digitize information contained within libraries, there has been a concomitant trend to digitize the contents of museums in order to provide greater public access to collections (Given and McTavish 2010). In light of this progress, this paper will review various methods of presenting three-dimensional objects to the public on the internet and, based on an evaluation of five digital collections, attempt to provide some advice as to best practices for museums or institutions seeking to digitize such objects and present them to the public via a digital collection. LITERATURE REVIEW Two-Dimensional Digitization There are many ways to present digital versions of three-dimensional objects on a webpage, ranging from simple two-dimensional photography to complicated three-dimensional scanning and rendering. Beginning on the simpler end of the scale, Bincsik, Maezaki, and Hattori (2012) describe the process of photographing Japanese decorative art objects in order to create an image database of objects from multiple museums. Specifically, the researchers explain that they need high quality photographs showing each object in all directions, as well as close-up images of fine details, in order to recreate the physical research experience as closely as possible. They also note that, for the same reason, the context of each object must be recorded, including photographs of any wrapping or storage materials and accompanying documentation. For this project, the researchers utilized Nikon professional or semi-professional cameras, with zoom and macro lenses, and often used small apertures to increase depth-of-field. At times, they also took measurements of the objects in order to assist museums in maintaining accurate records. The raw image files were then processed with programs such as Adobe Photoshop, saved as original TIF files, and converted into JPEG format for upload. Despite the success of the project, the researchers also noted the limitations of digitizing three-dimensional objects: With decorative art objects some information is inevitably lost, such as the weight of the object, the feeling of its surface texture or the sense of its functionality in terms of proportions and balance. Digital images clearly can fulfill many research objectives, but in some cases they can only be used as references. One objective of the decorative arts database is to advise the researcher in selecting which objects should be examined in person. (Bincsik, Maezaki, and Hattori 2012, 46) One difficulty with photography, particularly when digitizing artwork, is that color is a function of light. Thus, a single object will often appear to be different colors when photographed in different lighting conditions using conventional digital cameras, which process images using RGB filters. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 41 More accurate representations of objects can be acquired using multispectral imaging, which uses a higher number of parameters (the international standard is 31, compared to RGB’s 3) in order to obtain more information about the reflectance of an object at any particular point in space (Novati, Pellegri, and Schettini 2005). Multispectral imaging, however, is very expensive and, despite some researchers’ attempts to create affordable systems (e.g., Novati, Pellegri, and Schettini 2005), the acquisition of multispectral images is generally limited to large institutions with considerable funding (Chane et al. 2013). The use of two-dimensional photography to digitize objects is not limited to the arts; in the natural sciences, different types of photographic equipment have been developed to document existing collections and enhance scientific observation. Gigapixel imaging, for example, has been utilized to allow museum visitors to virtually explore large petroglyphs located in remote locations as well as for documentation and viewing of dinosaur bone specimens that are not on public display (Louw and Crowley 2013). This technology consists of taking many, very high resolution photographs that are then, via computer software, “aligned, blended, and stitched” together to create one extremely detailed composite image (Louw and Crowley 2013, 89–90). Robotic systems, such as GigaPan, have been developed to speed up the process and permit rapid recording and processing of the necessary area. Once the gigapixel image is created, it can then be uploaded and displayed on the web in dynamic form, including spatial navigation of the image with embedded text, audio, or video at specific locations and zoom levels to provide further information (Louw and Crowley 2013). Various types of gigapixel imaging, including the GigaPan system, have also been used to digitize important collections of biological specimens, particularly insects, which are often stored in large drawers. One study examined the documentation of entomological specimens by “whole-drawer imaging” using various gigapixel imaging technologies (Holovachov, Zatushevsky, and Shydlovsky 2014). The researchers explained that different gigapixel imaging systems (many of which are commercial and proprietary) utilize different types of cameras and lenses, as well as different types of software for processing. However, despite the expensive cost of some commercially available systems, it is possible for museums and other institutions to create their own, economically viable versions. The system created by Holovachov, Zatushevsky, and Shydlovsky utilized a standard SLR camera, fitted with a macro lens and attached to an immovable stand. The researchers manually set up lighting, focus, aperture, and other settings, and moved the insect drawer along a pre-determined grid pattern in order to obtain the multiple overlapping photographs necessary to create a large gigapixel image. They used a freely available stitching software program and manually corrected stitching artifacts and color balance issues that resulted from the use of a non-telecentric lens.1 Despite the lower cost of their individualized system, however, the researchers noted that the process was much more time-consuming and necessitated more labor from workers digitizing the collection. Moreover, technologically speaking, the researchers emphasized the limits of two-dimensional imaging, given that the 1The difference between telecentric and non-telecentric lenses is explained by the researchers: “Contrary to ordinary photographic lenses, object-space telecentric lenses provide the same object magnification at all possible focusing distances. An object that is too close or too far from the focus plane and not in focus, will be the same size as if it were in focus. There is no perspective error and the image projection is parallel. Therefore, when such a lens is used to take images of pinned insects in a box, all vertical pins will appear strictly vertical, independent of their position within the camera’s field of view” (Holovachov, Zatushevsky, and Shydlovsky 2014, 7). LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 42 “diagnostic characteristics of three-dimensional insects,” as well as the accompanying labels, are often invisible when a drawer is only photographed from the top. Thus, the researchers concluded that, ultimately, “the whole-drawer digitizing of insect collections needs to be transformed from two-dimensions to three-dimensions by employing complex imaging techniques (simultaneous use of multiple cameras positioned at different angles) and a digital workflow” (Holovachov, Zatushevsky, and Shydlovsky 2014, 7). Three-Dimensional Digitization Given the goal of obtaining as accurate a representation as possible when digitizing objects, many researchers have turned to the use of various techniques in order to obtain three-dimensional data. Acquiring a three-dimensional image of an object takes place in three steps: 1. Preparation, during which certain preliminary activities take place that involve the decision about the technique and methodology to be adopted as well as the place of digitization, security planning issues, etc. 2. Digital recording, which is the main digitization process according to the plan from phase 1. 3. Data processing, which involves the modeling of the digitized object through the unification of partial scans, geometric data processing, texture data processing, texture mapping, etc. (Pavlidis et al. 2007, 94) Steps 2 and 3 have been more technically described as (2) obtaining data from an object to create point clouds (from thousands to billions of X,Y,Z coordinates representing loci on the object); and (3) processing point clouds into polygon models (creating a surface on top of the points), which can then be mapped with textures and colors (Metallo and Rossi 2011). There are several techniques that can be utilized to acquire three-dimensional data from a physical object. Table 1 explains the four general methods most commonly used by museums. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 43 Type Description Positives Negatives Approx. Price Range Laser Scanning A laser source emits light onto the object’s surface, which is detected by a digital camera; geometry of the object is extracted by triangulation or time of flight calculations High accuracy in capturing geometry; can capture small objects and entire buildings (using different hardware) Limited texture and color captured; shiny surfaces refract the laser $3,000– $200,000 White Light (Structured Light) Scanning A pattern of light is projected onto the object’s surface, and deformations in that pattern are detected by a digital camera; geometry is extracted by triangulation from deformations Captures texture details, making it very accurate; can capture color Dark, shiny, or translucent objects are problematic $15,000– $250,000 Photogrammetry Three-dimensional data is extracted from multiple two- dimensional pictures Can capture small objects and mountain ranges; good color information Need either precise placement of cameras or more precise software to obtain accurate data Cameras: $500– $50,000; Software: free– $40,000 Volumetric Scanning Magnetic resonance imaging (MRI) uses a strong magnetic field and radio waves to detect geometric, density, volume and location information; computed tomography (CT) uses rotating x-rays to create two- dimensional slices, which can then be reconstructed into three-dimensional images Both types can view the interior and exterior of an object; CT can be used for reflective or translucent objects; MRI can image soft tissues No color information; MRI requires object to have high water content $200,000– $2,000,000 Table 1. Description of four general methods of acquiring three-dimensional data about physical objects (table information compiled by reference to Pavlidis et al. 2007; Metallo and Rossi 2011; Abel et al. 2011; and Berquist et al. 2012). The type of three-dimensional digitization used can ultimately depend upon the types of objects to be imaged or the type of data needed. For example, in digitizing human skeletal collections, one study explained that three-dimensional laser scanning was an advantageous technique to create models of bones for preservation and analysis, but cautioned that CT scans would be needed to examine the internal structures of such specimens (Kuzminsky and Gardiner 2012). Another study LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 44 utilized several techniques in an attempt to decipher graffiti inscriptions on ancient Roman pottery shards, ultimately concluding that high-resolution photography (similar to gigapixel imaging) and three-dimensional laser scanning both provided detailed and helpful data (Montani et al. 2012). Additionally, sometimes multiple types of digitization can be used for the same objects with similar results. One study, for example, obtained virtually equivalent three- dimensional models of the same object using laser scanning and two types of photogrammetry (Lerma and Muir 2014). Most recently, researchers have been utilizing combinations of digitization techniques to obtain the most accurate representations possible. Chane et al. (2013), for example, examined methods of combining three-dimensional digitization with multispectral photography in order to obtain enhanced information concerning the physical object in question. The researchers explained that combining the two processes is difficult because, in order to obtain multispectral textural data that is mapped to geometric positions, the object must be imaged from identical locations by multiple scanners/cameras or else the data processing that combines the two types of data becomes extremely complex. As a compromise, the researchers created a system of optical tracking based on photogrammetry techniques that permits the collection and integration of geometric positioning data and multispectral textures utilizing precise targeting procedures. However, the researchers noted that most systems integrating multispectral photography with three- dimensional digitization tended to be quite bulky, did not adapt easily to different types of objects, and needed better processing algorithms for more complex three-dimensional objects (Chane et al. 2013). Public Access to Three-Dimensionally Digitized Objects Despite museums’ growing focus on increasing public access to collections via digitization (Given and McTavish 2010), there is very little literature addressing public access to three-dimensionally digitized objects. Indeed, studies in this realm tend to focus on the technological aspects of either the modeling of specific objects or collections or website viewing of three-dimensional models. For example, Abate et al. (2011) described the three-dimensional digitization of a particular statue from the scanning process to its ultimate depiction on a website. The researchers explained in detail the particular software architecture utilized in order to permit the remote rendering of the three-dimensional model on users’ computers via a Java applet without compromising quality or necessitating download of potentially copyrighted works. By contrast, literature concerning the Digital Michelangelo project, during which researchers three-dimensionally digitized various Michelangelo works, focused on the method used to create an accurate three-dimensional model, complete with color and texture mapping, and a visualization tool (Dellepiane et al. 2008). One study did describe a project that was designed to place three-dimensional data about various cultural artifacts in an online repository for curators and other professionals (Hess et al. 2011). This repository was contained within database management software, a web-based interface was designed for searching, and user access to three-dimensional images and models was provided via an ActiveX plugin. Despite the potential of the prototype, however, it appears that the project has ceased,2 and the institution’s current three-dimensional imaging project is focused on the design 2See http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e- curator. http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 45 of a traveling exhibition incorporating, among other things, three-dimensional models of artifacts and physical replicas created from such models.3 Studies that do address public access directly tend to focus on the improvement of museum websites generally. For example, in terms of user expectations of museum websites, one study found that approximately 63 percent of visitors to a museum’s website did so in order to search the digital collection (Kravchyna and Hastings 2002). Another study found four types of museum website users, who each had different needs and expectations of sites. Relevantly, educators sought collections that were “the more realistic the better,” including suggestions like incorporating three-dimensional simulations of physical objects so that students could “explore the form, construction, texture and use of objects” (Cameron 2003, 335). Further, non-specialist users “value free choice learning” and “access online collections to explore and discover new things and build on their knowledge base as a form of entertainment” (Cameron 2003, 335). Similarly, some studies have addressed the incorporation of Web 2.0 technologies into museum websites. Srinivasan et al. (2009), for example, argue that Web 2.0 technologies must be integrated into museum catalogs rather than simply layered over existing records because users’ interest in objects is increased by participation in the descriptive practice. An implementation of this concept is found in Hunter and Gerber’s (2010) system of social tagging attached to three- dimensional models. This paper is an effort to address the gap between the technical process of digitizing and presenting three-dimensional objects on the web and the user experience of such. Through the evaluation of five websites, this paper will provide some guidance for the digitization of three- dimensional objects and their presentation in digital collections for public access. METHODOLOGY AND EVALUATIVE CRITERIA Evaluations of digital museums are not as prevalent as evaluations of digital libraries. However, given the similar purposes of digital museums and digital libraries, it is appropriate to utilize similar criteria. For digital libraries, Saracevic (2000) synthesized evaluation criteria into performance questions in two broad areas: (a) user-centered questions, including how well the digital library supports the society or community served, how well it supports institutional or organizational goals, how well it supports individual users’ information needs, and how well the digital library’s interface provides access and interaction; and (b) system- centered questions, including hardware and network performance, processing and algorithm performance, and how well the content of the collection is selected, represented, organized, and managed. Xie (2008) focused on user-centered evaluation and found five general criteria that exemplified users’ own evaluations of digital libraries: interface usability, collection quality, service quality, system performance, and user satisfaction. Parandjuk (2010) used information architecture to construct criteria for the evaluation of a digital library, including the following: • uniformity of standards, including consistency among webpages and individual records; • findability, including ease of use and multiple ways to access the same information; • sub-navigation, including indexes, sitemaps, and guides; 3See http://www.3dencounters.com. http://www.3dencounters.com/ LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 46 • contextual navigation, including simplified searching and co-location of different types of resources; • language, including consistency in labeling across pages and records and appropriateness for the audience; and • integration of searching and browsing. This system is particularly appropriate in the context of digital museums, as it emphasizes the curatorial or organizational aspect of the collection in order to support learning objectives. In one comprehensive evaluation of the websites of art museums, Pallas and Economides (2008) created a framework for such evaluation, incorporating six dimensions: content, presentation, usability, interactivity and feedback, e-services, and technical. Each dimension then contained several specific criteria. Many of the criteria overlapped, however, and three-dimensional imaging, for example, was placed within the e-services dimension, under virtual tours, although it could have been placed within presentation, with other multimedia criteria, or even within interactivity, with interactive multimedia applications. The problem in trying to evaluate a particular part of a museum’s website, namely, the way it presents three-dimensional objects in digital form, is that the level of specificity almost renders many of the evaluation criteria from previous studies irrelevant. As Hariri and Norouzi (2011) suggest, evaluation criteria should be based on the objective of the evaluation. Hence, based on portions of the above-referenced studies, this author has created a more focused evaluation framework, concentrating on criteria that are particularly relevant to museums’ digital presentations of three-dimensional objects. This framework is detailed in table 2, below. Dimension Description Functionality What technology is used to display the object? How well does it work? Must programs or files be downloaded? Are the loading times of displays acceptable? Usability How easy is the site to use? What is the navigation system? Are there searching and browsing functions, and how well does each work? How findable are individual objects? Presentation How does the display of the object look? What is the context in which the object is presented? Are there multiple viewing options? Is there any interactivity permitted? Content Does the site provide an adequate collection of objects? For individual objects, is there sufficient information provided? Is there additional educational content? Table 2. Summary of evaluative criteria Five digital collections, specified below, will be evaluated based on these criteria. This will be done in a case study manner, describing each website based on the above criteria and then using those evaluations to make suggestions for best practices. RESULTS INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 47 It is difficult to compare different types of digital collections, particularly when the focus is on different types of technology utilized to display similar objects. However, because the goal here is to determine the best practices for the digital presentation of three-dimensional objects, it is important to evaluate a variety of techniques in a variety of fields. Thus, the following digital collections have been chosen to illustrate different ways in which such objects can be displayed on a website. Museum of Fine Arts, Boston (MFA) (http://www.mfa.org/collections) The MFA, both in person and online, boasts a comprehensive and extensive collection of art and historical artifacts of varying forms. The website is very easy to navigate, with well-defined browsing options and easy search capabilities, allowing for refinement of results by collection or type of item. There are many collections, which are well organized and curated into separate exhibits and galleries. In addition, when viewing each gallery, suggestions are linked for related online exhibitions as well as tours and exhibits at the physical museum. Each item record contains a detailed description of the item as well as its provenance. Thus, the MFA website attains a very high rating for usability and content. However, individual items are represented by only single pictures of varying quality. Some pictures are color, some are black and white, and no two pictures appear to have the same lighting. Additionally, despite being slow to load, even the pictures that appear to be of the best quality cannot be of high resolution, as zooming in makes them slightly blurry. Accordingly, the MFA website receives a medium rating for functionality and a low rating for presentation. Digital Fish Library (DFL) (http://www.digitalfishlibrary.org/index.php) The DFL project is a comprehensive program that utilizes MRI scanning to digitize preserved biological fish samples from a particular collection housed at the Scripps Institution of Oceanography. After MRI scans of a specimen are taken, the data is processed and translated into various views that are placed on the website, accompanied by information about each species (Berquist et al. 2012). Navigating the DFL website is very intuitive, as the individual specimen records are organized by taxonomy. It is easy to search for particular species or browse through the clickable, pictorial interface. Records for each species include detailed information about the individual specimen, the specifics of the scans used to image each, and broader information about the species. Individual records also provide links to other species within the taxonomic family. Thus, the DFL website attains high ratings in both usability and content. For functionality and presentation, however, the ratings are medium. Although for each item there are videos and still images obtained from three- dimensional volume renderings and MRI scans, they are small in size and have low resolution. There is no interactive component, with the possible exception of the “digital fish viewer” that supposedly requires Java, but this author could not get it to work despite best efforts. One nice feature, shown in figure 1 below, is that some of the specimen records have three-dimensional renderings showing and explaining the internal structures of the species. http://www.mfa.org/collections http://www.digitalfishlibrary.org/index.php LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 48 Figure 1. Annotated three-dimensional rendering of internal structures of hammerhead shark, from the Digital Fish Library (http://www.digitalfishlibrary.org/library/ViewImage.php?id=2851) The Eton Myers Collection (http://etonmyers.bham.ac.uk/3D-models.html) The Eton Myers Collection of ancient Egyptian art is housed at Eton College, and a project to three- dimensionally digitize the items for public access was undertaken via collaboration between that institution and the University of Birmingham. Digitization was accomplished with three- dimensional laser scanners, data was then processed with Geomagic software to produce point cloud and mesh forms, and individual datasets were reduced in size and converted into an appropriate file type to allow for public access (Chapman, Gaffney, and Moulden 2010). Usability of the Eton Myers Collection website is extremely low. The initial interface is simply a list of three-dimensional models by item number with a description of how to download the appropriate program and files. Another website from the University of Birmingham (http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=The+Eton+Myers+Col lection) contains a more museum-like interface, but contains many more records for objects than are contained on the initial list of three-dimensional models. Moreover, most of the records do not even include pictures of the items, let alone links to the three-dimensional models, and the records that do include pictures do not necessarily include such links. Even when a record has a link to the three-dimensional model, it actually redirects to the full list of models rather than to the individual item. There is no search functionality from the initial list of three-dimensional models, and no way to browse other than to, colloquially speaking, poke and hope. Individual items are only identified by item number, and, aside from the few records that have accompanying pictures on the University of Birmingham site, there is no way to know to what item any given number refers. The http://www.digitalfishlibrary.org/library/ViewImage.php?id=2851 http://etonmyers.bham.ac.uk/3D-models.html http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=The+Eton+Myers+Collection http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=The+Eton+Myers+Collection INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 49 website attains only a low rating for content; although it seems that there may be a decent number of items in the collection, it is impossible to know for certain given the problems with the interface and the fact that individual items are virtually unidentified. The Eton Myers Collection website also receives a low rating for functionality. In order to access three-dimensional models of items, users must download and install a program called MeshLab, then download individual folders of compressed files, then unzip those files, and finally open the appropriate file in MeshLab. Despite compression, some of the file folders are still quite large and take some time to download. Presentation of the items is also rated low. Even for the high - resolution versions of the three-dimensional renderings, viewed in MeshLab, the geometry of the objects seems underdeveloped (e.g., hieroglyphics are illegible) and surface textures are not well mapped (e.g., colors are completely off). This is evident from a comparison of the three- dimensional rendering with a two-dimensional photograph of the same item, as in figure 2, below. Figure 2. Comparison of original photograph (left) and three-dimensional rendering (right) of Item Number ECM 361, from the Eton Myers Collection (http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+3 61&op-earliest_year=%3D&op-latest_year=%3D). Notably, Chapman, Gaffney, and Moulden (2010) indicate that the detailed three-dimensional imaging enabled them to identify tooling marks and read previously unclear hieroglyphics on certain items. Thus, it is possible that the problems with the renderings may be a result of a loss in quality between the original models and the downloaded versions, particularly given that the files were reduced in size and converted prior to being made available for download. http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3D&op-latest_year=%3D http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3D&op-latest_year=%3D LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 50 Epigraphia 3D Project (http://www.epigraphia3d.es) The Epigraphia 3D project was created to present an online collection of various historical Roman epigraphs (also known as inscriptions) that were discovered and excavated in Spain and Italy; the physical collection is housed at the Museo Arqueológico Nacional (Madrid). Digital imaging was accomplished using photogrammetry, free software was utilized to create three-dimensional object models and renderings, and Photoshop was used to obtain appropriate textures. Finally, the three-dimensional model was published on the web using Sketchfab, a web service similar to Flickr that allows in-browser viewing of three-dimensional renderings in many different formats (Ramírez-Sánchez et al. 2014). The Epigraphia 3D website is intuitive and informative. Browsing is simple because there are not many records, but, although it is possible to search the website, there is no search function specifically directed to the collection. Thus, usability is rated as medium. Despite the fact that the website provides descriptions of the project and the collection, as well as information about epigraphs generally, the website attains a medium rating for content in light of the small size of the collection and the limited information given for each individual item. However, the Epigraphia 3D website receives very high ratings for functionality and presentation. The individual three- dimensional models are detailed, legible, and interactive. Individual inscriptions are transcribed for each item. The use of Sketchfab to display the models is effective; no downloading is necessary, and it takes an acceptable amount of time to load. When viewing the item, users can rotate the object in either “orbit” or “first person” mode, as well as view it full-screen or within the browser window. Users can also display the wireframe model and the textured or surfaced rendering, as shown in Figure 3 below. Figure 3. Three-dimensional textured (left) and wireframe (middle) renderings from the Epigraphia 3D project (http://www.epigraphia3d.es/3d-01.html), as compared to an original two- dimensional photograph of the same object (right) (http://eda- bea.es/pub/record_card_1.php?refpage=%2Fpub%2Fsearch_select.php&quicksearch=dapynus&r ec=19984). http://www.epigraphia3d.es/ http://www.epigraphia3d.es/3d-01.html http://eda-bea.es/pub/record_card_1.php?refpage=%2Fpub%2Fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2Fpub%2Fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2Fpub%2Fsearch_select.php&quicksearch=dapynus&rec=19984 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 51 Smithsonian X 3D (http://3d.si.edu) The Smithsonian X 3D project, although affiliated with all of the Smithsonian’s varying divisions, was created to test the application of three-dimensional digitization techniques to “iconic collection objects” (http://3d.si.edu/about). The website provides significant detail concerning the project itself, mostly in the form of videos, and individual items, many of which are linked to “tours” that incorporate a story about the object. Content is rated as medium because, despite the depth of information provided about individual items, there are still very few items within the collection. The website also receives a medium rating for usability, given the simple browsing structure, easy navigation, and lack of a search feature (all likely due at least in part to the limited content). Functionality and presentation, however, are rated high. The X3D Explorer in-browser software (powered by Autodesk) does more than simply display a three-dimensional rendering of an object; it also permits users to edit the model by changing color, lighting, texture, and other variables as well as incorporates detailed information about each item, both as an overall description and as a slide show, where snippets of information are connected to specific views of the item. The individual three-dimensional models are high resolution, detailed, and well- rendered, with very good surface texture mapping. However, it must be noted that the X3D Explorer tool is in Beta and, as such, still has some bugs; for example, this author has observed a model disappear while zooming in on the rendering. Table 3, below, summarizes the results of the evaluation. Functionality Usability Presentation Content MFA Medium Very High Low Very High DFL Medium High Medium High Eton Myers Low Low Low Low Epigraphia 3D Very High Medium Very High Medium Smithsonian X 3D High Medium High Medium Table 3. Summary of evaluation results for each website by individual criteria DISCUSSION Based on the evaluation of the five websites described above, some suggested best practices for the digitization and presentation of three-dimensional objects become apparent. When digitizing, the museum should utilize the method that best suits the object or collection. For example, while MRI scanning is likely the best method for three-dimensionally digitizing biological fish specimens, it is not going to be effective or feasible for digitizing artwork or artifacts (Abel et al. 2011; Berquist et al. 2012). Regardless of the method of digitization used, however, the people conducting the imaging and processing should fully comprehend the hardware and software necessary to complete the task. Additionally, although financial restraints must be considered, museums should note that some three-dimensional scanning equipment is just as economically feasible as standard digital cameras (Metallo and Rossi 2011). However, if a museum chooses to utilize only two-dimensional imaging, http://3d.si.edu/ http://3d.si.edu/about LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 52 each item should be photographed from multiple angles in high resolution, to avoid creating a website, like the MFA’s, on which everything other than the object itself is presented outstandingly. Further, museums deciding on two-dimensional imaging should explore the possibility of utilizing photogrammetry to create three-dimensional models from their two- dimensional photographs, like the Epigraphia 3D project. There is free or inexpensive software that functions to permit the creation of three-dimensional object maps from very few photographs (Ramírez-Sánchez et al. 2014). Finally, compatibility is a key issue when conducting three- dimensional scans; the museum should ensure that the software used for rendering models is compatible with the way in which users will be viewing the models. In the context of public access to the museum’s digital collections, the website should be easy and intuitive to navigate. The MFA website is an excellent example; browsing and search functions should both be present, and reorganization of large numbers of objects into separate collections may be necessary. Where searching is going to be the primary point of entry into the collection, it is important to have sufficient metadata and functional search algorithms to ensure that item records are findable. Furthermore, remember that the website is simply a way to access the museum itself. Hence, the collections on the website, like the collections in the physical museum, should be curated; there should be a logical flow to accessing object records. The museum may also want to have sections that are similar to virtual exhibitions, like the “tours” provided by the Smithsonian X 3D project. Finally, museums should ensure that no additional technological know-how (beyond being able to access the internet) is required to access the three-dimensional content in object records. Users should not be required to download software or files to view records; Epigraphia 3D’s use of Sketchfab and the Smithsonian’s X 3D Explorer tool are both excellent examples of ways in which three-dimensional content can be viewed on the web without the need for extraneous software. Museums and cultural heritage institutions are increasing the focus on providing public access to collections via digitization and display on websites (Given and McTavish 2010). In order to do this effectively, this paper has attempted to provide some guidance as to best practices of presenting digital versions of three-dimensional objects. In closing, however, it must be noted that this author is not a technician. Although this paper has tried to contend with the issues from the perspective of a librarian, there are complicated technical concerns behind any digitization project that have not been adequately addressed. In addition, this paper has not examined the role of budgetary constraints on digitization or the concomitant issues of creating and maintaining websites. Moreover, because this paper has been treated as a broad overview of the digitization and presentation for public access of three-dimensional objects, the five websites evaluated were from varying fields of study. Museums should look to more specific comparisons in order to appropriately digitize and present their collections on the web. CONCLUSION There may not be a direct substitute for encountering an object in person, but for people who cannot obtain physical access to three-dimensional objects, the digital realm can serve as an adequate proxy. This paper has demonstrated, through an evaluation of five distinct digital collections, that utilizing three-dimensional imaging and presenting three-dimensional models of physical objects on the web can serve the important purpose of increasing public access to otherwise unavailable collections. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 53 REFERENCES Abate, D., R. Ciavarella, G. Furini, G. Guarnieri, S. Migliori, and S. Pierattini. “3D Modeling and Remote Rendering Technique of a High Definition Cultural Heritage Artefact.” Procedia Computer Science 3 (2011): 848–52. http://dx.doi.org/10.1016/j.procs.2010.12.139. Abel, R. L., S. Parfitt, N. Ashton, Simon G. Lewis, Beccy Scott, and C. Stringer. “Digital Preservation and Dissemination of Ancient Lithic Technology with Modern Micro-CT.” Computers and Graphics 35, no. 4 (August 2011): 878–84. http://dx.doi.org/10.1016/j.cag.2011.03.001. Berquist, Rachel M., Kristen M. Gledhill, Matthew W. Peterson, Allyson H. Doan, Gregory T. Baxter, Kara E. Yopak, Ning Kang, H.J. Walker, Philip A. Hastings, and Lawrence R. Frank. “The Digital Fish Library: Using MRI to Digitize, Database, and Document the Morphological Diversity of Fish.” PLoS ONE 7, no. 4: (April 2012). http://dx.doi.org/10.1371/journal.pone.0034499. Bincsik, Monika, Shinya Maezaki, and Kenji Hattori. “Digital Archive Project to Catalogue Exported Japanese Decorative Arts.” International Journal of Humanities and Arts Computing 6, no. 1– 2 (March 2012): 42–56. http://dx.doi.org/10.3366/ijhac.2012.0037. Cameron, Fiona. “Digital Futures I: Museum Collections, Digital Technologies, and the Cultural Construction of Knowledge.” Curator: The Museum Journal 46, no. 3 (July 2003): 325–40. http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x. Chane, Camille Simon, Alamin Mansouri, Franck S. Marzani, and Frank Boochs. “Integration of 3D and Multispectral Data for Cultural Heritage Applications: Survey and Perspectives.” Image and Vision Computing 31, no. 1 (January 2013): 91–102. http://dx.doi.org/10.1016/j.imavis.2012.10.006. Chapman, Henry P., Vincent L. Gaffney, and Helen L. Moulden. “The Eton Myers Collection Virtual Museum.” International Journal of Humanities and Arts Computing 4, no. 1–2 (October 2010): 81–93. http://dx.doi.org/10.3366/ijhac.2011.0009. Dellepiane, M., M. Callieri, F. Ponchio, and R. Scopigno. “Mapping Highly Detailed Colour Information on Extremely Dense 3D Models: The Case of David's restoration.” Computer Graphics Forum 27, no. 8 (December 2008): 2178–87. http://dx.doi.org/10.1111/j.1467- 8659.2008.01194.x. Given, Lisa M., and Lianne McTavish. “What’s Old Is New Again: The Reconvergence of Libraries, Archives, and Museums in the Digital Age.” Library Quarterly 80, no. 1 (January 2010): 7– 32. http://dx.doi.org/10.1086/648461. Hariri, Nadjla, and Yaghoub Norouzi. “Determining Evaluation Criteria for Digital Libraries’ User Interface: A Review.” The Electronic Library 29, no. 5 (2011): 698–722. http://dx.doi.org/10.1108/02640471111177116. Hess, Mona, Francesca Simon Millar, Stuart Robson, Sally MacDonald, Graeme Were, and Ian Brown. “Well Connected to Your Digital Object? E-curator: A Web-Based E-Science Platform for Museum Artefacts.” Literary and Linguistic Computing 26, no. 2 (2011): 193– 215. http://dx.doi.org/10.1093/llc/fqr006. http://dx.doi.org/10.1016/j.cag.2011.03.001 http://dx.doi.org/10.1371/journal.pone.0034499 http://dx.doi.org/10.3366/ijhac.2012.0037 http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x http://dx.doi.org/10.1016/j.imavis.2012.10.006 http://dx.doi.org/10.3366/ijhac.2011.0009 http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1086/648461 http://dx.doi.org/10.1108/02640471111177116 http://dx.doi.org/10.1093/llc/fqr006 LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 54 Holovachov, Oleksandr, Andriy Zatushevsky, and Ihor Shydlovsky. “Whole-Drawer Imaging of Entomological Collections: Benefits, Limitations and Alternative Applications.” Journal of Conservation and Museum Studies 12, no. 1 (2014): 1–13. http://dx.doi.org/10.5334/jcms.1021218. Hunter, Jane, and Anna Gerber. 2010. “Harvesting Community Annotations on 3D Models of Museum Artefacts to Enhance Knowledge, Discovery and Re-Use.” Journal of Cultural Heritage 11, no. 1 (2010): 81–90. http://dx.doi.org/10.1016/j.culher.2009.04.004. Jarrell, Michael C. “Providing Access to Three-Dimensional Collections.” Reference & User Services Quarterly 38, no. 1 (1998): 29–32. Kravchyna, Victoria, and Sam K. Hastings. “Informational Value of Museum Web Sites.” First Monday 7, no. 4 (February 2002). http://dx.doi.org/10.5210/fm.v7i2.929. Kuzminsky, Susan C. and Megan S. Gardiner. “Three-Dimensional Laser Scanning: Potential Uses for Museum Conservation and Scientific Research.” Journal of Archaeological Science 39, no. 8 (August 2012): 2744–51. http://dx.doi.org/10.1016/j.jas.2012.04.020. Lerma, José Luis, and Colin Muir. “Evaluating the 3D Documentation of an Early Christian Upright Stone with Carvings from Scotland with Multiples Images.” Journal of Archaeological Science 46 (June 2014): 311–18. http://dx.doi.org/10.1016/j.jas.2014.02.026. Louw, Marti, and Kevin Crowley. “New Ways of Looking and Learning in Natural History Museums: The Use of Gigapixel Imaging to Bring Science and Publics Together.” Curator: The Museum Journal 56, no. 1 (January 2013): 87–104. http://dx.doi.org/10.1111/cura.12009. Metallo, Adam, and Vince Rossi. “The Future of Three-Dimensional Imaging and Museum Applications.” Curator: The Museum Journal 54, no. 1 (January 2011): 63–69. http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x. Montani, Isabelle, Eric Sapin, Richard Sylvestre, and Raymond Marquis . “Analysis of Roman Pottery Graffiti by High Resolution Capture and 3D Laser Profilometry.” Journal of Archaeological Science 39, no. 11 (2012): 3349–53. http://dx.doi.org/10.1016/j.jas.2012.06.011. Newell, Jenny. “Old Objects, New Media: Historical Collections, Digitization and Affect.” Journal of Material Culture 17, no. 3 (September 2012): 287–306. http://dx.doi.org/10.1177/1359183512453534. Novati, Gianluca, Paolo Pellegri, and Raimondo Schettini. “An Affordable Multispectral Imaging System for the Digital Museum.” International Journal on Digital Libraries 5, no. 3 (May 2005): 167–78. http://dx.doi.org/10.1007/s00799-004-0103-y. Pallas, John, and Anastasios A. Economides. “Evaluation of Art Museums' Web Sites Worldwide.” Information Services and Use 28, no. 1 (2008): 45–57. http://dx.doi.org/10.3233/ISU- 2008-0554. Parandjuk, Joanne C. “Using Information Architecture to Evaluate Digital Libraries.” The Reference Librarian 51, no. 2 (2010): 124–34. http://dx.doi.org/10.1080/02763870903579737. http://dx.doi.org/10.5334/jcms.1021218 http://dx.doi.org/10.1016/j.culher.2009.04.004 http://dx.doi.org/10.5210/fm.v7i2.929 http://dx.doi.org/10.1016/j.jas.2012.04.020 http://dx.doi.org/10.1016/j.jas.2014.02.026 http://dx.doi.org/10.1111/cura.12009 http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x http://dx.doi.org/10.1016/j.jas.2012.06.011 http://dx.doi.org/10.1177/1359183512453534 http://dx.doi.org/10.1007/s00799-004-0103-y http://dx.doi.org/10.3233/ISU-2008-0554 http://dx.doi.org/10.3233/ISU-2008-0554 http://dx.doi.org/10.1080/02763870903579737 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 55 Pavlidis, George, Anestis Koutsoudis, Fotis Arnaoutoglou, Vassilios Tsioukas, and Christodoulos Chamzas. “Methods for 3D Digitization of Cultural Heritage.” Journal of Cultural Heritage 8, no. 1 (2007): 93–98, http://dx.doi.org/10.1016/j.culher.2006.10.007. Ramírez-Sánchez, Manuel, José-Pablo Suárez-Rivero, and María-Ángeles Castellano-Hernández. “Epigrafía digital: tecnología 3D de bajo coste para la digitalización de inscripciones y su acceso desde ordenadores y dispositivos móviles.” El Profesional de la Información 23, no. 5 (2014): 467–74. http://dx.doi.org/10.3145/epi.2014.sep.03. Saracevic, Tefko. “Digital Library Evaluation: Toward an Evolution of Concepts.” Library Trends 49, no. 3 (2000): 350–69. Srinivasan, Ramesh, Robin Boast, Jonathan Furner, and Katherine M. Becvar. “Digital Museums and Diverse Cultural Knowledges: Moving past the Traditional Catalog.” The Information Society 25, no. 4 (2009): 265–78, http://dx.doi.org/10.1080/01972240903028714. Xie, Hong Iris. “Users’ Evaluation of Digital Libraries (DLs): Their Uses, Their Criteria, and Their Assessment.” Information Processing and Management 44, no. 3 (May 2008): 1346–73, http://dx.doi.org/10.1016/j.ipm.2007.10.003. http://dx.doi.org/10.1016/j.culher.2006.10.007 http://dx.doi.org/10.3145/epi.2014.sep.03 http://dx.doi.org/10.1080/01972240903028714 http://dx.doi.org/10.1016/j.ipm.2007.10.003 INTRODUCTION
9446 ---- Microsoft Word - December_ITAL_Biswas_final.docx Analyzing Digital Collections Entrances: What Gets Used and Why It Matters Paromita Biswas and Joel Marchesoni INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 19 ABSTRACT This paper analyzes usage data from Hunter Library’s digital collections using Google Analytics for a period of twenty-seven months from October 2013 through December 2015. The authors consider this data analysis to be important for identifying collections that receive the largest number of visits. We argue this data evaluation is important in terms of better informing decisions for building digital collections that will serve user needs. The authors also study the benefits of harvesting to sites such as the Digital Public Library of America, and they believe this paper will contribute to the literature on Google Analytics and its use by libraries. INTRODUCTION Hunter Library at Western Carolina University (WCU) has fourteen digital collections hosted in CONTENTdm—a digital collection management system from OCLC. Users can enter the collections in various ways—through the Library’s CONTENTdm landing pages,1 search engines, or sites such as the Digital Public Library of America (DPLA) where all the collections are harvested.2 Since October 2013, the Library has collected usage data from its collections’ websites and from DPLA referrals via Google Analytics. This paper analyzes this usage data covering a period of approximately twenty-seven months from October 2013 through December 2015. The authors consider this data analysis important for identifying collections receiving the largest number of visits, including visits through harvesting sites such as the DPLA. The authors argue that such data evaluation is important because it can better inform decisions taken to build collections that will attract users and serve their needs. Additionally, this analysis of usage data generated from harvesting sites such as the DPLA demonstrates the usefulness of harvesting in increasing digital collections’ usage. Lastly, this paper contributes to the broader literature on Google Analytics and its use by libraries in data analysis. LITERATURE REVIEW Using Google Analytics to study usage of electronic resources is common; a considerable amount of material exists describing the use of Google Analytics in marketing and business fields.3 Paromita Biswas (pbiswas@email.wcu.edu) is Metadata Librarian and Joel Marchesoni (jmarch@email.wcu.edu) is Technology Support Analyst, Hunter Library, Western Carolina University, Cullowhee, North Carolina. ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 20 However, the published literature offers little about the use of this software for studying usage of collections consisting of unique materials digitized and placed online by libraries and cultural heritage organizations. For example, Betty has written about using Google Analytics to track statistics for user interaction with librarian-created digital media such as quizzes and video tutorials.4 Fang discusses using Google Analytics to track the behavior of users who visited the Rutgers-Newark Law Library website.5 Fang looked at the number of visitors, what and how many pages they visited, how long they stayed on each page, where they were coming from, and which search engine or website had referred them to the library’s website. Findings were evaluated and used to make improvements to the library’s website. For example, Fang mentions using Google Analytics data for tracking the percentage of new and returning visitors before and after the website redesign. Among articles that discuss using web analytics to learn how users access digital collections, most have focused on a comparison between third-party platforms, online search engines, and the traditional library catalog to find preferred modes of access and whether results call for a shift in how libraries share their digital collections. For example, in their article on the impact of social media platforms such as HistoryPin and Pinterest on the discovery and access of digital collections, Baggett and Gibbs use Google Analytics for tracking usage of digital objects on the library’s website as well statistics collected from HistoryPin’s and Pinterest’s first-party analytics tools.6 The authors conclude that while neither HistoryPin nor Pinterest drive users back to the library’s website, they help in the discovery of digital collections and can enhance user access to library collections. Schlosser and Stamper compare the effects on usage of a collection housed in an institutional repository and reposted on Flickr.7 Whether housing a collection on a third-party site had an adverse effect on attracting traffic to the library’s website was not as important as ensuring users accessed the collection somewhere. Likewise, O’English demonstrates how data from web analytics were used to compare access to archival materials via online search engines as opposed to library catalogs using MARC records for descriptions.8 O’English argues library practices should change accordingly to promote patron access and use. Ladd’s article on the access and use of a digital postcard collection from Miami University uses statistics from Google Analytics, CONTENTdm, and Flickr over a period of one year.9 Ladd’s findings reveal that few users came to the main digital collections website to search and browse; instead, most arrived via external sources such as search engines and social media sites. The resulting increase in views makes it imperative, Ladd asserts, that regular updates both in CONTENTdm and Flickr are important for promoting access and use of the postcards. Articles on using Google Analytics for tracking digital collection usage have explored tracking the geographic base of users. For example, Herold uses Google Analytics to demonstrate usage of a digital archival collection by users at institutional, national, and international levels.10 Herold looks at server transaction logs maintained in Google Analytics, on- and off-campus searching counts, user locations, and repeat visitors to the archival images representing cultural heritage materials related to Orang Asli peoples and cultures of Malaysia. She uses these data to ascertain INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 21 the number of users by geographic region and determine that, while most visitors came from the United States, Malaysia ranked second. The data supported, according to Herold, that this particular digital collection was able to reach another target audience: users from Malaysia. Herold’s findings indicate that digitization of unique materials makes them available to a worldwide audience. Whether harvesting has increased usage of digital collections available via DPLA or its hubs has received limited exploration in the literature. Most writings on harvesting digital collections have focused more on the technical aspects of the process, like the DPLA’s ingestion method, the quality and scalability of metadata remediation and enhancement,11 and large metadata encoding.12 For example, Gregory and Williams write about the North Carolina Digital Heritage Center as one of the service hubs of the DPLA. The service hubs are centers that aggregate digital collection metadata provided by institutions for harvesting by the DPLA. The authors discuss metadata requirements, software review, and establishment of workflow for sending large metadata feeds to the DPLA.13 Boyd, Gilbert, and Vinson, in their article on the South Carolina Digital Library (SCDL), another service hub for DPLA, describe the planning behind setting up the SCDL, its management, and the technology involved in metadata harvesting.14 Freeland and Moulaison discuss the Missouri hub as a model for “institutions with similar collective goals for exposing and enriching their data through the DPLA.”15 According to them, by harvesting their metadata to the DPLA, institutions are able to share their digital collections with the broader public. Additionally, institutions that harvest metadata to the DPLA get value-added services like geocoding of location- based metadata and expression of contributed metadata as linked data. Data Collection Parameters Hunter Library digital collections usage data included information on item views16 and referrals17 for each of the collections including DPLA referrals. The authors also considered keyword search terms18 across all referrals, and within CONTENTdm specifically, that brought users to the Library’s collections. The authors considered the most frequently occurring keywords to be representing the subjects of collections that were most used. Repeat visitors to the Library’s digital collections’ website were also tracked. Finally, sessions19 were traced by the geographic area20 of the users. Hunter Library’s collections vary in size. The Library’s largest and one of the oldest collections, Craft Revival [Note: collections are set in roman and capitalized] showcases documents, photographs, and craft objects housed in Hunter Library and smaller regional institutions. The collection’s items represent the late nineteenth and early twentieth century (1890s–1940s) Craft Revival movement in Western North Carolina, which was characterized by a renewed interest in handmade objects, including Cherokee arts and crafts. The Craft Revival collection began in 2005 and includes 1,982 items. The second largest collection, Great Smoky Mountains, which highlights efforts that went into the establishment of the park and includes photographs on the landscape and flora and fauna in the park, began in 2012 and consists of 1,829 items. Not all digital ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 22 collections were harvested to the DPLA at the same time. While some older collections were harvested to the DPLA in 2013, smaller, institution-specific collections started later were also harvested later. For example WCU—Oral Histories, a collection of interviews collected by students of one of WCU’s history classes documenting the history and culture of Western North Carolina and the lives of WCU athletes or artists’ like Josephina Niggli who taught drama at WCU; Highlights from WCU, a collection of unique items from WCU’s Mountain Heritage Center and other departments on campus, including letters from the Library’s Special Collections transcribed by WCU’s English department students; and WCU—Fine Art Museum, showcasing art work from the university’s Fine Art Museum, were harvested to the DPLA in 2015. As these smaller collections were started later, their total item views and referral counts would likely be less than some of the Library’s older collections; however, these newer collections were included as they might provide valuable data regarding harvesting referrals and returning visitors. Table 1 shows the years the collections were started, the number of items included in each collection, and the year they were harvested to the DPLA. Collection Name Start Year Collection Size (Number of Items) Harvested Since Cherokee Traditions 2011 332 2013 Civil War 2011 68 2013 Craft Revival 2005 1,982 2013 Great Smoky Mountains 2013 1,829 2013 Highlights from WCU 2015 39 2015 Horace Kephart 2005 552 2013 Picturing Appalachia 2012 972 2013 Stories of Mountain Folk 2012 374 2013 Travel Western North Carolina 2011 160 2013 WCU—Fine Art Museum 2015 87 2015 WCU—Herbarium 2013 91 2013 WCU—Making Memories 2012 408 2013 WCU—Oral Histories 2015 67 2015 Western North Carolina Regional Maps 2015 37 2015 Table 1. Collections by year INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 23 Collecting Data Using Google Analytics The Library has had Google Analytics set up on online exhibits—websites outside of CONTENTdm that provide additional insight into the collection—since 2008 and began using Google Analytics to track its CONTENTdm materials with the 6.1.2 release in October 2013. CONTENTdm version 6.4 introduced a configuration field that allowed the authors to enter a Google Analytics ID and automatically generate the tracking code in pages to simplify the setup. Following that software update, OCLC made Google Analytics the default data logging mechanism. The Library set up Google Analytics such that online exhibits are tracked together with their CONTENTdm collections. This is accomplished by using custom tracking on all webpages and a custom script in CONTENTdm. This allows the Library to link its CONTENTdm and wcu.edu domains within Google Analytics so that sessions can be viewed across all online digital collections. Data were collected from Google Analytics using several tools. Google provides an online tool called Query Explorer (https://ga-dev-tools.appspot.com/query-explorer/) that can create and execute custom searches against Google Analytics. This application was used to craft the queries. Microsoft Excel was primarily used to download data, using the custom plugin Rest to Excel Library (http://ramblings.mcpher.com/Home/excelquirks/json/rest) to parse information from Google Analytics into worksheets. The Excel add-on works well, but requires knowledge of Microsoft Visual Basic for Applications (VBA) programming to use effectively. This limitation prompted the authors to look for a simpler way of retrieving data. The authors found OpenRefine (https://github.com/OpenRefine/OpenRefine) to collect, sort, and filter data, with Excel used for results analysis. Once in Excel, formulas were used to mine data for specific targets. RESULTS ANALYSIS The data collected using Google Analytics spanned a period of approximately twenty-seven months, from October 2013 through December 2015. Table 1 and graph 1 show each collection’s item views, item referrals, and size (number of items in the collection). These numbers were calculated for each collection as a percentage of total item views, total items referrals, and total number of items for all collections together. In table 2, the top five collections in terms of items views and referrals are highlighted. Graph 1, a graphical representation of table 2, displays more starkly the differences between collections in terms of views and referrals. ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 24 Collection Name Item Views as Percentage of Total Views Item Referrals as Percentage of Total Referrals Number of Items as Percentage of Total Items for all Collections Cherokee Traditions 6.38 6.12 4.74 Civil War 1.89 0.88 0.97 Craft Revival 41.35 52.39 28.32 Great Smoky Mountains 7.50 6.34 26.14 Highlights from WCU 0.23 0.08 0.56 Horace Kephart 11.67 7.62 7.89 Picturing Appalachia 10.03 9.99 13.89 Stories of Mountain Folk 3.51 2.45 5.344 Travel Western North Carolina 7.87 9.57 2.29 WCU—Fine Art Museum 0.19 0.08 1.24 WCU—Herbarium 0.71 0.45 1.30 WCU—Making Memories 7.13 2.64 5.83 WCU—Oral Histories 0.80 1.08 0.96 Western North Carolina Regional Maps 0.26 0.11 0.53 Total 100.00 100.00 100.00 Table 2. Collections by percentage Graph 1. Collections by percentage INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 25 As demonstrated in the preceding table and graph, Craft Revival, one of the Library’s oldest and largest collections, contributes more than 28 percent of all digital collections’ items and garners close to 42 percent of all item views and 53 percent of all item referrals. Great Smoky Mountains, the second largest collection, contributes a little more than 26 percent of items but receives only about 8 percent of all item views and 7 percent of all referrals. The Horace Kephart collection, focusing on the life and works of Horace Kephart—author, librarian, and outdoorsman who made the mountains of Western North Carolina his home later in life—is the Library’s fourth largest collection. It receives almost 12 percent of all item views and about 8 percent of all item referrals. Picturing Appalachia, the third largest collection—consisting of photographs showcasing the history, culture, and natural landscape of Southern Appalachia in the Western North Carolina region—makes up 14 percent of items and receives approximately 10 percent of all referrals and views. Travel Western North Carolina—visual journeys of Western North Carolina communities through three generations—contributes fewer than 3 percent of items but scores high on both items views and referrals. WCU—Making Memories, which highlights the people, buildings, and events from WCU’s history, and Stories of Mountain Folk (SOMF), which is a collection of radio programs from Western North Carolina non-profit Catch the Spirit of Appalachia and archived at Hunter Library, are collections that are similar in size—receiving fewer than 3 percent of all item referrals. However, WCU—Making Memories receives a more than 7 percent of all item views compared to SOMF’s almost 4 percent. These findings are not surprising as the Making Memories collection documents Western Carolina University’s history and may receive many views from within the institution. Overall, however, the Craft Revival collection can be considered the Library’s most popular collection. The Horace Kephart collection appears to be the second most popular collection. And, not surprisingly, Cherokee Traditions, a collection of art objects, photographs, and recordings similar in content to the Craft Revival in terms of its focus on Cherokee culture and history, is quite popular and receives more item referrals than both WCU—Making Memories and SOMF and more item views than SOMF (table 2). An analysis of keyword searches within CONTENTdm and keyword searches across all referral sources reiterates these findings. As part of the analysis, data collected for this twenty-seven- month period for the top keyword searches within CONTENTdm and the top keyword searches counting all referrals was recorded in an Excel spreadsheet and then uploaded to OpenRefine. OpenRefine allows text and numeric data to be sorted by name (alphabetical) and count (highest to lowest occurring). Once the Excel spreadsheet was uploaded to OpenRefine, keywords were sorted numerically and clustered. OpenRefine has a “cluster” function to bring together text that has the same meaning but differs by spelling or capitalization (for example, “CHEROKEE,” “cherokee,” “cheroke”) or by order (for example, “Jane Smith,” “Smith, Jane”). The clustering function provides a count of the number of times a keyword was used regardless of exact spelling. After identifying keywords belonging to a cluster (for example, a cluster of the word “Cherokee” spelled differently), the differently spelled or organized keywords in each cluster were merged in ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 26 OpenRefine with their most accurate counterparts. Finally, it should be noted that keywords including “!” and “+” symbols were most likely generated from either using multiple search terms within CONTENTdm’s advanced search or from curated search links maintained on some of our online exhibit websites. These links take users to commonly used result sets within the collection. Tables 3 and 4 provide a listing of the ten most frequently searched keywords within CONTENTdm across all referrals and names of collections that are most relevant to these searches. Keywords Occurrence Count Relevant Collection(s) Cherokee 187 Craft Revival; Cherokee Traditions Cherokee Language 107 Craft Revival; Cherokee Traditions Southern Highland Craft Guild 98 Craft Revival basket!object 96 Craft Revival; Cherokee Traditions Indian masks—Appalachian Region, Southern 83 Craft Revival; Cherokee Traditions Basket!photograph postcard 82 Craft Revival; Cherokee Traditions W.M. Cline Company 78 Picturing Appalachia; Craft Revival Cherokee +Indian! photograph 72 Craft Revival; Cherokee Traditions Wood-carving— Appalachian Region, Southern 70 Craft Revival Indian wood-carving— Appalachian Region, Southern 69 Craft Revival Table 3. Top keywords searches within CONTENTdm INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 27 Keywords Number of Sessions Relevant Collection(s) cherokee traditions 442 Craft Revival; Cherokee Traditions horace kephart 185 Horace Kephart; Great Smoky Mountains; Picturing Appalachia cherokee pottery 55 Craft Revival; Cherokee Traditions kephart knife 50 Horace Kephart amanda swimmer 37 Craft Revival; Cherokee Traditions appalachian people 36 Craft Revival; Cherokee Traditions; Great Smoky Mountains; WCU—Oral Histories cherokee indian pottery 36 Craft Revival; Cherokee Traditions cherokee baskets 34 Craft Revival; Cherokee Traditions weaving patterns 33 Craft Revival; Cherokee Traditions basket weaving 26 Craft Revival; Cherokee Traditions Table 4. Top keyword searches across all referrals Tables 3 and 4 show that top searches relate to arts and crafts from the Western North Carolina region (“baskets,” “Indian masks,” “Indian wood carving,” “Cherokee pottery”), artists (“amanda swimmer”), or topics relating to Cherokee culture (“cherokee,” “cherokee language”). Searches relating to the Horace Kephart collection (“horace kephart,” “kephart knife”) are also popular, explaining the fact that the Kephart collection, which accounts for fewer than 8 percent of the Library’s digital collections’ items scores highly in terms of item views (second) and referrals (fourth). The popularity of topics related to Western North Carolina is reiterated in the geographic base of the users. Graph 2 shows North Carolina accounts for most of the searches, with cities in Western North Carolina (Asheville, Franklin, Cherokee, Waynesville) accounting for more than 40 percent of sessions. ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 28 Graph 2. Cities by session count The majority of item referrals come from search engines such as Google, Bing, and Yahoo! Graph 3 shows the percentage of item referrals from these external searches.21 However, the DPLA also generates a fair amount of incoming traffic to the collections. For example, while all collections get referrals from the DPLA, harvesting to the DPLA is particularly useful for smaller collections such as Highlights from WCU, WCU—Fine Art Museum, and Civil War Collection. Each of these collections gets 17 percent of referrals from the DPLA, making DPLA the largest referral source following the search engines for the Highlights and Fine Art Museum collections. Graph 4 shows referrals each collection receives via the DPLA as a percentage of total referrals. This indicates the usefulness of harvesting to the DPLA. A trend seems also to show there is an increase in total referrals from DPLA per month the longer items are in DPLA (graph 5). Graph 3. Percentage of search engine item referrals (Google, Bing, and Yahoo!) 367 319 171 146 144 135 122 109 105 98 44% 29% 47% 44% 75% 43% 57% 11% 23% 75% 74% 38% 33% 6% 22% INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 29 Graph 4. Percentage of DPLA item referrals Graph 5. Increase in DPLA referrals over time Lastly, new and returning visitors to the collections were tracked as a marker of user interest in particular collections. Graph 6 shows data collected for new and returning visitors calculated as a proportion of the total number of visits for each collection. Some smaller collections like Highlights from WCU, WNC Regional Maps, WCU—Fine Art Museum, and WCU—Oral Histories score highly in terms of attracting return visitors (graph 6). 6% 17% 3% 12% 17% 4% 11% 6% 3% 17% 3% 4% 5% 0% ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 30 Graph 6. New and returning visitors DISCUSSION The aim behind gathering data was to study usage of Hunter Library’s digital collections and examine the usefulness of harvesting in promoting use. Although usage data logs were unable to shed much light on the actual usefulness of the collections to users, the logs provided information on volume of use, what materials were accessed, and where users were located. Analysis of the transaction logs indicates that while all collections likely benefitted from harvesting, Craft Revival, Cherokee Traditions, and Horace Kephart (collections focusing on the culture and history of western North Carolina) were the most heavily used and most visitors came from the state of North Carolina and from the region in particular. Search terms in the transaction logs also indicated a strong interest in items related to Cherokee culture and Horace Kephart. As Herold, who traced the second largest group of users of the Orang Asli digital image archive to Malaysia notes, the geographic base of a collection’s users can be indicative of the popularity of a subject area.22 Likewise, Matusiak asserts that users’ comments can be indicative of the relevance of collections to users’ needs and provide direction for the future development of digital collections.23 As neither the Craft Revival, Cherokee Traditions, nor Horace Kephart collection includes items that relate specifically to the university’s history—unlike other institution-specific collections mentioned earlier—it is possible collection users may be more representative of the larger public than the university. These findings point to the need for questioning identification of an academic INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 31 library’s user base as mainly students and faculty of the institution and whether librarians should give greater consideration to the needs of a wider audience.24 Data supporting the existence of this user base, whose true import or preferences might not be captured in surveys and questionnaires, can serve as a valuable source of information for individuals responsible for building digital collections. In an informal survey of Hunter Library faculty carried out by Hunter Library’s Digital Initiatives Unit in September of 2014, respondents considered collections such as Craft Revival to be more useful to users external to the university. While the survey could allude to the nature of the user base of a collection like Craft Revival, it understandably could not capture the scale of the item views and referrals garnered by this collection as well as a usage data analysis could. On the other hand, analysis of usage data, as demonstrated in this paper, indicated that certain collections— Highlights from WCU, WCU—Fine Art Museum, and WCU—Oral Histories—possibly served a niche audience. These smaller and more recently established collections consisting of university- created materials attracted more returning visitors (see graph 6). These returning visitors were likely internal users whose visits indicated, as Fang points out, a loyalty to these collections.25 In the paper “A Framework of Guidance for Building Good Digital Collections,” authored by the National Information Standards Organization Framework Advisory Group, the authors point out that while there are no absolute rules for creating quality digital collections, a good collection should include data pertaining to usage.26 The authors point to multiple assessment matrixes including using a combination of observations, surveys, experiments, and transaction log analyses. As the WCU digital collections findings demonstrate, a careful analysis of the popularity of collections can indicate the need for balancing quantitative data with more qualitative survey and interview data. These findings also indicate that usage data analysis can be very valuable in identifying the extent of collection usage by visitors who may not have significant survey representation. Results from the small (fewer than ten respondents) WCU survey indicate that some respondents question the institutional usefulness of collections such as Craft Revival. These results show the importance of taking multiple factors into account when assessing user needs and interests in digital collections. CONCLUSION The authors feel future projects might stem from this data analysis. For example, local subject fields based on the highest recurring keywords that were mined from the transaction logs can be added for all of Hunter Library’s digital collections. Usage statistics at a later period could be evaluated to study if addition of user generated keywords increased use of any collection. As Matusiak points out in her article on the usefulness of user-centered indexing in digital image collections, social tagging—despite its lack of synonym control or misuse of the singular and plural—is a powerful form of indexing because of “close connection with users and their language,” as opposed to traditional indexing.27 The terms users assign to describe images are also the ones they are most likely to type while searching for digital images. Likewise, according to Walsh, a ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 32 study conducted by the University of Alberta found more than forty percent of collections reviewed used a locally developed classification for indexing and searching their collections, and many of these schemes could work well for searches within the collection by users who are familiar with the culture of the collection.28 Usage-data analysis can constitute useful information that guides decisions for building digital collections that better serve user needs. It can identify a library’s digital collections’ users and what they want. These are important considerations to keep in mind if library services are to be all about engaging and building relationship with the users.29 Harvesting to a national portal such as the DPLA is beneficial for Hunter Library’s collections. At the same time, the Library’s institution-specific collections receive more return visits, likely because of sustained interest from the large user base of the university’s students and employees, an assessment supported by survey findings. Conversely, collections not so directly tied to the institution receive the most one- time item views and referrals. Items that get used are a good indication of what users want and, as this paper demonstrates, the focus of academic digital library collections should consider the needs of both the university audience and the general public. REFERENCES 1. A landing page refers to the homepage of a collection. 2. The DPLA provides a single portal for accessing digital collections held by cultural heritage institutions across the United States. “History,” Digital Public Library of America, accessed May 19, 2016, http://dp.la/info/about/history/. 3. Paul Betty, “Assessing Homegrown Library Collections: Using Google Analytics to Track Use of Screencasts and Flash-Based Learning Objects,” Journal of Electronic Resources Librarianship 21, no. 1 (2009): 75–92, https:// doi.org/10.1080/19411260902858631. 4. Ibid. 5. Wei Fang, “Using Google Analytics for Improving Library Website Content and Design: A Case Study,” Library Philosophy and Practice (e-journal), June 2007, 1-17, http://digitalcommons.unl.edu/libphilprac/121. 6. Mark Baggett and Rabia Gibbs, “Historypin and Pinterest for Digital Collections: Measuring the Impact of Image-Based Social Tools on Discovery and Access,” Journal of Library Administration 54, no. 1 (2014): 11–22, https:// doi.org/10.1080/01930826.2014.893111. 7. Melanie Schlosser and Brian Stamper, “Learning to Share: Measuring Use of a Digitized Collection on Flickr and in the IR,” Information Technology and Libraries 31, no. 3 (September 2012): 85–93, https:// doi.org/10.6017/ital.v31i3.1926. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 33 8. Mark R. O’English, “Applying Web Analytics to Online Finding Aids: Page Views, Pathways, and Learning about Users,” Journal of Western Archives 2, no. 1 (2011): 1–12, http://digitalcommons.usu.edu/westernarchives/vol2/iss1/1. 9. Marcus Ladd, “Access and Use in the Digital Age: A Case Study of a Digital Postcard Collection,” New Review of Academic Librarianship 21, no. 2 (2015): 225–31, https://doi.org/10.1080/13614533.2015.1031258. 10. Irene M. H. Herold, “Digital Archival Image Collections: Who Are the Users?” Behavioral & Social Sciences Librarian 29, no. 4 (2010): 267–82, https://doi.org/10.1080/01639269.2010.521024. 11. Mark A. Matienzo and Amy Rudersdorf, “The Digital Public Library of America Ingestion Ecosystem: Lessons Learned After One Year of Large-Scale Collaborative Metadata Aggregation,” in 2014 Proceedings of the International Conference on Dublin Core and Metadata Applications (DCMI, 2014), 1–11, http://arxiv.org/abs/1408.1713. 12. Oskana L. Zavalina et al., “Extended Date/Time Format (EDTF) in the Digital Public Library of America’s Metadata: Exploratory Analysis,” Proceedings of the Association for Information Science and Technology 52, no. 1 (2015), 1–5, http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.145052010066/abstract. 13. Lisa Gregory and Stephanie Williams, “On Being a Hub: Some Details behind Providing Metadata for the Digital Public Library of America,” D-Lib Magazine 20, no. 7/8 (July/August 2014): 1–10, https://doi.org/10.1045/july2014-gregory. 14. Kate Boyd, Heather Gilbert, and Chris Vinson, “The South Carolina Digital Library (SCDL): What Is It and Where Is It Going?” South Carolina Libraries 2, no. 1 (2016), http://scholarcommons.sc.edu/scl_journal/vol2/iss1/3. 15. Chris Freeland and Heather Moulaison, “Development of the Missouri Hub: Preparing for Linked Open Data by Contributing to the Digital Public Library of America,” Proceedings of the Association for Information Science and Technology 52, no. 1 (2015): 1–4, http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.1450520100105/abstract. 16. A single view of an item in a digital collection. 17. Visits to the site that began from another site with an item page being the first page viewed. 18. Keywords are words visitors used to find the Library’s website when using a search engine. Google Analytics provides a list of these keywords. 19. A session is defined as a “group of interactions that take place on a website within a given time frame” and can include multiple kinds of interactions like page views, social interactions, and economic transactions. In Google Analytics, a session by default lasts thirty minutes, though ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 34 one can adjust this length to last a few seconds or several hours. “How a Session Is Defined in Analytics,” Google, Analytics Help, accessed May 20, 2016, https://support.google.com/analytics/answer/2731565?hl=en. 20. Locations were studied in terms of mostly cities and states. 21. The percentage is based on the total referral count a collection gets—for example, a 44 percent referral count for Cherokee Traditions would mean that the search engines account for 44 percent of the total referrals this collection gets. 22. Herold, “Digital Archival Image Collections,” 278. 23. Krystyna K. Matusiak, “Towards User-centered Indexing in Digital Image Collections,” OCLC Systems & Services: International Digital Library Perspectives 22, no. 4 (2006): 283–98, https://doi.org/10.1108/10650750610706998. 24. Ladd, “Access and Use in the Digital Age,” 230. 25. Fang points out that the improvements made to the Rutgers-Newark Law Library website could attract more return visitors and thus achieve loyalty. Fang, “Using Google Analytics for Improving Library Website,” 11. 26. NISO Framework Advisory Group, A Framework of Guidance for Building Good Digital Collections, 2nd ed. (Bethesda, MD: National Information Standards Organization, 2004), https://chnm.gmu.edu/digitalhistory/links/cached/chapter3/link3.2a.NISO.html. 27. Matusiak, “Towards User-centered Indexing,” 289. 28. John Walsh, “The Use of Library of Congress Subject Headings in Digital Collections,” Library Review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 29. Lynn Silipigni Connaway, The Library in the Life of the User: Engaging with People Where They Live and Learn, (Dublin: OCLC Research, 2015), http://www.oclc.org/research/publications/2015/oclcresearch-library-in-life-of-user.html.
9462 ---- Editor’s Comments: Odds and Ends Bob Gerrity INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 1 This issue marks the midpoint of Information Technology and Libraries’ fifth year as an open- access e-only journal. The move to online-only in 2012 was inevitable, as ITAL’s print subscription base was longer covering the costs of producing and distributing the print journal. Moving to an e- only model using an open-source publishing platform (the Public Knowledge Project’s Open Journal Systems) provided a low-cost production and distribution system that has allowed ITAL to continue publishing without requiring a large ongoing investment from LITA. The move to open access, however, was not inevitable, and I commend LITA for supporting that move and for continuing to provide a base subsidy that supports the journal’s ongoing publication. I also thank the Boston College Libraries for their ongoing support in hosting ITAL along with a number of other OA journals. Since ITAL is now open, access to it can no longer be offered as an exclusive benefit that comes with LITA membership. Regardless of the publishing model, though, ITAL has always relied on voluntary contributions of the time and expertise of reviewers and editors. I’d like to acknowledge the contributions of our past and current Editorial Board members, who play a key role in ensuring the ongoing quality and vitality of the journal. We will be adding a few additional Board members shortly, to help ensure that review of submissions to the journal are completed as quickly and effectively as possible. Speaking of peer review, one of the recent innovative startups in the scholarly communication space is a company called publons, which tracks and verifies peer-review activity, providing a mechanism for academics to report (and possibly receive institutional credit for) their peer- review work, an undervalued part of the scholarly communication framework. (Full disclosure: at University of Queensland we are conducting a pilot project with publons, to integrate the peer- review activities of our academics into our institutional repository.) In addition to new approaches to peer review, such as publons and Academic Karma, there are quite a few recent examples of innovations in various aspects of scholarly communication that are worth keeping an eye on. These include new collaborative authoring tools such as Overleaf, impact-measurement tools such as Impactstory, and personal digital library platforms such as Readcube. On a broader scale, initiatives such as PeerJ are building open access publishing platforms intended to dramatically improve the efficiency of and drive down the overall costs of scholarly publishing. February marked the 14th anniversary of a key trigger event in the Open Access movement—the launch of the Budapest Open Access Initiative in 2002. Bob Gerrity (r.gerrity@uq.edu.au), a member of LITA and the Editor of Information Technology and Libraries, is University Librarian at the University of Queensland, Brisbane, Australia. http://ejournals.bc.edu/ojs/index.php/ https://publons.com/ http://academickarma.org/ https://www.overleaf.com/ https://impactstory.org/ https://www.readcube.com/ https://peerj.com/ http://www.budapestopenaccessinitiative.org/read mailto:r.gerrity@uq.edu.au EDITOR’S COMMENTS | GERRITY doi: 10.6017/ital.v35i2.9462 2 Much has happened in the 14 years since the Budapest Initiative, on various fronts: o policy—introduction and widespread adoption of funder and institutional OA mandates; o technology--development and widespread adoption of institutional repositories, recent development of mechanisms to facilitate the discovery of OA publications (e.g., SHARE on the library side and CHORUS on the publisher side); o publishing—establishment of new OA megajournals (e.g., PLOS, BioMed Central), embrace of hybrid OA models by mainstream commercial publishers. Yet despite all the hype, acrimony, and activity triggered by the OA movement, a recent analysis in Chronical of Higher Education suggests the growth of OA has been slow and incremental: the percentage of research articles published annually in fully open-access format has increased at an average rate of of around one percent a year, from 4.8% in 2008 to 12% in 2015. At this rate, the tipping point for OA still seems very far away. Lots of energy has been and continues to be invested by different stakeholders in different approaches, and the green vs. gold argument still predominates. Recent developments suggest momentum is gaining for a more radical shift. In December 2015, the Max Planck Institute, a key player in the launch of OA with the Berlin Declaration on Open Access in 2003, hosted the 12th version of its annual OA conference to further the discussion around open access. Ironically, unlike previous meetings and seemingly in philosophical conflict with the underpinnings of the OA movement, the meeting was by invitation only. Given the topic, though, a “Proposal to Flip Subscription Journals to Open Access,” the closed nature of of the meeting is understandable. Underpinning the proposal was a 2015 paper from the Max Planck Digital Library that suggested that the amount of money currently being spent (largely by libraries) on journal subscriptions should be sufficient to fund research publication costs if applied to a “flipped” journal publishing business model, from subscription-based to gold open access.1 In the Netherlands, the university sector has adopted a national approach in negotiating deals with several major publishers (Springer, SAGE, Elsevier, and Wiley) that allow Dutch authors to publish their papers as gold OA, without additional charges (but, depending on the publisher, with limits on total numbers and/or which journals are available within the deals).2 The so-called “Dutch Deal” by the VSNU (Association of universities in the Netherlands) and UKB (Dutch Consortium of University Libraries and Royal Library) takes a national approach to flipping the model, attempting to bundle access rights for Dutch readers with APC credits for Dutch authors. http://www.arl.org/focus-areas/shared-access-research-ecosystem-share#.V3XhlZN95TY http://www.chorusaccess.org/ http://chronicle.com/article/As-an-Open-Access-Megajournal/234890 http://chronicle.com/article/As-an-Open-Access-Megajournal/234890 https://openaccess.mpg.de/Berlin-Declaration https://openaccess.mpg.de/Berlin-Declaration https://www.mpg.de/9202262/area-wide-transition-open-access INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 3 The Dutch government, which currently holds the EU presidency, is pushing hard for a Europe- wide adoption of this approach. Last month, the EU’s Competitiveness Council agreed that all scientific papers should be freely available by 2020.3 Meanwhile, in the US, the “Pay it Forward” research project at the University of California is examining what the institutional financial impact would be with a flipped model. The study is looking at existing institutional journal expenditures on subscriptions and modeling what a future, APC-based model would look like based on institutional research publication output and estimated average APC charges. Who knows when or if a global flip might occur, but it does strike me that the scholarly publishing world is overdue for a major shakeup. From the point of view of a university librarian, focused on keeping journal subscription costs in line (unsuccessfully I might add), I think there is real danger in not considering what a flip to a gold model might look like. The commercial publishers we all complain about are successfully exploiting the gold model as an additional revenue stream which, for the most part, academic libraries have been ignoring, since the individual APCs typically are paid from someone else’s budget. This has allowed the overall envelope of spending on research publication (subscriptions and APCs) to grow significantly. Perhaps a more interesting question is what the impact of a flip on libraries would be. If gold OA became the predominant model, we would no longer need all of the complex systems we’ve built to manage subscriptions and user access. To quote Homer Simpson, “Woohoo!” In the “watch this space” arena, EBSCO’s recently-launched open-source library services platform (LSP) initiative is beginning to take shape. It now has a name—FOLIO (for Future of the Libraries Is Open)—and as Marshall Breeding put it, the project “injects a new dynamic into the competitive landscape of academic library technology, pitting and open source framework backed by EBSCO against a proprietary market dominated by Ex Libris, now owned by EBSCO archrival ProQuest.”4 Publicly listed participants in the project include (in addition to EBSCO) OLE, Index Data, ByWater, BiblioLabs, and SIRSI Dynix.5 The platform release timetable calls for an initial, “technical preview” release of of the code for the base platform in August 2016, and an anticipated release of the apps needed to operate a library in early 2018.6 1. Ralf Schimmer, Kai Karin Geschuhn, Andreas Vogler, Disrupting the Subscription Journals’ Business Model for the Necessary Large-Scale Transformation to Open Access, (2015), doi:10.17617/1.3 2. Frank Huysmans, VSNU-Wiley: Not Such a Big Deal for Open Access, Warekennis (blog), March 1, 2016, https://warekennis.nl/vsnu-wiley-not-such-a-big-deal-for-open-access/ 3. Martin Enserink, “In dramatic statement, European leaders call for ‘immediate’ open access to all scientific papers by by 2020,” Science, May 27, 2016, doi:10.1126/science.aag0577. http://icis.ucdavis.edu/?page_id=286 https://warekennis.nl/vsnu-wiley-not-such-a-big-deal-for-open-access/ EDITOR’S COMMENTS | GERRITY doi: 10.6017/ital.v35i2.9462 4 4. Marshall Breeding, EBSCO Supports New Open Source Project, Amercian Libraries, April 22, 2016, https://americanlibrariesmagazine.org/2016/04/22/ebsco-kuali-open-source-project/ 5. https://www.folio.org/collaboration.php. 6. https://www.folio.org/apps-timelines.php. https://americanlibrariesmagazine.org/2016/04/22/ebsco-kuali-open-source-project/ https://www.folio.org/collaboration.php https://www.folio.org/apps-timelines.php
9469 ---- December_ITAL_Oud_final Accessibility of Vendor-Created Database Tutorials for People with Disabilities Joanne Oud INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 7 ABSTRACT Many video, screencast, webinar, or interactive tutorials are created and provided by vendors for use by libraries to instruct users in database searching. This study investigates whether these vendor- created database tutorials are accessible for people with disabilities to see whether librarians can use these tutorials instead of creating them in-house. Findings on accessibility were mixed. Positive accessibility features and common accessibility problems are described, with recommendations on how to maximize accessibility. INTRODUCTION Online videos, screencasts, and other multimedia tutorials are commonly used for instruction in academic libraries. These online learning objects are time consuming to create in-house and require a commitment to maintain and revise when database interfaces change. Many database vendors provide screencasts or online videos on how to use their databases. Should libraries use these vendor-provided instructional tools rather than spend the time and effort to create their own? Many already do: a study shows that 17.7 percent of academic libraries link to tutorials created by third parties, mainly by vendors or other libraries.1 When deciding whether to use vendor-created tutorials, one consideration is whether the tutorials meet accessibility requirements for people with disabilities. The importance of accessibility for online tutorials has been increasingly recognized and outlined in recent library literature.2 People with disabilities make up one of the largest minority groups in the United States and Canada, and studies show that about 9 percent of university or college students have a disability.3 Problems with web accessibility have been well documented. People with disabilities are often unable to access the same online sites and resources as others, creating a digital divide.4 Even if people with disabilities can access a site, it is more difficult for many to use it.5 Assistive technologies, like screen-reading software, enable access but add an extra layer of complexity in interacting with the site, and blind or low-vision users can’t always rely on visual cues to navigate and interpret sites. A recent study of library website accessibility concluded that typical library websites are not designed with people with disabilities in mind.6 Joanne Oud (joud@wlu.ca) is Instructional Technology Librarian and Instruction Coordinator, Wilfrid Laurier University, Ontario, Canada. ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 8 Libraries, which are founded on a philosophy of equal access to information, should be concerned about online accessibility. Legal requirements for providing accessible online web content vary, but exist in every jurisdiction in the United States and Canada. Apart from the legal requirements, recent literature points out that equitable access to information for people with disabilities is a matter of human rights and an issue of diversity and social justice, and calls on libraries and librarians to improve their commitment to online accessibility.7 It is important for libraries to participate in creating level playing field and to avoid creating conditions that make people feel unequal or prevent them from equitable access. It is unclear whether librarians can assume vendor-created instructional tutorials are accessible. Studies on vendor database accessibility have been mixed, showing some commitment to and improvements in accessibility on one hand, but sometimes substantial gaps in accessibility on the other.8 The focus until now has been exclusively on the accessibility of database interfaces. This study investigates the accessibility of online tutorials, including videos, screencasts, interactive multimedia, and archived webinars created by database and journal vendors and offered as instructional materials to librarians and patrons, to determine whether they are a viable alternative to making in-house training materials. LITERATURE REVIEW Although a few articles exist on how to make video tutorials accessible,9 no studies have evaluated the accessibility of already-created video or screencast tutorials. There are, however, some studies evaluating the accessibility of vendor databases. Byerley, Chambers, and Thohira surveyed vendors in 2007 and found that most felt they had integrated accessibility standards into their search interfaces, and nearly all tested for accessibility to some degree, though not always with actual users.10 These findings conflict somewhat with the results of other studies. Tatomir and Durrance evaluated the accessibility of thirty-two databases with a checklist and found that although many did contain accessibility features, 72 percent were marginally accessible or inaccessible.11 Similarly, Dermody and Majekodunmi found that students with print-related disabilities who use screen-reading software could only complete 55 percent of tasks successfully because of accessibility barriers and usability challenges.12 DeLancey surveyed vendors and examined VPATs, or product accessibility claims, and found that vendors felt they were compliant with 64 percent of US Section 508 items.13 Especially relevant to this study, only 23 percent of vendors said that the multimedia content within their products was compliant, and 46 percent admitted multimedia content was not compliant at all. Since vendor VPAT forms are completed for databases and other products only, and not the instructional tutorials created by vendors on how to use those products, vendor accessibility claims for instructional tutorials are unknown. Although no studies have been done on the accessibility of video or screencast tutorials, some have been done on the accessibility of multimedia or other related kinds of online learning. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 9 Roberts, Crittenden, and Crittenden surveyed 2,366 students taking online courses at several US universities. A total of 9.3 percent of those students reported that they had a disability, and of those, 46 percent said their disability affected their ability to succeed in their online course, although most reasons cited were not related to technical accessibility barriers.14 Kumar and Owston studied students with disabilities using online learning units that contained videos. All students in the study reported at least one barrier to completing the learning units.15 Although this study involves student use of video tutorials, it doesn’t report on accessibility issues specific to those tutorials. Previous studies of vendor products focus exclusively on database interfaces, and previous studies of online learning have not focused on screencast accessibility. Therefore this study’s goal is to investigate how accessible vendor-created video tutorials are. Accessibility is defined as both technical accessibility (can people with disabilities locate, access, and use them) and usability (how easy it is for people with disabilities to use them). This study will look at which major accessibility issues there are (if any) and make recommendations on whether librarians can direct students to them rather than making in-house instructional videos. METHOD An evaluation checklist (see appendix 2) was developed for this study using criteria drawn from the Web Content Accessibility Guidelines (WCAG) 2.0. WCAG 2.0 is the most widely recognized web-accessibility standard internationally. Much recent accessibility legislation adopts it, including the in-process revisions to Section 508 guidelines in the United States.16 WCAG 2.0 is also consistent with tutorial accessibility best-practice advice found in recent articles, which emphasize the need for accurate captions, keyboard accessibility, descriptive narration, and alternate versions for embedded objects, among other criteria.17 The checklist has twenty items and is split into two sections, “Functionality” and “Usability.” Functionality items test whether the tutorial can be used by people using screen-reading software or a keyboard only, and include whether the tutorial is findable on the page and playable, whether player controls and interactive content can be operated by keyboard, whether captions are available, and whether audio narration is descriptive enough so someone who can’t see the video can understand what is happening. Usability items test how easy the tutorial is to use. Examples include clear visuals and audio, use of visual cues to focus the viewer’s attention, and short and logically focused content. To help prioritize the importance of checklist items, the local Accessible Learning Centre (ALC), which supports students on campus who use assistive technologies, was consulted about the difficulties most encountered by students. The ALC’s highest priority was the provision of an alternate accessible version of a tutorial, since it is difficult to make complex embedded web content accessible for everyone under every circumstance and an alternate version allows people to work with content in a way that suits their needs. ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 10 For the evaluation, major database vendors were chosen through a scan of common vendors and platforms at universities, with input from collections colleagues. Some vendors were eliminated because they don’t provide instructional tutorials on their websites. Twenty-five vendors were included in the study (see appendix 1). A large majority of the tutorials found were screencast or video tutorials; a few vendors provided recorded webinars, and a few provided interactive multimedia tutorials, mainly text captions or visuals with clickable areas or quizzes. In total, 460 tutorials were evaluated for accessibility: 417 video, screencast, or interactive tutorials from twenty-foure vendors, and 41 recorded webinars from four vendors. If tutorials were available in more than one place, most commonly on both the vendor’s website and YouTube, both locations were tested. If more than thirty tutorials were provided by a vendor, every other one was tested. If multiple formats of tutorial were available, such as screencasts and recorded webinars, each format was tested. Testing from the perspective of people with visual impairments was a key focus. Other assistive technologies such as Kurzweil (for people who can see but have print-related disabilities) and Zoomtext (for enlargement) are widely used, but if webpages work well using screen-reading software intended for people with visual impairments, they also generally work using other kinds of assistive software. Tutorials were tested with two screen-reading programs used by people with visual impairments: NVDA (with Firefox), a free open source program, and JAWS (with Internet Explorer), a widely used commercial product. Both were used to determine whether any difficulties were due to the quirks of a particular software product or a result of inherent accessibility problems. In addition, captions were evaluated to determine accessibility for people who are deaf or have hearing difficulties. People with visual or some physical impairments use the keyboard only, so all tutorials were tested without a mouse using solely the keyboard. During testing, each task was tried three different ways within NVDA or JAWS before deciding that it couldn’t be completed. If one of the three methods worked the task was marked as successfully completed. If a task could be completed successfully in one screen-reading program but not the other, it was marked as unsuccessful. Screen-reader support needs to be consistent across platforms, since people may be using a variety of types of assistive software. FINDINGS AND DISCUSSION Tutorials created by the same vendor nearly all used the same approach and had the same checklist results. This is positive, since consistency is important for accessibility and helps in navigation and ease of use. None of the forty-one recorded webinars tested in this study were accessible. Webinars did not have player controls that were findable on the page by screen-reading software or usable by INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 11 keyboard. None had captions, transcripts, or alternate accessible versions. Often webinars were quite long, with no clear structure and no cues to focus attention on the screen. Recorded webinars had almost no accessibility features and can’t be recommended for use as accessible instructional materials in their current form. None of the screencast or video tutorials tested were completely accessible, and all failed in at least one checklist item. Tutorials from some vendors, however, came close to meeting all checklist requirements. Overall, there were many positive accessibility features in the video and screencast tutorials. Most of these tutorials were findable and playable by screen reading software in some way, had video player controls usable by keyboard, had descriptive narration so people who can’t see the screen can tell what is happening, had clear visuals and audio narration, used simple language, and were relatively short and focused in content. The most accessible screencast or video tutorials were produced by the American Psychological Association (APA), American Theological Library Association (ATLA), Modern Language Association (MLA), and Ebsco. Their tutorials had many accessibility features and rated highly on the checklist. They included much less commonly found accessibility features, especially the use of visual and/or audio cues to focus the viewer’s attention and the inclusion of accurate and properly synchronized closed captions. Visual cues are important for people with learning or attention- related disabilities, and help all viewers interpret and follow the video more easily. People who are deaf can’t access the content without captions, and captions also help people who have English as a second language or are at public computers without headphones. Tutorials from these vendors also had an alternate version or transcript available. As mentioned earlier, the highest-priority checklist item is the presence of an alternate accessible version, since it is difficult to design multimedia that works for people with all disabilities in all circumstances. People with disabilities may also have previous negative experiences with online multimedia and prefer to use an alternate format that they have had more success with. In the case of these above-average vendors, the alternate accessible version was a transcript consisting of the video’s closed captions, auto-generated by YouTube. Since the tutorials’ narration was descriptive and the captions were accurate, the auto-generated transcripts are useful. However, the YouTube transcript is hard to find on the YouTube page. Also, most of these vendors had tutorials available both from their own websites and from YouTube, and none had alternate versions available on their own websites. Viewers requiring an alternate format would need to know to go to the YouTube site instead of the vendor site to find it. Two other vendors also had quite accessible tutorials. IEEE’s tutorials had the same positive accessibility features already mentioned. Tutorials were done in-house and presented through the vendor’s site. While most tutorials presented on vendor sites were lacking in accessibility, IEEE’s were well thought out from an accessibility perspective and usable by screen-reading software. These were the only tutorials tested where all interactivity, including pop-up screens, was easily ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 12 usable and navigable by keyboard. The one accessibility issue was the lack of an alternate accessible version. Elsevier’s ScienceDirect tutorials took a different approach to accessibility than other vendors, or even than Elsevier’s tutorials for other Elsevier products. The Science Direct tutorials were not accessible, but an alternate text version was available and people using screen-reader software were informed of this when they get to the tutorial page and were redirected to the text version. The ideal is to have one version that is accessible to everyone, but this approach is a good way to implement an alternate version if one accessible version isn’t possible. Screencasts or video tutorials from other vendors also have some good accessibility features, but these were balanced with serious accessibility problems. The main accessibility issues discovered include the following: Alternate accessible versions: vendors who had captions and hosted their videos on YouTube did have auto-generated YouTube transcripts, but these were hard to find and were only useful if the captions were descriptive and accurate, which many were not. Apart from Elsevier’s ScienceDirect tutorials, no vendors provided another format deliberately as an accessible alternative. Captions: captions were missing or problematic in the tutorials of fourteen vendors, or 59 percent of the total. Five (21 percent) of vendors provided no captions at all for their tutorials. Nine (38 percent) had unedited, auto-generated YouTube captions, which are highly inaccurate and therefore don’t provide usable access to the content for people who are deaf. Tutorial not findable or playable on page: Twelve vendors (50 percent) had tutorials that were not findable on the webpage or playable for people using a keyboard or screen- reading software. Most of these issues are with tutorials on vendor sites, which were often Flash-based or offered through non-YouTube third party sites like Vimeo. Four vendors (17 percent) offered access to their tutorials both through their own (inaccessible) website and YouTube, which is findable and playable by screen reading software. Eight (33 percent), however, only provided access through their (inaccessible) webpages, which means that people using a keyboard or screen reading software would not be able to use their tutorials. No visual cues to focus attention: Eight vendors (33 percent) had no visual cues to focus attention in the video. Visual cues help people with certain disabilities focus on the essential part of the screen that is being discussed, help everyone more easily interpret and follow what is happening, and are known to help facilitate successful multimedia learning.18 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 13 Nondescriptive narration: Six vendors (25 percent) had tutorials with audio narration that didn’t sufficiently describe what was happening on the screen. Narration needs to describe what is happening in enough detail so people who can’t see the screen are not missing information available for sighted viewers. Fuzzy visuals: Five vendors (21 percent) had tutorials with visuals that were fuzzy and hard to see. This makes viewing difficult for people with low vision, and challenging even for people with normal vision. Fuzzy audio or background music: Three vendors (13 percent) had poor-quality audio narration or background music playing during narration. Background music is distracting for those with hearing difficulties and makes it more difficult to focus on what is being said. Eliminating extraneous sound also makes it easier for people to learn from multimedia.19 Tutorials consisting only of text captions: Three vendors (13 percent) had tutorials consisting of text captions with no narration. The text captions were not readable by screen-reading software, and no alternate accessible versions were provided. Providing narration in tutorials is recommended for accessibility, since it allows people who can’t see the screen to access the content more easily, and has been shown to improve learning and recall over on-screen text and graphics alone.20 RECOMMENDATIONS AND CONCLUSIONS This study attempted to determine how accessible vendor-created database tutorials are, and whether academic librarians can use them instead of re-creating them locally. For recorded webinars, the answer is a clear no, since none were technically accessible for people using screen- reading software. For video or screencast tutorials, however, the answer less is clear. Results showed that many vendors created tutorials with positive features like clear visuals and audio, being short and focused on one main point, and using descriptive narration. However, technical accessibility was much less successful, with 59 percent of vendors omitting usable captions and 50 percent presenting tutorials that couldn’t be found on the page or played by people using screen-reading software. These technical accessibility issues prevent people with hearing, vision, or some mobility impairments from using the tutorials at all. Although none of the tutorials studied met all the checklist criteria, some came close and could be used by librarians depending on local requirements, policies, and priorities for accessibility. In part, this study found that the accessibility of many tutorials depends on how they are presented. Disappointingly, 50 percent of vendors had tutorials on their websites that were not findable or playable by people with disabilities. Many vendors, however, hosted tutorials on YouTube as well as their own site. In these cases, YouTube was always a more accessible option ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 14 than the vendor site. YouTube itself is relatively accessible, with both pages and players that are navigable by keyboard and by screen-reading software. There are options for accessibility settings in YouTube, such as having captions display automatically, and more accessible third-party overlays are available for the YouTube player. On vendor sites, there were more likely to be issues with Flash and an inability for people using screen-reading software or keyboards to find and play videos. Some vendors embed YouTube videos on their site. Even if the embedded videos are findable and playable, this method omits important accessibility features found on the YouTube page, such as the text transcript. The results of this study show that using YouTube where available is recommended. Further, linking to YouTube rather than embedding the video is preferred, unless a separate link to the transcript is made to provide an alternate accessible version. Captions are another key accessibility problem identified in this study: nearly two-thirds had unusable captions. Often, auto-generated YouTube captions were present but were not usable. The presence of captions is not enough for accessibility; those captions need to be accurate and present the same content as the narration. YouTube auto-captioning does not generate captions that are accurate enough to be useful without manual editing. YouTube auto-generates transcripts from the captions, so if the captions are inaccurate the transcript will not be useful either. Editing YouTube auto-generated captions is necessary to ensure accessibility. A few accessibility issues found in this study would be easy to improve with some thought during tutorial creation. Adding visual cues like arrows or highlighting to the screen to help people focus attention, or remembering that not everyone can see the screen while recording narration, can be easily achieved and would improve accessibility significantly. Other issues would require more planning and effort to improve. Given the widespread technical accessibility problems identified in this study, it is particularly important for people creating tutorials to provide alternate formats that are accessible if tutorials themselves are not accessible. Almost no vendors do this currently, but it would have the most significant impact on accessibility for the broadest range of people. Adding usable captions is the second most important area for improvement. To provide access for people who are deaf, captions need to be added or auto- generated YouTube captions need to be edited for accuracy. Both alternate formats and captions require some thought and effort to implement but ensure that tutorials will meet accessibility requirements and be usable by everyone. NOTES AND BIBLIOGRAPHY 1. Eamon Tewell, “Video Tutorials in Academic Art Libraries: A Content Analysis and Review,” Art Documentation 29, no. 2 (2010): 53–61. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 15 2. Amanda S. Clossen, “Beyond the Letter of the Law: Accessibility, Universal Design, and Human-Centered Design in Video Tutorials,” Pennsylvania Libraries: Research & Practice 2, no. 1 (2014): 27–37, https://doi.org/10.5195/palrap.2014.43; Joanne Oud, “Improving Screencast Accessibility for People with Disabilities: Guidelines and Techniques,” Internet Reference Services Quarterly 16, no. 3 (2011): 129–44, https://doi.org/10.1080/10875301.2011.602304; Kathleen Pickens and Jessica Long, “Click Here! (And Other Ways to Sabotage Accessibility),” Imagine, Innovate, Inspire: The Proceedings of the ACRL 2013 Conference (Chicago: ACRL, 2013), 107–12. 3. DeAnn Barnard-Brak, Lucy Lechtenberger, and William Y. Lan, “Accommodation Strategies of College Students with Disabilities,” Qualitative Report 15, no. 2 (2010): 411–29. 4. Cyndi Rowland et al., “Universal Design for the Digital Environment: Transforming the Institution,” Educause Review 45, no. 6 (2010): 14–28. 5. Peter Brophy and Jenny Craven, “Web Accessibility,” Library Trends 55, no. 4 (2008): 950–72. 6. Kyunghye Yoon, Laura Hulscher, and Rachel Dols, “Accessibility and Diversity in Library and Information Science: Inclusive Information Architecture for Library Websites,” Library Quarterly 86, no. 2 (2016): 213–29. 7. Ruth V. Small, William N. Myhill, and Lydia Herring-Harrington, “Developing Accessible Libraries and Inclusive Librarians in the 21st Century: Examples from Practice,” Advances in Librarianship 40 (2015): 73–88, https://doi.org/10.1108/S0065-2830201540; John Carlo Jaeger, Paul T. Wentz, and Brian Bertot, “Libraries and the Future of Equal Access for People with Disabilities: Legal Frameworks, Human Rights, and Social Justice,” Advances in Librarianship 40 (2015): 237–53; Yoon, Hulscher, and Dols, “Accessibility and Diversity in Library and Information Science: Inclusive Information Architecture for Library Websites.” 8. Suzanne L. Byerley, Mary Beth Chambers, and Mariyam Thohira, “Accessibility of Web-Based Library Databases: The Vendors’ Perspectives in 2007,” Library Hi Tech 25, no. 4 (2007): 509– 27, https://doi.org/10.1108/07378830710840473; Kelly Dermody and Norda Majekodunmi, “Online Databases and the Research Experience for University Students with Print Disabilities,” Library Hi Tech 29, no. 1 (2011): 149–60, https://doi.org/10.1108/07378831111116976; Jennifer Tatomir and Joan C. Durrance, “Overcoming the Information Gap: Measuring the Accessibility of Library Databases to Adaptive Technology Users,” Library Hi Tech 28, no. 4 (2010): 577–94, https://doi.org/10.1108/07378831011096240. 9. Pickens and Long, “Click Here!”; Clossen, “Beyond the Letter of the Law”; Oud, “Improving Screencast Accessibility for People with Disabilities”; Nichole A. Martin and Ross Martin, “Would You Watch It? Creating Effective and Engaging Video Tutorials,” Journal of Library & ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 16 Information Services in Distance Learning 9, no. 1–2 (2015): 40–56, https://doi.org/10.1080/1533290X.2014.946345. 10 . Byerley, Chambers, and Thohira, “Accessibility of Web-Based Library Databases.” 11. Tatomir and Durrance, “Overcoming the Information Gap.” 12. Dermody and Majekodunmi, “Online Databases and the Research Experience for University Students with Print Disabilities.” 13. Laura DeLancey, “Assessing the Accuracy of Vendor-Supplied Accessibility Documentation,” Library Hi Tech 33, no. 1 (2015): 103–13, https://doi.org/10.1108/LHT-08-2014-0077. 14. Jodi B. Roberts, Laura A. Crittenden, and Jason C. Crittenden, “Students with Disabilities and Online Learning: A Cross-Institutional Study of Perceived Satisfaction with Accessibility Compliance and Services,” Internet and Higher Education 14, no. 4 (2011): 242–50, https://doi.org/10.1016/j.iheduc.2011.05.004. 15. Kari L. Kumar and Ron Owston, “Evaluating E-Learning Accessibility by Automated and Student-Centered Methods,” Educational Technology Research and Development 64, no. 2 (2015): 263–83, https://doi.org/10.1007/s11423-015-9413-6. 16. US Access Board, “Draft Information and Communication Technology ( ICT ) Standards and Guidelines,” 36 CFR Parts 1193 and 1194, RIN 3014-AA37 (2015), https://www.access- board.gov/attachments/article/1702/ict-proposed-rule.pdf. 17. Pickens and Long, “Click Here!”; Clossen, “Beyond the Letter of the Law”; Martin and Martin, “Would You Watch It?”; Oud, “Improving Screencast Accessibility for People with Disabilities.” 18. See the Signaling Principle in Richard E. Mayer, Multimedia Learning, 2nd ed. (Cambridge: Cambridge University Press, 2009): 108–17. 19. See the Coherence Principle, Ibid., 89–107. 20. See the Modality Principle, Ibid., 200–220. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 17 Appendix 1. List of Vendors 1. ACM 2. Adam Matthew 3. Alexander St Press 4. APA 5. ATLA 6. ChemSpider 7. Cochrane Library (webinars only) 8. Ebsco 9. Elsevier 10. Factiva 11. Gale 12. IEEE 13. Lexis Nexis Academic (tutorials and webinars) 14. Marketline 15. MathSciNet 16. OVID/Wolters Kluwer (tutorials and webinars) 17. Oxford 18. Proquest (tutorials and webinars) 19. Pubmed 20. Sage 21. SciFinder 22. Standard & Poor/NetAdvantage 23. Taylor and Francis 24. Web of Knowledge/Thompson Reuters 25. Zotero ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 18 Appendix 2. Tutorial Accessibility Evaluation Checklist Functionality � Equivalent alternate format(s) are provided � Transcript/test version � Audio � Other ___________________________ � Alternate formats provided are accessible � Alternate formats provided are findable on the page by screen reader � Screen reading software can find the video on the webpage � Screen-reading software can access and play the video � Video-player functions can by operated by keyboard/screen-reading software � Interactive content can be accessed and used by keyboard/screen-reading software � User has some control over timing (pause/rewind capability) � Alternate modes of presentation are available for all, meaning presented through text, visuals, narration, color, or shape � Synchronized closed captions are available for all audio � Audio/narration is descriptive Usability � User controls if/when the video starts (no auto play) � Video is easy to use by screen-reading software � Clear, high-contrast visuals and text � Clear, high-contrast audio (no background noise/music) � Uses visual cues to focus attention (e.g., highlighting, arrows) � Is short and concise � Is clearly and logically organized � Has consistent navigation, look, and feel � Uses simple language, avoids jargon, and defines unfamiliar terms � Explicit structure with sections, headings to give viewers context � Learning outcome/goal clearly outlined and content focused on outcome
9474 ---- June_ITAL_Rubel_final Picture Perfect: Using Photographic Previews to Enhance Realia Collections for Library Patrons and Staff Dejah T. Rubel INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 59 ABSTRACT Like many academic libraries, the Ferris Library for Information, Technology, and Education (FLITE) acquires a range of materials, including learning objects, to best suit our students’ needs. Some of these objects, such as the educational manipulatives and anatomical models, are common to academic libraries but others, such as the tabletop games, are not. After our liaison to the School of Education discovered some accessibility issues with Innovative Interfaces' Media Management module, we decided to examine all three of our realia collections to determine what our goals in providing catalog records and visual representations would be. Once we concluded that we needed photographic previews to both enhance discovery and speed circulation service, choosing processing methods for each collection became much easier. This article will discuss how we created enhanced records for all three realia collections including custom metadata, links to additional materials, and photographic previews. INTRODUCTION Ferris State University’s full-time enrollment for Fall 2015 was 14,715 students. Of these students, 10,216 are Big Rapids residents and the other 4,499 are either Kendall College of Art and Design students or at other off-campus sites across Michigan.1 During the 2014-2015 school year, FLITE had 14,647 check-outs including 2,558 check-outs of items in reserves, which is where our realia collections are located.2 However, reserves includes other items in addition to these collections, thus making analysis of circulation statistics problematic. Another problem with conducting such an analysis is that the educational manipulative collection already had photographic previews and the tabletop game collection is a pilot project, so there is no clear before and after comparison. We can, however, demonstrate that enhancing the catalog records for our anatomical model collection had an incredibly significant impact, jumping from a handful of check-outs from 2014-2015 to almost 450 in 2016. LITERATURE REVIEW Although there are very few libraries using photographic previews for their realia collections, the ones that do described similar limitations with bibliographic records and goals that only Dejah T .Rubel (rubeld@ferris.edu) is the Metadata and Electronic Resources Management Librarian, Ferris State University, Big Rapids, MI. PICTURE PERFECT: USING PHOTOGRAPHIC PREVIEWS TO ENHANCE REALIA COLLECTIONS FOR LIBRARY PATRONS AND STAFF | RUBEL | https://doi.org/10.6017/ital.v36i2.9474 60 photographic previews could meet. Most realia collections that warranted this extra effort are either curriculum materials or anatomical models, which is not surprising considering how difficult they are to describe. As Butler and Kvenild noted in their article on cataloging curriculum materials, “Patrons struggled to identify which game or kit they sought based on the…information in the online catalog,” because “Discovering curriculum materials in the catalog and getting a sense of the item are not easy when using traditional catalog descriptions...”3. As they continue, “The inventory and retrieval problems…were compounded by the fact that existing catalog records were not as descriptive as they should be.”4 This was also a problem for our collections because our names and descriptions were often not intuitive or precise. In addition, as Loesch and Deyrup discovered while cataloging their curriculum materials collection, “…there was great inconsistency among the OCLC records regarding the labeling of the format…,”5 which was another issue we needed to address. Although the General Material Designation (GMD) has since been rendered obsolete, FLITE continues to use it to highlight certain material. This choice is due to some limitations with our library management system as well as our discovery layer, namely the lack of good mapping or use of the 33X fields. Until this is rectified with a more modern system, we have it found it easier to retain certain GMDs like “sound recording”, “electronic resource”, and “realia”. Thus, we needed to standardize our terms for each collection. Another problem that our predecessors indicated photographic previews might resolve was missing objects or pieces of objects.6 This becomes especially important for our tabletop games collection because most of those pieces are very small and too numerous for a piece count upon return. Fortunately, “Previews…can aid users in making better decisions about potential relevance, and extract gist more accurately and rapidly than traditional hit lists provided by search engines.”7 Ideally, a preview will display an appropriate level of information about the object it represents in order “…to support users in making a correct judgement about the relevance of that object to the user’s information need.”8 Greene goes further by listing the main roles for previews of which the first two are the most applicable for photographic previews: aiding retrieval and aiding users in quickly making relevance decisions.9 For these uses, photographic previews of realia are ideal because users can examine the object without needing to see its details and they expect them to be abstract, not exhaustive, unlike digital surrogates that an archive would use.10 As Greene also notes, the high-level goal of any preview is to "...communicate the level and scope of objects to users so that comprehension is maximized and disorientation is minimized."11 A common finding among all the previous projects was that even a single photograph provides more readily comprehensible information than several lines of description. As Moeller states regarding their journal project, "They [previews of each issue's cover] give the researcher or student an immediate idea of the nature of the journal."12 He goes further to give the example of an innocuous journal title for a propagandist serial whose political nature is transparent once you view its imagery. From a staff perspective, photographic previews can also easily illustrate the number of INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 61 pieces and an object's condition or orientation. This can be very useful in determining whether something is missing or damaged without having to do a time-consuming individual piece count upon check-in. But as Butler and Kvenild discuss, layout within each photograph is key for illustrating missing pieces.13 Unfortunately, aside from a few small projects mentioned in Butler and Kvenild's article, there are not many examples of photographic previews for realia collections currently being used by academic libraries. One reason might be software limitations. Innovative's Media Management module is still unique among ILS/LMS software in that most vendors either provide a separate digital repository for special collections digital surrogates or they incorporate images into the catalog using third party software like Syndetic SolutionsTM. Another reason for the lack of photographic previews within catalogs may simply be the rarity of realia in academic libraries. Every library certainly has a few unique pieces, like a skeleton for the pre-medical students, but often not enough to consider them an entire collection much less a complex enough collection to warrant the extra effort to create photographic previews of each item. At FLITE, we had already crossed that threshold of complexity. Therefore, this article will start by discussing our educational manipulative collection, which provided the basis for how we would catalog and process the tabletop games and anatomical models. Educational Manipulative Collection Our first foray into creating photographic previews was completed by the previous Cataloger with over 300 items cataloged in 2004 and another 30-40 added to the collection over the next decade. Unlike the other realia collections, the educational manipulatives were cataloged using Innovative’s Course Reserves module, so no attempt was made to find or create OCLC records. Nevertheless, the minimal metadata is very consistent across the collection, which supports Greene’s recommendation “…that it was important to define a set of consistent attributes at the high level of the collection if any effective browsing across the collections was to be provided.”14 In our case, we rely on a combination of the GMD ([realia]), a custom call number prefix (TOYS Box #), and a limited amount of local subject headings as shown below with “Manipulatives” as the common subject for the entire collection. 690 = (d) Current local subject headings in use as of 12/3/15: Art. Infant/Toddler. Block props. Magnets. Boards. Manipulatives. Cognitive. Music. Discovery Box. Oversize books. Discovery. Posters. Dramatics. Puppets. Finger Puppets. Story apron. Flannel Board. Story props. Gross Motor. Woodworking. PICTURE PERFECT: USING PHOTOGRAPHIC PREVIEWS TO ENHANCE REALIA COLLECTIONS FOR LIBRARY PATRONS AND STAFF | RUBEL | https://doi.org/10.6017/ital.v36i2.9474 62 Due to the nature of descriptive metadata, photographic previews of the educational manipulatives made logical sense because “The images…are not the content. They are the metadata, the description of the materials.”15 As Moeller describes, Innovative’s Media Management module links images and many other file types directly to bibliographic records without requiring users to click an additional link unless they want to view a larger image of a thumbnail.16 Similar to Butler and Kvenild’s project, all of our photos were 900 pixels wide by 600 pixels tall, which is slightly smaller than their default width of 1000 pixels.17 One advantage of using the Media Management module is its ability to automatically create thumbnails 185 pixels wide by 85 pixels tall. A bigger advantage is that the images are hosted on the same server that runs our catalog, which allows us to freely distribute the images in an intuitive manner (thumbnails instead of links) without having to worry about authentication to a shared folder from off-campus, unlike our PDF files. Unfortunately, our liaison to the School of Education recently discovered some accessibility issues with Media Management that forced us to consider whether we should change the embedded photographic previews to external links. The most significant of these problems is simply the language of the proprietary viewer software. Because it is written in Java, if you click on a thumbnail for a larger image, many browsers, like Chrome, will not run it and those that will often require a security exception to do so. We have attempted to ameliorate some of these issues by providing an FAQ entry on which browsers are best for viewing these images and how to add a security exception for our website, but unless or until Innovative rewrites this software in a different language, these accessibility issues will persist because Java is being phased out of many browsers. Butler and Kvenild also noted its slow response time compared to their own server.18 Another issue they mentioned was that the thumbnails would not be visible in their consortial catalog, so they needed to add links in the 856 field for these users.19 This is less of an issue for us because we do not contribute any of our realia records to our consortia catalog, but Moeller’s concern that in general “…enhancements involving scanned images…will not be easily shared with other libraries,”20 is entirely valid. Unlike OCLC records, there is no way to share attached or embedded images as part of the metadata and not the content. Contrariwise, Butler and Kvenild’s concerns regarding catalog migration are very pertinent because we are considering moving to a new LMS within the next few years.21 Although we acknowledge that “Utilizing 856 tags is an indirect method of accessing the images, as users must take the intiative to follow the links,” we will eventually have to move and link our photographic previews to ensure accessibility after migration.22 Tabletop Game Collection Unlike the educational manipulatives, the majority of the tabletop game collection was previously cataloged in OCLC, so finding good bibliographic records was easy. Once downloaded, we decided to add a unique GMD ([game]), custom call number prefix (BOARD GAME Box #), and local subject heading “Tabletop games”. However, our Emerging Technlogies Librarian who coordinated this INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 63 pilot project felt that the single subject heading was not descriptive enough. So he gave us a spreadsheet with more specific subject headings such as “Deck Building”, “Historical”, and “Resource Management” that we added as genre/form subject headings in the 655_4 field. He also suggested that we add links to the rule books, which we did using the 856 field and the link text “connect to rule book (PDF)”. Because tabletop games are commercial products, finding images online was also easy. At first, we had some concerns about copyright, but we are not reselling these products or using the image as a replacement for the item. So, we concurred with Butler and Kvenild that “…the images in our project fall under copyright fair use.”23 Another plus to using commercial images is that we could use more than one to show various aspects of setup and play. The downside to this benefit is image sizes and content photographed varied widely, so we used our best judgement in creating labels and tried to keep them as consistent as possible. To ensure consistency across the collection, we decided that the first image should always be the top of the game’s box labeled “Box Cover” or “Box Cover – Front” if there was a “Box Cover – Back” image. (We only displayed the back of the box cover if there was significant information about the game printed on it.) Then we added up to five additional images showing parts of the game like “Card Examples”, “Game Pieces”, and “Game Set-up”. Overall, this number of images worked very well in both Encore’s Attached Media Viewer and the Classic Catalog/Web OPAC, but there is a slight duplication in images by Syndetic SolutionsTM for a few games. This results in a larger version of the box top image displaying to the right of the title and above the smaller thumbnails of images we added using Media Management. In regards to piece counts, we presumed that we would need photographic previews to aid in piece counting upon return of a tabletop game. However, our Emerging Technologies Librarian assured us that because we are an educational institution, we could contact the vendor for free replacement pieces at any time. He also emphasized that unlike the educational manipulatives or the anatomical models, this was a pilot collection, so extensive processing would not be a good investment of our labor. Fortunately, the anatomical model collection would require images for piece counts as well as several other cataloging customizations to increase discoverability and speed circulation. Anatomical Model Collection Similar to our educational manipulative collection, but not nearly as extensive, our anatomical model collection has been a part of FLITE since its inception. Unlike the manipulatives, which are used primarily by the early childhood education students, the anatomical models support a range of allied health programs including but not limited to dental hygiene, radiology, and nursing. The majority of our two dozen models were purchased in the 20th century and, like the manipulatives, the majority were cataloged using Innovative’s Course Reserves module. Unfortunately, none of these records were very descriptive, some being so poor as to be merely a title like “Jawbones” and a barcode. So, the first task was to match objects with OCLC records. Fortunately, this task PICTURE PERFECT: USING PHOTOGRAPHIC PREVIEWS TO ENHANCE REALIA COLLECTIONS FOR LIBRARY PATRONS AND STAFF | RUBEL | https://doi.org/10.6017/ital.v36i2.9474 64 became easier once we discovered that it was easier to match the object to the vendor’s catalog image and then search OCLC by vendor model name or number than it is to decipher written descriptions if you do not know human anatomy. Once good bibliographic records were downloaded, we decided to add one of three GMDs depending on the type of model ([model], [chart], or [flash card]), a custom call number prefix (MODEL #), and one or more of the local subject headings shown below. 690 = (d) Anatomy model. Anatomy chart. Anatomy models. Anatomy charts. Dental hygiene model. Dental model. Dental hygiene models. Dental models. Technically, all dental models could be used as anatomical models, but not vice versa. Therefore, the common subject headings for the collection are “Anatomy model” and “Anatomy models”. To make things easier to shelve, retrieve, and inventory, we also designed numeric ranges for the call numbers, as shown below, so we would know what type of model we should expect when referring to a specific model number. 099 = (c) MODEL #00X following this hierarchy: 001-099 Anatomical Charts and Flash Cards 100-199 Articulated Skeletons 200-299 Disarticulated Skeletons and Bone Kits 300-399 Organs 400-499 Skulls (anatomical and dental hygiene) 500-599 Other Dental Models (dental studies, dental decks) We also scanned and linked PDFs of the heavily worn model keys with the link text “connect to key PDF” before washing and rehousing all the models. Once they were clean, they were ready for their shoot with Ferris State University’s Media Production team. Due to winter break, Media Production was able to shoot the majority of the collection fairly quickly. They returned to us high-resolution TIFFs the same size as those for the manipulatives, 900 pixels by 600 pixels. In case of Java viewer failure, we requested that there be one top-level image that showcases exactly what the model contains with images of individual pieces or drawers as the succeeding images. For example, our disarticulated skeletons are housed in small plastic carts with three drawers in each cart. Therefore, the first image would be a shot of all the pieces of the disarticulated skeleton and the second image would be the contents of the top drawer, the third image the contents of the middle drawer, and the last image the contents of the bottom drawer. In this specific example, we re-used the images that we posted in the catalog INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 65 record by pasting them on top of the cart to show circulation staff what to expect in each drawer upon check-in. Overall, photographic previews for this collection appear to be working very well for both catalog users and circulation staff “…to inform users about size, extent, and availability of collections or objects.”24 In fact, they have been working so well for this collection that usage has increased exponentially compared to previous years. Figure 1. Circulation Statistics 2014-2016 CONCLUSIONS AND FUTURE DIRECTIONS Although we implemented photographic previews for three realia collections, we could not define any standard workflow for the process beyond correcting or downloading the metadata first and adding the images second. Part of this is due to our working primarily with legacy collections because we often discovered issues, like the model keys, while working through another issue. The other part is due to the nuances involved in processing realia in general. Even with good, readily available catalog records like those for the tabletop games, time still had to be spent separating, organizing, and rehousing game pieces as well as hunting down useful images. Unfortunately, any type of realia processing, even if it is just textual description, is much more time-consuming than the majority of academic library cataloging. Adding in the extra steps to create, upload, and link a photographic preview can nearly double that labor investment. Notwithstanding, as Butler and Kvenild advocate “…not supplying images as metadata for items that most need them (i.e. kits, games, and models) is to make them nearly irretrievable. Providing bare-bones traditional metadata for these items is analogous to delegating them to the backlog shelves of yesteryear.”25 367 317 114 10 1 444 24 0 50 100 150 200 250 300 350 400 450 500 2014 2015 2016 Circulation Statistics Manipulatives Models Games PICTURE PERFECT: USING PHOTOGRAPHIC PREVIEWS TO ENHANCE REALIA COLLECTIONS FOR LIBRARY PATRONS AND STAFF | RUBEL | https://doi.org/10.6017/ital.v36i2.9474 66 Unfortunately, neither the library management system nor the third-party catalog enhancement market currently provides a good solution to this problem. Considering how great an impact photographic previews have had in the online retail market, this lack of technical support is surprising. Yes, Syndetic SolutionsTM is a great product for cover images and tables of content for books. However, once you go beyond traditional resources, there is a great need to allow institutions to submit their own images as part of catalog record enhancement and not to serve as separate digital surrogates in a digital respository. This could be done either within the library management system, like the Media Management module, or as an option for catalog enhancement where libraries could add images to either a shared database or their own database using standard identifiers on a third-party platform like SyndeticsTM. Further research on photographic previews is also sorely needed. As of this writing, we only have a handful of case studies and some guiding philosophy on the use of previews. Consultation with internet retailers and literature on online marketing might be more applicable than library science research to evaluate their impact, but research into their direct impact vs. textual descriptions on catalog use would be ideal. REFERENCES 1. Fact Book 2015 – 2016 (Big Rapids, MI: Ferris State University Institutional Research & Testing, 2016), http://www.ferris.edu/HTMLS/admision/testing/factbook/FactBook15-16- 2.pdf, 47. 2. Ibid, 12. 3. Marcia Butler and Cassandra Kvenild, “Enhancing Catalog Records with Photographs for a Curriculum Materials Center,” Technical Services Quarterly 31 (2014): 122-138, https://doi.org/10.1080/07317131.2014.875377, 122-124. 4. Ibid, 126. 5. Martha Fallahay Loesch and Marta Mestrovic Deyrup, “Cataloging the Curriculum Library: New Procedures for Non-Traditional Formats,” Cataloging & Classification Quarterly 34, no. 4 (2002): 79-89, https://doi.org/10.1300/J104v34n04_08, 82. 6. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 128. 7. Stephan Greene, Gary Marchionini, Catherine Plaisant, and Ben Shneiderman, “Previews and Overviews in Digital Libraries: Designing Surrogates to Support Visual Information Seeking,” Journal of the American Society for Information Science 51, no. 4 (2000): 380-393, https://doi.org/10.1002/(SICI)1097-4571(2000) 51:4<380::AID-ASI7>3.0.CO;2-5, 381. 8. Ibid. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 67 9. Ibid, 384. 10. Ibid, 385. 11. Ibid. 12. Paul Moeller, “Enhancing Access to Rare Journals: Cover Images and Contents in the Online Catalog,” Serials Review 33, no. 4 (2007): 231-237, https://doi.org/10.1016/j.serrev.2007.09.003, 235. 13. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 128. 14. Greene et. al., “Previews and Overviews in Digital Libraries,” 388. 15. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 124. 16. Moeller, “Enhancing Access to Rare Journals,” 234. 17. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 129. 18. Ibid, 132. 19. Ibid, 126. 20. Moeller, “Enhancing Access to Rare Journals,” 237. 21. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 131. 22. Ibid, 135. 23. Ibid, 134. 24. Greene et. al., “Previews and Overviews in Digital Libraries,” 386. 25. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 136.
9526 ---- Microsoft Word - 9526-16430-5-CE.docx President’s Message: Reflections on LITA’s Past and Future Aimee Fifarek INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 2016 3 When I reached out to ITAL Editor Bob Gerrity about my first President’s Column, he graciously provided copies of past LITA Presidents’ columns to get me started. It reminded me once again of the illustrious company I am in, starting with Stephen R. Salmon, the first president of the Information Services and Automation Division, as we were known until 1977. I am proud to be at the head of LITA as it begins to celebrate its 50th Anniversary year. A half century ago when LITA was founded the world was experiencing an era of profound technological change. The US and Soviet Union were battling to be first in the Space Race, and an increasing number of world powers were engaging in nuclear testing. While Civil Rights demonstrations and the fighting in Vietnam dominated the news, we were imagining peace via the technologically-driven future depicted in a new TV series called Star Trek. With TV focused on the stars, we were able to go to the movies and explore the strange new world of inner space in Fantastic Voyage. Technology was poised to enter our daily lives as well, with Diebold demonstrating the first ATM1 and Ralph H. Baer writing the 4-page paper that would lay the foundation for the video game industry.2 Heady times for technology indeed, and the fact that Libraries were sufficiently advanced to require an Association dedicated to supporting technologists is hardly surprising. By the time of LITA’s founding at the 1966 Midwinter Meeting in Chicago, library automation had been in development for over a decade.3 MARC was just being invented, with the first tapes from the Library of Congress scheduled to go to the sixteen pilot libraries later that year. Membership in the only organization that existed, the Committee on Library Automation (COLA), was restricted to the handful of professionals who either developed or managed existing library systems. But technology was beginning to impact many more librarians than just those rarified few. According to President Salmon, “It was clear that large numbers of librarians who didn't meet COLA's standards for membership were in need of information on library automation and wanted leadership.”4 The first meeting of our Division on July 14, 1966 at the ALA Annual Conference in New York was attended by several hundred librarians interested in information sharing, technology standards, and technology training for library staff. This group created the first mission, vision, and bylaws that set us on a 50-year path of success. LITA is well positioned to take the first steps into our next 50 years. Thanks to the efforts of last year’s LITA Board, we are on the verge of adopting a new two-year strategic plan that is designed Aimee Fifarek (aimee.fifarek@phoenix.gov) is LITA President 2016-17 and Deputy Director for Customer Support, IT and Digital Initiatives at Phoenix Public Library, Phoenix, AZ. PRESIDENT’S MESSAGE | FIFAREK doi: 10.6017/ital.v35i3.9526 4 to guide us through the current transitional period. It will be accompanied by a tactical plan that will allow us to document our accomplishments and set the stage for an ongoing culture of continuous planning. Also, Jenny Levine has proven to be extremely capable as she completes her first year as LITA Executive Director. She has just the right combination of ALA experience, technology know-how, and calm competence to guide us through the retooling and reimagining that is required to take a middle-aged Association into the next phase of its life. The four areas of focus in the new strategic plan will help us to balance our efforts between preserving the strengths of our past and adapting our organization for a successful future. The first area of focus, Member Engagement, shows that our primary commitment needs to be to LITA members. Without you, LITA would not exist. One of the key efforts is to increase the value of LITA for members who are unable to travel to conferences. With travel budgets down and staying low, online member engagement is an area all of ALA needs to improve, and who better to lead in this area than LITA. The next area, Organizational Sustainability, is all about keeping the infrastructure of the organization strong, much of which happens in the domain of LITA staff. Budgeting, quality communication, and strategic planning all live here. The section on Education and Professional Development recognizes the important role that webinars, online courses, online journal, and print publications play in allowing LITA members to share their knowledge on both cutting edge and practical topics with the rest of the Association and ALA in general. We are already doing great work here and we need to better support and expand these efforts. The last focus area, Advocacy and Information Policy, represents a future growth area for LITA. Now that everyone in the library world "does" technology to a certain extent, LITA needs to think about how we will differentiate ourselves as outside competencies increase. Our advantage is that we have been doing and thinking about technology for much longer than anyone else. With our vast wealth of experience, it's appropriate that we work to become thought leaders and implementers in the information policy realm. In this, as always, we return to where we started: our members. LITA has thrived over the last 50 years because of this, our most important resource. LITA was founded on the concept of sharing information about technology through conversation, publications, and knowledge creation. We endure because you, the committed, passionate information professionals are willing to share what you know with those who come after. And like our founders, there are always individuals who are willing to take on the mantle of leadership, whether through getting elected to LITA Board, becoming a Committee or Interest Group Chair, serving in key editorial roles for our monographs, journal, and blog, or joining the all-important LITA Staff. Thanks to all of you who make LITA’s future happen every day. I am proud to be in your company. INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 1016 5 REFERENCES 1 . Alan Taylor, “50 years ago: a look back at 1966,” The Atlantic Photo, March 23, 2016, http://www.theatlantic.com/photo/2016/03/50-years-ago-a-look-back-at-1966/475074/, Photo 46. 2. “Take me back to August 30, 1966,” http://takemeback.to/30-August-1966#.V8SzItLrtaQ. 3. “Library Technology Timeline,” http://web.york.cuny.edu/~valero/timeline_reference_citations.htm. 4. Stephen R. Salmon, “LITA’s First 25 Years, a Brief History,” http://www.ala.org/lita/about/history/1st25years.
9527 ---- Editorial Board Thoughts: Requiring and Demonstrating Technical Skills for Library Employment Emily Morton-Owens INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 2016 6 Recently I’ve been involved in a number of conversations about technical skills for library jobs, sparked by an ITAL article by Monica Maceli1 and a code4lib presentation by Jennie Rose Halperin.2 Maceli performed a text analysis of job postings on code4lib to reveal what skills are co- occurring and most frequent. Halperin problematized the expense of the MLS credential in comparison to the qualifications actually required by library technology jobs and the salaries offered for technical versus nontechnical work. This work has inspired many conversations about the shift in skills required for library work, the value placed on different kinds of labor, and how MLS programs can teach library technology. During a period of hiring at my institution and through teaching a library school course in which many of the students are on the brink of graduation, my attention has been called particularly to one point in the library employment process: job postings. These advertisements are the first step in matching aspiring library staff with the real-life needs of libraries—where the rubber meets the road between employer expectations and new-grad experience. Most libraries already use the practice of distinguishing between required and preferred qualifications, which is a good start, especially for technology jobs where candidates may offer strong learning proficiency yet lack a few particular tools. Although there have been conflicting interpretations of the Hewlett-Packard research suggesting that men are more likely than women to apply to jobs when they don’t meet all the requirements,3 I observe a general tendency among graduating students to err on the side of caution because they’re not sure which qualifications they can claim. Among my students, for example, constant confusion attends the years of experience required. Is this library experience? General job experience? Experience at the same type of library? Paid or unpaid? Postings are often ambiguous and students may choose to apply or not. Similarly, there are questions about what extent of experience qualifies someone to know a technology: mastering it through creating new projects at a paid job, experience maintaining it, or merely basic familiarity? Not knowing who has been hired, and on the basis of what kind of experience, is a gap for researchers trying to close the loop on job advertisements. Even when a job posting has avoided an overlong list of required technical skills, it might still be expressing a narrow sense of what’s required to qualify. Someone who understands Subversion will be capable of understanding Git, so we see plenty of job advertisements that ask for experience with a “a version control system (e.g. Git, Subversion, or Mercurial).” I recently polled staff in our department and found very few of us with bachelor’s degrees in technical subjects. More of us had come to working in library technology through work experience or graduate programs. And yet, our job postings contained long statements that conflated education and experience, such as “Bachelor’s degree in Computer Science, Information Science, or other Emily Morton-Owens (egmowens@upenn.edu), a member of the ITAL Editorial Board, is Director of Digital Library Development and Systems, University of Pennsylvania Libraries, Philadelphia, Pennsylvania. mailto:egmowens@upenn.edu EDITORIAL BOARD THOUGHTS | MORTON-OWENS doi: 10.6017/ital.v35i3.9527 7 relevant field and at least 3 years of experience application development in Object Oriented and scripting languages or equivalent combination of education and experience. Master’s desirable.” I edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “Bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a Master’s degree is preferred,” followed by a separate description of technical skills needed. This increased the number and quality of our applications, so I’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. Meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. First, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. Second, they ask about possibilities to formalize skills. Recently, I’ve gotten questions about a certificate program in UX and whether there is any formal certification to be a systems librarian. Surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-MLS work experience—doesn’t suggest any standard method for substantiating technical knowledge. Once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. Some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,4 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. At Penn Libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. This gives us concrete code to discuss in a far more realistic and relaxed context. While it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new MLS grads for library technology jobs. The new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. Others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. Even if we make efforts to narrow the gap between employers and job- seekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. Library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. There persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. EDITORIAL BOARD THOUGHTS | MORTON-OWENS doi: 10.6017/ital.v35i3.9527 8 REFERENCES 1. Monica Maceli, “What Technology Skills Do Developers Need? A Text Analysis of Job Listings in Library and Information Science (LIS) from Jobs.code4lib.org,” Information Technology and Libraries 34 no3 (2015): 8-21, doi:10.6017/ital./v23i3.5893. 2. Jennie Rose Halperin, “Our $50,000 Problem: Why Library School?” code{4}lib, http://code4lib.org/conference/2015/halperin. 3. Tara Sophia Mohr, “Why Women Don’t Apply for Jobs Unless They’re 100% Qualified,” Harvard Business Review, August 25, 2014, https://hbr.org/2014/08/why-women-dont-apply-for-jobs- unless-theyre-100-qualified. 4. Erin Kissane, “Job Listings That Don’t Alienate,” https://storify.com/kissane/job-listings-that- don-t-alienate. http://dx.doi.org/10.6017/ital./v23i3.5893 http://code4lib.org/conference/2015/halperin https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://storify.com/kissane/job-listings-that-don-t-alienate https://storify.com/kissane/job-listings-that-don-t-alienate
9540 ---- December_ITAL_Maceli_final Technology Skills in the Workplace: Information Professionals’ Current Use and Future Aspirations Monica Maceli and John J. Burke INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 35 ABSTRACT Information technology serves as an essential tool for today’s information professional, and ongoing research is needed to assess the technological directions of the field over time. This paper presents the results of a survey of the technologies used by library and information science practitioners, with attention to the combinations of technologies employed and the technology skills that practitioners wish to learn. The most common technologies employed were email, office productivity tools, web browsers, library catalog- and database-searching tools, and printers, with programming topping the list of most-desired technology skill to learn. Similar technology usage patterns were observed for early and later-career practitioners. Findings also suggested the relative rarity of emerging technologies, such as the makerspace, in current practice. INTRODUCTION Over the past several decades, technology has rapidly moved from a specialized set of tools to an indispensable element of the library and information science (LIS) workplace, and today it is woven throughout all aspects of librarianship and the information professions. Information professionals engage with technology in traditional ways, such as working with integrated library systems, and in new innovative activities, such as mobile-app development or the creation of makerspaces.1 The vital role of technology has motivated a growing body of research literature, exploring the application of technology tools in the workplace, as well as within LIS education, to effectively prepare tech-savvy practitioners. Such work is instrumental to the progression of the field, and with the rapidly-changing technological landscape, requires ongoing attention from the research community. One of the most valuable perspectives in such research is that of the current practitioner. Understanding current information professionals’ technology use can help in understanding the role and shape of the LIS field, provide a baseline for related research efforts, and suggest future Monica Maceli (mmaceli@pratt.edu) is Assistant Professor, School of Information, Pratt Institute, New York. John J. Burke (burkejj@miamioh.edu) is Library Director and Principal Librarian, Gardner-Harvey Library, Miami University Middletown, Middletown, Ohio. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 36 directions. The practitioner perspective is also valuable in separating the hype that often surrounds emerging technologies from the reality of their use and application within the LIS field. This paper presents the results of a survey of LIS practitioners, oriented toward understanding the participants’ current technology use and future technology aspirations. The guiding research questions for this work are as follows: 1. What combinations of technology skillsets do LIS practitioners commonly use? 2. What combinations of technology skillsets do LIS practitioners desire to learn? 3. What technology skillsets do newer LIS practitioners use and desire to learn as compared to those with ten-plus years of experience in the field? LITERATURE REVIEW The growth and increasing diversity of technologies used in library settings has been matched by a desire to explore how these technologies impact expectations for LIS practitioner skill sets. Triumph and Beile examined the academic library job market in 2011 by describing the required qualifications for 957 positions posted on the ALA JobLIST and ARL Job Announcements websites.2 The authors also compared their results with similar studies conducted in 1996 and 1988 to see if they could track changes in requirements over a twenty-three-year period. They found that the number of distinct job titles increased in each survey because of the addition of new technologies to the library work environment that require positions focused on handling them. The comparison also found that computer skills as a position requirement increased by 100 percent between 1988 and 2011, with 55 percent of 2011 announcements requiring them. Looking more deeply at the technology requirements specifically, Mathews and Pardue conducted a content analysis of 620 jobs ads from the ALA JobList to identify skills required in those positions.3 The top technology competencies required were web development, project management, systems development, systems applications, networking, and programming languages. They found a significant overlap of librarian skill sets with those of IT professionals, particularly in the areas of web development, project management, and information systems. Riley-Huff and Rholes found that the most commonly sought technology-related job titles were systems/automation librarian, digital librarian, emerging and instructional technology librarian, web services/development librarian, and electronic resources librarian.4 A few years later, Maceli added to this list with newly popular technology-relating titles, including emerging technologies librarian, metadata librarian, and user experience/architect librarian.5 Beyond examining which specific technologies librarians should be able to use, researchers have also pondered whether a list of skills is even possible to create. Crawford synthesized a series of blog posts from various authors to discuss which technology skills are essential and which are too specialized to serve as minimum technology requirements for librarians.6 He questioned whether universal skill sets should be established given the variety of tasks within libraries and the unique backgrounds of each library worker. Crawford also questioned the expectation that every librarian INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 37 will have a broad array of technology skills from programming to video editing to game design and device troubleshooting. Partridge et al. reported on a series of focus groups held with 76 librarians that examined the skills required for members of the profession, especially those addressing technology.7 In the questions they asked the focus groups, the authors focused on the term “library 2.0” and attempted to gather suggestions on skills that current and future librarians need to assist users. They concluded that the groups identified that a change in attitudes by librarians was more important to future library service than the acquisition of skills with specific technology tools. Importance was given to librarians’ abilities to stay aware of technological changes, be resilient and reflective in the face of them, and to communicate regularly and clearly with the members of their communities. Another area examined in the studies is where the acquisition of technology skills should and does happen for librarians. Riley-Huff and Rholes reported on a dual approach to measure librarians’ preparation for performing technology-related tasks.8 The authors assessed course offerings for LIS programs to see if they included sufficient technology preparation for new graduates to succeed in the workplace. They then surveyed LIS practitioners and administrators to learn how they acquired their skills and how difficult it is to find candidates with enough technology preparation for library positions. Their findings suggest that while LIS programs offer many technology courses, they lack standardization, and graduates of any program cannot be expected to have a broad education in library technologies. Further research confirmed this troubling lack of consistency in technology-related curricula. Singh and Mehra assessed a variety of stakeholders, including students, employers, educators, and professional organizations, finding widespread concern about the coverage of technology topics in LIS curricula.9 Despite inconsistencies between individual programs, several studies provided a holistic view of the popular technology offerings within LIS curricula. Programs commonly offered one or more introductory technology courses, as well as courses in database design and development, web design and development, digital libraries, systems analysis, and metadata.10,11,12 As researchers have emphasized from a variety of perspectives, new graduates could not realistically be expected to know every technology with application to the field of information.13 There was widespread acknowledgement that learning in this area can, and must, continue in a lifelong fashion throughout one’s career. Riley-Huff and Rholes reported that LIS practitioners saw their own experiences involving continuing skill development on the job, both before and after taking on a technology role.14 However, literature going back many decades suggests that the increasing need for continuing education in information technology has generally not been matched by increasing organizational support for these ventures. Numerous deterrents to continuing technology education were noted, including lack of time,15 organizational climate, and the perception of one’s age.16 While studies in this area have primarily focused on MLS-level positions, Jones reported on academic library support staff members and their perceptions of technology use over a ten-year period and found that increased technology responsibilities added TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 38 to workloads and increased workplace stress.17 Respondents noted that increasing use of technology in their libraries has increased their individual workloads along with the range of responsibilities that they hold. METHOD To build an understanding of the research questions stated above, which focus on the technologies currently used by information professionals and those they desired to learn, we designed and administered a thirteen-question anonymous survey (see appendix) to the subscribers of thirty library-focused electronic discussion groups between February 25 and March 13, 2015. The groups were chosen to target respondents employed in multiple types of libraries (academic, public, school, and special) with a wide array of roles in their libraries (public services librarians, systems staff members, catalogers, and so on). We solicited respondents with an email sent to the groups asking for their participation in the survey and with the promise to post initial results to the same groups. The survey included closed and open-ended questions oriented toward understanding current technology use and future aspirations as well as capturing demographics useful in interpreting and generalizing the results. The survey questions have been previously used and iteratively expanded over time by the second author, first in the fall of 2008, then spring of 2012, with summative results presented in the last three editions of the Neal-Schuman Library Technology Companion. We obtained a total of 2,216 responses to the question, “Which of the following technologies or technology skills are you expected to use in your job on a regular basis?” Of these responses, 1,488 (67 percent) of the respondents answered the question regarding technologies they would like to learn: “What technology skill would you like to learn to help you do your job better?” We conducted basic reporting of response frequency for closed questions to assess and report the demographics of the respondents. To analyze the open-ended survey question results in greater depth, we conducted a textual analysis using the R statistical package (https://www.r-project.org/). We used the tm (text mining) package in R (http://CRAN.R- project.org/package=tm) to calculate frequency, correlation of terms, generate plots, and cluster terms. RESULTS The following section will first present an overview of survey responses and respondents, and then explore results as related to the stated four research questions. The LIS practitioners who responded to the survey reported that their libraries are located in forty US states, eight Canadian provinces, and forty-three other countries. Academic libraries were the most common type of library represented, followed by public, school, special, and other (see table 1). INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 39 Library Type Number of Respondents Percentage of All Respondents Academic 1,206 54.4 Public 545 24.6 School 266 12 Special 138 6.2 Other 61 2.8 Table 1. The types of libraries in which survey respondents work Respondents also provided their highest level of education. A total of 77 percent of responding LIS practitioners have earned a library-related or other master’s degrees, dual master’s degrees, or doctoral degrees. From these reported levels of education, it is likely that more respondents are in librarian positions than in library support staff positions. However, individuals with master’s degrees serve in various roles in library organizations, so the percentage of graduate degree holders may not map exactly to the percentage of individuals in positions that require those degrees. Significantly fewer respondents (16 percent) reported holding a high school diploma, some college credit, an associate degree, or a bachelor’s degree as their highest level of education. Another aspect we measured in the survey was tasks that respondents performed on a regular basis. The range of tasks provided in the survey allowed for a clearer analysis of job responsibilities than broad categories of library work such as “public services” or “technical services.” Some respondents appeared to be employed in solo librarian environments where they are performing several roles. Even respondents who might have more focused job titles such as “reference librarian” or “cataloger” may be performing tasks that overlap traditional roles and categories of library work. The tasks offered in the survey and the responses to each are shown in table 2. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 40 Task Number of Respondents Percentage of Respondents Reference 1,404 63.4 Instruction 1,296 58.5 Collection development 1,260 56.9 Circulation 917 41.4 Cataloging 905 40.8 Electronic resource management 835 37.7 Acquisitions 789 35.6 User experience 775 35 Library administration 769 34.7 Outreach 758 34.2 Marketing/public relations 722 32.6 Library/IT systems 672 30.3 Periodicals/serials 659 29.7 Media/audiovisuals 566 25.5 Interlibrary loan 518 23.4 Distance library services 474 21.4 Archives/special collections 437 19 Other 209 9.40% Table 2. Tasks performed on a regular basis by survey respondents While public services-related activities lead the list, with reference, instruction, collection development, and circulation as the top four task areas, technical services-related activities are well represented; the next three in rank are cataloging, electronic resource management, and acquisitions. The overall list of tasks shows the diversity of work LIS practitioners engage in, as each respondent chose an average of six tasks. The results also suggest that the survey respondents are well acquainted with a wide variety of library work rather than only having experience in a few areas, making their uses of technology more representative of the broader library world. The survey also questioned the barriers LIS practitioners face as they try to add more technology to their libraries, and 2,161 respondents replied to the question, “Which of the following are barriers to new technology adoption in your library?” Financial considerations proved to be the most common barrier, with “budget” chosen by 80.7 percent of respondents, followed by “lack of staff time” (62.4 percent), “lack of staff with appropriate skill sets” (48.5 percent), and “administrative restrictions” (36.7 percent). INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 41 What Combinations of Technology Skillsets do LIS Practitioners Commonly Use? Responses from survey question 8, “Which of the following technologies or technology skills are you expected to use in your job on a regular basis?,” were analyzed to build an understanding of this research questions. A total of 2,216 responses to this question were received. Survey respondents were asked to select from a detailed list of technologies/skills (visible in question 8 of the appendix) that they regularly used. The top answers respondents chose for this question were: email, word processing, web browser, library catalog (public side), and library database searching. The full list of the top twenty-five technology skills and tools used is detailed in figure 1, with the list of the bottom fifteen technology skills used presented in figure 2. Figure 1. Top twenty-five technology skills/tools used by respondents (N = 2,216) 0 500 1,000 1,500 2,000 Email Word Processing Web Browser Library Catalog Public Side Library Database Searching Spreadsheets Printers Web Searching Teaching Others To Use Technology Presentation Software Windows OS Laptops Scanners Library Management System Staff Side Downloadable Ebooks Web Based Ebook Collections Cloud Based Storage Technology Troubleshooting Teaching Using Technology Online Instructional Materials/Products Tablets Web Video Conferencing Educational Copyright Knowledge Library Website Creation Or Management Cloud-Based Productivity Apps TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 42 Figure 2. Bottom fifteen technology skills/tools used by respondents (N = 2,216) Text analysis techniques were then used to determine the frequent combinations of technology skills used in practice. First, a clustering approach was taken to visualize the most popular technologies that were commonly used in combination (figure 3). Clustering helps in organizing and categorizing a large dataset when the categories are not known in advance, and, when plotted in a dendrogram chart, assists in visualizing these commonly co-occurring terms. The authors numbered the clusters identified in figure 3 for ease of reference. From left to right, the first cluster is focuses on communication and educational tools, the second emphasizes devices and software, the third contains web and multimedia creation tools, the fourth contains office productivity and public-facing information retrieval tools, and the fifth cluster has a diverse collection of responsibilities including systems-oriented responsibilities (from operating systems to specific hardware devices), working with ebooks, teaching with technology, and teaching technology to others. 0 500 1,000 1,500 2,000 Mac OS Audio Recording And Editing Technology Equipment Installation Computer Programming Or Coding Assistive Adaptive Technology RFID Chromebooks Network Management Server Management Statistical Analysis Software Makerspace Technologies Linux 3D Printers Augmented Reality Virtual Reality INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 43 Figure 3. Cluster analysis of most frequent technology skills used in practice, with red outlines on each numbered cluster Notably, the list of top skills used (figure 1) falls more on the end-user side of technology; skills more oriented toward systems work (e.g. Linux, server management, computer programming, or coding) were less frequently mentioned, and several were among the lowest reported (figure 2). Of the 2,216 respondents, 15 percent used programming or coding skills regularly in their job (which is of interest as programming or coding was the skill most desired to learn by respondents; this will be discussed further in the context of the next research question). Plotting the correlations between the more advanced technology skillsets can provide a picture of the work such systems-oriented positions are commonly responsible for, particularly as they are less well represented in the responses as a whole. Figure 4 plots the correlated terms for those tasked with “server management.” It is fair to assume someone with such responsibilities falls on the highly technical end of the spectrum. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 44 Figure 4. Terms correlated with “server management,” indicating commonly co-occurring workplace technologies for highly-technical positions The more common task of “library website creation or management,” which fell to those with a broad level of technological expertise, had numerous correlated terms. Figure 5 demonstrated a wide array of technology tools and responsibilities. Figure 5. Terms correlated with “library website creation or management,” indicating commonly co-occurring technologies used on the job INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 45 And lastly, teaching using technology and teaching technology to others is a long-standing responsibility of librarians and library staff. The following plot (figure 6) presents the skills correlated with “teaching others to use technology.” Figure 6. Terms correlated with “teaching others to use technology,” indicating commonly co- occurring technologies used on the job What Combinations of Technology Skillsets do LIS Practitioners Desire to Learn? We analyzed responses to survey question 10, “What technology skill would you like to learn to help you do your job better?,” to explore this research question. As summarized in Burke18—and consistent with the prior year’s findings—coding or programming remained the most desired technology skillset, mentioned by 19 percent of respondents. The raw text analysis yielded a fuller list of the top terms mentioned by participants (table 3 and visualized in figure 7). TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 46 Technology Term Number of Respondents Percentage of Respondents Coding or programming (combined for reporting) 292 19.59 Web 178 11.96 Software 158 10.62 Video 112 7.53 Apps 106 7.12 Editing 105 7.06 Design 85 5.71 Database 76 5.11 Table 3. Terms mentioned by 5 percent or more of survey respondents Figure 7. Wordcloud of responses to “what technology skill would you like to learn to help you do your job better?” INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 47 We then explored the deeper context of responses and individually analyzed responses specific to the more popular technology desires. First, we assessed the responses mentioning the desire to learn coding or programming. Of these responses, the most common specific technologies mentioned were HTML, Python, CSS, JavaScript, Ruby, and SQL, listed in decreasing order of interest. Although most participants did not describe what they would like to do with their desired coding or programming skills, of those that did, the responses indicated interest in ● becoming more empowered to solve their own technology problems (e.g., “I would like to learn the [programming languages] so I don't have to rely on others to help with our website,” “I’m one of the most tech-skilled people at my library, but I’d like to be able to build more of my own tools and manage systems without needing someone from IT or outside support.”); ● improving communication with IT (e.g., “how to speak code, to aid in communication with IT,” “to better identify problems and work with IT to fix them”); ● creating novel tools and improving system interoperability (e.g. “coding for app and API creation”); and ● bringing new technologies to their library and patrons (e.g., “coding so that I can incorporate a hackerspace in my library”). Next, we took a clustering approach to visualize the terms commonly desired in combination. Figure 8 describes the clustered terms that we found within the programming or coding responses. The terms “programming” and “coding” form a distinct cluster to the right of the diagram, indicating that many responses contained only those two terms. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 48 Figure 8. Clustering of terms present in responses indicating the desire to learn coding or programming The remaining portion of the diagram begins to illustrate the specific technologies mentioned for those respondents that answered in greater detail or expanded on their general answer of programming or coding. Other related desired technology-skill areas become apparent: database management, HTML and CSS (as well as the more general “web design,” which appeared in the top terms in table 3), PHP and JavaScript, Python and SQL, and XML creation, among others. The bulleted list presented in the previous paragraph illustrates some of the potential applications participants envisioned these skills being useful in, but the majority did not provide this level of detail in their response. Editing was another prominent term that appeared across participant responses and was largely meant in the context of video editing. Because of the vagueness of the term “editing,” a closer look was necessary to determine other technology desires. Looking at terms highly correlated with “editing” revealed both video and photo editing to be important to respondents. Several of the top- appearing terms were used more generally: “database” and mobile “apps” were mentioned without specifying the technology tool or scenario of use, such that a more contextual analysis could not be conducted. These responses can be particularly difficult to interpret as the term “databases” can have a technical meaning (e.g., working with SQL) or it can refer to the use of library databases from an end user perspective. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 49 What Technology Skillsets do Newer LIS Practitioners Use and Desire to Learn as Compared to Those with Ten-Plus Years Experience in the Field? Of the 2,216 survey responses, 877 stated they had worked in libraries for ten or fewer years. We analyzed these responses separately from the remaining 1,334 respondents who had worked in libraries for more than ten years. Of this group, 644 had worked in libraries for twenty-plus years (figure 9). A handful of participants did not answer the question and were omitted from the analysis. Figure 9. Number of survey responses falling into the various categories for number of years working in libraries The top technology skills used in the workplace did not differ significantly between the different groups. The top skills, as discussed earlier and presented in figure 1, were well represented and similarly ordered. A few small percentage points of difference were noted in a handful of the top skills (figure 10). Those newer to the field were slightly more likely to teach others to use technology, use cloud-based storage, and use cloud-based productivity apps. More experienced practitioners regularly used the library management system (on the staff side) more than those that were newer to the field. 0 100 200 300 400 500 600 700 0-2 3-5 6-10 11-15 16-20 21+ TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 50 Figure 10. Top twenty-five technology skills used by respondents in the zero to ten years’ experience (dark blue) and eleven-plus years experience (light blue) groups For the question regarding technologies they would like to learn, 69 percent of the participants with zero to ten years’ experience answered the question compared to a slightly smaller 65 percent of the participants with more than ten-years’ experience. Top terms for both groups were very similar, including coding or programming, software, web, video, design, and editing. These terms were not dissimilar to the responses taken as a whole (table 3), indicating that respondents were generally interested in learning the same sorts of technology skills regardless of how long they had been in the field. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Email Word Processing Web Browser Library Catalog Public Side Library Database Searching Spreadsheets Web Searching Printers Teaching Others To Use Technology Presentation Software Windows OS Laptops Scanners Downloadable Ebooks Cloud Based Storage Library Management System Staff Side Web Based Ebook Collections Technology Troubleshooting Teaching Using Technology Online Instructional Materials/Products Cloud-based Productivity Apps Tablets Web Video Conferencing Library Website Creation Or Management Educational Copyright Knowledge 0-10 Years 11+ Years INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 51 A few noticeable differences between the two groups emerged. The most popular skills mentioned, coding or programming, were mentioned by 28 percent of the respondents with zero to ten years’ experience, and by 15 percent of the respondents with eleven-plus years experience. There was slightly more interest (by a few percentage points) in databases, design, Python, and Ruby in the zero to ten years’ experience group. Taking a closer look at the different year ranges in the zero to ten years of experience or less group, revealed that those with three to five years of experience were most likely to be interested in learning coding or programming skills. Figure 11. Percentage of respondents interested in learning coding or programming in the groups with ten or fewer years’ experience Of the participants that answered the question at all, several stated that there were no technology skills they would need or like to learn for their position, either because they were comfortable with their existing skills or were simply open to learning more as needed (but nothing specific came to mind). Combined with those who did not answer the question (and so presumably did not have a particular technology they were interested in learning), 28 percent of the zero to ten years’ experience group and 31 percent of the eleven-plus years experience group did not have any technologies that they desired to learn at the moment. DISCUSSION As detailed earlier, the most common technologies employed by LIS practitioners were email, office productivity tools, web browsers, library catalog and database searching tools, and printers. Generally similar technology usage patterns were observed for early and later-career practitioners and programming topped the list of most-desired technology skill to learn. 0% 5% 10% 15% 20% 25% 30% 35% 0-2 years 3-5 years 6-10 years TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 52 The cluster analysis presented in figure 3 suggests that a relatively small percentage of practitioners have technology-intensive roles that would require skills such as programming, working with databases, systems administration, etc. Rather, the cluster analysis showed common technology skillsets focused on the end-user side of technology tools. In fact, most of the top ten skills used—email, office productivity tools (word processing, spreadsheets and presentation software), web browsers, library catalog and database searching, printers, and teaching others to use technology—are fairly nontechnical in nature. A potential exception is that of teaching technology. Figure 6 suggests that teaching others to use technology entails several hardware devices (for example, laptops, tablets, smartphones, and scanners) as well as online and digital resources, such as ebooks. However, most of the popular skills used would be considered baseline skills for information workers in any domain. As suggested by Tennant, programming and other advanced technical skills do not necessarily need to be a core skill for all information professionals, but knowledge of the potential applications and possibilities of such tools is required.19 This idea was echoed by Partridge et al., whose findings emphasized the need for awareness and resilience in tackling new technological developments.20 These skills alone would obviously be too little for LIS practitioners explicitly seeking a high-tech role, as discussed in Maceli.21 However, further research directed toward exploring the mental models and general technological understanding of information professionals would be helpful in understanding the true level of practitioner engagement with technology, to complement the list of relatively low-tech tools employed. Programming has been a skill of great interest within the information professions for many years and the respondents’ enthusiasm and desire to learn in this area was readily apparent from the survey results, with nearly 20 percent of participants citing either “programming” or “coding” as a skill they desired to learn. In the context of their current responsibilities, 15 percent of respondents overall mentioned “computer programming or coding” as a regular technological skill they employed (figure 2). There was a slight difference between the librarians with fewer than eleven years of experience—19 percent coded regularly—compared to 13 percent of those with eleven or more years of experience. Within the years-of-experience divisions, the newer practitioners were more interested in learning programming, with the peak of interest at three to five years in the workplace (figure 11). The relatively low interest or need to learn programming in the newest practitioners potentially indicates a hopeful finding—that their degree program was sufficient preparation for the early years of their career. Prior research would contradict this finding. For example, Choi and Rasmussen’s 2006 survey found that, in the workplace, librarians frequently felt unprepared in their knowledge of programming and scripting languages.22 In the intervening years, curriculum has shifted to more heavily emphasize technology skills, including web development and other topics covering programming,23 perhaps better preparing early career practitioners. Overall, INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 53 programming remains a popular skill in continuing education opportunities as well as in job listings,24 which aligns well with the respondents’ strong interest in this area. The skills commonly co-occurring with programming in practice included working with Linux, database software, managing servers, and webpage creation (figure 4). Taken as a whole, these skills indicate job responsibilities falling toward the systems side, with webpage creation a skill that bridged intensely technical and more user-focused work (as also evident in figure 4).This indicates that, though programming may be perceived as highly desirable for communicating and extending systems, as a formal job responsibility it may still fall to a relatively small number of information professionals in any significant manner. Makerspace technologies and their implementation possibilities within libraries have garnered a great deal of excitement and interest in recent years, with much literature highlighting innovative projects in this area (such as American Library Association25 and Bagley26). Fourie and Meyer provided an overview of the existing makerspace literature, finding that most research efforts focus on the needs and construction of the physical space.27 Given the general popularity of the topic (as detailed in Moorefield-Lang),28 it is interesting to note that such technologies were infrequently mentioned by survey participants, both in those desiring to learn these tools and those who were currently using them. The most infrequent skills used (figure 2) included makerspace technologies, 3D printers, augmented, and virtual reality. Only a small number of respondents currently used this mix of makerspace-oriented and emerging technologies, and only 3 percent of respondents mentioned interest in learning makespace-related skills. Despite many research efforts exploring the particulars of unique makerspaces in a case-study approach (for example, Moorefield-Lang),29 little data exists on the total number of makerspaces within libraries, and the skillset is largely absent from prior research describing LIS curriculum and job listings. This makes it difficult to determine whether the low number of participants that reported working with makerspace technologies is reflective of the small number of such spaces in existence or simply that few practitioners are assigned to work in this area, no matter their popularity. In either case, these findings provide a useful baseline with which to track the growth of makerspace offerings over time and librarian involvement in such intensely technological work. Despite the interest and clear willingness to learn and use technology, several workplace challenges became apparent from participant responses. As prior research explored (notable Riley-Huff and Rholes),30 practitioners assumed they would be continually learning and building skills on the job throughout their career to stay current technologically. As described in the earlier results section, many participants mentioned that, although they were highly willing and able to learn, the necessary organizational resources were lacking. As one participant noted, “I’d like to learn anything but the biggest problem seems to be budget (time and monetary).” Several participants expressed feeling overwhelmed with their current workload. New learning opportunities, technological or otherwise, were simply not feasible. Although the survey results indicated that practitioners of all ages were roughly equally interested in learning new TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 54 technologies, a handful of responses mentioned that ageist issues were creating barriers. Though few, these respondents described being dismissed as technologists because of their age. These themes have long been noted in the large body of continuing-education-related literature going back several decades. Stone’s study ranked lack of time as the top deterrent to professional development for librarians, and it appears little has changed.31 Chan and Auster noted that organizational climate and the perception of one’s age may impair the pursuit of professional development, among other impediments.32 However, research has noted a generally strong drive in older librarians to continue their education; Long and Applegate found a preference in later- career librarians for learning outlets provided by formal library schools and related professional organizations, but a lower interest in generally popular topics such as programming.33 These findings were consistent with the participant responses gathered in this survey. Finally, as detailed in the results section, a significant percent of respondents (33 percent) did not answer the question regarding what technologies they would like to learn. As is a limitation with survey research, it is difficult to know what the respondent’s intention was in not answering the question, i.e., are they comfortable with their current technology skills? Do they lack the time or interest in pursuing further technology education? And of those that did answer, many did not specify their intended use of the technologies they desired to learn. So a deeper exploration of what technologies LIS practitioners desire to learn and why would be of value as well. These questions are worth pursuing in more depth through further research efforts. CONCLUSION This study provides a broad view into the technologies that LIS practitioners currently use and desire to learn, across a variety of types of libraries, through an analysis of survey responses. Despite a marked enthusiasm toward using and learning technology, respondents described serious organizational limitations impairing their ability to grow in these areas. The LIS practitioners surveyed have interested patrons, see technology as part of their mission, and are not satisfied with the current state of affairs, but they seem to lack money, time, skills, and a willing library administration. Though respondents expressed a great deal of interest in more advanced technology topics, such as programming, the majority typically engaged with technology on an end-user level, with a minority engaged in deeply technical work. This study suggests future work in exploring information professionals’ conceptual understanding of and attitudes toward technology, and a deeper look at the reasoning behind those who did not express a desire to learn new technologies. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 55 REFERENCES 1. Marshall Breeding, “Library Technology: The Next Generation,” Computers in Libraries 33, no. 8 (2013): 16–18, http://librarytechnology.org/repository/item.pl?id=18554. 2. Therese F. Triumph and Penny M. Beile, “The Trending Academic Library Job Market: An Analysis of Library Position Announcements from 2011 with Comparisons to 1996 and 1988,” College & Research Libraries 76, no. 6 (2015): 716–39, https://doi.org/10.5860/crl.76.6.716. 3. Janie M. Mathews and Harold Pardue, “The Presence of IT Skill Sets on Librarian Position Announcements,” College & Research Libraries 70, no. 3 (2009): 250–57, https://doi.org/10.5860/crl.70.3.250. 4. Debra A. Riley-Huff and Julia M. Rholes, “Librarians and Technology Skill Acquisition: Issues and Perspectives,” Information Technology and Libraries 30, no. 3 (2011): 129–40, https://doi.org/10.6017/ital.v30i3.1770. 5. Monica Maceli, “Creating Tomorrow’s Technologists: Contrasting Information Technology Curriculum in North American Library and Information Science Graduate Programs against Code4lib Job Listings,” Journal of Education for Library and Information Science 56, no. 3 (2015): 198–212, https://doi.org/10.12783/issn.2328-2967/56/3/3. 6. Walt Crawford, “Making it Work Perspective: Techno and Techmusts,” Cites and Insights 8, no. 4 (2008): 23–28. 7. Helen Partridge et al., “The Contemporary Librarian: Skills, Knowledge and Attributes Required in a World -f Emerging Technologies,” Library & Information Science Research 32, no. 4 (2010): 265–71, https://doi.org/10.1016/j.lisr.2010.07.001. 8. Riley-Huff and Rholes, “Librarians and Technology Skill Acquisition.” 9. Vandana Singh and Bharat Mehra, “Strengths and Weaknesses of the Information Technology Curriculum in Library and Information Science Graduate Programs,” Journal of Librarianship and Information Science 45, no. 3 (2013): 219–231, https://doi.org/10.1177/0961000612448206. 10. Riley-Huff and Rholes, “Librarians and Technology Skill Acquisition.” 11. Sharon Hu, “Technology Impacts on Curriculum of Library and Information Science (LIS)—A United States (US) Perspective,” LIBRES: Library & Information Science Research Electronic Journal 23, no. 2 (2013): 1–9, http://www.libres-ejournal.info/1033/. 12. Singh and Mehra, “Strengths and Weaknesses of the Information Technology Curriculum.” 13. See, for example, Crawford, “Making it Work Perspective”; Partridge et al., “The Contemporary Librarian.” TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 56 14. Riley-Huff and Rholes, “Librarians and Technology Skill Acquisition.” 15. Elizabeth W. Stone, Factors Related to the Professional Development of Librarians (Metuchen, NJ: Scarecrow, 1969). 16. Donna C. Chan and Ethel Auster, “Factors Contributing to the Professional Development of Reference Librarians,” Library & Information Science Research 25, no. 3 (2004): 265–86, https://doi.org/10.1016/S0740-8188(03)00030-6. 17. Dorothy E. Jones, “Ten Years Later: Support Staff Perceptions and Opinions on Technology in the Workplace,” Library Trends 47, no. 4 (1999): 711–45. 18. John J. Burke, The Neal-Schuman Library Technology Companion: A Basic Guide for Library Staff, 5th edition (New York: Neal-Schuman, 2016). 19. Roy Tennant, “The Digital Librarian Shortage,” Library Journal 127, no. 5 (2002): 32. 20. Partridge et al., “The Contemporary Librarian.” 21. Monica Maceli, “What Technology Skills Do Developers Need? A Text Analysis of Job Listings in Library and Information Science (LIS) from Jobs.code4lib.org,” Information Technology and Libraries 34, no. 3 (2015): 8–21, https://doi.org/10.6017/ital.v34i3.5893. 22. Youngok Choi and Edie Rasmussen, “What Is Needed to Educate Future Digital Libraries: A Study of Current Practice and Staffing Patterns in Academic and Research Libraries,” D-Lib Magazine 12, no. 9 (2006), http://www.dlib.org/dlib/september06/choi/09choi.html. 23. See, for example, Maceli, “Creating Tomorrow's Technologists.” 24. Elías Tzoc and John Millard, “Technical Skills for New Digital Librarians,” Library Hi Tech News 28, no. 8 (2011): 11–15, https://doi.org/10.1108/07419051111187851. 25. American Library Association, “Manufacturing Makerspaces,” American Libraries 44, no. 1/2 (2013), https://americanlibrariesmagazine.org/2013/02/06/manufacturing-makerspaces/. 26. Caitlin A. Bagley, Makerspaces: Top Trailblazing Projects, A LITA Guide (Chicago: American Library Association, 2014). 27. Ina Fourie and Anika Meyer, “What to Make of Makerspaces: Tools and DIY Only or is there an Interconnected Information Resources Space?,” Library Hi Tech 33, no. 4 (2015): 519–25, https://doi.org/10.1108/LHT-09-2015-0092. 28. Heather Moorefield-Lang, “Change in the Making: Makerspaces and the Ever-Changing Landscape of Libraries,” TechTrends 59, no. 3 (2015): 107–12, https://doi.org/10.1007/s11528-015-0860-z. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 57 29. Heather Moorefield-Lang, “Makers in the Library: Case Studies of 3D Printers and Maker Spaces in Library Settings,” Library Hi Tech 32, no. 4 (2014): 583–93, https://doi.org/10.1108/LHT-06-2014-0056. 30. Riley-Huff and Rholes, “Librarians and Technology Skill Acquisition.” 31. Stone, Factors Related to the Professional Development of Librarians. 32. Chan and Auster, “Factors Contributing to the Professional Development of Reference Librarians.” 33. Chris E. Long and Rachel Applegate, “Bridging the Gap in Digital Library Continuing Education: How Librarians Who Were Not ‘Born Digital’ Are Keeping Up,” Library Leadership & Management 22, no. 4 (2008), https://journals.tdl.org/llm/index.php/llm/article/view/1744. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 58 Appendix. Survey Questions 1. What type of library do you work in? 2. Where is your library located (state/province/country)? 3. What is your job title? 4. What is your highest level of education? 5. Which of the following methods have you used to learn about technologies and how to use them? Please mark all that apply. • Articles • As part of a degree I earned • Books • Coworkers • Face-to-face credit courses • Face-to-face training sessions • Library patrons • Online credit courses • Online training sessions (webinars, etc.) • Practice and experiment on my own • Web resources I regularly check (sites, blogs, Twitter, etc.) • Web searching • Other: 6. Which of the following skill areas are part of your responsibilities? Please mark all that apply. • Acquisitions • Archives/special collections • Cataloging • Circulation • Collection development • Distance library services • Electronic resource management • Instruction • Interlibrary loan INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 59 • Library administration • Library IT/systems • Marketing/public relations • Media/audiovisuals • Outreach • Periodicals/serials • Reference • User experience • Other: 7. How long have you worked in libraries? • 0–2 years • 3–5 years • 6–10 years • 11–15 years • 16–20 years • 21 or more years 8. Which of the following technologies or technology skills are you expected to use in your job on a regular basis? Please mark all that apply • Assistive/adaptive technology • Audio recording and editing • Augmented reality (Google Glass, etc.) • Blogging • Cameras (still, video, etc.) • Chromebooks • Cloud-based productivity apps (Google Apps, Office 365, etc.) • Cloud-based storage (Google Drive, Dropbox, iCloud, OneDrive, etc.) • Computer programming or coding • Computer security and privacy knowledge • Database creation/editing software (MS Access, etc.) • Dedicated e-readers (Kindle, Nook, etc.) • Digital projectors TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 60 • Discovery layer/service/system • Downloadable e-books • Educational copyright knowledge • E-mail • Facebook • Fax machine • Image editing software (Photoshop, etc.) • Laptops • Learning management system (LMS) or virtual learning environment (VLE) • Library catalog (public side) • Library database searching • Library management system (staff side) • Library website creation or management • Linux • Mac operating system • Makerspace technologies (laser cutters, CNC machines, Arduinos, etc.) • Mobile apps • Network management • Online instructional materials/products (LibGuides, tutorials, screencasts, etc.) • Presentation software (MS PowerPoint, Prezi, Google Slides, etc.) • Printers (public or staff) • RFID (radio frequency identification) • Scanners and similar devices • Server management • Smart boards/interactive whiteboards • Smartphones (iPhone, Android, etc.) • Software installation • Spreadsheets (MS Excel, Google Sheets, etc.) • Statistical analysis software (SAS, SPSS, etc.) • Tablets (iPad, Surface, Kindle Fire, etc.) • Teaching others to use technology INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 61 • Teaching using technology (instruction sessions, workshops, etc.) • Technology equipment installation • Technology purchase decision-making • Technology troubleshooting • Texting, chatting, or instant messaging • 3D printers • Twitter • Using a web browser • Video recording and editing • Virtual reality (Oculus Rift, etc.) • Virtual reference (text, chat, IM, etc.) • Word processing (MS Word, Google Docs, etc.) • Web-based e-book collections • Web conferencing/video conferencing (Webex, Google Hangouts, Goto Meeting, etc.) • Webpage creation • Web searching • Windows operating system • Other: 9. Which of the following are barriers to new technology adoption in your library? Please mark all that apply. • Administrative restrictions • Budget • Lack of fit with library mission • Lack of patron interest • Lack of staff time • Lack of staff with appropriate skill sets • Satisfaction with amount of available technology • Other: 10. What technology skill would you like to learn to help you do your job better? 11. What technologies do you help patrons with the most? 12. What technology item do you circulate the most? TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 62 13. What technology or technology skill would you most like to see added to your library?
9585 ---- June_ITA_Buljung_final Up Against the Clock: Migrating to LibGuides v2 on a Tight Timeline Brianna Buljung and Catherine Johnson INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 68 ABSTRACT During Fall semester 2015, Librarians at the United States Naval Academy were faced with the challenge of migrating to LibGuides version 2 and integrating LibAnswers with LibChat into their service offerings. Initially, the entire migration process was anticipated to take almost a full academic year; giving guide owners considerable time to update and prepare their guides. However, with the acquisition of the LibAnswers module, library staff shortened the migration timeline considerably to ensure both products went live on the version 2 platform at the same time. The expedited implementation timeline forced the ad hoc implementation teams to prioritize completion of the tasks that were necessary for the system to remain functional after the upgrade. This paper provides an overview of the process the staff at the Nimitz Library followed for a successful implementation on a short timeline and highlights transferable lessons learned during the process. Consistent communication of expectations with stakeholders and prioritization of tasks were essential to the successful completion of the project. INTRODUCTION Academic libraries all over the United States have migrated from LibGuides version 1 to the new, sleeker, responsive design of version 2. Approaches to the migration can differ vastly depending on library size, staff capabilities and time frame available for completing the project. In 2015, the Nimitz Library at the United States Naval Academy, began planning to both upgrade LibGuides to version 2 and to acquire LibAnswers with LibChat. The Web Team and Reference Department partnered to migrate the LibGuides platform and integrate LibAnswers into the Library’s web presence. The Library first adopted Springshare’s LibGuides in 2009. By 2015, the subscription had grown to 61 published guides with 10,601 views. The LibGuides collection was modified and expanded during two web site upgrades and several staffing changes. Throughout 2014 and 2015, Library staff periodically discussed the possibility of upgrading to the version 2 interface, but timing, Brianna Buljung (bbuljung@mines.edu) is Instruction & Research Librarian, Colorado School of Mines, Golden, CO. Catherine Johnson (cjohnson@usna.edu) is Head of Reference & Instruction at the United States Naval Academy, Annapolis, MD. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 69 staffing vacancies and the priority of other projects kept the migration from taking place. In late summer 2015, with the acquisition of Springshare’s LibAnswers with LibChat pending, staff determined that it was finally time to migrate to the new LibGuides interface. Initially, the migration team planned to spend nearly a full academic year completing the migration process. This timeline would provide guide owners with ample time for staff training, revising guides, conducting usability testing and preparing the migrated guides to go live without distracting from their other duties. However, right before starting the project, the Library finalized the acquisition of Springshare’s LibAnswers with LibChat which they decided to launch with the version 2 interface. The team pushed up the LibGuides migration by several months to keep from confusing patrons with multiple interfaces and launch dates. The migration of LibGuides and the implementation of LibAnswers would take place during the Fall semester and both products would go live in the version 2 interface before the start of the Spring semester. This paper provides an overview of the process that the staff at Nimitz Library followed for a successful implementation on a short timeline and highlights transferable lessons learned during the process. The authors also include a post-implementation reflection on the process. LITERATURE REVIEW Much of the currently available literature on migration of platforms, especially the LibGuides platform is published informally. Librarians from universities across the country have created help guides, checklists and best practices for surviving the migration. Most migration help-guides are tailored to each specific institution but they can still provide helpful suggestions that can be adapted by another.1 Springshare also provides extensive help content and checklists, including a list of the most important steps for administrators to complete.2 However, little of the available literature discusses the minimally acceptable amount of work needed to be completed by guide authors. This type of information was crucial to the Nimitz Library team after drastically shortening the migration timeline. A clearly delineated list of required and optional tasks was needed for guide owners, given time constraints and other job duties. In addition to the informally published help materials, several articles have been published on various aspects of research guide design and evaluation. A few articles examine the migration process. Hernandez and McKeen offer advice for libraries contemplating migration; including setting goals and performing usability testing against the new guides.3 Duncan et al provide a case study of the implementation process at the University of Saskatchewan.4 Some articles discuss the basics of guide design and usage in the library. These best practices can be adapted to different platforms, web sites and user populations. They discuss the importance of various web design elements such as word choice and page layout.5 Another aspect the literature exposes is student use of the guides.6 Finally, usability of research guides is one of the most important and widely discussed topics in the literature. Creating and maintaining guide content depends on the user’s ability to locate and use the guides in their research.7 Most often, research guides are designed UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 70 with the student in mind; to assist them in beginning a project, researching when a librarian is unavailable or as a reference for follow-up after an instruction session.8 As Pittsley and Memmott discuss, navigation elements can impact a student’s use of research guides.9 The Process As preparations for the migration began, it became immediately apparent that the Web Team and Reference Department would have to divide the project into manageable segments to complete the work without overwhelming guide owners. Three ad hoc teams, made up of librarians from several different departments, were created to take the lead on different elements of the project. The migration team was responsible for researching, organizing and supervising the migration of LibGuides to version 2. The LibAnswers team learned about LibAnswers and how to effectively integrate the product into the Library’s web site. The LibChat team tested the functionality of LibChat and determined how it would fit into the Library’s reference desk staffing model. Dividing the project into manageable segments allowed each team to focus on the execution of their area of responsibility. The team approach allowed the Library to draw on individual strengths and staff willingness to participate without depending on one single staff member to manage the entire migration and implementation process on such a short timeline. Migration Team The migration team was responsible for determining the tasks that were mandatory for guide owners to complete, the amount of training they would need to use the new interface and how each product should be incorporated into the Library’s web site. The LibGuides migration team relied heavily on advice from other libraries and the documentation from Springshare to guide them in determining mandatory tasks. The Engineering and Computer Science librarian reached out to the ASEE Engineering Libraries Division listserv for advice from peer libraries that had already completed migration. The team also made use of the Springshare help guides and best practices guides posted by other universities. Ultimately, the migration team created checklists and spreadsheets to help guide owners prepare their guides for migration. A pre-migration checklist (Appendix A) was shared with guide owners; containing all of the required and optional tasks that needed to be completed before the migration took place in early November. Tasks such as deleting outdated or unused images and evaluating low use guides for possible deletion were required for guide owners to complete. Other tasks such as checking each guide for a friendly url or checking database descriptions for brevity and jargon free language were encouraged but considered optional. The team determined that items directly related to the ability of post-migration guides to function properly made the required list, while more cosmetic or stylistic tasks could be completed on a time-allowed basis. A post-migration checklist (Appendix B) was created for guide owners following the migration. This list included portions of the guides that had to be checked to ensure widgets, links and other assets had UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 71 migrated properly. Both checklists were accompanied by tips, screenshots, deadlines and indicated which team member to contact with questions. Clear explanation of the expectations for the project, and accommodating the guide owners’ busy schedules made the migration more successful. The migration team gave the new, more robust A-Z list significant attention. LibGuides version 2 allows the A-Z list to be sorted by subject, type and vendor. It also allows a library to tag “Best Bets” databases in each subject area. The databases categorized as Best Bets display more prominently in the list of databases by subject. Using Google Sheets, the Electronic Resources Librarian quickly and easily solicited feedback from liaison librarians about which databases to tag as Best Bets for each subject area. Google Sheets also made it easy for librarians to edit the list of databases related to their subject expertise. Some databases had been incorrectly categorized and, in some subjects, newer subscriptions didn’t appear on the list. LibGuides version 2 allows users to sort databases by type, but doesn’t provide a predetermined list of types. In order to create the list of material types into which all databases would be sorted, the migration team examined lists found on other library web sites. Several lists were combined and duplicates, or irrelevant types were removed. An additional military specific type was added to address the most common research conducted by midshipmen. Then, the liaison librarians were solicited for input on the language used to describe each type and which databases should be tagged by each type. Name choices are a matter of local preference, such as having a single type category for both dictionaries and encyclopedias, or two separate categories. To keep the list of material types to a manageable length, the team decided that each type must contain more than one or two databases. It takes time to get well defined lists of subjects and types. Staff working with patrons are able to gather informal feedback about the categorizations in their current form, and make suggestions, corrections, or additions based on patron feedback. The migration of LibGuides and acquisition of LibAnswers provided the Reference Department and Web Team with an opportunity to update policies and establish new best practices for guide owners. One important cosmetic update included more encouragement for guide owners to use a photo in their profiles. Profile pictures had been used inconsistently in the first LibGuides interface, and several guide owners used the default grey avatar. Guide owners who were reluctant to have a headshot on their profile were encouraged to take advantage of stock photos made available through the Naval Academy’s Public Affairs Office. A photo shoot was also organized for guide owners. On a voluntary basis, guide owners spent about an hour helping each other to take pictures in and around the Library. The event helped to get a collection of more professional photos for guide owners to choose from. Another important update was the re-evaluation of LibGuides policies in light of the new functionality available in version 2. The guide owners gathered for a meeting midway through the pre-migration guide cleanup process to troubleshoot problems and consider best practices for the UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 72 new interface. Guide owners discussed the standardization of tab names in the guides, the information important to include in author profile boxes, and potential categories for the “types” dropdown in the A-Z database list. The meeting provided a great opportunity to discuss the options available to guide owners and to solicit feedback on interface mock-ups and guide templates created by the Systems Librarian. Many items from the discussion were incorporated into the update LibGuides policies for guide owners. LibAnswers and LibChat Teams Integrating LibAnswers with LibChat, an additional Springshare product, at the same time as the migration to LibGuides version 2 is not necessary. Because the acquisition of LibAnswers coincided with the need to upgrade to version 2, the Library staff determined that the two should be done at the same time in order to minimize disruption for patrons. The ad hoc teams tasked with implementing LibAnswers and LibChat met regularly to learn about the new products and to consider how these products would fit into the library’s existing points of service. While the LibAnswers and LibChat teams began as two distinct groups, it became increasingly clear that the functionality of these two systems is interwoven so closely that they must be reviewed and discussed together. The teams spent considerable time learning the functionality of the new systems, considering how the new service points would integrate into the existing offerings, and creating draft policies to provide guidance to staff. The teams developed a set of tips and guidelines to address staff concerns and provide guidance on how the new system should be used (see Appendix C). The teams also held training sessions focused on providing opportunities for staff to explore and practice using the new products. Although the implementation of LibAnswers with LibChat was not necessary to upgrade to LibGuides version 2, undertaking all of these upgrades at once allowed the ad hoc groups to collaborate with ease, define policies and procedures that would help these products integrate seamlessly with existing services, and prevent change fatigue within the Library. Updating the Library Website The final element of migration and implementation the teams had to consider was integration into the Library’s existing web site. Many elements of the Library’s site are dictated by the broader university web policy and content management system. However, working within guidelines the teams were able to take advantage of the new LibGuides interface, especially the more robust A-Z list of databases, to provide users with multiple ways of accessing the new tools. The Library makes use of a tabbed box to provide entry to Summon, the catalog, the list of databases and LibGuides. The new functionality of LibGuides version 2 enabled the team to provide easier access directly to the alphabetical listing of databases. The LibGuides tab was also updated to provide a drop down list of all the guides and a link to browse by guide owner, subject or type of guide. These enhancements saved time for the user and cut down on the number of clicks needed to access database content licensed by the library. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 73 Integrating the LibAnswers product into the site was achieved by providing several different ways for patrons to access it. An FAQ tab was added to the main tabbed box to provide quick access to LibAnswers, complete with a link to submit questions. The “Contact Us” section on the site home page was updated to include a link to LibAnswers as well as newer, more modern icons for the different contact methods. All guide owners were instructed to update the contact information on their guides to include a LibAnswers widget. A great source of inspiration on integrating the tools into the Library site came from looking at other library web sites. The teams worked from the list of LibGuides community members provided on the Springshare help site and by viewing the sites of known peer libraries. Working through an unfamiliar web site can be a quick way to find design ideas and work flows that are successful and attractive. Team members found wording, icons and placement ideas that could be adapted for use on the Nimitz Library site. Advice for Managing a Short Migration Timeline While on a short implementation timeline or with a small staff that has to accomplish this project in addition to their regular duties, it's important to consider a few strategies that can make the process simpler and less stressful. First, communicate expectations with everyone involved in the project at all steps of the process. Determine which stakeholders need to know about the various checklists and upcoming deadlines. Communicating needs and expectations throughout the entirety of the project reduces confusion and enables teams and individual guide owners to complete the project on time. Although LibGuides had predominantly been the domain of the Nimitz Reference Department, projects of this scale also impacted other parts of the library, from systems to the Electronic Resources librarian. Email communication and short notices in the Library’s weekly staff update were the primary means of communication with stakeholders. Documents were shared via Google Drive to provide guide owners with a centralized file of help materials. Also, the point of contact for questions with each element of the migration was clearly identified on each checklist and tip sheet. This single addition to the checklists helped guide owners to quickly and easily get questions and technical issues addressed. On a short timeline it is also important to consider the elements that are crucial for completion and those that can be delayed. Some critical needs in a LibGuides migration include deleting guides that are no longer being used, checking for boxes that will not migrate and deleting bad links. These tasks must be completed by guide owners or administrators to ensure that the migrated data formats properly. Careful attention to these tasks also save the staff unnecessary work on updating and fixing the new guides before going live. Other elements of guide design and migration are merely nice to have. They complement the user’s experience with the final product but neglecting them will not affect basic functionality. These secondary tasks can be completed as time allows. For guide owners, optional tasks include shortening link descriptions, checking for a UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 74 guide description and friendly url and other general updates to the guides. The migration was broken into manageable tasks by giving guide owners a clear list of required and optional items. Team leaders will also need to manage expectations. It can be difficult to remember that web pages, especially LibGuides, are living documents. They can be updated fairly easily after the system has gone live. On a short timeline, in the midst of other duties and responsibilities, it is acceptable for a guide to be just good enough. There is rarely enough time for each guide to reach a state of perfection prior to going live. A guide that is spell-checked and contains accurate information can be edited and made more aesthetically pleasing as time allows after the entire site has gone live. While additional edits are taking place, students still have access to the information they need for their academic work. Lists, such as the subjects and material types in the A-Z list, are always a work in progress based on feedback from service points and usability testing. Updates and edits should be made as patrons interact with the products. Regular use can help library staff identify problems with or confusion about the products that might not be anticipated prior to going live. Stress on guide owners can be greatly reduced by communicating expectations throughout the process. Post-Implementation Nimitz Library successfully went live with both LibGuides version 2 and LibAnswers with LibChat in early January 2016, right before midshipmen returned to campus for the Spring semester. LibAnswers with LibChat was introduced to the campus community with a soft launch at the beginning of the Spring semester due to staffing levels and shifts at the reference desk. The librarian on duty at the reference desk was also responsible for answering any chats or LibAnswers questions initiated during their shift. The volume of questions remained fairly low during the semester. On average, the Library received two synchronous and 1.5 asynchronous incoming questions per week via LibAnswers with LibChat. The low volume was beneficial in that it allowed librarians to become familiar with answering questions and editing FAQs. They were able to handle both face-to-face interactions with patrons in the library and the web traffic. However, the volume was so low that it became apparent more marketing of the service was needed. At the start of the fall 2016 semester, the library made an effort to increase awareness of the new LibAnswers products by emailing all students, mentioning the service in every instruction session, and creating fliers advertising the service and distributing them around the library. Though data is preliminary, statistics have shown that use of these services has more than tripled in the first month of the new semester. As discussed above, the expedited implementation timeline forced the ad hoc teams to prioritize completion of the tasks that were necessary for the system to remain functional after the upgrade. This meant other necessary, but not urgent, updates to guides were left untouched during the migration. Given the amount of effort needed to prepare the guides for migration, it is understandable that guide owners had grown tired of making LibGuides updates and found it UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 75 necessary to move on to other projects. With this fatigue in mind, the team leaders will continue to remind guide authors that LibGuides are living pages in need of constant attention. The team leaders will also take advantage of user feedback to promote continued updates to LibGuides. Throughout the migration process team leaders solicited feedback from staff and users in a variety of ways. First, reference staff wereinformed of design and implementation changes made throughout the migration. They were given time to view and evaluate the master guide template prior to the migration. The team solicited feedback on the names and organization of categories in the A-Z list. After the products went live, the team gathered informal feedback through reference desk interviews, in information literacy instruction sessions and in conversations with faculty and students. Student volunteers participated in usability testing during the Spring semester. They were asked to complete a series of tasks related to the different aspects of the new interface. Their feedback, especially from thinking aloud while completing the tasks, revealed to librarians how students actually use the guides. Both formal and informal feedback helped librarians adapt and improve the guides. Based on the feedback, the Systems Librarian made global changes to improve system functionality. In one instance, users were having difficulty submitting a new LibAnswers question when they could not find an appropriate FAQ response. The Systems librarian made the “Submit Your Question” link more prominent for users in that situation. The LibGuides continue to be evaluated by staff for currency and ease of use. In discussing the first round of usability test results it was determined that more testing during the Fall semester of 2016 would be helpful. During the upgrade to version 2 and implementation of LibAnswers with LibChat, librarians focused on the functions in the system that were most essential or most desired. All of these products contain additional functionality that was not implemented during the upgrade. After a brief rest, the reference department and library web team explored the products’ additional functionality and determined what avenues to explore next. CONCLUSIONS Migration of any platform can be an extensive and time consuming task for library staff. Preparations and post-migration clean up can interrupt staff workflows and strain limited resources. Using migration teams was a successful strategy on a short timeline because it helped spread the workload by delegating specific learning and tasks to specific people. Those people, in turn, became experts in their area of focus and served as a resource for others in the library. This model cultivated a sense of ownership in the migration across many stakeholders that might not have otherwise existed. That sense of ownership in the project, coupled with checklists and spreadsheets full of discrete tasks in need of completion made it possible for a small staff to complete the migration quickly and successfully. Migrating on a short timeline can be especially stressful but careful planning and good communication of expectations helps stakeholders focus on the end goal. Upon completion of the project there was a very real sense of fatigue with this UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 76 project. As a result, tasks that were listed as optional because they weren’t critical for migration went unattended for quite some time after the migration. Slowly, months later, guide owners are ready to revisit guides and continue making improvements. If given more time, this migration may have been completed more methodically and with the intent of having everything perfect before moving on to the next step. Instead, working on a tight timeline forced us to continue moving forward, making necessary changes, and making note of changes to be made in the future. Ultimately, it was a constant reminder that our online presence is and should be a constant work in progress, not the subject of a big, occasional update. REFERENCES 1. Luke F. Gadreau, “Migration checklist for guide owners,” last modified April 3, 2015, https://wiki.harvard.edu/confluence/display/lg2/Migration+Checklist+for+Guide+Owners; Leeanne Morrow et al., “Best Practice Guide for LibGuides,” accessed November 17, 2016, http://libguides.ucalgary.ca/c.php?g=255392&p=1703394; Rebecca Payne, “Updating LibGuides & Preparing for LibGuides v2,” last modified November 18, 2014, https://wiki.doit.wisc.edu/confluence/pages/viewpage.action?pageId=85630373; Julia Furay, “Libguides Presentation: Migrating from v1 to v2 (Julia),” last modified September 29, 2015, http://guides.cuny.edu/presentation/migration. 2. Anna Burke, “LibGuides 2: Content Migration is Here!” last modified April 30, 2014, http://blog.springshare.com/2014/04/30/libguides-2-content-migration-is-here/; Springshare, “On your Checklist: Five Tips & Tricks for Migrating to LibGuides v2,” last modified February 18, 2016; http://buzz.springshare.com/springynews/news-27/springytips; Springshare, “Migrating to LibGuides v2(and going live!),” last modified November 7, 2016, http://help.springshare.com/libguides/update/whyupdate. 3. Lauren McKeen and John Hernandez, “Moving mountains: surviving the migration to LibGuides 2.0,” Online Searcher 39 (2015): 16-21, http://www.infotoday.com/OnlineSearcher/Articles/Features/Moving-Mountains-Surviving- the-Migration-to-LibGuides--102367.shtml. 4. Vicky Duncan et al., “Implementing LibGuides 2: an academic case study,” Journal of Electronic Resources Librarianship, 27 (2015): 248-258, https://dx.doi.org/10.1080/1941126X.2015.1092351 5. Jimmy Ghaphery and Erin White, “Library use of web-based research guides,” Information Technology and Libraries 31 (2012): 21-31, http://dx.doi.org/10.6017/ital.v31i1.1830; Danielle A Becker; “LibGuides remakes: how to get the look you want without rebuilding your website,” Computers in Libraries 34 (2014): 19-22, http://www.infotoday.com/cilmag/jun14/index.shtml; Michal Strutin, “Making research guides UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 77 more useful and more well used,” Issues in Science and Technology Librarianship 55(2008), https://dx.doi.org/10.5062/F4M61H5K. 6. Ning Han and Susan L. Hall, “Think Globally! Enhancing the International Student Experience with LibGuides,” Journal of Electronic Resources Librarianship 24(2012): 288-297, https://dx.doi.org/10.1080/1941126X.2012.732512; Gabriela Castro Gessner et al., “Are you reaching your audience?: The Intersection Between LibGuide Authors and LibGuide Users,” Reference Services Review 43(2015): 491-508, http://dx.doi.org/10.1108/RSR-02-2015-0010. 7. Luigina Vileno, “Testing the usability of two online research guides,” Partnership: The Canadian Journal of Library and Information Practice and Research 5 (2012), https://dx.doi.org/10.21083/partnership.v5i2.1235; Rachel Hungerford et., “LibGuides usability testing: customizing a product to work for your users,” http://hdl.handle.net/1773/17101; Alec Sonsteby and Jennifer DeJonghe, “Usability testing, user-centered design, and LibGuides subject guides: a case study,” Journal of Web Librarianship 7(2013): 83-94, https://dx.doi.org/10.1080/19322909.2013.747366. 8. Mardi Mahaffy, “Student use of library research guides following library instruction,” Communications in Information Literacy 6(2012): 202-213, http://www.comminfolit.org/index.php?journal=cil&page=article&op=view&path%5B%5D=v 6i2p202. 9. Kate A Pittsley and Sara Memmot, “Improving Independent Student Navigation of Complex Educational Web Sites: An Analysis of Two Navigation Design Changes in LibGuides,” Information Technology and Libraries 31 (2012): 52-64, https://dx.doi.org/10.6017/ital.v31i3.1880. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 78 Appendix A: LibGuides Pre-Migration Checklist If there are issues, contact the Head of Reference & Instruction. Required before migration: Due Date Task Check when Complete 26 October 2015 Review attached report of guides that have not been updated in the last year. Delete or consolidate unneeded, practice, or backup guides.* 26 October 2015 Review attached report of guides with fewer than 500 hits. Delete or consolidate unneeded, practice, or backup guides.* 26 October 2015 Review all links to all databases included on your guides and make sure the links are mapped to the A-Z list. 26 October 2015 Review all guides for links not included in the current A-Z List. List any links that you think should be included in the A-Z List moving forward on the shared spreadsheet (A-Z Additions and Best Bets). Be sure to include all necessary information, including subject and type. Mid- October 2015 & 28 October 2015 Review forthcoming reports about broken links. Anticipate one report on October 13, and one October 26. 26 October 2015 Review the Databases by subject page of the A-Z list and make sure everything that should be included in your subject is there. Add anything you’d like removed from your subject to UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 79 the shared spreadsheet (tab 2). Identify 3 “best bets” databases for each of your subject areas on the shared spreadsheet (tab 3). 26 October 2015 Ensure all images have an alt tag 26 October 2015 Delete outdated or unused images in your image collection 26 October 2015 Convert all tables to percentages, not pixels 26 October 2015 Review attached report of boxes that will not migrate into version 2. (This won’t apply to everyone) 26 October 2015 Email the chair of the Web Team if you have guides with boxes containing custom formatting or code (this is only necessary if you manually adjusted the HTML or CSS, or use a tabs within a box on your guide). We are keeping a master list to double check after migration. 26 October 2015 Check all links to the catalog in your guides to make sure they are accurate 26 October 2015 Check all widgets (like catalog search boxes) to ensure they function properly, delete any widgets you don’t need, and keep a list of widgets to check post-migration to make sure they still function. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 80 Optional before migration: Due Date Task Check when Complete Consider turning links in ‘rich text’ boxes to a ‘links and list’ box. Review all guides to ensure they have a friendly URL, are assigned to a subject, have assigned tags, and a brief guide description. Shorten database descriptions to one to two sentences. Consider including dates of coverage and why it’s useful for this particular subject. Helpful hints: *If you’d like to hold on to content from guides you plan to delete, create an unpublished “master guide” where you can store content you plan to use in the future. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 81 Appendix B: LibGuides Post-Migration Checklist and Guide Clean up NOTE: Now that migration is complete, if you make an update to your version 1 guides, your change will not transfer to version 2. This means broken links will need to be fixed in both versions. If there are issues or questions contact the Head of Reference & Instruction (general questions), the Systems Librarian (technical issues), or the Electronic Resources Librarian (database assets and A-Z list). CLEAN UP AND CHECK CONTENT 1) Check boxes to make sure content is correctly displayed on all your guides. Check all boxes closely, as some had the header end up below the first bullet point. For example: To fix an issue like this - Click on at the bottom of the box you are working on. Then click on “Reorder Content”. You can move the links down and the text up 2) Ensure all guides have a friendly URL, are assigned to a subject, have assigned tags if you didn’t do this pre-migration. See the pre- migration handout for help. In version 2 this information will display at the TOP of our guides in edit mode and at the BOTTOM of our guides on the public interface. 3) Ensure images are resized to fit general web guidelines - See this guide for help http://guidefaq.com/a.php?qid=12922 4) Check all your widgets to ensure they still function properly 5) Add a guide type to each of your guides. This is a new feature in LibGuides version 2. It is under the gear on the right side of your guide while in edit mode. This will help us sort and organize them in the list of guides. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 82 ADD NEW LIBGUIDES 2 CONTENT 1) Make a box pointing to related guides. Research has shown that a box on the guide home page pointing to related guides can be very helpful to students. Link to other subject guides that would be of interest and any course guides for that subject. For example: the box on the Mechanical Engineering guide contains links to EM215 and Nuclear Engineering (which is part of the Mechanical Engineering department). To do this - go to the bottom of your welcome box, click the Add/Reorder button, and then on Guide List, your first option is to manually choose guides to add to the list. 2) Add a tab to every guide that is named Citing Your Sources and redirects to the Citing Your Sources LibGuide. To do this: a. Create a blank page named Citing Your Sources at the bottom of your left side navigation b. On your blank page click on the to open the options for editing the page. c. Click on Redirect URL and paste the link to the Citing Sources guide in the box. d. It is also a good idea to mark the open in a new window box as well e. If you’ve completed it successfully your Citing Your Sources tab will look like this in edit mode. Since the Citing Your Sources guide is still a work in progress it is unpublished and you will get an error when you preview it. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 83 f. Finally, REMOVE the plagiarism and citing sources box from your guides. 3) Now is a good time to take advantage of new functionality and to update the content of your guides. You can now combine multiple types of information into the same box, you can also take advantage of tabbed boxes. See this LibGuide for further assistance: http://support.springshare.com/libguides/migration/v2cleanup-regular 4) Create your new Profile Box At the meeting on Oct 20th, the Reference & Instruction department agreed that the following elements should be consistent in the profile box: Box Name: Librarian Image: A stock photo or a personal photo (picture day coming soon) In the Contact box: Title Nimitz Library XXXX Dept. Office # XXX 410-293-XXXX EMAIL ADDRESS And your subjects will be displayed below UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 84 Appendix C: Tips & Guidelines for LibAnswers with LibChat WHAT MODES OF INQUIRY WILL BE AVAILABLE TO USERS? Using the LibAnswers platform, users will be able to submit questions via chat or by using the question form within LibAnswers. Users will also be able to ask questions as they did before: at the reference desk, via askref@usna.edu, and by calling 410-293-2420. WHAT ARE “BEST PRACTICES” OR GUIDELINES FOR LIBANSWERS W/ LIBCHAT? See the tips for responding to tickets at the bottom of this document. See the tips for creating/maintaining FAQ at the bottom of this document. See the tips for responding to chat questions at the bottom of this document. WHAT PRIORITY SHOULD I GIVE RESPONSES COMING THROUGH VARIOUS MODES OF INQUIRY? Reference staff will have to use their professional judgement when deciding what priority to give questions coming in through various modes of inquiry. While the addition of chat and tickets may seem overwhelming at first, the same rules you’ve applied in the past will work. If a chat comes in while you’re helping someone face-to-face, use that as an opportunity to advertise the chat service. Explain to the patron that you also help users via chat and you’re going to let the chatter know that you’ll be with them shortly. The same can apply if you’re finishing up a chat when a face-to-face user walks up. Simply explain that the library also offers a chat service and you’re just finishing up a question. Remember to get comfortable with and take advantage of the canned messages in chat, let the phone go to voicemail if necessary, and explain to face-to-face users what’s happening. During the pilot phase you should also keep track of strategies that worked well for you, or times when the various modes of inquiry became too overwhelming. We’ll take all of that into consideration when we reexamine this service. Chat, phone, and face-to-face interactions are synchronous modes of communication, so users expect responses immediately. Tickets are asynchronous modes of communication and should be dealt with on a first come, first served basis. Respond to tickets when you have time. When responding to tickets, respond to the oldest tickets first as that user has been waiting the longest for an answer. However, feel free to use your judgement and, if you choose, respond to questions with quick answers right away. HOW SHOULD I PRIORITIZE QUESTIONS FROM USNA V. NON-USNA USERS? Priority should be given to midshipmen, faculty, and staff. If an outside user makes use of the chat or ticket service, feel free to explain to them that this service is primarily for faculty/staff/students and they should direct their question to askref@usna.edu. If you are free and have time, feel free to assist outside patrons via the chat or ticket system. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 85 HOW SHOULD I HANDLE REMAINING QUESTIONS DURING A CHANGE IN SHIFTS? Handle them in the same manner that you would a face-to-face question with a student, faculty, or staff member. Finish up quickly if you can, advise the patron that you need to leave and offer to handle the question when you return, or transfer the chat to another librarian. If there are remaining tickets in the queue, simply notify the next librarian on duty. WHAT ARE THE EXPECTED TURNAROUND TIMES FOR RESPONDING TO PATRON INQUIRIES? Chat, face-to-face, and phone inquiries should be responded to as immediately as possible. Tickets should be responded to within a business day. WHO CAN I CONTACT FOR HELP AND TROUBLESHOOTING? If you have questions, your first stop should be the LibAnswers FAQ, provided by Springshare (available in the “Help” section when logged into LibApps). If you can’t find the answer to your question there, feel free to contact the Head of Reference and Instruction, who will work to resolve the problem with you. GUIDELINES FOR RESPONDING TO LIBANSWERS TICKETS*: ● Keep in mind that when you are responding to tickets, you are a jack of all trades. That means even if the question is outside of your subject area, you should do your best to provide the user information that will get them started. In that email you may also suggest that the user contact the subject specialist. ● Respond to LibAnswers tickets in the same way you would respond to an email inquiry from a user. ● If you provide a factual response, be sure to include the source from which that information came. GUIDELINES FOR CREATING/MAINTAINING FAQS*: ● The FAQ database is a public-facing, searchable collection of questions and answers. The intent is to empower our users to find their answers. Any question that might be considered a frequently asked question should be included in the FAQ. This might include questions about the library, the collections, how to find specific types of information, how to start research on specific and recurring assignments etc. ● When creating an FAQ from a ticket, remember that you can edit the question. Do your best to format the question in a way that would be applicable and relevant to the most users. ● When creating an FAQ from a response you’ve already written, be sure to edit out any personally identifiable information (PII) about the person who initially asked the question. Be sure to check the question and response for any PII. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 86 ● If you want to modify an FAQ: If a member of the staff notices incomplete or incorrect information in an FAQ response, he/she should use professional judgement in deciding how to handle the situation. If it’s an error that may have been caused by a typo, he/she may choose to edit the response immediately. However, if the edit impacts the substantive content of the response, he/she may choose to consult with the librarian who initially wrote the response. GUIDELINES FOR LIBCHAT*: ● If you refer a question, alert the librarian to whom the user is being referred. ● Remember the person you’re chatting with can’t see you so if you leave (to conduct a search, to check a book, to help someone else etc.) let them know you’ll be right back. ● Sometimes chat questions can seem rushed, so it may be tempting to answer the initial question. Remember, like face-to-face interactions, clarifying queries save time for the user and the librarian, allowing for the provision of more accurate and efficient answers. ● When providing responses, remember that as an academic library, our mission is to provide the information needed and to instruct our users so they may become self-reliant; Chat challenges us to balance providing answers and instruction. Do your best to find an appropriate balance. ● As the transaction is ending, remain courteous, check that all the user’s questions have been addressed, and encourage them to use the service again. * Note: These Guidelines are drafts and will evolve as the staff learns more about this system throughout the pilot phase.
9595 ---- Identifying Emerging Relationships in Healthcare Domain Journals via Citation Network Analysis Kuo-Chung Chu, Hsin-Ke Lu, and Wen-I Liu INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 39 Kuo-Chung Chu (kcchu@ntunhs.edu.tw) is Professor, Department of Information Management, and Dean, College of Health Technology, National Taipei University of Nursing and Health Sciences; Hsin-Ke Lu (sklu@sce.pccu.edu.tw) is Associate Professor, Department of Information Management, and Dean, School of Continuing Education, Chinese Culture University; Wen-I Liu (wenyi@ntunhs.edu.tw, Corresponding author) is Professor, Department of Nursing, and Dean, College of Nursing, National Taipei University of Nursing and Health Sciences. ABSTRACT Online e-journal databases enable scholars to search the literature in a research domain or to cross- search an interdisciplinary field. The key literature can thereby be efficiently mapped. This study builds a web-based citation analysis system consisting of four modules: (1) literature search; (2) statistics; (3) articles analysis; and (4) co-citation analysis. The system focuses on the PubMed Central dataset and facilitates specific keyword searches in each research domain for authors, journals, and core issues. In addition, we use data mining techniques for co-citation analysis. The results could help researchers develop an in-depth understanding of the research domain. An automated system for co-citation analysis promises to facilitate understanding of the changing trends that affect the journal structure of research domains. The proposed system has the potential to become a value-added database of the healthcare domain, which will benefit researchers. INTRODUCTION Healthcare is a multidisciplinary research domain of medical services provided both inside and outside a hospital or clinical setting. Article retrieval for systematic reviews in the domain is much more elusive than retrieval for reviews in clinical medicine because of the interdisciplinary nature of the field and the lack of a significant body of evaluative literature. Other connecting research fields consist of the respective research fields of the application domain (i.e., the health sciences, including medicine and nursing).1 In addition, valuable knowledge and methods can be taken from the fields of psychology, the social sciences, economics, ethics, and law. Further, the integration of those disciplines is attracting increasing interest.2 Researchers may use bibliometrics to evaluate the influence of a paper or describe the relationship between citing and cited papers. Citation analysis, one of several possible bibliometric approaches, is more popular than others because of the advent of information technologies.3 Citation analysis counts the frequency of cited papers from a set of citing papers to determine the most influential scholars, publications, or universities in a discipline. It can be classified into two basic types: the first type counts only the citations in a paper that are authored by an individual, while the second mailto:kcchu@ntunhs.edu.tw mailto:sklu@sce.pccu.edu.tw mailto:wenyi@ntunhs.edu.tw IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 40 https://doi.org/10.6017/ital.v37i1.9595 type analyzes co-citations to identify intellectual links among authors in different articles. This paper focuses on the second type of citation analysis. Small defined co-citation analysis as “the frequency with which two items of earlier literature are cited together by the later literature.”4 It is not only the most important type of bibliometric analysis, but also the most sophisticated and popular method. Many other methods originate from citation analysis, including document co-citation analysis, bibliographic coupling,5 author co- citation analysis,6 and co-word analysis.7 There are levels of co-citation analysis: document, author, and journal. Co-citation could be used to establish a cluster or “core” of earlier literature.8 The pattern of links between documents can establish a structure to highlight the relationship of research areas. Citation patterns change when previously less-cited papers are cited more frequently, or old papers are no longer cited. Changing citation patterns imply the possibility of new developments in research areas; furthermore, we can investigate changing patterns to understand the scientific trend within a research domain.9 Co-citation analysis can help obtain a global overview of research domains.10 The aim of this paper is to detect emerging issues in the healthcare research domain via citation network analysis. Our results can provide a basis for knowledge that researchers can use to construct a search strategy. Structural knowledge is intrinsic to problem solving. Because of the interdisciplinary nature of the healthcare domain and the broadness of the term, research is performed in several research fields, such as nursing, nursing informatics, long-term care, medical informatics, geriatrics, information technology, telecommunications, and so forth. Although electronic journals enable searching by author, article, and journal title using keywords or full text, the results are limited to article content and references and therefore do not provide an in-depth understanding of the knowledge structure in a specific domain. The knowledge structure includes the core journals, core issues, the analysis of research trends, and the changes in focus of researchers. For a novice researcher, however, the literature survey remains a troublesome process in terms of precisely identifying the key articles that highlight the overview concept in a specific domain. The process is complicated and time-consuming, and it limits the number of articles collected for retrospective research. The objective of this paper is to provide information about the challenges and methodology of relevant literature retrieval by systematically reviewing the effectiveness of healthcare strategies. To this end, we build a platform for automatically gathering the full text of e- journals offered by the PubMed Central (PMC) database.11 We then analyze the co-citation results to understand the research theme of the domain. METHODS This paper tries to build a value-added literature database system for co-citation analysis of healthcare research. The results of the analysis will be visually presented to provide the structure of the domain knowledge to increase the productivity of researchers. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 41 Dataset For co-citation analysis, a data source of related articles on healthcare is required. For this paper, the articles were retrieved from the PMC database using search terms related to the healthcare domain. To build the article analysis system, we used bibliometrics to locate the relevant references while analysis techniques were implemented by the association rule algorithm of data mining. The PMC database, which is produced by the US National Institutes of Health and is implemented and maintained by the US National Center for Biotechnology Information of the US National Library of Medicine, provides electronic articles from more than one thousand full-text journals for free. We could understand the publication status from the Open Access Subset (OAS) and access to the OAI (Open Archives Initiative) Protocol for Metadata Harvesting, which includes the full text in XML and PDF. Regarding access permission, PMC offers a dataset of many open access journal articles. This paper used a dedicated XML-formatted dataset (https://www.ncbi.nlm.nih.gov/pmc/tools/oai/). The XML-formatted dataset followed the specification of DTD (document type definition) files, which are sorted by journal title. Each article has a PMCID (PMC identification), which is useful for data analysis. In addition to the dataset, the PMC also provides several web services to help widely disseminate articles to researchers. PubMed Central (PMC) citation database Searching module Citation module Web view Users Data sourceMiddle-end Pre-processeingBack-end Front-end XML files Web serverDB server Keyword Co-citation module Statistical module Figure 1. The system architecture of citation analysis with four subsystems. https://www.ncbi.nlm.nih.gov/pmc/tools/oai/ IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 42 https://doi.org/10.6017/ital.v37i1.9595 System Architecture Our development environment consisted of the following four subsystems: front-end, middle-end, back-end, and pre-processing. The front-end creates a “web view,” a visualization of the results for our web-based co-citation analysis system. The system architecture is shown in figure 1. Front-End Development Subsystem We used Adobe Dreamweaver CS5 as a visual development tool for the design of web templates. The PHP programming language was chosen to build the co-citation system that would be used to access and analyze the full-text articles. In terms of the data mining technique, we implemented the Apriori algorithm with the PHP language.12 The results were exported as XML to a charting process, where we used amCharts (https://www.amcharts.com/), to create stock charts, column charts, pie charts, scatter charts, line charts, and so forth. Middle-End Server Subsystem The system architecture was a Microsoft Windows-based environment with a XAMPP 2.5 web server platform (https://www.apachefriends.org/download.html). XAMPP is a cross-platform web development kit that consists of Apache, MySQL, PHP, and Perl. It works across several operating systems, such as Linux, Windows, Apache, macOS, and Oracle Solaris, and provides SSL encryption, a phpMyAdmin database management system, Webalizer traffic management and control suite, a mail server (Mercury Mail Transport System), and FileZilla FTP server. Back-End Database Subsystem To speed up co-citation analysis, the back-end database system used MySQL 5.0.51b with interface phpMyAdmin 2.11.7 for easy management of the database. MySQL includes the following features: • Using C and C++ to code programs, users can develop an application programming interface (API) through Visual Basic, C, C + +, Eiffel, Java, Perl, PHP, Python, Ruby, and Tcl languages with the multithreading capability that can be used in multi-CPU systems and easily linked to other databases. • Performance of querying articles is quick because SQL commands are optimally implemented, providing many additional commands and functions for a user-friendly and flexible operating database. An encryption mechanism is also offered to improve data confidentiality. • MySQL can handle a large-scale dataset. The storage capacity is up to 2TB for Win32 NTS systems and up to 4TB for Linux ext3 systems. • It provides the software MyODBC as an ODBC driver for connecting many programming languages, and it several languages and character sets to achieve localization and internationalization. Pre-processing Subsystem The PMC provides access to the article via OAS, OAI services, e-utilities, and FTP. We used FTP to download a compressed (ZIP) file packaged with a filename following the pattern “articles?-?.xml.tar.gz” on October 28, 2012 (ftp://ftp.ncbi.nlm.nih.gov/pub/pmc), where “?-?” is “0-9” or “A-Z”. The size of the ZIP file was approximately 6.17GB. After extraction, the size of the articles was approximately 10GB. The 571,890 articles from 3,046 journals were grouped and https://www.amcharts.com/ https://www.apachefriends.org/download.html ftp://ftp.ncbi.nlm.nih.gov/pub/pmc INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 43 sorted by journal title in a folder labeled with an abbreviated title. An XML file would, for example, be named “AAPSJ-10-1-2751445.nxml,” where “AAPSJ” was the abbreviated title of the journal American Association of Pharmaceutical Scientists Journal, “10” was the volume of the journal, “1” was number of the issue, and “2751445” was the PMCID. We used related technologies for developing systems that include PHP language, array usage, and the Apriori algorithm to analyze the articles and build the co-citation system.13 Finally, several analysis modules were created to build an integrated co-citation system. RESEARCH PROCEDURE The following is our seven-step research procedure to fulfill the integrated co-citation system: 1. Parse XML file: select tags for construction of database; choose fields for co-citation analysis (for example, , , and ). 2. Present web-based article: design webpage and CSS style; present web-based XML file by indexing variable . 3. Build an abstract database: the database consists of several fields: , , , , , , and . 4. Develop searching module: pass the keyword to the method “POST” in SQL query language and present the search result in the webpage. 5. Develop statistical module: the statistical results include number of article and cited articles, the journals and authors cited in all articles, and the number of cited articles. 6. Develop citation module: visually present the statistical results in several formats; rank searched journals; rank searched and cited journals in all the articles. 7. Develop co-citation module: analyze the association between articles with the Apriori algorithm. Association Rule Algorithms The association rule (AR), usually represented by AB, means that the transaction containing item A also contains item B. There are many such rules in most of the dataset, but some were useless. To validate the rules, two indicators, support and confidence, can be applied. Support, which means usefulness, is the number of times the rules feature in the transactions, whereas confidence means certainty, which is the probability that B occurs whenever the A occurs. We chose the rules for which the values of both support and confidence were greater than a predefined threshold. For example, a rule stipulating “toastjam” has support of 1.2 percent and confidence of 65 percent, implying that 1.2 percent of the transactions contain “toast” and “jam” and that 65 percent of the transactions containing “toast” also contained “jam.” The principle for generating the AR is based on two features of the documents: (1) find the high- frequency items that set their supports greater than the threshold; (2) for each dataset X and its subnet Y, check the rule XY if the support is greater than the threshold, in which the rule XY means that the occurrence in the rule containing X also contains Y. Most studies focus on searching high-frequency item sets.14 The most popular approach for identifying the item sets is Apriori algorithm, as shown in figure 2.15 The algorithm rationale is that if the support of item set I is less IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 44 https://doi.org/10.6017/ital.v37i1.9595 than or equal to the threshold, I is not a high-frequency item set. New item set I that inserts any item A into I would not be a high-frequency item set. According to the rationale, the Apriori algorithm is an iteration-based approach. First, it generates candidate item set C1 by calculating the number of occurrences of each attribute and finding that the high-frequency item set L1 has support greater than the threshold. Second, it generates item set C2 by joining L1 to C1, iteratively finding L2 and generating C3, and so on. 1: L1 = {large 1-item sets}; 2: for (k=2; Lk-1; k++) do begin 3: Ck = Candidate_gen (Lk-1); 4: for all transactions tD do begin /* generate candidate k-dataset*/ 5: Ct = subset (Ck, t); 6: for all candidates c Ct do 7: c_count=c_count+1; 8: end 9: Lk ={cCk | c_count ≥ minsuppport} 10: end 11: return L = Lk; Figure 2. The Apriori algorithm. The Apriori algorithm is one of the most commonly used methods for AR induction. The Candidate_gen algorithm, as shown in figure 3, includes join and prune operations for generating candidate sets.16 Steps 1 to 4 generate all possible candidate item sets c from Lk-1. Steps 5 to 8: delete the item set that is not a frequent item set by the Apriori algorithm. Step 9 returns candidate set Ck to the main algorithm. 1: for each item set X1 Lk-1 2: for each item set X2 Lk-1 3: c = join (X1[1], X1[2], X1[k-2], X1[k-1], X2[k-1]) 4: Where X1[1] = X2[1], X1[k-2] = X2[k-2], X1[k-1] < X2[k-1]; 5: for item sets c Ck do 6: for all (k-1)-subsets s of c do 7: if (s Lk-1) then add c to Ck; 8: else delete c from Ck; 9: return Ck; Figure 3. The Candidate_gen algorithm. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 45 RESULTS We searched the PMC database with keywords “healthcare,” “telecare,” “ecare,” “ehealthcare,” and “telemedicine” and located 681 articles with a combined 14,368 references. Values were missing from the year field for 4 of the references; this was also the case for 635 of a total of 52,902 authors. According to the keyword search for the healthcare domain, a pie chart of the journal citation analysis, as shown in figure 4, the top-ranked journal in terms of citations was the British Medical Journal (BMJ). It was cited approximately 439 times, 18.89 percent of the total, followed by the Journal of the American Medical Association (JAMA), which was cited approximately 344 times, 14.80 percent of the total. The trend of healthcare citation 1852 to 2009 peaked in 2006 at approximately 1,419 citations, with more than half of the total occurring in this year. Figure 4. Top-cited journals in the healthcare domain by percentage of total citations (N = 2324) With the keyword search for the healthcare domain, Figure 5 shows a pie chart of the author citations. The most-cited author was J. W. Varni, professor of pediatric cardiology at the University of Michigan Mott Children’s Hospital in Ann Arbor. This author was cited approximately 149 times, equivalent to 23.24 percent of the total, followed by D. N. Herndon, professor at the Department of Plastic and Hand Surgery, Friedrich-Alexander University of Erlangen in Germany. This author was cited approximately 73 times, 11.39 percent of the total. By identifying the affiliations of the top- ranked authors, researchers can access related information in their field of interest. The co-citation analysis was conducted using the Apriori algorithm. The relationship of co-citation journals with a supporting degree greater than 38 from 1852 to 2009 is shown in figure 6. Each IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 46 https://doi.org/10.6017/ital.v37i1.9595 journal was denoted by a node, where the node with double circle meant the journal is co-cited with the other in a citing article. BMJ, which covers the fields of evidence-based nursing care, obstetrics, healthcare, nursing knowledge and practices, and others, is the core journal of the healthcare domain. Figure 5. Top-cited authors in journals of the healthcare domain by percentage of total citations (N = 641) Figure 6. The relationship of co-citation journals with BMJ. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 47 To identify the focus of the journal, we analyze the co-citation in three periods. In 1852–1907, journals are not in co-citation relationships; in 1908–61, five candidates had a supporting degree greater than 1 (see table 1); and in 1962–2009, twenty-eight candidates had a supporting degree greater than 14 (see table 2 (for example, BMJ and Lancet had sixty-eight co-citations). Table 1. Candidates in co-citation analysis with a supporting degree greater than 1 (1908–61). No Journals No. of Journals Co-cited Support 1 Publ Math Inst Hung Acad Sci, Publ Math 2 3 2 JAOA, J Osteopath 2 1 3 Antioch Rev, J Abnorm Soc Psychol 2 1 4 N Engl J Med, Am Surg 2 1 5 Arch Neurol Psychiatry, J Neurol Psychopathol, Z Ges Neurol Psychiat 3 1 Table 2. Candidates in co-citation analysis with a supporting degree greater than 14 (1962–2009). No Journals No. of Journals Co-cited Support 1 BMJ, Lancet 2 68 2 BMJ, JAMA 2 65 3 JAMA, Med Care 2 64 4 BMJ, Arch Intern Med 2 61 5 Lancet, JAMA 2 52 6 Soc Sci Med, BMJ 2 52 7 JAMA, Arch Intern Med 2 51 8 Lancet, Med Care 2 50 9 Crit Care Med, Prehospital Disaster Med 2 49 10 N Engl J Med, BMJ 2 49 11 N Engl J Med, Lancet 2 49 12 N Engl J Med, JAMA 2 47 13 N Engl J Med, Med Care 2 47 14 Qual Saf Health Care, BMJ 2 47 15 BMJ, Crit Care Med 2 42 16 Med Care, BMJ 2 38 17 N Engl J Med, J Bone Miner Res 2 33 IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 48 https://doi.org/10.6017/ital.v37i1.9595 18 N Engl J Med, J Pediatr Surg 2 26 19 Lancet, J Pediatr Surg 2 25 20 JAMA, Nature 2 25 21 Lancet, JAMA, BMJ 3 24 22 N Engl J Med, Lancet, BMJ 3 21 23 Intensive Care Med, BMJ 2 21 24 BMJ, N Engl J Med, JAMA 3 20 25 N Engl J Med, JAMA, Lancet 3 20 26 JAMA, Med Care, Lancet 3 14 27 JAMA, Med Care, N Engl J Med 3 14 28 BMJ, JAMA, Lancet, N Engl J Med 4 14 The link of co-citation journals in three periods from 1852 to 2009 can be summarized as follows: (1) three journals were highly cited but were not in a co-citation relationship in 1852–1907 (see figure 7); (2) five clusters of the healthcare journals in co-citation relationships were found for the years 1908–61 (see figure 8); and (3) 1962–2009 had a distinct cluster of four journals within the healthcare domain (see figure 9). Figure 7. The relationship of co-citation journals for the healthcare domain in 1852–1907. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 49 Figure 8. The relationship of co-citation journals for the healthcare domain in 1908–61. Journals with double circles are co-cited with the other in a citing article. Journals with triple circles are co- cited with the other two in a citing article. Figure 9. The relationship of co-citation journals for the healthcare domain in 1962–2009. The thick line and circle indicates the journals are co-cited in a citing article. CONCLUSIONS IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 50 https://doi.org/10.6017/ital.v37i1.9595 This paper presented an automated literature system for co-citation analysis to facilitate understanding of the sequence structure of journal articles cited in the healthcare domain. The system visually presents the results of its analysis to help researchers quickly identify the key articles that provide an overview of the healthcare domain. This paper used the keywords related to healthcare for its analysis and found that BMJ is a core journal in the domain. The co-citation analysis found a single cluster within the healthcare domain comprising four journals: BMJ, JAMA, Lancet, and the New England Journal of Medicine. This paper focused on a co-citation analysis of journals. Authors, articles, and issues featured in the co-citation analysis can be further studied in an automated way. A period analysis of publication years is also important. Further analyses can facilitate understanding of the changes in a research domain and the trend of research issues. In addition, the automatic generation of a map would be a worthwhile topic for the future study. ACKNOWLEDGEMENTS This article was funded by the Ministry of Science and Technology of Taiwan (MOST), formerly known as National Science Council (NSC), with Grant No: NSC 100-2410-H-227-003. For the remaining authors none were declared. All the authors have made significant contributions to the article and agree with its content. There is no known conflict of interest in this study. REFERENCES 1 A. Kitson et al., “What are the Core Elements of Patient-Centered Care? A Narrative Review and Synthesis of the Literature from Health Policy, Medicine and Nursing,” Journal of Advanced Nursing 69 (2013): 4–8, https://doi.org/10.1111/j.1365-2648.2012.06064.x. 2 S. J. Brownsell et al., “Future Systems for Remote Health Care,” Journal of Telemedicine and Telecare 5 (1999): 145–48, https://doi.org/10.1258/1357633991933503; B. G. Celler, N. H. Lovell, and D. K. Chan, “The Potential Impact of Home Telecare on Clinical Practice,” Medical Journal of Australia 171 (1999): 518–20; R. Walker et al., “What It Will Take to Create New Internet Initiatives in Health Care,” Journal of Medical Systems 27 (2003): 95–98, https://doi.org/10.1023/A:1021065330652. 3 I. Marshakova-Shaikevich, The Standard Impact Factor as an Evaluation Tool of Science Fields and Scientific Journals,” Scientometrics 35 (1996): 283–85, https://doi.org/10.1007/BF02018487; I. Marshakova-Shaikevich, “Bibliometric Maps of Field of Science,” Information Processing & Management 41(2005):1536–45, https://doi.org/10.1016/j.ipm.2005.03.027; A. R. Ramos- Rodrí guez and J. Ruí z-Navarro, “Changes in the Intellectual Structure of Strategic Management Research: A Bibliometric Study of the Strategic Management Journal, 1980–2000,” Strategic Management Journal 25, no. 10 (2004): 982–1000, https://doi.org/10.1002/smj.397. 4 H. Small, “Co-citation in the Scientific Literature: A New Measure of the Relationship between Two Documents,” Journal of American Society for Information Science 24 (1973): 266–68. https://doi.org/10.1111/j.1365-2648.2012.06064.x https://doi.org/10.1258/1357633991933503 https://doi.org/10.1023/A:1021065330652 https://doi.org/10.1007/BF02018487 https://doi.org/10.1016/j.ipm.2005.03.027 https://doi.org/10.1002/smj.397 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 51 5 M. M. Kessler, “Bibliographic Coupling between Scientific Papers,” American Documentation 14 (1963): 10–25, https://doi.org/10.1002/asi.5090140103; B. H. Weinberg, “Bibliographic Coupling: A Review,” Information Storage and Retrieval 10 (1974): 190–95. 6 H. D. White and B. C. Griffith, “Author Cocitation: A Literature Measure of Intellectual Structure,” Journal of the American Society for Information Science 32 (1981): 164–70, https://doi.org/10.1002/asi.4630320302. 7 Y. Ding, G. G. Chowdhury, and S. Foo, “Bibliometric Cartography of Information Retrieval Research by Using Co-word Analysis,” Information Processing & Management 37 no. 6 (November 2001): 818–20, https://doi.org/10.1016/S0306-4573(00)00051-0. 8 Small, “Co-citation,” 266. 9 D. Sullivan et al., “Understanding Rapid Theoretical Change in Particle Physics: A Month-by- Month Co-citation Analysis,” Scientometrics 2 (1980): 312–16, https://doi.org/10.1007/BF02016351. 10 N. Shibata et al., “Detecting Emerging Research Fronts based on Topological Measures in Citation Networks of Scientific Publications,” Technovation 28 (2008): 762–70, https://doi.org/10.1016/j.technovation.2008.03.009. 11 Weinberg, “Bibliographic Coupling.” 12 White and Griffith, “Author Cocitation.” 13 R. Agrawal and R. Srikant. “Fast Algorithm for Mining Association Rules in Large Databases” (paper, International Conference on Very Large Databases [VLDB], September 12–15, 1994, Santiago de Chile). 14 R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases” (paper, ACM SIGMOD International Conference on Management of Data, Washington, DC, May 25–28, 1993. 15 Agrawal and Srikant, “Fast Algorithm,” 3. 16 Ibid., 4. https://doi.org/10.1002/asi.5090140103 https://doi.org/10.1002/asi.4630320302 https://doi.org/10.1016/S0306-4573(00)00051-0 https://doi.org/10.1007/BF02016351 https://doi.org/10.1016/j.technovation.2008.03.009 Abstract Introduction Methods Dataset System Architecture Front-End Development Subsystem Middle-End Server Subsystem Back-End Database Subsystem Pre-processing Subsystem Research Procedure Association Rule Algorithms Results Conclusions Acknowledgements References
9598 ---- Microsoft Word - March_ITAL_Massicotte_proof_revised.docx Reference Rot in the Repository: A Case Study of Electronic Theses and Dissertations (ETDs) in an Academic Library Mia Massicotte and Kathleen Botter INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 11 ABSTRACT This study examines ETDs deposited during the period 2011-2015 in an institutional repository, to determine the degree to which the documents suffer from reference rot, that is, linkrot plus content drift. The authors converted and examined 664 doctoral dissertations in total, extracting 11,437 links, finding overall that 77% of links were active, and 23% exhibited linkrot. A stratified random sample of 49 ETDs was performed which produced 990 active links, which were then checked for content drift based on mementos found in the Wayback Machine. Mementos were found for 77% of links, and approximately half of these, 492 of 990, exhibited content drift. The results serve to emphasize not only the necessity of broader awareness of this problem, but also to stimulate action on the preservation front. INTRODUCTION A significant proportion of material in institutional repositories is comprised of electronic theses and dissertations (ETDs), providing academic librarians with a rich testbed for deepening our understanding of new paradigms in scholarly publishing and their implications for long-term digital preservation. While academic libraries have long collected and preserved hard copy theses and dissertations of the parent institution, the shift to mandatory electronic deposit of this material has conferred new obligations and curatorial functions not previously incorporated into library workflows. By highlighting ETDs as a susceptible collection deserving of specific preservation actions, we draw attention to some unique responsibilities for libraries housing university-produced content, particularly as scholarly information continues its shift away from commercial production and distribution channels. As Teper and Kraemer point out in their discussion of ETD program goals, “without preservation, long-term access is impossible; without long-term access, preservation is meaningless.”1 What Is Reference Rot, And Why Study It? In addition to linkrot (where a link sends the user to a webpage which is no longer available), Mia Massicotte (Mia.Massicotte@concordia.ca) is Systems Librarian, Concordia University Library, Montreal, Quebec, Canada. Kathleen Botter (Kathleen.Botter@concordia.ca) is Systems Librarian, Concordia University Library, Montreal, Quebec, Canada. A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 12 there are webpages that remain available, but whose contents have undergone change over time-- known as content drift. This dual phenomena of linkrot plus content drift has been characterized as reference rot by the Hiberlink project team,2 and has important implications for digital preservation. Since theses and dissertations are original works born digital by virtue of mandatory deposit programs, a university’s ETD program is effectively a digital publishing initiative, accompanied by a new universe of responsibility for its digital preservation. Due to the specialized nature of graduate-level research, ETDs frequently include links to resources on the open web, for example, personal blogs, project websites, and commercial entities. Digital Object Identifiers (DOIs), useful in the context of published literature, do not apply to URLs on the free web, which are DOI-indifferent. Open web links also fall outside the scope of preservation initiatives such as LOCKSS (Lots of Copies Keep Stuff Safe)3 which aim to safeguard the published literature. With increasing frequency, researchers are citing newer forms of scholarship, which do not readily fall under the rubric of published literature. Moreover, since thesis preparation is conducted over a period of time typically measured in years, links cited therein are likely to be more vulnerable to linkrot and content drift by the time of manuscript submission. Yet despite the surfeit of anecdotal daily evidence that URLs vanish and result in dead links, Phillips, Alemneh, and Ayala point out that “by and large academic libraries are not capturing and maintaining collections of web resources that provide context and historical reference points to the modern theses and dissertations held in their collections.”4 Since an ETD comprises a unique form of scholarly output produced by universities, and simultaneously satisfies the parent institution's degree-granting apparatus, as well as reflecting its academic stature on the international stage, the presence of reference rot in this body of literature is of particular concern and worthy of immediate attention. Smoking Guns There has been no shortage of evidence reporting on the linkrot phenomena over the last two decades. Koehler, whose initial study on linkrot appeared in JASIS in 1999, periodically revisited, analyzed, and reported on the same set of 360 URLs collected in his original study.5,6,7 In 2015, upon the twenty-year benchmark of the original data collection, Oguz and Koehler reported in JASIS that only 2 of the original links remained active.8 A number of foundational studies, including Casserly and Bird,9 Spinellis,10 Sellitto,11 Falagas, Karveli, and Tritsaroli,12 and Wagner et al.13 have reported on linkrot occurring in professional literature. Sanderson, Phillips, and Van de Sompel provide a table of 17 well-known linkrot studies, comparing overall benchmarks, and supplying a succinct summary of the scope of each study.14 Linkrot also gained further important exposure with the Harvard Law School study by Zittrain, Albert, and Lessig, which found that 70% of 3 Harvard law journal references, and 49.9% of URLs in Supreme Court opinions examined, no longer pointed to their originally cited sources.15 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 13 Members of the Hiberlink project, which set out to examine “a vast corpus of online scholarly publication in order to assess what links still work as intended and what web content has been successfully archived using text mining and information extracting tools” have been pivotal in making the case for reference rot.16 Hiberlink demonstrated that failure to link to cited sources was due not only to linkrot, but also to web page content which changed over time.17 A new dimension of the digital preservation universe was thrown into sharp relief with follow-up study by Klein et al. (2014), which examined one million web references extracted from 3.5 million Science, Technology, and Medicine (STM) articles published in Elsevier, PubMed Central, and ArXiv, between the years 1997 and 2012. The study concluded that one in five articles suffers from reference rot.18 Though the study focused on STM articles, its authors drew attention to theses and dissertations as a susceptible class of material. Analyzing the same set of links extracted from this large STM corpus, Jones et al. (2016) recently reported that 75% of referenced open web pages demonstrated changes in content.19 ETDs — A Susceptible Collection The digital preservation part of institutionally mandated ETD deposit has yet to have its dots fully connected to the rest of the diagram. After four years of research into academic institutions’ ETD programs, Halbert, Skinner, and Schultz reported that close to 75% of respondents surveyed had no preservation plan for their ETD collections.20 Despite the prevalence of linkrot studies, linkrot in ETDs has not been subjected to similar scrutiny, and the implications of disappearance of content is underappreciated. While mandatory deposit programs have become relatively commonplace, focus has largely remained on policy and implementation aspects, metadata quality, interoperability and conformance to standards.21,22 There are few studies which focus on institutional repository link content. The study conducted by Sanderson, Philips, and Van de Sompel (2011) was a large-scale examination of two repositories.23 400,144 papers deposited in ArXiv, and 3,595 papers in the University of North Texas (UNT) digital library repository were studied, and more than 160,000 URLs examined. Links were analyzed for persistence and the availability of mementos, that is, whether prior versions of the page existed in a public web archive, such as the Internet Archive's Wayback Machine. For 72% of UNT URLs, either mementos were available, or the resource still existed at its original location, or both. Although 54% (9,880) were available in one or more international web archives, 28% (5,073) of UNT's ETD links were found to no longer exist, nor had they been archived by the international archival community. Phillips, Alemneh, and Ayhala looked at overall general patterns and trends of URL references in repository ETDs, examining 4,335 ETDs between the years 1999-2012 in the UNT repository.24 The team analyzed 26,683 unique URLs in 2,713 ETDs containing one or more links, finding an overall average of 10.58 unique URLs per ETD with one or more links. The UNT team provided a A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 14 breakdown of domain and subdomain occurrence frequency, and indicated areas of future investigation into content-based URL linking patterns of ETDs. ETD link decay was studied by Sife and Bernard, who performed a citation analysis on URLs in 83 theses published between 2007 and 2011 at Tanzania's Sokoine National Agricultural Library.25 15,468 citations were examined, 9.6% (1,487) of which were open web citations. URLs were considered active if found at the original location, or available after a URL redirect. The authors manually tested URLs over a period of seven days to record their accessibility, noting down inaccessible URLs error messages and domains, and analyzing the types of errors encountered. The authors calculated that it took only 2.5 years for half of the web citations to disappear. At the ETD2014 conference,26 an important study of 7,500 ETDs in 5 U.S. universities was presented. Of 6,400 ETDs defended between 2003 and 2010, approximately 18% of open web link content was confirmed as lost, and a further 34% at risk of loss, that is, live links which lacked an archived copy.27 Though the results of that particular study have not been formally published, it was briefly summarized in a session held at the 38th UKSG Annual Conference in Glasgow, Scotland in March 2015, an account of which was subsequently published by Burnhill, Mewissen, and Wincewicz in Insights.28 Given the scarcity of published literature on link content as found in ETDs, this present study which examines reference rot in ETDs in an academic institutional repository is unique, draws attention to an important digital collection which is vulnerable to loss, and highlights need for action. BACKGROUND AND CONTEXT Concordia University is a comprehensive university located in Montreal, with a student population of 43,903 full-time equivalents in 2015, of which 7,835 were graduate students. 27 PhD programs were offered in 2015,29 and 43 programs at the Masters level. Faculties of Arts and Science, Engineering and Computer Science, Fine Arts, and Business have a thesis requirement, and produce upwards of 350 Masters and 150 PhD dissertations annually. The broad disciplines, and the departmental clusters used in this study are shown in Table 1. Prior to the thesis deposit mandate, Concordia University Library housed hard copy versions of theses and dissertations in the collection. In 2009, the Library launched Spectrum, Concordia’s Eprints institutional repository, playing a leadership role in Spectrum's implementation and policy development, and providing training and support to the School of Graduate Studies regarding submission and management of theses for deposit. Following a successful pilot project, the Graduate Studies Office ceased accepting paper manuscripts, and mandated electronic deposit of all theses and dissertations into Spectrum as of spring 2011. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 15 Discipline Department Discipline Department Arts Applied Linguistics Communication Economics Educational Technology History Hist and Phil of Religion Humanities Philosophy Sociology Political Science Psychology Religion Business* Decision Sciences and MIS Finance Management Marketing Engineering** Building Engineering Civil Engineering Computer Science Comp Sci & Software Eng Electrical and Comp Eng Industrial Engineering Info Systems Security Mechanical Engineering Fine Arts Art Education Art History Film and Moving Image Studies Industrial Design Fine Arts Performing Arts Science Biology Chemistry Mathematics Physics Exercise Science Table 1. Summary of departmental clusters used in this study * John Molson School of Business ** Engineering & Computer Science METHODOLOGY We concentrated on PhD dissertations (henceforth ETDs) in Spectrum in order to limit the scope of the project; Master's theses were excluded. A 5-year period was chosen, beginning with the first semester of mandatory deposit, spring 2011, through fall 2015, a total of 720 ETDs. Since Concordia ETDs are released for publication immediately following convocation, the University's official convocation dates were used to identify the set of documents to be downloaded and examined. We proceeded in phases: first downloading ETDs from Spectrum and converting to a text format that could be examined for patterns; then extracting links from each and testing programmatically A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 16 for linkrot; then drawing a stratified random sample of active URLs and visiting them to determine if content drift had taken place. Our methodology for link extraction was similar to those described by Klein et al.,30 and Zhou, Tobin, and Grover.31 During the dissertation download stage, 36 ETDs with embargoed content were encountered and eliminated. ETDs were then converted from existing PDF/A format to xml. A further 20 documents failed to convert due to nonstandard or complex formatting which resulted in unreadable, garbled characters. These documents resisted multiple conversion attempts, and since they could not be mined, had to be eliminated. A final total of 664 ETDs were successfully converted using three different tools: 97% (644) were converted using PDFtoHTML,32 the remaining 3% by either givemetext (14)33 or Adobe Acrobat (3). A spot check of documents was sufficient evidence that many links occurred throughout the text body. Since we intended to extract URLs to the open web, we wanted to err on the side of detecting more links, rather than easily-identifiable well-formed URLs. Links were mined from the body of the text in a manner similar to the study carried out at UNT.34 We wanted a regular expression which would catch as many URLs as possible, expecting to manually clean the link output before further processing. We tested multiple regular expressions35 against a small sample of our converted ETDs and compared the results. We selected one which seemed well-suited for our purpose, as it was liberal in detecting links throughout the text, was able to extract links which contained obvious omissions and problems — for example, those that lacked http:// prefixes — but also caught non-obvious errors, such as ellipses in long URLs. We considered how de- duplication of extracted links might affect the outcome, and opted to count each link as an individual instance. Manual cleanup included catching URLs that broke across new lines, identifying false hits such as titles containing colons and DOIs, and adding escape encoding characters for "&" and "%" in order to generate a clean URL for use in the next step of the process. METHODOLOGY — Linkrot collection A script programmatically used the cURL command line tool to visit each link and fetch the http response code in return.36 An output listing was produced for each doctoral dissertation, comprised of the original URLs, the final URLs, and the http response codes. Link output for each of the converted 664 ETDs was collected from December 2015 to January 2016, with the fall 2015 semester checked in March 2016. 76% (504 of 664) of ETDs contained one or more links, the highest number of links (5,946) falling into the Arts group. 24% (160 of 664) of ETDs contained no links. For the 5-year period, the broad discipline breakdown of documents examined, the number of ETDs with links, and the number of links extracted are shown in Table 2. Converted ETDs by publication year, broken out by broad disciplines, are shown in Figure 1. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 17 Discipline Number of PhD ETDs in Spectrum ETDs converted* Contain no links Contain links Number of links extracted Arts 210 195 31 164 5,946 Business 45 43 12 31 210 Engineering 351 326 82 244 3,259 Fine Arts 28 25 2 23 1,728 Science 86 75 33 42 294 Total 720 664 160 504 11,437 Table 2. 5-year period, 2011-2015, summary of documents examined and links extracted * 56 documents in total eliminated (36 embargoed; plus 20 which failed to convert). Figure 1. Converted ETDs by publication year and broad discipline A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 18 The 11,437 links extracted were checked for linkrot, each link accessed and its http response code recorded. 77% (8,834 of 11,437) of links returned an active 2xx http response code. 23% (2,603) of links could not be reached, returning a response code other than in the 2xx range. This includes 102 links in the 3xx range which failed to reach a destination after 50 redirects and were considered linkrot. Numbers of links, total link response, and link response by year broken down by broad discipline are shown in Figure 2, with accompanying data provided in Table 3 and discussed in the findings section. Figure 2. Link HTTP response codes, by broad discipline and year INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 19 Discipline HTTP response code 2011 2012 2013 2014 2015 Total % Active & Rotten** Arts 2xx 691 864 800 1,108 1,093 4,556 77% all other* 320 428 131 293 218 1,390 23% Business 2xx 14 52 17 22 50 155 74% all other 9 19 5 9 13 55 26% Engineering 2xx 302 702 638 482 404 2,528 78% all other 134 172 180 196 49 731 22% Fine Arts 2xx 165 143 504 467 94 1,373 79% all other 74 56 118 98 9 355 21% Science 2xx 77 34 58 39 14 222 76% all other 25 23 10 11 3 72 24% Subtotal 2xx 1,249 1,795 2,017 2,118 1,655 8,834 77% active all other 562 698 444 607 292 2,603 23% rotten % Rotten 31% 28% 18% 22% 15% 23% Total 1,811 2,493 2,461 2,725 1,947 11,437 100% Table 3. Breakdown by year and discipline showing active (2xx) and rotten (all others) response codes *All other = 0, 1xx, 3xx (unresolved after 50 redirects), 4xx and 5xx response codes combined ** Active and rotten rates based on total links per discipline METHODOLOGY — Content Drift For the content drift phase, we wanted to sample documents from each of the five disciplines. ETDs which did not contain any links were excluded from the sample. Using only documents with one or more active links, a stratified random sample of 10% was drawn for a final sample of 49 ETDs containing a total of 990 links. A snippet of text surrounding each link was then also extracted from each ETD, along with any "date accessed" or "date viewed" information if present. Each link was manually visited, assessed for content drift, and observations recorded. The breakdown of the content drift sample is shown in Table 4. A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 20 Discipline ETDs with links ETDs with active links (2xx) ETDs sampled for content drift* Number of links extracted for sample Arts 164 156 16 668 Business 31 28 3 12 Engineering 244 235 24 154 Fine Arts 23 23 2 136 Science 42 40 4 20 Total 504 482 49 990 Table 4. Breakdown of sample pool of ETDs for content drift analysis * 10% sample drawn from each discipline’s pool of ETDs; only ETDs with URLs relevant for content drift assessment. Visited links were benchmarked against the existence of a memento -- an archived snapshot of that page located in the Wayback Machine.37 Since the University sets a strict thesis submission deadline of 3 months prior to convocation, mementos prior to submission deadline would be sought. Based on the occurrences of "date accessed" and discursive information found in the snippets, we arrived at the supposition that links were likely to have been checked the closer the student approached final stages of manuscript preparation, although this is not verifiable. We set ourselves a soft window for locating an archived snapshot using a date 6 months prior to the convocation date as the benchmark; that is, for each semester's deadline date, an additional 3 months was added, arriving at a 6-months-prior-to-publication marker. Since programmatic analysis of 990 links required time, expertise, and resources not available to us, we approached the problem heuristically. Assuming that online consultations are not linear, active links occurring multiple times in a document were given equal weight. Each link was manually checked in the Wayback Machine using "date viewed" if provided; if no date was provided (the majority of cases), Wayback was checked to see if an archived version existed as close to our 6 month soft marker as possible. If a memento was not found within a month earlier/later than the soft marker, then the nearest neighboring older memento was selected, if one existed. The original URL, the date the URL was visited, and whether a snapshot was located in Wayback was recorded. All links were checked during July-August 2016. If the initial web browser failed to access, a second and sometimes third browser was tried, using Safari, Chrome, and Internet Explorer (IE) in that order. Unsuccessful attempts to reach Wayback were rechecked in September. The question as to whether, and to what degree content drift had occurred was assessed, and is discussed in the next section. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 21 FINDINGS AND DISCUSSION Linkrot Findings Of 664 ETDs examined for linkrot, 77% of links tested returned an active http response code in the 2xx range -- roughly three-quarters overall. Numbers of links by broad discipline varied greatly, as shown in Figure 2 (healthy links in green, linkrot shown in red). Linkrot rates ranged from 21% in Fine Arts, to 26% in Business, as seen in last column of Table 3. It should be noted that 2xx response codes are also returned for pages that disguise themselves as active links. For example, a URL returns an active status code when a domain has been parked (e.g. purchased to reserve the space), or when a customized 404-page-not-found is encountered. Since we had no mechanism in place to treat false positives, these were flagged during the linkrot phase as candidates for subsequent content drift analysis. 23% (2,604 of 11,437) of all links, returned a response code of something other than in the 2xx-range and considered linkrot -- roughly one-quarter. Response codes in the 4xx range alone, including 404-page-not-found errors, comprised 17% (1,916 of 11,437) of all links. Table 5 shows the breakdown of the total number of links that were visited in the spring of 2016 for linkrot determination. HTTP response code category Meaning of http response code* Number of links Percent of total links (%) 0 Empty response** 507 4% 1xx Informational 2 0% 2xx Successful 8,834 77% 3xx Redirection† 102 1% 4xx Client error 1,916 17% 5xx Server error 76 1% Total 11,437 100% Table 5. Breakdown of HTTP response codes received * We used http protocol definitions at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html ** unofficial http response code due to request timing out † failure to resolve after 50 redirects Http responses ranged from a high of 85% active in 2015, to a low of 69% active in 2011, the oldest publication year. To put it differently, the most recent year exhibited a linkrot rate of 15%. Consistent with other studies, linkrot manifests itself quickly after publication and increases over time, as indicated by percentages shown in Figure 2. Content Drift Findings Of the 990 links visited to check for the presence of content drift, 764 (400 + 364), or 77%, had a Wayback memento compared 226 (92+134), or 23%, which did not. Slightly more than half of links with mementos, 52% (400 of 764), demonstrated some level of content drift when the A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 22 memento was compared to the current active link, while 48% (364 of 764) with mementos did not exhibit content drift. The presence of content drift by discipline, with/without mementos showing numbers of links tested, appears in Table 6. Discipline Number of links tested Content Drift detected No Content Drift Memento found Memento not found Total Memento found Memento not found Total Arts 668 261 60 321 254 93 347 Business 12 5 0 5 4 3 7 Engineering 154 74 10 84 55 15 70 Fine Arts 136 55 22 77 38 21 59 Science 20 5 0 5 13 2 15 Total 990 400 92 492 364 134 498 Table 6. Presence of Content Drift by Discipline, with/without mementos For links that had no memento in Wayback, content drift assessment was based on the presence of an observable date in the current active link, including copyright, and/or other details which positively correlated against our extracted snippet information. For example, some links retrieved a .pdf or other static file which correlated with the snippet, there being no reason to conclude its content had undergone change since publication, despite the lack of a memento. Snippets were also used in cases where a robots.txt file at the target URL had prevented Wayback from creating a memento. Occasional examination of the dissertation text was conducted to validate information extracted in the snippet. The 23% (226) which lacked mementos remain at significant risk and will fall prey to further drift as time passes. As seen in Table 7, of 492 URLs manifesting content drift, 11% (54 of 492) were completely lost, linking to web domains that had been sold or were currently up for sale, and webpages replaced or removed. 9% (42 of 492) of web pages exhibited major change such that there was little correlation with snippets, or where website overhauls made assessment difficult, but not impossible. 36% (179 of 492) web links exhibited minor drift, primarily pages that differed somewhat from a memento in visual appearance, such as header and footer differences, changes in background theme or style, or changes in navigation or search functionality which did not represent a high degree of impairment. 7% (34 of 492) linked to continually updating websites, such as Wikipedia and news organizations, and 7% (35 of 492) were customized 404-page-not- found, distinctive enough to warrant separate categories. A full 30% (148 of 492) exhibited a multiplicity of changes of uncertain nature which we grouped together, such as pages where graphic or audio components had been removed or could not be retrieved, broken javascript that impeded access, browser failure, mementos not accessible after repeated attempts -- indicative of a range of issues affecting the quality of web archives and hence preservation.38 The types of INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 23 content drift encountered, broken down by broad discipline and numbers of links, and percentage, is shown in Table 7. Type of content drift Arts Business Engineeri ng Fine Arts Science Total % of type Lost 45 0 3 6 0 54 11% Major but findable 22 0 9 9 2 42 9% Minor – redesigned but recognizable 128 2 30 17 2 179 36% Ongoing updating website 25 3 5 0 1 34 7% Custom 404 23 0 4 8 0 35 7% Other 78 0 33 37 0 148 30% Total 321 5 84 77 5 492 100% Table 7. Types of content drift encountered, number of links by broad discipline Though difficulties encountered during content drift assessment made further extrapolation problematic, the presence of reference rot was confirmed. Our 10% stratified random sample examined 990 active links, finding that roughly half (492 of 990) manifested some degree of content drift. For 364 links, or 36% overall, a benchmark memento was found and no content drift detected. Although many content drift changes can arguably be characterized as minor, it is not possible to ascertain where the content drift scale tips irremediably for any particular reader. What can be said with certainty is that 11% of active links which did not exhibit linkrot, and were quite live and accessible, fell into a small but unsettling group where the context of the cited web source is irrevocably lost. Of the 498 links which did not exhibit any evidence of content drift, 134, approximately one-third, have no memento archived and continue to remain at high risk. A focused and deeper analysis of active links which might lead to a typology of content drift types would be a possible area of future study, though even the well-resourced study by Jones et al. which utilized a strict "ground truth" for comparing textual mementos over time, points out that classifying links would certainly be challenging.39 A larger sample size might also allow closer analysis of disciplinary differences, which may lead to a better understanding of these types of content drift variations. CONCLUSION Reference rot in the form of linkrot and content drift were observed in ETDs in Spectrum, our institutional repository, and this confirmation should give pause for those charged with A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 24 stewardship of ETD collections. Theses and dissertations have long been viewed as material which contribute overall to academic scholarly output, and carry unique status within the academy. In August 2016, OpenDOAR registered 1600 institutional repositories with ETDs,40 up from 1,100 institutions as reported in 2012 by grey literature specialist Schoepfel.41 Academic libraries have, in large part, facilitated the transition from paper to ETD with widespread adoption of institutional repository deposit programs, and along with that adoption comes a range of long-term preservation issues. Yet as Ohio State’s Strategic Digital Initiatives Working Group pointed out, “Even in digital library communities, preservation all too often stands in for or is used interchangeably with byte level backup of content.”42 For long-term access, focus can productively be shifted to offset the immediate threat of incompleteness and inadequate capture.43 Not much has changed since Hedstrom wrote back in 1997: “With few exceptions, digital library research has focussed on architectures and systems for information organization and retrieval, presentation and visualization, and administration of intellectual property rights … The critical role of digital libraries and archives in ensuring the future accessibility of information with enduring value has taken a back seat to enhancing access to current and actively used materials.”44 Our understanding and discussion of digital preservation must be broadened, and attention turned to this key area of responsibility in the preservation life-cycle. The authors maintain that ETD content and link preservation is an editorial, not individual, imperative. Encouraging individual authors to perform their own archiving is doomed to fall short of even reasonable expectations. Instituting measures such as Perma, a distributed, redundant method of capturing and archiving web site content as part of the citation process must be pro-actively sought and built into library, and hence repository, workflows.45 Browser plugins and automated solutions which use the Memento protocol for capturing and archiving web site content as part of the citation process do exist,46 but naturally have to be implemented before they can take effect. Either way, efforts to operationalize existing mechanisms which are designed to reduce future loss would be extremely productive. Responsibility for insuring not only current, but continuing future access to ETD content rests with those who maintain curatorial function of the repository. Academic librarians have assumed a prominent and de facto role as curators, facilitating the role of university publication and emphasizing its break away from previous ties with commercial entities. We collectively bear greater responsibility for this body of scholarly work, and need to move forward from a position of benign neglect to one of informed curation and pro-active preservation of an important collection of scholarly output which is at risk. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 25 REFERENCES 1. Thomas H. Teper and Beth Kraemer, “Long-Term Retention of Electronic Theses and Dissertations,” College & Research Libraries 63, no. 1 (January 1, 2002), 64, https://doi.org/10.5860/crl.63.1.61. 2 The term “reference rot” was introduced by the Hiberlink team. “Hiberlink – About,” accessed March 31, 2016, http://hiberlink.org/about.html. 3. LOCKSS: Lots of Copies Keep Stuff Safe, accessed December 6, 2016, http://www.lockss.org/about/what-is-lockss/. 4. Mark Edward Phillips, Daniel Gelaw Alemneh, and Brenda Reyes Ayala, “Analysis of URL References in ETDs: A Case Study at the University of North Texas,” Library Management 35, no. 4/5 (June 3, 2014), 294, https://doi.org/10.1108/LM-08-2013-0073. 5. Wallace Koehler, “An Analysis of Web Page and Web Site Constancy and Permanence,” Journal of the American Society for Information Science 50, no. 2 (January 1, 1999): 162–80, https://doi.org/10.1002/(SICI)1097-4571(1999)50:2<162::AID-ASI7>3.0.CO;2-B. 6. Wallace Koehler, “Web Page Change and Persistence—a Four-Year Longitudinal Study,” Journal of the American Society for Information Science & Technology 53, no. 2 (January 15, 2002): 162–71, http://doi.org/10.1002/asi.10018. 7. Wallace Koehler, "A longitudinal study of Web pages continued: a consideration of document persistence." Information Research 9, no. 2 (2004): 9-2, http://www.informationr.net/ir/9- 2/paper174.html. 8. Fatih Oguz and Wallace Koehler, “URL Decay at Year 20: A Research Note,” Journal of the Association for Information Science and Technology 67, no. 2 (February 1, 2016): 477–79, https://doi.org/10.1002/asi.23561. 9. Mary F. Casserly and James Bird, “Web Citation Availability: Analysis and Implications for Scholarship,” College and Research Libraries 64, no. 4 (July 2003): 300–317, http://crl.acrl.org/content/64/4/300.full.pdf. 10. Diomidis Spinellis, “The Decay and Failures of Web References,” Communications of the ACM 46, no. 1 (January 2003): 71–77, https://doi.org/10.1145/602421.602422. 11. Carmine Sellitto, “A Study of Missing Web-Cites in Scholarly Articles: Towards an Evaluation Framework,” Journal of Information Science 30, no. 6 (December 1, 2004): 484–95, https://doi.org/10.1177/0165551504047822. A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 26 12. Matthew E. Falagas, Efthymia A. Karveli, and Vassiliki I. Tritsaroli, “The Risk of Using the Internet as Reference Resource: A Comparative Study,” International Journal of Medical Informatics 77, no. 4 (April 2008): 280–86, https://doi.org/10.1016/j.ijmedinf.2007.07.001. 13. Cassie Wagner et al., “Disappearing Act: Decay of Uniform Resource Locators in Health Care Management Journals,” Journal of the Medical Library Association 97, no. 2 (April 2009): 122– 30, https://doi.org/10.3163/1536-5050.97.2.009. 14. Robert Sanderson, Mark Phillips, and Herbert Van de Sompel, “Analyzing the Persistence of Referenced Web Resources with Memento,” arXiv:1105.3459 [Cs], May 17, 2011, http://arxiv.org/abs/1105.3459. 15. Jonathan Zittrain, Kendra Albert, and Lawrence Lessig, “Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations,” Legal Information Management 14, no. 2 (June 2014): 88–99, https://doi.org/10.1017/S1472669614000255. 16. “Hiberlink - About,” accessed March 31, 2016, http://hiberlink.org/about.html. 17. “Hiberlink - Our Research,” accessed March 31, 2016, http://hiberlink.org/research.html. 18. Martin Klein, Herbert Van de Sompel, Robert Sanderson, Harihar Shankar, Lyudmila Balakireva, Ke Zhou, Richard Tobin. “Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot,” PLoS One 9, no. 12 (December 26, 2014), https://doi.org/10.1371/journal.pone.0115253. 19. Shawn M. Jones, Herbert Van de Sompel, Harihar Shankar, Martin Klein, Richard Tobin, Claire Grover. “Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content,” PLOS ONE 11, no. 12 (December 2, 2016): e0167475, https://doi.org/10.1371/journal.pone.0167475. 20. Martin Halbert, Katherine Skinner, and Matt Schultz, “Preserving Electronic Theses and Dissertations: Findings of the Lifecycle Management for ETDs Project,” Text, (August 6, 2015), 2, http://educopia.org/presentations/preserving-electronic-theses-and- dissertations-findings-lifecycle-management-etds. 21. For a recent overview, see Sarah Potvin and Santi Thompson, “An Analysis of Evolving Metadata Influences, Standards, and Practices in Electronic Theses and Dissertations,” Library Resources & Technical Services 60, no. 2 (March 31, 2016): 99–114, https://doi.org/10.5860/lrts.60n2.99. 22. Joy M. Perrin, Heidi M. Winkler, and Le Yang, “Digital Preservation Challenges with an ETD Collection — A Case Study at Texas Tech University,” The Journal of Academic Librarianship 41, no. 1 (January 2015): 98–104, https://doi.org/10.1016/j.acalib.2014.11.002. 23. Sanderson, Phillips, and Van de Sompel, “Analyzing the Persistence of Referenced Web Resources with Memento,” http://arxiv.org/abs/1105.3459. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 27 24. Phillips, Alemneh, and Ayala, "Analysis of URL references," https://doi.org/10.1108/LM-08- 2013-0073. 25. Alfred S. Sife and Ronald Bernard, “Persistence and Decay of Web Citations Used in Theses and Dissertations Available at the Sokoine National Agricultural Library, Tanzania,” International Journal of Education and Development Using Information and Communication Technology 9, no. 2 (2013): 85–94, http://eric.ed.gov/?id=EJ1071354. 26. “ETD2014 — University of Leicester,” University of Leicester, accessed January 27, 2016, http://www2.le.ac.uk/library/downloads/etd2014. 27. EDINA, University of Edinburgh, “Reference Rot: Threat and Remedy,” (Education, 04:54:38 UTC), http://www.slideshare.net/edinadocumentationofficer/reference-rot-and-linked- data-threat-and-remedy. 28. Peter Burnhill, Muriel Mewissen, and Richard Wincewicz, “Reference Rot in Scholarly Statement: Threat and Remedy,” Insights the UKSG Journal 28, no. 2 (July 7, 2015): 55–61, https://doi.org/10.1629/uksg.237. 29. Concordia University University Graduate Programs, accessed April 7, 2016, http://www.concordia.ca/academics/graduate.html. 30. Klein et al., "Scholarly Context Not Found," https://doi.org/10.1371/journal.pone.0115253. 31. Ke Zhou, Richard Tobin, and Claire Grover, “Extraction and Analysis of Referenced Web Links in Large-Scale Scholarly Articles,” in Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’14 (Piscataway, NJ, USA: IEEE Press, 2014), 451–452, http://dl.acm.org/citation.cfm?id=2740769.2740863. 32. Pdftohtml v0.38 win32, meshko (Mikhail Kruk), http://pdftohtml.sourceforge.net/ accessed September 20, 2015. (Actual download is at http://sourceforge.net/projects/pdftohtml/). 33. Give me text! Open Knowledge International, accessed October 26, 2015-March 7, 2016, http://givemetext.okfnlabs.org/. 34. Phillips, Alemneh, and Ayala, "Analysis of URL references," https://doi.org/10.1108/LM-08- 2013-0073. 35. “In Search of the Perfect URL Validation Regex,” accessed December 7, 2015, https://mathiasbynens.be/demo/url-regex. We selected "@gruber v2" for our extraction. 36. cURL v7.45.0, "command line tool and library for transferring data with URLs," accessed October 18, 2015, http://curl.haxx.se/. 37. We have used the term "memento" in lowercase to denote a snapshot souvenir page, to distinguish from an automated service utilizing the Memento protocol. A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 28 38. For a good overview of the types of problems, see Michael L. Nelson, Scott G. Ainsworth, Justin F. Brunelle, Mat Kelly, Hany SalahEldeen and Michele Weigle, "Assessing the Quality of Web Archives" 1 vol., Computer Science Presentations, Book 8 (Old Dominion University. ODU Digital Commons, 2014). http://digitalcommons.odu.edu/computerscience_presentations/8. 39. Shawn M. Jones, et al. “Scholarly Context Adrift," https://doi.org/10.1371/journal.pone.0167475. 40. OpenDOAR search of Institutional Repositories with Theses at http://www.opendoar.org/find.php, accessed August 26, 2016. 41. Joachim Schöpfel, "Adding value to electronic theses and dissertations in institutional repositories." D-Lib Magazine 19, no. 3 (2013): 1. https://doi.org/10.1045/march2013- schopfel. 42. Strategic Digital Initiatives Working Group. Implementation of a Modern Digital Library at The Ohio State University. (Apr 2014). https://library.osu.edu/documents/SDIWG/sdiwg_white_paper.pdf. (Published). 43. Tim Gollins. “Parsimonious Preservation: Preventing Pointless Processes! (The Small Simple Steps That Take Digital Preservation a Long Way Forward),” in Online Information Proceedings UK National Archives, 2009. Available at http://www.nationalarchives.gov.uk/documents/information-management/parsimonious- preservation.pdf. 44. Margaret Hedstrom, "Digital preservation: a time bomb for digital libraries." Computers and the Humanities 31, no. 3 (1997): 189-202. https://doi.org/10.1023/A:1000676723815. 45. Zittrain, Albert, and Lessig, "Perma," https://doi.org/10.1017/S1472669614000255. 46. Herbert Van de Sompel, Michael L. Nelson, Robert Sanderson, Lyudmila L. Balakireva, Scott Ainsworth, and Harihar Shankar, “Memento: Time Travel for the Web,” arXiv:0911.1112 [Cs], November 5, 2009, http://arxiv.org/abs/0911.1112.
9601 ---- Microsoft Word - December_ITAL_farnell_final.docx Editorial Board Thoughts: Metadata Training in Canadian Library Technician Programs Sharon Farnel INFORMATION TECHNOLOGIES AND LIBRARIES | DECEMBER 2016 3 The core metadata team at my institution is small but effective. In addition to myself as Coordinator, we include two librarians and two full-time metadata assistants. Our metadata assistant positions are considered to be similar, in some ways, to other senior assistant positions within the organization which require or at least prefer that individuals have a library technician diploma. However, neither of our metadata assistants has such a diploma. Their credentials, in fact, are quite different. In part, this difference is driven by the nature of the work that our metadata assistants do. They work regularly with different metadata standards such as MODS, DC, DDI in addition to MARC. The perform operations on large batches of metadata using languages such as XSLT or R. This is quite different in many ways than the work of their colleagues who work with the ILS, many of whom do have a library technician diploma. As we prepare for an upcoming short-term leave of one of our team members, I have been thinking a great deal about the work our metadata assistants do and whether or not we would find an individual who came through a librarian technician program who had the skills and knowledge we need a replacement to have. And I have also been reminded of conversations I have had with recently graduated library technicians who felt their exposure to metadata standards, practices, and tools beyond RDA and MARC had been lacking in their programs. This got me thinking about the presence or absence of metadata courses in library technician programs in Canada. I reached out to two colleagues from MacEwan University—Norene Erickson and Lisa Shamchuk—who are doing in-depth research into library technician education in Canada. They kindly provided me with a list of Canadian institutions that offer a library technician program so I could investigate further. Now, I must begin with two caveats. One, this is very much a surface level scan rather than an in- depth examination, although this is simply the first step in what I hope will be a longer term investigation. Second, although several Francophone institutions in Canada offer library technician programs, I did not review their programs; I was concerned that my lack of fluency in the French language could lead to inadvertent misrepresentations. Sharon Farnel (sharon.farnel@ualberta.ca), a member of the ITAL Editorial Board, is Metadata Coordinator, University of Alberta Libraries, Edmonton, Alberta. EDITORIAL BOARD THOUGHTS | FARNEL https://doi.org/10.6017/ital.v35i4.9601 4 Canadian institutions offering a library technician program (by province) are: Alberta ● MacEwan University (http://www.macewan.ca/wcm/SchoolsFaculties/Business/Programs/LibraryandInforma tionTechnology/) ● Southern Alberta Institute of Technology (http://www.sait.ca/programs-and-courses/full- time-studies/diplomas/library-information-technology) British Columbia ● Langara College (http://langara.ca/programs-and-courses/programs/library-information- technology/) ● University of the Fraser Valley (http://www.ufv.ca/programs/libit/) Manitoba ● Red River College (http://me.rrc.mb.ca/catalogue/ProgramInfo.aspx?ProgCode=LIBIF- DP&RegionCode=WPG) Nova Scotia ● Nova Scotia Community College (http://www.nscc.ca/learning_programs/programs/plandescr.aspx?prg=LBTN&pln=LIBIN FTECH) Ontario ● Algonquin College (http://www.algonquincollege.com/healthandcommunity/program/library-and- information-technician/) ● Conestoga College (https://www.conestogac.on.ca/parttime/library-and-information- technician) ● Confederation College (http://www.confederationcollege.ca/program/library-and- information-technician) ● Durham College (http://www.durhamcollege.ca/programs/library-and-information- technician) ● Seneca College (http://www.senecacollege.ca/fulltime/LIT.html) ● Mohawk College (http://www.mohawkcollege.ca/ce/programs/community-services-and- support/library-and-information-technician-diploma-800) INFORMATION TECHNOLOGIES AND LIBRARIES | DECEMBER 2016 5 Quebec ● John Abbott College (http://www.johnabbott.qc.ca/academics/career- programs/information-library-technologies/) Saskatchewan ● Saskatchewan Polytechnic (http://saskpolytech.ca/programs-and- courses/programs/Library-and-Information-Technology.aspx) My method was quite simple. Using the program websites listed above, I reviewed the course listings looking for ‘metadata’ either in the title or in the description when it was available. Of the fourteen (14) programs examined, nine (9) had no course with metadata in the title or description. Two (2) programs had courses where metadata was listed as part of the content but not the focus: Langara College as part of “Special Topics: Creating and Managing Digital Collections” and Seneca College as part of “Cataloguing III” which has a partial focus on metadata for digital collections. Three (3) of the programs had a course with metadata in the title or description; all are a variation on “Introduction to Metadata and Metadata Applications”. (Importantly, the three institutions in question - Conestoga College, Confederation College, and Mohawk College - are all connected and share courses online). So, what do these very preliminary and impressionistic findings tell us? It seems that there is little opportunity for students enrolled in library technician programs in Canada to be exposed to the metadata standards, practices, and tools that are increasingly necessary for positions involved in work with digital collections, research data management, digital preservation, and the like. Admittedly, no program can include courses on all potentially relevant topics. In addition, formal course work is only one aspect of training and education that can prepare graduates for their career; practica and work placements and other more informal activities during a program are crucial, as are the skills and knowledge that can only be developed once hired and on the job. Nevertheless, based on the investigation above, one would be justified in asking if we are disadvantaging students by not working to incorporate additional coursework focused on metadata standards, application, and tools, as well as on basic skills in manipulation of metadata in large batches. scripting languages or equivalent combination of education and experience. Master’s desirable.” I edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “Bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a Master’s degree is preferred,” followed by a separate description of technical skills needed. This increased the number and quality of our EDITORIAL BOARD THOUGHTS | FARNEL https://doi.org/10.6017/ital.v35i4.9601 6 applications, so I’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. Meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. First, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. Second, they ask about possibilities to formalize skills. Recently, I’ve gotten questions about a certificate program in UX and whether there is any formal certification to be a systems librarian. Surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-MLS work experience—doesn’t suggest any standard method for substantiating technical knowledge. Once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. Some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,1 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. At Penn Libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. This gives us concrete code to discuss in a far more realistic and relaxed context. While it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new MLS grads for library technology jobs. The new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. Others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. Even if we make efforts to narrow the gap between employers and job- seekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. Library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. There persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. REFERENCES 1. Erin Kissane, “Job Listings That Don’t Alienate,” https://storify.com/kissane/job-listings-that- don-t-alienate.
9602 ---- December_ITAL_fifarek_final President’s Message: Focus on Information Ethics Aimee Fifarek INFORMATION TECHNOLOGIES AND LIBRARIES | DECEMBER 2016 1 Just a few weeks ago we held yet another successful LITA Forum1, this time in Fort Worth, TX. Tight travel budgets and time constraints mean that only a few hundred people get to attend Forum each year, but that is one of the things that make it a great conference. Because of its size you have a realistic chance of meeting everyone there, whether it’s at Game Night, one of the many networking dinners, or just for during hallway chitchat after a session. And the sessions really do give you something to talk about. This year I couldn’t help but notice a theme. Among all the talk about makerspace technologies, analytics, and specific software platforms, the one bubble that kept rising to the surface was information ethics. Why are you doing what you are doing with the information you have, and should you really be doing it? Have you stopped to think what impact collecting, posting, sharing that information is going to have on the world around you? In a post-election environment replete with talk of fake news and other forms of deliberate misinformation, LITA Forum presenters seem to have tapped in to the zeitgeist. Tara Robertson, in her closing keynote2, talked about the harm digitizing analog materials can do when what is depicted is sensitive to individuals and communities. Waldo Jaquith of US Open Data talked about how a government decision to limit options on a birth certificate to either “white” or “colored” effectively wiped the native population out of political existence in Virginia. And Sam Kome from Claremont Colleges talked about how well-meaning librarians can facilitate privacy invasion merely by collecting operational statistics3. There were many other examples brought out by Forum speakers but these in particular emphasized the real consequences the serious consequences the use of data – intentional or not – can have on people. I think it is time for librarians4 to get more vocal about information ethics and the role we play in educating the population about humane information use. Our profession has always been forward thinking about information literacy and is traditionally known for helping our communities make judgements about the information they consume. But we have not done enough to declare our expertise in the information economy, to stand up and say “we’re librarians – this is what we do.” Now, more than ever, people need the skills to think critically about the information they are consuming via all kinds of media, understand the consequences of allowing algorithms to shape their information universe, and make quality judgments about trading their personal information for goods and services. To quote from UNESCO: Aimee Fifarek (aimee.fifarek@phoenix.gov) is LITA President 2016-17 and Deputy Director for Customer Support, IT and Digital Initiatives at Phoenix Public Library, Phoenix, AZ. PRESIDENT’S MESSAGE | FIFAREK https://doi.org/10.6017/ital.v35i4.9602 2 Changes brought about by the rapid development of information and communication technologies (ICT) not only open tremendous opportunities to humankind but also pose unprecedented ethical challenges. Ensuring that information society is based upon principles of mutual respect and the observance of human rights is one of the major ethical challenges of the 21st century.5 I challenge all librarians to make a commitment to propagating information ethics, both personally and professionally. Make an effort to get out of your social media echo chamber6 and engage with uncomfortable ideas. When you see biased information being shared consider it a “teachable moment” and highlight the spin or present more neutral information. And if your library is not actively making information literacy and information ethics part of its programming and instruction, then do what you can to change it. Offer to be on a panel, create a curriculum, or host a program that includes key concepts relating to information “ownership, access, privacy, security, and community”7. The focus of the Libraries Transform campaign this year is all about our expertise: “Because the best search engine in the Library is the Librarian”8 It’s our time to shine. REFERENCES 1. http://forum.lita.org/home/ 2. http://forum.lita.org/speakers/tara-robertson/ 3. http://forum.lita.org/sessions/patron-activity-monitoring-and-privacy-protection/ 4. As always, when I use the term “librarian” my intention is to include any person who works in a library and is skilled in information and library science, not to limit the reference to those who hold a library degree. 5. http://en.unesco.org/themes/ethics-information 6. https://www.wnyc.org/story/buzzfeed-echo-chamber-online-news-politics/ 7. https://en.wikipedia.org/wiki/Information_ethics 8. http://www.ilovelibraries.org/librariestransform/
9655 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Impact of Information Technology on Library Anxiety: The Role of Computer Attitudes Jiao, Qun G;Onwuegbuzie, Anthony J Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 138 The Impact of Information Technology on Library Anxiety: Oun G. Jiao and Anthony J. Onwuegbuzie The Role of Computer Attitudes Over the past two decades, computer-based technologies have become dominant forces to shape and reshape the products and services the academic library has to offer. The application of library technologies has had a profound impact on the way library resources are being used. Although many students continue to experience high lev- els of library anxiety, it is likely that the new technologies in the library have led to them experiencing other forms of negative affective states that may be, in part, a function of their attitude towards computers. This study investi- gates whether students' computer attitudes predict levels of library anxiety. C omputers and information technologies have experienced considerable growth over the past two decades. As such, familiarity with computers is rapidly becoming a basic skill and a prerequisite for many tasks. Although not every college student is equally prepared for the rising demand of computer skills in the !nformation age, computer literacy is increasingly becom- mg a gatekeeper for students' academic success. 1 Gaps in computer literacy and skills can leave many students behind not only in their academic achievement but also in their future job-market success. The unprecedented pace of technological change in the development of digital information networks and electronic services in recent years has helped to expand the role of the academic library. Once only a storehouse of printed materials, it is now a technology-laden informa- tion network where students can conduct research in a mixed print and digital-resource environment, experi- ence the use of advanced information technologies, and hone their computer skills. Yet, many students are struggling to cope with the changes brought on by the rapid advances of information teclmologies. Academic libraries of various sizes have spent a large percentage of their material budget on elec- tronic commercial content, and the trend will continue.' These days, college students are faced with the choices of ever-changing modes of electronic accessing tools, inter- faces, and protocols along with the traditional print resources in the library. The fact that the same journal article may be available in multiple vendors' aggregator Oun G. Jiao (gerryjiao@baruch.cuny.edu) is Reference Librar- ia~ and_ Associate Professor at Newman Library, Baruch College, City University of New York, and Anthony J. Onwuegbuzie (Tony Onwuegbuzie@aol.com) is Associate Professor at the College of Education, University of South Florida, Tampa. sites (such as EBSCOhost and Gale Group) makes the navigation through these bibliographic databases more complex and challenging. Relevant sources must be iden- tified and navigation protocols must be learned before appropriate information and contents can be found. Furthermore, having located a citation, students still have to search the library online catalog to find out if the jour- nal or book is available in the library and, if not, know how to make an interlibrary loan request either on paper or electronically.' Anxiety levels can be high and patience levels can be low at varying times of conducting library research. 4 . That students experience various levels of apprehen- sion when using academic libraries is not a new phenom- enon. Indeed, the phenomenon is prevalent among college students in the United States and many other countries, and is widely known as library anxiety. Mellon first coined the term in her study in which she noted that 75 percent to 85 percent of undergraduate students described their initial library experiences in terms of anx- iety.5 According to Mellon, feelings of anxiety stem from either the relative size of the library; a lack of knowledge about the location of materials, equipment, and resources of the library; how to initiate library research; or how to proceed with a library search. 6 Library anxiety is an unpleasant feeling or emotional state with physiological and behavioral concomitants that come to the fore in li_brary settings. Typically, library-anxious students expe- rience negative emotions, including ruminations, tension, fear, and mental disorganization, which prevent them from using the library effectively. 7 A student who experi- ences library anxiety usually undergoes either emotional ~r physical discomfort when faced with any library or library-related task. 8 Library anxiety may arise from a lack of self-confidence in conducting research, lack of prior exposure to academic libraries, the inability to see the relevance of libraries to one's field of interest, and lack of familiarity with library equipment and technologies. Library anxiety is often accorded special attention because of its debilitating effects on students' academic achievement.9 Although many students continue to experience high levels of library anxiety, it is likely that the new technolo- gies and electronic databases in libraries have led to stu- dents experiencing other forms of negative affective states. In particular, it is likely that library anxiety experi- enced by students is, in part, a function of their attitudes toward computers. Consistent with this assertion, Mizrachi and Shoham and Mizrachi reported a statisti- cally significant relationship between library anxiety and computer attitudes. 10 They noted in their research that home and work usage of computers, computer games, word processors, computer spreadsheets, and the Internet are all related to the dimensions of library anxi- ety found among Israeli students to varying degrees. 138 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Similarly, Jerabek, Meyer, and Kordinak found levels of computer anxiety to be related to levels of library anxiety for both men and women. 11 These studies focused exclu- sively on undergraduate students. However, no study has examined this relationship among graduate students, a population that uses the academic library more than any other student population. Over the past fifteen years, a large body of research lit- erature on computer attitudes has been generated. In par- ticular, many researchers have studied the relationship between computer attitudes and computer use. 12 The importance of beliefs and attitudes towards computers and technologies is widely acknowledged. 13 Students' computer attitudes arguably impact their willingness to engage in computer-related activities in colleges and uni- versities where effectively using library electronic resources represents an increasingly important part of col- lege education. Negative computer attitudes may inhibit students' interests in learning to use the library resources and thereby weaken their academic performance levels, while at the same time elevating levels of library anxiety. Mclnerney, Mclnerney, and Sinclair observed that nega- tive perceptions about computers among student teach- ers may accompany feelings of anxiety, including worries about being embarrassed, looking foolish, and even damaging the computer equipment. 14 Further, there is often a negative relationship between prior experience with computers and computer anxiety experienced by individuals. 15 Until recently, library anxiety has only been inter- preted in the context of the library setting-that is, a phe- nomenon that occurs while students are undertaking library tasks. Jiao, Onwuegbuzie, and Lichtenstein defined library anxiety as "an uncomfortable feeling or emotional disposition, experienced in a library setting, which has cognitive, affective, physiological, and behav- ioral ramifications." 16 At the same time, unprecedented technological advancement has had a profound impact on the products and services offered by academic libraries. Students now are able to conduct sophisticated library searches from the comfort of their homes. It is clear that the construct of library anxiety needs to be expanded in the new library and information environ- ment, incorporating into its definition other variables that are relevant for the changing library and information con- text. Because many library users spend a significant por- tion of their time using computer-based technologies to conduct information searches, it is natural to ask, to what extent does library anxiety stem from students' prior atti- tudes and experiences with computers and library tech- nologies? However, with the exception of the studies conducted by Mizrachi and Shoham and Mizrachi on Israeli undergraduate students, this link has not been examined. 17 Thus, the present study investigated the rela- tionship between computer attitudes and library anxiety in the rapidly changing library and information environ- ment. As such, the current inquiry replicated the works of Mizrachi, Shoham and Mizrachi, and Jerabek, Meyer, and Kordinak by examining the degree to which computer attitudes predict levels of library anxiety among graduate students in the United States. 18 It was expected that find- ings from this study would help to increase the under- standing of the construct of library anxiety. Indeed, research in this area has become critical in higher educa- tion where educators are responsible for graduating stu- dents with the skills necessary to thrive and to lead in a rapidly changing technological environment in the twenty-first century. I Method Participants Participants were ninety-four African American graduate students enrolled in the College of Education at a histori- cally Black college and university in the eastern U.S. All participants were solicited in either a statistics or a meas- urement course at the time that the investigation took place. In order to participate in the study, students were required to sign an informed-consent document that was given during the first class session of the semester. The majority of the participants were female. Ages of the par- ticipants ranged from twenty-two to sixty-two years (Mean = 30.40, SD = 8.75). Instruments and Procedure All participants were administered two scales, namely, the Computer Attitude Scale (CAS) and the Library Anxiety Scale (LAS). The CAS, developed by Loyd and Gressard, contains forty Likert-type items that assess individuals' attitudes toward computers and the use of computers. 19 This instrument consists of the following four scales, which can be used separately: (1) anxiety or fear of computers; (2) confidence in the ability to use com- puters; (3) liking or enjoying working with computers; and (4) computer usefulness. Loyd and Gressard reported coefficient alpha reliability coefficients of .86, .91, .91, and .95 for scores pertaining to computer anxiety, computer confidence, computer liking, and total scales, respec- tively. For the present study, the score reliabilities were as follows: • computer anxiety, .84 (95 percent confidence interval CI = .79, .88); • computer confidence, .81 (95 percent CI = .75, .86); • computer liking, .89 (95 percent CI= .85, .92); and • computer usefulness, .76 (95 percent CI = .68, .83). THE IMPACT OF INFORMATION TECHNOLOGY ON LIBRARY ANXIETY I JIAO AND ONWUEGBUZIE 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The LAS, developed by Bostick, contains forty-three 5-point Likert-format items that assess levels of library anxiety experienced by college students. 20 It also contains the following five subscales: 1. barriers with staff; 2. affective barriers; 3. comfort with the library; 4. knowledge of the library; and 5. mechanical barriers. A high score on any subscale represents high levels of anxiety in that area. Jiao and Onwuegbuzie, in their exami- nation of the score reliability reported on LAS in the extant literature, found that it has typically been in the adequate to high range for the subscale and total-scale scores. 21 Based on their analysis, Onwuegbuzie, Jiao, and Bostick concluded that "not only does the [LAS] produce scores that yield extremely reliable estimates, but also these estimates are remarkably consistent across samples with different cul- tures, nationalities, ages, years of study, gender composi- tion, educational majors, and so forth." 22 For the current investigation, the subscales generated scores for the com- bined sample that had a classical theory alpha reliability coefficient of .89 (95 percent CI = .85, .92) for barriers with staff, .84 (95 percent CI = .79, .88) for affective barriers, .53 (95 percent CI= .37, .66) for comfort with the library, .62 (95 percent CI= .48 .73) for knowledge of the library, and .70 (95 percent CI= .58, .79) for mechanical barriers. Analysis A canonical correlation analysis was conducted to iden- tify a combination of library anxiety dimensions (barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers) that might be simultaneously related to a combination of com- puter-attitude dimensions (computer anxiety, computer liking, computer confidence, and computer usefulness). Canonical correlation analysis is used to examine the rela- tionship between two sets of variables whereby each set contains more than one variable. 23 In the present investi- gation, the five dimensions of library anxiety were treated as the dependent multivariate set of variables, and the four dimensions of computer attitudes formed the inde- pendent multivariate profile. The number of canonical functions (factors) that can be produced for a given dataset is equal to the number of variables in the smaller of the two variable sets. Because the library-anxiety set contained five dimensions and the computer-attitude set contained four variables, four canonical functions were generated. For any significant canonical coefficient, the standard- ized canonical-function coefficients and structure coeffi- cients were then interpreted. Standardized canonical- function coefficients are computed weights that are applied to each variable in a given set in order to obtain the composite variate used in the canonical correlation analysis. As such, standardized canonical-function coeffi- cients are equivalent to factor-pattern coefficients in fac- tor analysis or to beta coefficients in a regression analysis." Conversely, structure coefficients represent the correlations between a given variable and the scores on the canonical composite (latent variable) in the set to which the variable belongs.2 5 Thus, structure coefficients indicate the degree to which each variable is related to the canonical composite for the variable set. Indeed, structure coefficients are essentially bivariate correlation coeffi- cients that range in value between -1.0 and + 1.0 inclu- sive." The square of the structure coefficient yields the proportion of variance that the original variable shares linearly with the canonical variate. I Results Table 1 presents the intercorrelations among the five dimensions of library anxiety and the four dimensions of computer attitude. Of particular interest were the twenty correlations between the library-anxiety subscale scores and the computer-attitude subscale scores. It can be seen that, after applying the Bonferroni adjustment, four of these relationships were statistically significant. Specifically, computer liking was statistically significantly related to affective barriers, knowledge of the library, and comfort with the library. Using Cohen's criteria of .1, .3, and .5 for small, medium, and large relationships, respec- tively, the first two relationships (involving affective bar- riers and knowledge of the library) were medium, and the third relationship (between computer liking and com- fort with the library) was large. 27 In addition to these three relationships, the association between computer useful- ness and knowledge of the library also was statistically significant, with a medium effect size. The correlation matrix in table 1 was used to examine the multivariate relationship between library anxiety and computer attitudes. This relationship was assessed via a canonical correlation analysis. The canonical analysis revealed that the four canonical correlations combined were statistically significant (p < .0001). Also, when the first canonical root was removed, the remaining three canonical roots were not statistically significant. In fact, removal of subsequent canonical roots did not lead to statistical significance. Together, these results suggested that only the first canonical function was statistically sig- nificant, but the remaining three roots were not statisti- cally significant. This first canonical root also was practically significant (Rc1 = .63), contributing 40.8 per- cent (Rc12) to the shared variance, which represents a large effect size. 28 140 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Data pertaining to the first canonical root are pre- sented in table 2, which provides both standardized func- tion coefficients and structure coefficients. Using a cutoff correlation of 0.3, the standardized canonical-function coefficients revealed that affective barriers, comfort with the library, and knowledge of the library made important contributions to the library-anxiety set, with affective bar- riers and comfort with the library making similarly large contributions. 29 With regard to the computer-attitude set, computer anxiety, computer liking, and computer confi- dence made noteworthy contributions, with the latter two dimensions making the most noteworthy contributions. The structure coefficients revealed that all five dimen- sions of library anxiety made important contributions to the first canonical variate. The square of the structure coefficient indicated that barriers with staff, affective bar- riers, comfort with the library, and knowledge of the library made similarly large contributions, explaining 67.2 percent, 72.3 percent, 72.3 percent, and 60.8 percent of the variance, respectively. With regard to the computer- attitude set, computer liking and computer usefulness made important contributions. These variables explained 64.0 percent and 16.8 percent of the variance, respectively. Comparing the standardized and structure coeffi- cients indicated that computer anxiety and computer con- fidence served as suppressor variables because the standardized coefficients associated with these variables were large, whereas the corresponding structure coeffi- cients were relatively small. 30 Suppressor variables are variables that assist in the prediction of dependent vari- ables due to their correlation with other independent variables. 31 Thus, the inclusion of computer anxiety and computer confidence in the canonical correlation model strengthened the multivariate relationship between library anxiety and computer attitudes. I Discussion The purpose of this study was to investigate the rela- tionship between computer attitudes and library anxi- ety among African American graduate students. Specifically, the multivariate link between these two constructs was examined. A canonical correlation analysis revealed a strong multivariate relationship between library anxiety and computer attitudes. The library-anxiety subscale scores and computer-attitudes subscale scores shared 40.82 percent of the common variance. Specifically, computer liking and computer usefulness were related simultaneously to the following five dimensions of library anxiety: barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers. Computer anxi- ety and computer confidence served as suppressor vari- ables. Thus, computer attitudes predict levels of library anxiety. As such, the present findings are consistent with those of Mizrachi and Shoham and Mizrachi, who found a sta- tistically significant relationship between computer atti- tudes and the following seven dimensions of the Hebrew Library-Anxiety Scale, a modified version of LAS devel- oped by the authors for their Israeli sample: 1. Staff 2. Knowledge 3. Language 4. Physical Comfort 5. Library Computer Comfort 6. Library Policies and Hours, and 7. Resources. 32 According to its authors, the Staff factor refers to stu- dents' attitudes towards librarians and library staff and their perceived accessibility. The Knowledge factor per- tains to how students rate their own library expertise. The Language factor relates the extent to which using English- language searches and materials yield discomfort. Physical Comfort evaluates how much the physical facil- ity negatively affects students' satisfaction and comfort with the library. Library Computer Comfort assesses the perceived trustworthiness of library computer facilities and the quality of directions for using them. Library Policies and Hours concerns students' attitudes toward library rules, regulations, and hours of operation. Finally, Resources refers to the perceived availability of the desired material in the library collection. The correlations between the dimensions of library anxiety and computer attitudes ranged from .11 (physical comfort) to .47 (knowledge). The current results also replicate those of Jerabek, Meyer, and Kordinak, who found levels of com- puter anxiety to be related to levels of library anxiety for both men and women. 33 Nevertheless, caution should be exercised in generaliz- ing the current findings to all graduate students. Though the present study examined the association between library anxiety and computer attitudes among African American graduate students, it should not be assumed that this relationship would hold for other racial groups. Jiao, Onwuegbuzie, and Bostick found that African American students attending a research-intensive institution reported statistically significantly lower levels of library anxiety associated with barriers with staff, affective barri- ers, and comfort with the library than did Caucasian American graduate students enrolled at a doctoral-grant- ing institution, with effect sizes ranging from moderate to large. 34 In a follow-up study, Jiao and Onwuegbuzie com- pared African American and Caucasian American students with respect to library anxiety, controlling for educational background by selecting both racial groups from the same institution. 35 No statistically significant racial differences THE IMPACT OF INFORMATION TECHNOLOGY ON LIBRARY ANXIETY I JIAO AND ONWUEGBUZIE 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 1. lntercorre lations among the Library-Anxiety Subscales and Computer-Att itude Subsca les Subscale 2 3 4 5 6 7 8 9 1 . Barriers with Staff .64 * .63* .49* .46* - .02 .05 -.27 -.09 * .52 * * -.05 .02 -.37 * -.23 2. Affective Barriers .56 .40 3. Comfort with the Library .56 * .44 * -.19 -.20 -.55 * -.16 _39*' -.21 -.11 -.37 * -.32 * 4. Knowledge of the Library 5. Mechanical Barriers -.13 -.01 -.18 .04 .77 * .48 * .46 * 6. Computer Anxiety .67 * .36 * 7. Computer Confidence .43 * 8. Computer Liking 9. Computer Usefulness *Indicates a statistically significant relationsh ip after the Bonferroni adjustment. Table 2. Canon ical Solution for Th ird Function-Re lationship between Library-Anx iety Subscales and Computer-Att itude Subsca les Theme Standardization Coefficient Library-Anxiety Subscale Barriers with Staff .17 Affect ive Barriers .40* Comfort with the Library .39* Know ledge of the Library .31 * Mechanical Barr iers -.12 Computer-Attitude Subscale Computer Anxiety -0.31* Computer Confidence 0.98* Computer Liking -1.25* Computer Usefulness -0 .13 *Loadings with the effect sizes larger than .3. were found in library anxiety for any of the five dimen- sions of LAS. However, across all five library-anxiety measures, the African American sample reported lower scores than did the Caucasian American sample. In fact, using the test of trend by Onwuegbuzie and Levin, they found that the consistency with which the African American graduate students had lower levels of library anxiety than did the Caucasian American students was both statistically and practically significant. 36 Thus, Jiao and Onwuegbuzie's results, alongside those of Jiao, Onwuegbuzie, and Bostick, suggest that racial differences in library anxiety prevail. 37 Thus, future research should investigate whether the relationship between library anxi- Structure Coefficient .82* .85* .85* .78* .39* -.22 - .13 -.80* -.41 * Structure•(%) 67.2 72 .3 72.3 60.8 15.2 4.8 1.7 64.0 16.8 ety and computer attitudes found in the present study among African American graduate students also exists among Caucasian American graduate students, as well as among other racial groups. Further, the causal direction of the relationship found in the current study should be investigated. That is, future studies should investigate whether library anxiety places a person more at risk for experiencing poor com- puter attitudes, or whether the converse is true. More research also is needed to determine how computer atti- tudes might play a role in the library context. Notwithstanding, it appears that the construct of library anxiety can be expanded to include the construct 142 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of computer attitudes. Indeed, one implication of the findings is that Bostick's LAS should be modified to include dimensions of computer attitudes. 38 Such a modi- fication likely would facilitate the identification of library-anxious students. By identifying students with high levels of library anxiety and poor computer atti- tudes, library educators and others could help them improve their dispositions and provide them with the skills necessary to negotiate the rapidly changing techno- logical environment, thereby putting them in a better position to be lifelong learners. References 1. Susan M. Piotrowski, Computer Training: Pathway from Extinction (ERIC Document Reproduction Service, ED 348955, 1992). 2. Thomas H. Hogan, "Drexel University Moves Aggres- sively from Print to Electronic Access for Journals (Interview with Carol Hansen Montgomery, Dean of Libraries)," Computers in Libraries 21, no. 5 (May 2001): 22-27. 3. M. Claire Stewart and H. Frank Cervone, "Building a New [nfrastructure for Digital Media: Northwestern University Library," Information Technology and Libraries 22, no. 2 (June 2003): 69-74. 4. Carol C. Kuhlthau, "Longitudinal Case Studies of the Infor- mation Search Process of Users in Libraries," Library and Informa- tion Science Research 10 (July 1988): 257-304; Carol C. Kuhlthau, "Inside the Search Process: Information Seeking from the User's Perspective," Journal of the American Society for Information Science 42, no. 5 (June 1991): 361-71; Carol C. Kuhlthau, Seeking Meaning: A Process Approach to Library and Information Services (Norwood, N.J.: Ablex, 1993); Carol C. Kuhlthau, "Students and the Informa- tion Search Process: Zones of Intervention for Librarians," Advances in Librarianship 18 (1994): 57-72; Carol C. Kuhlthau et al., "Validating a Model of the Search Process: A Comparison of Aca- demic, Public, and School Library Users," Library and Information Science Research 12, no. 1 (Jan.-Mar. 1990): 5-31. 5. Constance A. Mellon, "Library Anxiety: A Grounded The- ory and Its Development," College & Research Libraries 47, no. 2 (Mar. 1986): 160-65. 6. Ibid. 7. Qun G. Jiao, Anthony J. Onwuegbuzie, and Art Lichten- stein, "Library Anxiety: Characteristics of' At-Risk' College Stu- dents," Library and Information Science Research 18 (spring 1996): 151-63. 8. Constance A. Mellon, "Attitudes: The Forgotten Dimen- sion in Library Instruction," Library Journal 113 (Sept. 1, 1988): 137-39; Constance A. Mellon, "Library Anxiety and the Non- Traditional Student," in Reaching and Teaching Diverse Library User Groups, ed. Teresa B. Mensching (Ann Arbor, Mich.: Pierian, 1989), 77-81; Anthony J. Onwuegbuzie, "Writing a Research Pro- posal: The Role of Library Anxiety, Statistics Anxiety, and Com- position Anxiety," Library and Information Science Research 19, no. 1 (1997): 5-33. 9. Anthony J. Onwuegbuzie and Qun G. Jiao, "Information Search Performance and Research Achievement: An Empirical Test of the Anxiety-Expectation Model of Library Anxiety," four- nal of the American Society for Information Science and Technology (JASIST) 55, no. 1 (2004): 41-54; Anthony J. Onwuegbuzie, Qun G. Jiao, and Sharon L. Bostick, Library Anxiety: Theory, Research, and Applications (Lanham, Md.: Scarecrow, 2004). 10. Diane Mizrachi, "Library Anxiety and Computer Atti- tudes among Israeli B.Ed. Students" (master's thesis, Bar-Ilan University, Israel, 2000); Snunith Shoham and Diane Mizrachi, "Library Anxiety among Undergraduates: A Study of Israeli B.Ed. Students," Journal of Academic Librarianship 27, no. 4 (July 2001): 305-11. 11. Ann J. Jerabek, Linda S. Meyer, and Thomas S. Kordinak, "'Library Anxiety' and 'Computer Anxiety': Measures, Validity, and Research Implications," Library and Information Science Research 23, no. 3 (2001): 277-89. 12. Muhamad A. Al-Khaldi and Ibrahim M. Al-Jabri, "The Relationship of Attitudes to Computer Utilization: New Evi- dence from a Developing Nation," Computers in Human Behavior 9, no. 1 (Jan. 1998): 23-42; Margaret Cox, Valeria Rhodes, and Jennifer Hall, "The Use of Computer-Assisted Learning in Pri- mary Schools: Some Factors Affecting Uptake," Computers in Education 12, no. 1 (1988), 173-78; Gayle V. Davidson and Scott D. Ritchie, "Attitudes toward Integrating Computers into the Class- room: What Parents, Teachers, and Students Report," Journal of Com- puting in Childhood Education 5, no. 1 (1994): 3-27; Donald G. Gardner, Richard L. Dukes, and Richard Discenza, "Computer Use, Self-Confidence, and Attitudes: A Causal Analysis," Com- puters in Human Behavior 9, no. 4 (winter 1993): 427-40; Robin H. Kay, "Predicting Student Teacher Commitment to the Use of Computers," Journal of Educational Computing Research 6, no. 3 (1990): 299-309. 13. Deborah Bandalos and Jeri Benson, "Testing the Factor Structure Invariance of a Computer Attitude Scale over Two Grouping Conditions," Educational and Psychological Measure- ment 50, no. 1 (Spring 1990): 49-60; Frank M. Bernt and Alan C. Bugbee Jr., "Factors Influencing Student Resistance to Computer Administered Testing," Journal of Research on Computing in Education 22, no. 3 (spring 1990): 265-75; Michel Dupagne and Kathy A. Krendl, "Teacher's Attitudes toward Computers: A Review of the Literature," Journal of Research on Computing in Education 24, no. 3 (Spring 1992): 420-29; Elizabeth Mowrer-Popiel, Constance Pollard, and Richard Pollard, "An Analysis of the Perceptions of Preservice Teachers toward Technology and Its Use in the Class- room," Journal of Instructional Psychology 21, no. 2 (June 1994): 131-38; Jennifer D. Shapka and Michel Ferrari, "Computer- Related Attitudes and Actions of Teacher Candidates," Comput- ers in Human Behavior 19, no. 3 (May 2003): 319-34. 14. Valentina Mclnerney, Dennis M. Mclnerney, and Kenneth E. Sinclair, "Student Teachers, Computer Anxiety, and Computer Experience," Journal of Educational Computing Research 11, no. 1 (1994): 27-50. 15. Susan E. Jennings and Anthony J. Onwuegbuzie, "Com- puter Attitudes as a Function of Age, Gender, Math Attitude, and Developmental Status," Journal of Educational Computing Research 25, no. 4 (2001): 367-84. 16. Jiao, Onwuegbuzie, and Lichtenstein, "Library Anxiety," 152. 17. Mizrachi, "Library Anxiety and Computer Attitudes"; Shoham and Mizrachi, "Library Anxiety among Undergraduates." 18. Mizrachi, "Library Anxiety and Computer Attitudes"; Shoham and Mizrachi, "Library Anxiety among Undergraduates"; THE IMPACT OF INFORMATION TECHNOLOGY ON LIBRARY ANXIETY I JIAO AND ONWUEGBUZIE 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Jerabek, Meyer, and Kordinak, '"Library Anxiety' and 'Com- puter Anxiety."' 19. Brenda H. Loyd and Clarice Gressard, "The Effects of Sex, Age, and Computer Experience on Computer Attitudes" AEDS Journal 18, no. 2 (1984): 67-77. 20. Sharon L. Bostick, "The Development and Validation of the Library Anxiety Scale" (Ph.D. diss, Wayne State University, 1992). 21. Qun G. Jiao and Anthony J. Onwuegbuzie, "Reliability Generalization of the Library Anxiety Scale Scores: Initial Find- ings/' (unpublished manuscript, 2002). 22. Onwuegbuzie, Jiao, and Bostick, Library Anxiety, 22. 23. Norman Cliff and David J. Krus, "Interpretation of Canonical Analyses: Rotated versus Unrotated Solutions," Psy- chometrica 41, no. 1 (Mar. 1976): 35-42; Richard B. Darlington, Sharon L. Weinberg, and Herbert J. Walberg, "Canonical Variate Analysis and Related Techniques," Review of Educational Research 42, no. 4 (fall 1973): 131-43; Bruce Thompson, "Canonical Corre- lation: Recent Extensions for Modeling Educational Processes" (paper presented at the annual meeting of the American Educa- tional Research Association, Boston, Mass., Apr. 7-11, 1980) (ERIC, ED 199269); Bruce Thompson, Canonical Correlation Analysis: Uses and Interpretations (Newbury Park, Calif.: Sage, 1984); Bruce Thompson, "Canonical Correlation Analysis: An Explanation with Comments on Correct Practice" (paper pre- sented at the annual meeting of the American Educational Research Association, New Orleans, La., Apr. 5-9, 1988) (ERIC, ED 295957); Bruce Thompson, "Variable Importance in Multiple Regression and Canonical Correlation" (paper presented at the annual meeting of the American Educational Research Associa- tion, Boston, Mass., April 16-20, 1990) (ERIC, ED 317615). 24. Margery E. Arnold, "The Relationship of Canonical Corre- lation Analysis to Other Parametric Methods" (paper presented at the annual meeting of the Southwest Educational Research Association, New Orleans, La., Jan. 1996) (ERIC, ED 395994). 25. Thompson, "Canonical Correlation: Recent Extensions." 26. Ibid. 27. Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences (New York: Wiley, 1988). 28. Ibid. 29. Zarrel V. Lambert and Richard M. Durand, "Some Pre- cautions in Using Canonical Analysis," Journal of Marketing Research 12, no. 4 (Nov. 1975): 468-75. 30. Anthony J. Onwuegbuzie and Larry G. Daniel, "Typology of Analytical and Interpretational Errors in Quantitative and Qualitative Educational Research," Current Issues in Education 6, no. 2 (Feb. 2003). Accessed Nov. 13, 2003,http://cie.ed.asu.edu/ volume6/number2/. 31. Barbara G. Tabachnick and Linda S. Fidell, Using Multi- variate Statistics, 3rd ed. (New York: Harper), 1996. 32. Mizrachi, "Library Anxiety and Computer Attitudes"; Shoham and Mizrachi, "Library Anxiety among Undergraduates." 33. Jerabek, Meyer, and Kordinak, '"Library Anxiety' and 'Computer Anxiety."' 34. Qun G. Jiao, Anthony J. Onwuegbuzie, and Sharon L. Bostick, "Racial Differences in Library Anxiety among Graduate Students," Library Review 53, no. 4 (2004): 228-35. 35. Qun G. Jiao and Anthony J. Onwuegbuzie, "Library Anx- iety: A Function of Race?" (unpublished manuscript, 2003). 36. Anthony J. Onwuegbuzie and Joel R. Levin, "A Proposed Three-Step Method for Assessing the Statistical and Practical Significance of Multiple Hypothesis Tests" (paper presented at the annual meeting of the American Educational Research Asso- ciation, San Diego, Calif., Apr. 12-16, 2004). 37. Jiao, Onwuegbuzie, and Bostick, "Racial Differences in Library Anxiety." 38. Bostick, "The Development and Validation of the Library Anxiety Scale." 144 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004
9656 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Beyond Information Architecture: A Systems Integration Approach to Web-site Design Maloney, Krisellen;Bracke, Paul J Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 145 Beyond Information Architecture: A Systems Integration Approach to Web-site Design Krisellen Maloney and Paul J. Bracke Users' needs and expectations regarding access to infor- mation have fundamentally changed, creating a discon- nect between how users expect to use a library Web site and how the site was designed. At the same time, library technical infrastructures include legacy systems that were not designed for the Web environment. The authors propose a framework that combines elements of informa- tion architecture with approaches to incremental system design and implementation. The framework allows for the development of a Web site that is responsive to changing user needs, while recognizing the need for libraries to adopt a cost-effective approach to implementation and maintenance. T he Web has become the primary mode of informa- tion seeking and access for users of academic libraries. The rapid acceptance of Web technologies is due, in part, to the ubiquity of the Web browser, which presents a user interface that is recognized and under- stood by a broad range of users. As libraries increase the amount of content and broaden the range of services available through their Web sites, it is becoming evident that it will take more than a well-designed user interface to completely support users' information-seeking and access needs. The underlying technical infrastructure of the Web site must also be organized to logically support the users' tasks. Library technical infrastructures, largely designed to support traditional library processes, are being adapted to provide Web access. As part of this adaptation process, they are not necessarily being reor- ganized to meet the changing expectations of Web-savvy users, particularly younger users who are not familiar with traditional library organization methods such as the card catalog, print indexes, or other legacy tools. Libraries must harness the power of the highly struc- tured information systems that have long been a part of libraries and integrate these systems in new ways to support users' goals and objectives. Part of this chal- lenge will be answered by the development of new sys- tems and technical standards, but these are only a partial solution to the problem. An important part of making library systems and Web sites function as powerful dis- covery tools is to modernize the systems that provide existing services and content to support the changing needs and expectations of the user. Emerging concepts of information architecture (IA) describe the system requirements from the user perspective but do not pro- vide a mechanism to conceptually integrate existing functions and content, or to inform the requirements necessary to modernize and integrate the current system architecture. The authors propose a framework for approaching a comprehensive Web-site implementation that combines components of IA and system modernization that have been successful in other industries. Within this frame- work, those components are tailored for the unique aspects of information provision that characterize a library. The proposed framework expands the concept of IA to include functional and content requirements for the Web site. This expansion identifies points within the con- ceptual and physical design where user requirements are constrained by the existing infrastructure. Identification of these constraints begins an iterative design process in which some user requirements inform changes to the underlying system architecture. Conversely, when the required changes to the underlying system architecture cannot be achieved, the constraints inform the conceptual design of the Web site. The iterative nature of this approach acknowledges the usefulness of much of the existing infrastructure but provides an incremental approach to modernizing installed systems. This frame- work describes aspects of the conceptual and physical- design elements that must be considered together and balanced to produce a Web site that supports the goals and objectives of the user but is cost-effective and practi- cal to implement. I Information Architecture and the Problem of Libraries IA is both a characteristic of a Web site and an emerging discipline. A number of authors have attempted to develop a formal definition of IA. Wodtke presents a sim- ple task-based definition, stating that an information architect "creates a blueprint for how to organize the Web site so that it will meet all (business, end user) these needs." 1 Rosenfeld and Marville present a four-part defi- nition in which two parts focus on the practice, and two parts define IA as characteristic. The first characteristic defines IA as a combination of "organization, labeling, and navigation schemes" while the second describes it as "the structural design of an information space to facilitate task description and intuitive access to content." 2 There is general agreement that IA provides a specification of the Web site from the perspective of the user. The specification usually describes the organization, navigational elements, Krisellen Maloney (maloneyk@u.library.arizona.edu) is Director of Technology at the University of Arizona Libraries, Tucson. Paul J. Bracke (paul@ahsl.arizona.edu) is Head of Systems and Net- working at the Arizona Health Sciences Library, Tucson. BEYOND INFORMATION ARCHITECTURE I MALONEY AND BRACKE 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and labeling required to completely structure a user's Web-site experience. IA is not synonymous with Web-site design, but rather provides the conceptual foundation upon which a presentation design is based. Web-site design adds presentation and graphical elements to IA to create the user experience. Library Web sites provide a display platform by which library content and services can be accessed through a common user interface. Most of the tools and services have been available for decades and, in response to user demand, are increasingly being made Web-acces- sible in digital formats (virtual reference, full-text data- bases). Despite this new access medium and format, the conceptual design of the underlying systems has not changed much. The library technical infrastructure is made up of many loosely coupled systems optimized to perform a single function or to support the work of a library department. Library Web sites do not present a sufficiently unified interface design or level of technical integration to match current users' mental models of information seeking and access. 3 The systems have not been integrated to support users' overarching goals or meet the expectation of seamless access that they have developed when using other Web sites (such as Google or Amazon). In many cases, users are still expected to understand aspects of the library that are now obsolete (card catalogs) in order to navigate the library's Web site. For example, the process of finding a journal article using a typical library Web site is based on a print para- digm and has changed little despite the advent of online discovery tools. In a print environment, users first looked at an index to identify an article of interest, then wrote down the citation, went to the card catalog, and there looked up the journal containing the article. If the library owned the journal, the user would then write down the call number and go to the shelves to find the article. This process has not necessarily changed much for many libraries, even though indexes, card catalogs, and journals are often available online. Even more confusing is that the end result of some search processes within a library Web site is not necessarily content, but a metadata representa- tion of content that must be entered into another search box. Although the first search is representative of the search of a traditional index and the second search is rep- resentative of the search of the card catalog, many of our users have no mental model for this multistep search process. Users accustomed to the simple keyword search available through Internet search engines may have great difficulty in understanding the need for the many steps involved in library use. There is an expectation that search systems and online content will be linked, regard- less of the economic, legal, and technical factors that make these links difficult. While linking-options in ven- dor databases and OpenURL resolvers have begun to simplify the electronic version of the process by automat- ing some of the steps, the multistep process is still valid in many instances in most libraries. It is clear that library Web sites must undergo a fun- damental change in order to be responsive to the needs of the user. Because library Web sites appear to be similar to conventional Web sites, it is tempting to adopt a general approach to IA to address users' needs. There are, how- ever, several areas in which the general approach to IA does not adequately support the design needs for library Web sites. Generalized IA approaches, such as those provided by Rosenfeld and Marville, do not provide adequate guid- ance regarding the organization and display of content from external sources. There is an unstated assumption that external sources will provide information in the for- mat specified by the Web-site architect. IA approaches suggest methods to completely describe the user experi- ence, from the time a user first accesses a site to the point at which a user task is complete, regardless of the origin of the content or service accessed. For example, the con- tent from each of Amazon.com' s commercial partners is packaged to operate like a part of the Amazon.com site. In contrast, libraries often only have control of a user's experience up to the point at which they leave a library's servers. Libraries guide users not only to local services and digitized collections, but to databases, journals, and more that are licensed from external sources and the appearances of which are controlled by external sources. Even when using a technical standard such as Z39.50 to provide a local look and feel to remote resources, libraries do not necessarily have full control over the data format or elements of the content that is returned. This lack of local control over content is a limitation to libraries adopt- ing common definitions of IA. Another design area that is not well supported by generalized approaches to IA is the integration of previ- ously installed systems, such as library catalogs. These legacy systems provide important services that represent decades of development and collaboration, and are essen- tial to the future of libraries. For example, libraries pro- vide access to unique resources and systems ranging from online catalogs to abstracting and indexing databases to interlibrary loan (ILL) networks. Libraries are using Web technologies to provide new access methods to library content and services. These technologies provide a thin veneer on systems that function in a manner unfamiliar to many users. The challenge then becomes to change what lies beneath the surface, the underlying functionality of the site, to support the needs of the user. Using a general- ized approach to IA, as applied in other settings, libraries would assess the needs of the user and develop a new, complete system that supports those needs. Such an approach ignores the extensive, existing infrastructure of legacy systems in libraries that is still useful and that serves purposes beyond the user's Web interface. What is 146 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. needed is a standard reference model for library services that provides a framework for access to services and con- tent. This is a long-term goal that requires cooperation and agreement among libraries, and that would allow legacy systems to be repackaged in ways that are more flexible, meet changing user needs, and can be integrated into changing technology environments. Because there are cur- rently no such reference models, librarians need to develop other approaches to integrate existing legacy sys- tems into a modernized Web site. I Extending the IA Framework In this paper, the general definition of IA that has been proposed by several authors has been extended to incor- porate the additional constraints that characterize library Web sites.4 Extended Information Architecture (EIA) is the first half of the framework, and provides a complete conceptual design of the Web site from the users' per- spective. Figure 1 depicts the elements and relationships within EIA. The coordinating structure provides an over- arching framework for the integration of the multiple service elements that provide much of the underlying functionality of the Web site. The relationship between the coordinating structure and the service elements is iterative, with service elements constraining the coordi- nating structure and the coordinating structure informing the design of the service elements. The Coordinating Structure The coordinating structure contains many of the design elements that are found in generalized approaches to IA, including the organization, navigational structure, and labeling. These are the elements of a Web site that, in con- cert, define the structure of the user interface without specifying the functionality and content underlying that interface. The framework emphasizes aspects of the gen- eralized approaches that are most relevant to libraries and places them in relation to the service elements that specify the content and functionality of the site. The first element of the coordinating structure is the organization of the Web site. Organization refers to the logical groupings of the content and services that are available to the user. These groupings are not necessarily representative of physical-system implementations, but may be task- or subject-based instead. For example, many academic library Web sites have primary groupings that include information resources, services, and user guides. Although the information resources may include infor- mation from a range of systems (for instance, the catalog, abstracting and indexing databases, full-text databases, locally-developed exhibits), the logical grouping of infor- mation resources unifies the concept for the user. A site's organization scheme will often serve as the foundation for the primary navigational choices on a site's main menu or primary navigational bar. Another component of the coordinating structure is the navigational structure of the site. Navigational struc- tures define the relationships between content and serv- ice elements of a site, and between groupings in the site's organization. These structures also include search tools and other link-management tools that help users locate needed content and services. There are usually two types of relationships that form a navigational structure. First is the definition of a global relationship scheme that out- lines the primary navigational structure of the site. These often define relationships between sections of a site's organization, but may also provide access to key pieces of functionality from any point within a site. In addition to the overarching global relationship scheme, there are often several locally or functionally defined relationship schemes that are used throughout the site. These local relationship schemes are usually located within a service or content grouping and provide logical connections within their defined grouping. Both sets of relationships are designed to support a task and provide pathways for the user to move among the various elements of the site. Other relationship schemes may be topic oriented, allow- ing the user to move easily among similar content sources. These logical relationships are later implemented within a user interface as tools such as menus, navigation bars, and navigation tabs when combined with labels and a visual design. Customization and personalization are navigational structures that have gained a fair amount of attention in the library literature. Both strategies allow a Web site to be displayed differently, based on user characteristics. Customization allows the user to create the relationships most suitable for his or her needs. This strategy has been explored by a number of libraries, although there is little convincing evidence that users implement such strate- gies in an intense or repeated manner. 5 Personalization allows a system designer to bring together a set of pages in a relationship that is meaningful for a user or a user group. Labels, the third element of the coordinating struc- ture, provide signposts that communicate an integrated view of a Web site's design to those who use it. It is important to define a labeling system that consistently and clearly communicates the meaning of the site to the user. Accordingly, the labels should be constructed in the user's language, not the librarian's. For example, a user may not understand that an abstracting and index- ing database will provide them with information regarding journal articles that are relevant to a topic of interest. In that case, the label "Find an article" is more useful than "Indexes." BEYOND INFORMATION ARCHITECTURE I MALONEY AND BRACKE 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Extended Informati on Architecture Coordinating Structure • Organization: The grouping and specification of the funct ion and content that is necessary to support the site. • Navigational Structure: The associations among the service and content elements of the site. These relationships provide the conceptual foundation for navigation and include global and local navigationa l concepts, site index and search, customizab le and personalized structures. • Labeling: A consistent naming scheme that presents options and choices to users in terms that will understand. Serv ice Elements • Functio nal Requirements: The description of the functional elements that are necessary to support the user. • Content Requirements: The description of the content elements that are necessary to support the user. • Content Specifications: The description of the content elements that are already available to support the user. • Functional Specifications: The description of the functional elements that are present in a previously installed system. Figure 1. An Extended Information Architecture for Developing a Conceptual Design of Library Web Sites Labels are used to describe individual service or con- tent units, but may also be used as headings to provide structural elements to augment the navigational scheme. The consistent use of labels as headings within the site not only increases user understanding of the site, but may also be explicitly constructed to support user tasks. An example of labeling to support tasks can be seen on the University Libraries Web site of the University of Louisville where, under the main heading for Articles, the first subheading is Step 1: Search article databases; and the second subhead- ing is Step 2: Search (the catalog) by journal title." Service Elements Service elements are the second major component of Extended Information Architecture, and represent the content and functionality of the Web site. In this frame- work, the service elements serve a dual purpose. The def- inition of service elements involves defining both the ideal requirements for functionality and content as well as the specifications of what is currently available. The definition process can then be used to identify points in the Web site where new functions and content need to be added, or where existing functionality must be modern- ized. These additions and modifications may be achiev- able immediately, but in many cases an incremental plan for change may need to be developed. The service-element requirements, labeled as Functional Requirements and Content Requirements in figure 1, express the users' needs and expectations for the functional or content elements of the Web site. The pur- pose of the requirements definitions is to describe the service elements that are necessary to allow a user to meet his or her goals or objectives in using the site. These requirements are a representation of the ideal composi- tion of a Web site, and inform not only the immediate implementation of the site but also the development of future systems and the modernization of existing sys- tems. It is also important to note that the requirements should be developed to express user needs, not a particu- lar implementation option. For example, it might be tempting to specify the implementation of a particular vendor's OpenURL resolver. This does not, however, describe how the system would function ideally from a user perspective. Instead, an appropriate requirement would be that users should be able to link to full text from all citations in an abstracting and indexing database. More specifically, content requirements describe the content that is necessary to meet the users' goals and objectives. Access to content is often the primary empha- sis of a library Web site, and the content requirements describe the intellectual content that should be accessible through a Web site. Examples of content that might be required are article citations, full-text articles, and multi- media objects. Normally, these requirements will be closely connected with library-wide collection-develop- ment policies and priorities, and should be driven by sub- ject specialists rather than systems personnel. These requirements inform the development of systems to meet the needs of the users. The content specifications describe the content that is available within the current systems. There are many reasons why content requirements and content specifications do not match, including the inabil- ity or choice of a library to acquire a particular piece, the unavailability of specified content, or technical incompat- ibilities between content and the library's infrastructure. Although content is sometimes viewed as the core component of a library Web site, there is also great deal of additional functionality that is provided to users. The functional requirements describe the users' needs and expectations of the functionality in the context of com- pleting tasks on the Web site. For example, ILL forms found on many sites are easy for the user to fill out, although the most effective interface to ILL for the user might not involve a form-based user interface at all. It might be a direct system-to-system interface from an OpenURL Resolver to the ILL software in which all cita- tion data are transmitted for the user. This requirement is not necessarily obvious when considering ILL in isola- tion, but is evident when considering it in the larger con- text of the users' goals and objective for the entire Web site. The functional specifications describe the functions 148 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as they exist in the installed base of systems and expose the functionality that is available to the user. When the specifications do not match the requirements, the users' expectations regarding the system will not be fully achieved. The economic and technical limitations of system implementation and modernization often reduce the speed at which the large base of previously installed sys- tems can be modified to meet users' changing needs and expectations. It is thus critical to identify gaps between existing systems and desired systems and discover areas where a Web site will have characteristics that are not completely aligned with what the user needs or expects. When the service-element requirements do not match the service-element specifications of existing systems, an iter- ative design process begins. This process will be inter- twined with the evaluation of the system architecture. Gaps that can be addressed immediately should be incor- porated into an implementation plan for the new Web site. Longer-term migration or development plans can be developed to fill gaps that cannot be addressed immedi- ately. It is also important to acknowledge that developing and meeting service-element requirements is an iterative process. They will need to be revisited over time as user needs change, and requirements that are met now become the specifications that are evaluated in the future. I Interrelationships within EIA When the service-element requirements cannot be used to modify the service-element specifications, the service ele- ments constrain the design of the Web site and influence the design of the coordinating structure. The upward arrow in figure 1 labeled Constrains indicates that the user experience is constrained by the specifications of content or functional elements that are not currently changeable. In such situations, the coordinating structure must be designed to provide additional context for the user to understand the purpose of the existing service ele- ments. This explanatory role can be seen in the imple- mentation of many Web sites as formal parts of the organizational structure designed to explain the idiosyn- crasies of the Web site to the user. For example, many aca- demic library Web sites have tutorials, FAQs, or sections labeled "How do I . . . ?" that provide tips on using aspects of the site that are not always evident to users. It is necessary to acknowledge the usefulness of the explanatory role of the coordinating structure in the iter- ative and incremental processes of Web-site design. Just as bibliographic instruction and adequate signage have allowed the user to navigate aspects of the traditional library that were not intuitive, the coordinating structure provides the conceptual signposts and other guidance required for users to effectively navigate the Web site. At the same time, it is important to realize that the explana- tory role would not be necessary if the Web site's archi- tecture and design were intuitive to the user. As the design of the service elements changes to accommodate the larger goals of the user, the explanatory function of the coordinating structure will be diminished. The main goal of library Web site design should be to reduce the explanatory role of the coordinating structure and to develop service elements that seamlessly support the goals and objectives of the user. Until all service elements have been modernized to meet the needs of the user, the conceptual design of Web sites will represent a compro- mise between what users require and what it is possible for users to do within the current legacy information infrastructure I System Architecture While the conceptual design of the Web site describes the needs of the user apart from the technical details of the implementation, the system architecture is the descrip- tion of the system as it exists. In the case of library Web sites, the system architecture is not limited to the func- tionality and data on the library's Web server. Instead, it is also inclusive of all core infrastructure, individual sys- tems, and data access and storage mechanisms that pro- vide the blueprint of the Web site's backend as it has been built. The individual systems in the architecture may include locally controlled ones, (for instance, an online catalog), but will also include remote systems such as abstracting and indexing databases mounted by a vendor. A definition of the design of the existing system plays a key role in the evolutionary specification of the system because it provides developers with a greater under- standing of the possibilities and constraints of the existing infrastructure. In describing a system architecture, sev- eral formal representations can be used that capture vari- ous aspects of the system's capabilities at different levels of granularity. These include module views that provide static specifications of individual components; compo- nent and connector views that provide dynamic views of processes; and deployment views that incorporate hard- ware elements.' The selection of representations is beyond the scope of this paper. Typical elements of a system architecture can be seen in figure 2. For the paper, three classes of components are being considered, although more may be introduced if applicable locally. The core-infrastructure components are fundamental services and information that support one or more systems or subsystems. In a typical library environ- ment this includes authentication services, Web platforms, and the network. In some library environments, external BEYOND INFORMATION ARCHITECTURE I MALONEY AND BRACKE 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. units may maintain some or all of these components. For example, many college campuses maintain an authentica- tion infrastructure in the campus computing office. Overall, core infrastructure provides the glue for tying together the many applications that libraries attempt to integrate in their Web sites. The system architecture should include details regarding the standards and interfaces that are used within the library technical infrastructure. Many of the applications in the library environment are off-the-shelf components that have been developed by external vendors. These off-the-shelf components may include the catalog, ILL modules, electronic-course reserves, and virtual-reference systems. Although indi- vidual libraries may have some control over configura- tion options in these applications, they are likely to have little influence over the basic functionality or data formats provided by these systems. Core functionality tends to change based on the demands of many libraries looking for similar functionality. Despite the lack of functional control over these systems, components developed by external vendors may provide standards-based system interfaces to their functionality. These usually take the form of industry-supported standards or vendor-sup- plied application programming interfaces and give libraries some flexibility in working with these compo- nents. Explicit descriptions of the available standard and proprietary interfaces should be included within the sys- tem architecture. Other applications may have been developed within the library and so can be changed more easily. Examples of locally developed applications typically include subject pages, information about the library, and digital Web exhibits and collections. Although local development does provide more control over the appearance and functional- ity of a piece of software, it is not without problems. Local development is often conducted using a bricolage approach, solving specific problems singularly, without giving consideration to the larger networks of systems in which the solutions operate. When such approaches do not take into account larger issues of systems architecture, opportunities to solve a broader range of problems may be missed and subsequent repackaging of these solutions may be limited or impossible. Libraries frequently also have a limited number of programmers, often remedied by pulling librarians or staff from other duties. While this cer- tainly can allow libraries to meet some user needs, the lack of software-engineering skills in libraries may result in local solutions that are inflexible and that do not support standards for data storage or interchange. Because the internal design of these applications is accessible and mod- ifiable, the system architecture should include more exten- sive descriptions of the internal features and relationships that they contain. Although this will not completely allevi- ate the problems of software maintenance, it will provide a better foundation for decisions regarding future migration. System Architecture Applications (off-the-shelf and locally developed) Specification of the access mechanisms and standar ds for previously installed systems including: • Catalog • Interlibrary-Loan • Electronic Reserves • Abstracting and Indexing Databases • Content Management Systems • Legacy Web Content Core Infrastructure • Authentication: The va lidation of a users identity based on creden tials. Increas ingly a part of a campus-wide infrastructure . • Web Platforms : Operating systems, server software and application software the provide the general foundation for the Website. • Network: The communication infrastructur e within the library system and connect ing to the Internet. Information Storage and Access • Storage: The definition of storage structures including relational or hierarchical schema. Character format specifications. • Standards: Standards available for access to the data. These include formats like MARC, Dublin Core and mechani sms like 239 .50 and ODBC . Figure 2. Eleme nts of a System Archit ec tur e Finally, typical library architectures consist of links to resources that are licensed or organized on behalf of the user. These include abstracting and indexing databases, full-text content provided by publishers outside of the library, and general vetted Internet sites. Linking the user to the system usually provides access to these systems, and libraries have no control over the technical imple- mentations of such resources. Newer federated search technologies are integrating into the library infrastructure the users' access to the site and to results from the sites, and linking tools make the interrelationships between these systems more easily understood. Nevertheless, inte- grating these resources into a Web site in a manner that makes sense to library users is a challenge. The access mechanisms and information formats required to com- municate with the site should be clearly documented within this system architecture. I Interrelationship of the Information and System Architectures Reacting to the rapid pace of change can result in an ad- hoc or haphazard approach to Web-site design. The sec- tions above describe a systematic approach to include and evaluate changes to the Web site. In order to imple- 150 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ment the changes and create a Web site that is scalable and made of reusable components, it is necessary to eval- uate, plan, and document all changes to the system . Figure 3 graphically depicts the interrelationship between EIA and system architecture. User needs, as described by IA, should inform the development of technical infra- structure. The Informs arrow indicating that EIA informs the design and development of th e system architecture depict s this interrelationship. The Constrains arrow des- ignates the reality that some aspects of the existing infra- structure cannot be changed within this planning cycle and will limit the library 's ability to immediately change the underlying content and function of the Web site. When mapping the conceptual d esign to the physical design, there will be gaps that represent functionality that cannot be supported, either fully or in part, by the current system architecture and thus constrain the full imple- mentation of the conceptual design . If IA is then to be implemented as fully as possible, these gaps identify the modification s and additions that must be carefully evaluated, designed , and implemented within the underlying system architecture. Gaps can be addressed in a variety of ways. If there is a total gap in functionality, a system can be deve loped or implemented to provide the desired functionality as part of the larg er system architecture. This may result in a complete devel- opment project or in the specification of an off-the-sh elf application to meet the newly identified demand. In the case where an existing system has some of the required functionality but is not completely suitable for the users ' goals and objectiv es, an incremental approa ch of modernization can be adopted . Modernization sur- rounds "the legacy system with a so ftware layer that hid es the unwanted complexity of the old system and exports a modern interface ."" This is done to provide integration with a modern operating environment while retaining the data and exposing the functions of the exist- ing system, if desired. Techniques range from screen scraping to the implementation of Web services to export access to functions that are still relevant within the new context. All of these chang es beco me part of the system architecture for future iteration s of change. Gaps that can- not be immediately added or changed to meet the speci- fied requirements become constraints in the next iteration of conceptual design. In the absence of a plan, the underlying systems will continue to undergo constant evolutionary changes, ostensibly to meet the changing needs and workflows of both users and staff. Change comes from many sources, including local implementations and modifications, external vendors, and industry-wide changes in stan- dards. This rapid but incremental change can produce a system that is very difficult to maintain and that provides few reusable modules. Having a well-documented imple- mentation and integration plan will not guarantee that Extended Inform atio n Archit ecture System Archit ect ure Coro Infrastructu re • Authonticatlon • Web Platforms • Network Figure 3. The Interrelat ionsh ip between the Conceptual and Physical Design of the Library Web Site the library will not experi ence the negative effects of tech- nological change, but it does allow a library to b ette r manage change in meeting the needs of its users. Th e more explicitly and clearly th e modifiable featur es are documented within the sys te m architecture; the easier it will be to plan to fill the gaps. I Conclusion Library users' mental models of library processes hav e fundamentally changed, creating a serious disconn ect between how users expect to use a library Web site and how the site was design ed. In particular , user expectations regarding the numb er of step s that must be completed have changed. At the same time, library technical infra- structures are composed, in part , of legacy systems that provide great value and facilitate interlibrary resour ce sharing, but were not designed for the Web environm ent. It is essential that librari es develop new approaches to th e conceptual design of Web sites that support current and future changes to both user behaviors and to library sys- tems architectures. In th e long run, these approach es should contribute to th e development of a referenc e model for the description of library services. The authors have proposed a complete framework for conceptual design and physical implementation that is responsive to changing user ne eds while recogni zing the need for libraries to adopt an efficient and cost-effe ctive approach to Web-site design, implementation, and main- tenanc e. Functional and content needs of the user are id entified and molded into a conceptual design based on a broadened perspectiv e of the users' objectiv es . Mapping conceptual requirem ents to physical architec- tures is an important part of this framework, using an architectural representation in combination with descrip- tions of integration elements that have been developed to support the incremental and iterative change. BEYOND INFORMATION ARCHITECTURE I MALONEY AND BRACKE 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The ability to respond is essential, necessitated by the rapid change in the technical and user environments in which libraries operate. The framework is designed to allow logical and informed decisions to be made through- out the process regarding when to create new systems, when to replace or modernize existing systems, and when to improve the conceptual signage of the Web site. References 1. Christina Wodtke, Information Architecture: Blueprints for the Web (Indianapolis: New Riders, 2003), 2. Louis Rosenfeld and Peter Marville, Information Architec- ture for the World Wide Web, 2nd ed. (Cambridge, Mass.: O'Reilly, 2002), 4. 3. Bob Gerrity, Theresa Lyman, and Ed Tallent, "Blurring Services and Resources: Boston College's Implementation of MetaLib and SPX," Reference Services Review 30, no. 3 (2002): 229-41; Barbara J. Cockrell and Elaine Anderson Jayne, "How Do I Find an Article? Insights from a Web Usability Study," Jour- nal of Academic Librarianship 28, no. 3 (May 2002): 122-32. 4. Jesse James Garrett, Elements of User Experience (Indi- anapolis: New Riders, 2002); Rosenfeld and Marville, Information Architecture. 5. James S. Ghapery and Dan Ream, "VCU's My Library: Librarians Love It ... Users? Well Maybe," Information Technol- ogy and Libraries 19, no. 4 (Dec. 2000): 186-90; James S. Ghapery, "My Library at Virginia Commonwealth University: Third Year Evaluation," D-Lib Magazine 8, no. 7 /8 (July/ Aug. 2002). Accessed July 16, 2003, www.dlib.org/ dlib/july02/ ghaphery / 07ghaphery.html. 6. University of Louisville Libraries Web site (2003). Accessed July 16, 2003, http:/ /library.louisville.edu. 7. Craig Larman, Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design (New Jersey: Prentice Hall PTR, 1998); Martin Fowler, Analysis Patterns: Reusable Object Models (Boston: Addison-Wesley, 1997); James Rumbaugh, Ivar Jacobson, and Grady Booch, The Unified Modeling Language Ref- erence Manual (Boston: Addison-Wesley, 1999); Robert C. Sea- cord, Daniel Plakosh, and Grace A. Lewis, Modernizing Legacy Systems: Software Technologies, Engineering Processes, and Business Practices (Boston: Addison-Wesley, 2003). 8. Seacord, Plakosh, and Lewis, Modernizing Legacy Systems, 9. 152 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004
9657 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Policies Governing Use of Computing Technology in Academic Libraries Vaughan, Jason Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 153 Policies Governing Use of Computing Technology in Academic Libraries The networked computing environment is a vital resource for academic libraries. Ever-increasing use dictates the prudence of having a comprehensive computer-use policy in force. Universities often have an overarching policy or policies governing the general use of computing technology that helps to safeguard the university equipment, software, and network against inappropriate use. Libraries often benefit from having an adjunct policy that works to empha- size the existence and important points of higher-level poli- cies, while also providing a local context for systems and policies pertinent to the library in particular. Having computer-use policies at the university and library level helps provide a comprehensive, encompassing guide for the effective and appropriate use of this vital resource. F or clients of academic libraries, the computing envi- ronment and access to online information is an essential part of everyday service-every bit as vital as having a printed collection on the shelf. The computing environment has grown in positive ways-higher-caliber hardware and software, evolving methods of communi- cation, and large quantities of accurate online information content. It has also grown in many negative ways-the propagation of worms and viruses, other methods of hacking and disruption, and inaccurate informational content. As the computing environment has grown, it has become essential to have adequate and regularly reviewed policies governing its use. Often, if not always, overarching policies exist at a broad institutional or even larger systemwide level. Such policies can govern the use of all university equipment, software, and network access within the library and elsewhere on campus, such as cam- pus computer labs. A single policy may encompass every easily conceivable computing-related topic, or there may be several individual policies. Apart from any document drafted and enforced at the university level, various pub- lic laws exist that also govern appropriate computer-use behavior, whether in academia or on the beach. Many institutions have separate policies governing employee use of computer resources; this paper focuses on student use of computing technologies. In some cases, the library and the additional campus student-computer infrastructure (for example, campus labs and dormitory computer access) are governed by the same organizational entity, so the higher-level policy and the library policy are de facto the same. In many instances, libraries have enacted additional computer- use policies. Such policies may emphasize or augment certain points found in the institution-level policy(s), address concerns specific to the library environment, or both. This paper surveys the scope of what are most Jason Vaughan commonly referred to as "computer-use policies," specifically, those geared toward the student-client pop- ulation. Common elements found in university-level policies (and often later emphasized in the library pol- icy) are identified. A discussion on additional topics generally more specific to the library environment, and often found in library computer-use policies, follows. The final section takes a look at the computer-use envi- ronment at the University of Nevada, Las Vegas (UNLV), the various policies in force, and identifies where certain elements are spelled out-at the univer- sity level, the library level, or both. I Policy Basics Purpose and Scope Policies can serve several purposes. A policy is defined as: a plan or course of action ... intended to influence and determine decisions, actions, and other matters. A course of action, guiding principle, or procedure con- sidered expedient, prudent, or advantageous.' Any sound university has a comprehensive computer- use policy readily available and visible to all members of the university community-faculty, staff, students, and visitors. Some institutions have drafted a universal policy that seeks to cover all the pertinent bases pertaining to the use of computing technology. In some cases, these broad overarching policies have descriptive content as well as references to other related or subsidiary policies. In this way, they provide content and serve as an index to other policies. In other cases, no illusions are made about having a single, general, overarching policy-the university has multiple policies instead. Policies can define what is per- mitted (use of computers for academic research) or not per- mitted (use of computers for nonacademic purposes, such as commercial or political interests). A policy is meant to guide behavior and the use of resources as they are meant to be used. In addition, policies can delve into procedure. For example, most policies contain a section on how to report suspected abuse and how suspected abuse is inves- tigated, and outlines potential penalties. Policies buried in legalese may serve some purpose, but they may not do a good job of educating users on what is acceptable and not acceptable. Perhaps the best approach is an appropriate Jason Vaughan (jvaughan@ccmail.nevada.edu) is Head of the Library Systems Department at the University of Nevada, Las Vegas. POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. balance between legalese and language most users will understand. In addition, policies can also serve to help educate individuals on important topics, rather than merely stating what is allowed and what will get one in trouble. For example, a general policy statement might read, "You must keep your password confidential." Taken a step further, the policy could include recommendations pertaining to passwords, such as the minimum password length, inclusion on nonalphabetic characters, the recom- mendation to change the password regularly, and the man- date to never write down the password. Characteristics of a Policy-Visibility, Prominence, Easily Identifiable A policy is most useful when it is highly visible and clearly identified as a policy that has been approved by some authoritative individual or body. Students often sign a form or agree online to terms and conditions when their university accounts are established. Web pages may have a disclaimer stating something to the effect of "use of (insti- tution's) resources is governed by .... " and provide a hyperlink to the various policies in place. Or, a simple poli- cies link may appear in the footer of every Web page at the institutional site. Some universities have gone a bit further. At the University of Virginia, for example, students must complete an online quiz after reviewing the computer-use guidelines.' In addition, they can choose to view the optional video. Such components serve to enhance aware- ness of the various policies in place. A review of the library literature failed to uncover any articles focusing on computer-use policies in academic libraries. The author then selected several similar-sized (but not necessarily peer) institutions to UNLV-doctoral- granting universities with a student population between twenty thousand and thirty thousand-and thoroughly examined their library Web sites to see what, if any, policy components were explicitly highlighted. It quickly became evident that many libraries do not have a centrally visible, specifically titled, inclusive computer-use policy document. Most, but not all, of the library Web sites pro- vided a link to the institutional-level computer-use policy. In some cases, library policies were not consolidated under a central page titled "Policies and Procedures," or "Guidelines," and, where they did appear, the context did not imply or state authoritatively that this was an official policy. There was no statement of who drafted the policy (which can lend some level of authority or credence), as well as no indicated creation or revision date. Granted, many libraries have paper forms one must sign to obtain a library card, or they may state the rules in hardcopy posted within prominent computer-dense locations. Still, with so much emphasis given to licensed database and Internet resources, and with such heavy use of the com- puting environment, such policies should appear online in a prominent location. Where better to provide a com- puter-use policy than online? Perhaps all the libraries reviewed did have policies posted somewhere online. If the author could not easily find them, chances are a stu- dent would have difficulties as well. In sum, the location of the policy information and how it is labeled can make a tremendous difference. Revisions Policies should be reviewed on a regular basis. Often, the initial policy likely goes through university counsel, the president's administrative circles, and, perhaps, a board of regents or the equivalent. Revisions may go through such avenues, or may be more streamlined. A frequent review of policies is mandated by evolving information technology. For example, cell phones with built-in cameras or Internet- browsing capabilities, nonexistent a few years ago, are now becoming mainstream. With such an inconspicuous device, activities such as taking pictures of an exam or finding simple answers online are now possible. Similarly, regularly installed critical updates are a central concept within Windows' latest version of operating-system soft- ware. Such functionality failed to attract much attention until the increase in security exploits and associated media coverage. Some policies, recently updated, now make mention of the need to keep operating systems patched. I Why Have a Library Policy? While some libraries link to higher-level institutional policies and perhaps have a few rules stated on various scattered library Web pages, other libraries have quite comprehensive policies that serve as an adjunct to (and certainly comply with) higher-institutional policies. There are several reasons to have a library policy. First, it adds visibility to whatever higher-level policy may be in place. A central feature of a library policy is that it often provides links (and thus, additional visibility) to other higher-level policies. A computer-use policy can never appear in too many places. (Some libraries have the link in the footer of every Web page.) A computer-use policy can be thought of as a speed limit sign. Presumably, everyone knows that unless otherwise posted, the speed limit inside the city is thirty-five miles per hour, and out- side it is fifty-five miles per hour. Nevertheless, numerous speed-limit signs are in place to remind drivers of this. Higher-level institutional policies often take a broad stroke, in that they pertain to and address computing technology in general, without addressing specific sys- tems in detail. A second reason to have a local-library pol- icy is to reflect rules governing local-library resources that are housed and managed by the library. Such systems 154 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. often include virtual reference, electronic reserves, lap- top-checkout privileges, and the mass of electronic data- bases and full-text resources purchased and managed by libraries. Such library-based systems do not necessarily make the radar of higher-level policies, yet have impor- tant considerations, such as copyright issues in the elec- tronic age or privacy as it relates to e-mail and chat reference. In addition, libraries often have two large user groups that other campus entities do not have-univer- sity affiliates (faculty, staff, students) and nonuniversity affiliates (community users). While broader university policies generally apply to all users of computing tech- nology, local-library policies can work to address all users of the library PCs, and make distinctions as to when, where, and what each group can use. I Common Computer-Use Policy Elements The following section outlines broad topics that are usu- ally addressed within high-level, institutional policies. Often, some or many of these same elements are later reemphasized or adapted by libraries, focusing on the library environment. In many cases, the policy is pre- sented in a manner somewhat like breaking the seal on a new piece of software packaging. Essentially, if someone is using the university equipment or network, that person agrees to abide by all policies governing such use. An overarching policy frequently may end with a bulleted summary of the important points in the document. An important first part of the policy is a clear indication of who the policy applies to. This may be as broad as "any- one who sits down in front of university equipment or connects to the network," or as specific as spelling out individual user groups (undergraduates, graduates, alumni, K-12 students). Appendix A summarizes ele- ments found in the various end-user computer policies in force at UNLV and the UNLV university Libraries. Network and Workstation Security Network security is a universal topic addressed in com- puter-use policies. Under this general aegis one often finds prohibitions against various forms of hacking, as well as recommendations for steps individual users should take to help better secure the overall network. There are also such policies as the prohibition of food and drink near computer workstations or on the furniture housing computer workstations. Typical components related to network and workstation security include: 1. Disruption of other computer systems or networks; deliberately altering or reconfiguring system files; use of FTP servers, peer-to-peer file sharing, or operation of other bandwidth-intensive services 2. Creation of a virus; propagation of a virus 3. Attempts at unauthorized access; theft of account IDs or passwords 4. Password information-individual users need to maintain a strong, confidential password 5. Intentionally viewing, copying, modifying, or deleting other users' files 6. A requirement to secure restrictions to files stored on university servers 7. Recommendation or requirement to back up files 8. Statement of ownership regarding equipment and software-the university, not the student, owns the equipment, network, and software 9. Intentional physical damage: tampering, marking, or reconfiguring equipment or infrastructure- such as unplugging network cables 10. Food and drink policies Personal Hardware and Software Many universities allow students to attach their own lap- tops to the campus wired or wireless network(s). In addi- tion to network connections, a growing number of consumer devices such as floppy disks, zip disks, and rewritable CD /DVD-media have the potential to connect to university computers for the purpose of data transfer. Today, the list has grown to include portable flash drives, digital cameras and camcorders, and MP3 players, among others. The attaching of personal equipment to university hardware may or may not be allowed. Similarly, users may often try to install software on university-owned equipment. Typical examples may include a game brought from home or any of the myriad pieces of soft- ware easily downloaded from the Internet. Some of the policy elements dealing with the use of personal hard- ware and software include: 1. Connecting personal laptops to the university wired or wireless network(s) 2. Use of current and up-to-date patched operating systems and antivirus programs running on per- sonal equipment attached to the network 3. Connecting, inserting, or interfacing such personal hardware as floppy disks, CDs, flash drives, and digital cameras with university-owned hardware; liability regarding physical damage or data loss 4. Limit access to and mandate immediate reporting of stolen personal equipment (to deactivate regis- tered MAC addresses, for example) 5. Downloading or installing personal or otherwise additional software onto university equipment 6. Use of personal technology (cell phones, PDAs) in classroom or test-taking environments POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. E-mail E-mail privileges figure prominently in computer-use policies. Some topics deal with security and network per- formance (sending a virus), while many deal with inap- propriate use (making threats or sending obscene e-mails). Other topics deal with both (such as sending spam, which is unsolicited, annoying, and consumes a lot of bandwidth). Among the activities covered are prohibi- tions or statements regarding: l. Hiding identity, forging an e-mail address 2. Initiating spam 3. Subscribing others to mailing lists 4. Disseminating obscene material or Weblinks to such material 5. General guidelines on e-mail privileges, such as the size of an e-mail account, how long an account can be used after graduation, and e-mail retention 6. Basic education regarding e-mail etiquette Printing With the explosion of full-text resources, libraries and other student-computing facilities have experienced a tremendous growth in the volume of pages printed on library printers. At UNLV Libraries, for example, the print- ing volume for July 2002 to June 2003 was just shy of two million pages; the following year that had jumped to almost 2.4 million pages. Various policies helping to gov- ern printing may exist, such as honor-system guidelines ("don't print more than ten pages per day"). Some institu- tions or libraries have implemented cost-recovery systems, where students pay fixed amounts per black-and-white and color pages printed through networked printers. Standard policies regarding printer-use cover: 1. Mass printing of flyers or newsletters 2. Tampering with or trying to load anything into paper trays (such as trying to load transparencies in a laser printer) 3. Per-sheet print costs (color and black-and-white; by paper size) 4. Refund policies 5. Additional commonsense guidelines, such as "use print preview in browser" Personal Web Sites Many universities allow students to create personal Web sites, hosted and served from university-owned equipment. Customary policy items focusing on this privilege include: 1. General account guidelines-space limitations, backups, secure FTP requirements 2. Use of school logo on personal Web pages 3. Statement of content responsibility or institutional disclaimer information 4. Requirement to provide personal contact information 5. Posting or hosting of obscene, questionable, or inappropriate content Intellectual Property, Copyright, or Trademark Abuse of copyright, clearly a violation of federal law, is something that libraries and universities were concerned about long before computers hit the mainstream. Widespread computing has introduced new avenues to potentially break copyright laws, such as peer-to-peer file sharing and DVD-movie duplication, to mention only two. A computer-use policy covering copyright will gen- erally include: l. General discussion of copyright and trademark law; links to comprehensive information on these topics 2. Concept of educational "fair use" 3. Copying or modification of licensed software, use of software as intended, use of unlicensed software 4. Specific rules pertaining to electronic theses and dissertations 5. Specific mention of the illegality of downloading copyrighted music and video files Appropriate- and Priority-Use Guidelines Appropriate use is often covered in association with top- ics such as network security or intellectual property. However, appropriate- and priority-use rules can be an entire policy and would include: l. Mention of federal, state, and local laws 2. Use of resources for theft or plagiarism 3. Abuse, harassment, or making threats to others (via e-mail, instant messaging, or Web page) 4. Viewing material that may offend or harass others 5. Legitimate versus prohibited use; use for nonacad- emic purposes such as commercial, advertising, political purposes, or games 6. Academic freedom, Internet filtering Privacy, Data Security, and Monitoring Privacy and data security are tremendous issues within the computing environment. Networking protocols and com- ponents of many software programs and operating sys- tems by default keep track of many activities (browser history files and cache, Dynamic Host Configuration Protocol logs, and network account login logs, to mention a few). Additional specialized tools can track specific ses- sions and provide additional information. Just as credit- 156 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. card companies, banks, and hospitals provide a privacy policy to their clients, so do many academic computer-use policies. Such statements often address what logs are kept, how they are maintained, how they may be used, and who has access. In addition to the legitimate use of maintaining information, there is the general concept of questionable or outright malicious collection of information, through cook- ies, spybots, or browser hijacks. The following are concepts often addressed under the general heading of privacy: l. Cookies, spybots, other malicious software 2. What information is collected for evaluative sys- tem management and/ or statistical purposes; use of cookies for this; how such information is used and reported 3. Statement on routine monitoring or inspection of accounts or use; reasons information may be accessed (routine system maintenance, official uni- versity business, investigation of abuse, irregular usage patterns) 4. Security of information stored on or transmitted by various campus resources 5. Statement on general lack of security of public, multiuser workstations (browser cache, search his- tory, recent documents) 6. Disposition of information under certain circum- stances (for example, if a student dies while enrolled, any personal university e-mail and stored files can be turned over to executor of will or parents) Abuse Violations, Investigations, and Penalties As policies generally are a statement of what is or is not permitted, or what is considered abuse, a clearly defined mechanism for reporting suspected abuse and policy vio- lations can often be found. Obviously, some abuse issues violate not only university policy, but also local, state, or federal law. Investigations of suspected abuse are by their nature tied into the privacy and monitoring category. Policy items detailing suspected abuse usually include: 1. How one can report suspected abuse 2. How requests for content, logging, or other account information are handled; how and by what entities abuse investigations are handled 3. Potential penalties 4. How to appeal potential penalties; rights and responsibilities one may have in such a situation Other Computer or Network-based Services Affecting the Broad Student Population Universities operate any number of other computer or network-based services for the broad academic commu- nity. Such services may include provisioning of ISP accounts, courseware, online registration, and digital institutional repositories. Depending on the broad nature of these services, policy information particular to such systems can be specified at the broad policy level, espe- cially if they have unique avenues of potential exploita- tion or abuse not covered in the general topics included elsewhere in the policy. I Additional Library-Specific Computer-Use Policy Elements Many libraries elect to have their own, additional computer-use policies that serve as an adjunct to the larger university-level policy that generally governs the use of all computing resources on campus. Libraries that have a formalized library computer-use policy often start with a statement of other policies governing the use of the library equipment and network-references to the uni- versity policies in place. The library policy may choose to include or paraphrase parts of the university policy deemed especially important or otherwise applicable to the specific library environment. Important concepts gov- erning university policies apply equally to library poli- cies-purpose and comprehensiveness, visibility, and frequent review. Libraries that have formalized com- puter-use policies often link them under library common Web-site sections such as "information about the libraries," or "about the libraries." Library policies can help address items unique, special, or otherwise worthy of elaboration, such as specific systems in place or situa- tions that may arise. They can also help provide guide- lines and strategies to aid staff in policy enforcement. As an example of a library computer-use policy, appendix B provides the main UNLV Libraries computer-use policy. Public versus Student Use-Allowances and Priority Use Many of the other entities on a university campus do not daily deal with the community at large (the non-univer- sity affiliates) as do academic libraries. This applies to most if not all public institutions, as well as many private institutions. The degree to which academic libraries embrace community users varies widely; often, a state- ment on which user groups are the primary clients is stated in a policy. Such policy statements may discuss who may use what computers, what software compo- nents they have access to, and when access is allowed. In some cases, levels of access for students and the commu- nity are basically the same. Community users may be allowed to use all software installed on the PC. More often, separate PCs with smaller software sets have been configured for community users or for specific access to POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. government documents. In some cases, libraries allow some or all PCs to be used by anyone, student or nonstu- dent, but have technically configured the PC or network to prevent the community at large from using the full software set (such as common productivity suites). However, community users may be limited from using the productivity software (such as Microsoft Word) found on these PCs. The may be restricted from using PCs on upper floors, or those reserved for special pur- poses, such as high-end graphics-development worksta- tions. In addition, during crunch time-midterms and final exams-community users are often restricted to the few PCs set up and configured to allow access only to the library Web page (not the Web at large) and the online catalog. In addition, only students and staff can plug in their personal laptops to the library and campus net- work. Regardless of whether it is crunch time, nonstu- dent users can be asked to leave if all PCs are in use and students are waiting. An in-house-authored program identifies accounts and whether particular users are stu- dents or nonstudents. In 2005, the UNLV Libraries will begin limiting full web access to community users; they will only be permitted access to a limited set of Web- based resources, such as government document websites and library licensed databases. More and more government information is available online. For libraries serving as government document repositories, all users have the right to freely access infor- mation distributed by the government. In 2005, the UNLV Libraries will begin limiting full Web access to commu- nity users; they will only be permitted access to a limited set of Web-based resources, such as government docu- ment Web sites and library licensed databases. On another note, many libraries have special adaptive work- stations with additional software and hardware to facili- tate access to library resources by disabled citizens. Disabled individuals, enrolled at the university or not, are allowed to use these adaptive workstations. Laptop Checkout Privileges Many libraries today check out laptops for student use. At UNLV Libraries, faculty, staff, and students may check out LCD projectors and library-owned laptops and plug them into the network at any of the hundreds of available locations within the main library. More details on these privileges can be found in the article "Bringing Them In and Checking Them Out: Laptop Use in the Modern Academic Library." 3 As the university does not otherwise check out laptops to users or allow students to plug in their own laptops to the wired university net- work, the Libraries had to come up with these additional specific policies. Licensed Electronic Resources-Terms and Conditions Academic libraries are generally the gatekeepers to many citation and full-text databases and electronic journals. Each of the myriad subscription vendors has terms of use, violations of which can carry harsh penalties. For exam- ple, the UNLV Libraries had an incident where a vendor temporarily cut off access to its resource due to potential abuse detected from a single student. In this case, the user was downloading multiple PDF full-text files in an auto- mated manner. This illustrates the need to have some statement in a library policy outlining the existence of such additional terms of use. Vendors generally place a link at the top page of each of their resources related to this. For greater visibility, libraries should at least point out the existence of such terms of use for better exposure and potential compliance. In addition, some electronic resources have licensing agreements that simply do not permit community-user access. In these cases, library pol- icy can simply state that some licensed resources may be accessed only by university affiliates. Electronic Reserves Many libraries have set up electronic reserves systems to help distribute electronic full-text documents and stream- ing media content, among other things. Additional poli- cies may govern the use of such systems, such as making the system available only to currently enrolled students, and providing some boundaries in terms of what is acceptable for mounting on such a system. In addition, there is the whole area of copyright. E-reserve systems often have built-in methods to help better enforce copy- right compliance in the electronic arena. Additional pol- icy statements can help educate faculty members on particulars related to copyright and e-reserves. Offsite Access to Licensed Electronic Resources Many libraries provide offsite access to their licensed resources to legitimate users via proxy servers or other methods. The policy regarding such access may address things such as who is permitted to access resources from offsite (such as students, staff, and faculty), and the requirement that the user be in good standing (such as no outstanding library-book fines). In some instances, uni- versities have implemented broad authentication systems that, once logged on from an offsite location, allow the user into a range of university resources, including, potentially, library-licensed electronic resources. If such is the case, information pertaining to offsite access may be found in a higher-level policy. 158 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Electronic Reference Transactions Many libraries have installed (or plan to install) virtual- reference systems, or, at a minimum, have a simple e-mail reference service ("Ask a Librarian"). In addition, many collect library feedback or survey information through simple forms. In all cases, a record exists of the transac- tion. With virtual-reference systems, the record can include chat logs, e-mail reference inquiries, and URLs of Web pages accessed during the transaction. A policy governing the use of electronic-reference systems may address such things as which clientele may use the sys- tem; a statement on the confidentiality of the transac- tion; or a statement on whether the library maintains the electronic-transaction details. Items such as hours of operation and response time to an e-mail question could be considered more procedural or informational than a policy issue. Statements on Information Literacy While perhaps not a policy per se, many libraries have a computer-use policy statement to the effect that while the library may provide links to certain information, this does not serve as an endorsement or guarantee that the infor- mation is accurate, up-to-date, or has been verified. (Such a statement posted on the library Web site may provide additional exposure to the maxim that all that glitters is not gold.) Statements that libraries do not regulate, organize, or otherwise verify the general mass of information on the Internet may be included. Obviously, many libraries have separate instruction sessions, awareness programs, and overall mission goals geared toward information literacy. I Principles on Intellectual Freedom and Internet Filtering Statements by the American Library Association (ALA) on intellectual freedom and Internet filtering may well appear in an institutional policy and often are included in library policies. Filtering is something more likely to affect public and school libraries as opposed to academic libraries. Still, underage children can and do use aca- demic libraries. In such an environment, they may be intentionally or unintentionally exposed to questionable or obscene material. Thus, a library computer-use policy can express the general concept behind the following: 1. intellectual freedom (freedom of speech; free, equal, unrestricted access); 2. the fact that academic libraries provide a variety of information expressing a variety of viewpoints; 3. the fact that this information is not filtered; and 4. the responsibility of parents to be aware of what their children may be viewing on library PCs. Some libraries have provided policy links to various sets of information from the Office of Intellectual Freedom at ALA's Web site, such as: 1. ALA Code of Ethics 2. ALA Bill of Rights 3. Intellectual Freedom Principles for Academic Libraries: An Interpretation of the Library Bill of Rights 4. Access to Electronic Information, Services, and Networks: An Interpretation of the Library Bill of Rights Some libraries also provide references to ALA infor- mation pertaining to the USA Patriot Act and how law- enforcement inquiries are handled. I Summary Computing is a vitally important tool in the academic environment. University and library computing resources receive constant and growing use for research, communication, and synthesizing information. Just as computer use has grown, so have the dangers in the net- worked computing environment. Universities often have an overarching policy or policies governing the general use of computing technology that help to safeguard the university equipment, software, and network against inappropriate use. Libraries often benefit from having an adjunct policy that works to emphasize the existence and important points of higher-level policies, while also pro- viding a local context for systems and policies pertinent to the library in particular. Having computer-use policies at the university and library level help provide a compre- hensive, encompassing guide for the effective and appro- priate use of this vital resource. References 1. The A111erica11 I Jeri/age College Dictionary, 3rd edition. (Boston: Houghton, 1997), 1058. 2. Board of Visitors of the University of Virginia, "Responsi- ble Computing at U.Va.: A Handbook for Students." Accessed June 2, 2004, www.itc.virginia.edu/pubs/ docs/RespComp / rchandbook03.html. 3. Jason Vaughan and Brett Burnes, "Bringing Them In and Checking Them Out: Laptop Use in The Modern Academic Library," Information Technology and Libraries 21 (2002): 52-62. POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV UNLV UCCSN UNLV Policy Libraries scs Computing UNLV Student for Posting Guidelines UNLV Libraries NevadaNet Resou rces Computer-Use Information for Library Additional Policy* Policy** Policy*** on the Webt Computer Usett Policiesttt General Direct Evident Link or References to Higher-Level Institutional/System Computer Use Policy X X X Author / Authority Information Included X X X Approved/Revised Date Included X X X X Network and Workstation Security Disruption of other computer systems/networks; deliberate ly altering or reconfiguring system files; FTP Servers/Peer-to-Peer File Sharing/Operation of other bandwidth intensiv e services X X X Creat ion of a virus; propagation of a virus X X X X Attempts at unauthorized access/Theft of account IDs or passwords X X X X X Password Information- individual user's need to maintain a strong, confidential password Intentionally view, copy, mod ify, or de lete other users' files X X X X Requirement to secure restrictions on stored files Recommendat ion/requirement to back up fi les Statement of ownership regarding equipment/software X Intent ion al phys ical damage: tampering with or marking, reconfiguring equipment or infrastructure X X X Food and dr ink policies X 160 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) scs Nevada Net Policy* Pe rsonal Hardware and Software Connect ing persona l laptops, etc. to university wired or wireless network(s) Use of current and up-to-date patched operating systems and antiv irus programs running on personal equipment attached to network - Connect ing/ insert ing/ interfacing personal hardware with existing univers ity equipment; liability regarding physica l damage or data loss Limiting access to personal equipment/report immed iately if stolen Download ing or installat ion of personal or otherwise add itiona l software onto university equipment Use of pe rsonal technology in c lassroom/test -tak ing environments Printing Mass printing of f lyers or news lette rs Tampering with or trying to load anything into paper trays Per-sheet print costs Refund policies Additiona l common- sense gu idel ines E-mail Hiding ident ity/forging an e-mai l address Initiating spam X UCCSN Computing Resources Policy** X X X X UNLV Policy UNLV Student for Posting Computer-Use Information Policy*** on the Webt X X UNLV Libraries Gu idelines for Library Computer Usett X X X X X X X UNLV Libraries Additional Policiesttt X X POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) j UNLV UCCSN UNLV Policy Libraries I scs Computing UNLV Student for Posting Guidelines UNLV Libraries Nevada Net Resources Computer-Use Information for Library Additional Policy* Policy** Policy*** on the Webt Computer Usett Policiesttt E-mail (cont.) Subscribing others to mailing lists Dissemination of obscene material or Web links to such material X X General guidel ines on e-mail privileges , such as the size of an e-mail account, how long an account can be used after graduation, etc. Personal Web Site Specific General account guidelines X Use of schoo l name and logo Statement of content responsibility/institutional discla imer inform ation X Requirement to prov ide personal contact inform at ion X Posting and hosting of obscene, questionable, or inappropriate material X Intellectual Property, Copyright, and Trademark General d iscussion of copyrights and trademark law; link s to comprehensive information on these topics X X X The concept of educational fair use X Copying or modifying licensed software/use of software as intended/use of unlicensed software X X X Specific rules pertaining to electronic theses and dissertations 162 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ' f I I Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) UNLV UCCSN UNLV Policy Libraries scs Computing UNLV Student for Posting Guidelines UNLV Libraries NevadaNet Resources Computer-Use Information for Library Additional Policy* Policy** Policy*** on the Webt Computer Usett Policiesttt Appropriate- and Priority-Use Guidelines Mention of federal, state, and local laws X X X Use of resources for theft/plagiarism X Abuse, harassment, or making threats to others (via e-mai l, instant messaging, Web page, etc.) X X X X Viewing material which may offend others X Legitimate versus prohibited use; use for nonacademic purposes (commerc ial; advertising; political purposes; games; etc.) X X X X X Academic Freedom; Internet filtering X X X X Privacy Cookies, spybots, other malicious software What information is collected for evaluative/system management/statistical purposes; use of cookies for this Statement on routine monitoring or inspect ion of accounts or use; reasons information may be accessed X X Security of information stored on or transmitted by various campus resources X Statement on general lack of security of public, multi-user workstations Disposition of information under certain circumstances POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) UNLV scs NevadaNet Policy* Abuse Violations, Investigations, and Penalties How one can report suspected ab use How requests for content, logg ing, or other account information are hand led; how and by what entities abuse investigations are hand led Potential pena lt ies How to appea l potentia l penalties; rights/ respons ibilit ies you may have in such a sit uation Other Computer/ Network-based Services Affecting the Broad Student Population Library-Specific Pub lic versus student use -a llowances and pr iority use Right to access government information Assistance for person w ith disab ilit ies Laptop, LCD projector, etc. checkout privileges Licensed electron ic resources-terms and conditions Offsite access to licensed electron ic resources-who can access from offsite Electronic reference transactions Statements on information literacy X X X UCCSN Computing UNLV Student Resources Computer,-Use Policy** Policy*** X X X 164 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 UNLV Policy for Posting Information on the Webt X X X Libraries Guidelines UNLV Libraries for Library Additional Computer Usett Policies ttt X X X X X X Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) UNLV ALA princip les on academic freedom /Internet filtering scs Nevada Net Policy* Electron ic reserves; copyright as it pertains to electronic reserves Notes UCCSN UNLV Policy Computing UNLV Student for Posting Resources Computer-Use Information Policy** Policy*** on the Webt Libraries Guidelines UNLV Libraries for Library Additional Computer Usett Policiesttt X X * The Systems Computing Services NevadaNet Policy. Among other responsibilities, SCS provides and maintains the general Internet connectivity for Nevada's higher education institutions, including UNLV. The complete document can be accessed at www.scs.nevada.edu/nevadanet/nvpolicies.html. ** The University and Community College System of Nevada Computing Resources Policy. UCCSN is the system of higher education institutions in the state of Nevada, governed by an elected board of regents. The complete document can be accessed at www.scs .nevada .edu/about/policy061899.html *** The complete document can be accessed at www.unlv.edu/infotech/itcc/SCUP.html. 1 The complete document can be accessed at www.unlv.edu/infotech/itcc/WWW _Policy.html. 11 The primary UNLV Libraries policy governing student computer use. Provided in Appendix 2, the complete document can also be accessed at www. library.unlv.edu/services/policies/computeruse.html . ttt Various other policies are in effect at the UNLV Libraries. Some of these can be accessed at www.library.unlv .edu/services/policies/computeruse.html. Appendix B. UNLV University Libraries Guidelines for Library Computer Use In pursuit of its goal to provide effective access to information resources in support of the university's programs of teaching, research, and scholarly and creative production , the university libraries have adopted guidelines governing electronic access and use of licensed software. All those who use the libraries' public computers must do so in a legal and ethical manner that demonstrates respect for the rights of other users and recognizes the importance of civility and responsibility when using resources in a shared academic environment. Authorized Users To gain authenticated access to the libraries ' computer network, all users of the university libraries public computers must be officially registered as a library borrower, a library computer user, or a guest user . A photo ID is required. (Exceptions may be made as needed when access to Federal Depository electronic resources is required.) Priority use is granted to UNLV students, faculty, and staff. As need arises, access restrictions may be imposed on nonuniversity users. In accordance with lic ensing and legal restrictions, nonuniversity users are restricted from using word-processing, spreadsheet, and other productivity and high-end multi-media software. During high-demand times, all users may have time restrictions placed on their computer use. If requested by library staff, all users must be prepared to show photo ID to confirm their user status. Authorized and Unauthorized Use Public computers are to be used for academic research purposes only. Electronic information, services, software, and net- works provided directly or indirectly by the mliversity libraries shall be accessible, in accordance with licensing or contrac- tual obligations and in accordance with existing UNLV and University and Community College System of Nevada (UCCSN) computing services policies (UCCSN Computing Resources Policy www.scs .nev ada.edu/about/policy061899.html; POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNLV Faculty Computer Use Policy www.unlv.edu/infotech/itcc/FCUP.html; Student Computer Use Policy http:/ /ccs. unlv.edu/ scr/ computeruse.asp). Users are not permitted to: 1. Copy any copyrighted software provided by UNLV. It is a criminal offense to copy any software that is protected by copyright, and UNLV will treat it as such 2. Use licensed software in a manner inconsistent with the licensing arrangement. Information on licenses is avail- able through your instructor 3. Copy, rename, alter, examine, or delete the files or programs of another person or UNLV without permission 4. Use a computer with the intent to intimidate, harass, or display hostility toward others (sending offensive mes- sages or prominently displaying material that others might find offensive such as vulgar language, explicit sexual material, or material from hate groups) 5. Create, disseminate, or run a self-replicating program ("virus"), whether destructive in nature or not 6. Use a computer for business purposes 7. Tamper with switch settings, move, reconfigure, or do anything that could damage terminals, computers, printers, or other equipment 8. Collect, read, or destroy output other than your own work without the permission of the owner 9. Use the computer account of another person with or without their permission unless it is designated for group work 10. Use software not provided by UNLV 11. Access or attempt to access a host compnter, either at UNLV or through a network, without the owner's permis- sion, or through use of log-in informatio! belonging to another person Internet and Web Use The university libraries cannot control the information available over the Internet and are not responsible for its content. The Internet contains a wide variety of material, expressing many points of view. Not all sources provide information that is accurate, complete, or current, and some may be offensive or disturbing to some viewers. Users should properly evaluate Internet resources according to their academic and research needs. Links to other Internet sites should not be construed as an endorsement by the libraries of the content or views contained therein. The university libraries respect the First Amendment and support the concept of intellectual freedom. The libraries also endorse ALA's Library Bill of Rights, which supports access to information and opposes censorship, labeling, and restricting access to information. In accordance with this policy, the university libraries do not use filters to restrict access to information on the Internet or Web. As with other library resources, restriction of a minor's access to the Internet or Web is the responsibility of the parent or legal guardian. Printing Users are charged for printing no matter who supplies the paper. Mass production of club flyers, newsletters, posters is strictly prohibited. If multiple copies are desired, users need to go to an appropriate copying facility such as Campus Reprographics. Contact a staff member when using the color laser printer to avoid costly mistakes. The university libraries reserve the right to restrict user printing based on quantity and content (such as materials related to running an outside business). Copyright Alert Many of the resources found on the Internet or Web are copyright protected. Although the Internet is a different medium from printed text, ownership and intellectual property rights still exist. Check the documents for appropriate statements indicat- ing ownership. Most of the electronic software and journal articles available on library servers and computers are also copy- righted. Users shall not violate the legal protection provided by copyrights and licenses held by the university libraries or others. Users shall not make copies of any licensed or copyrighted computer program found on a library computer. Use of Personal Laptops and Other Equipment Students, faculty, and staff of the university are welcome to bring laptops with network cards and use them with our data drops to gain access to our network. The laptop must be registered in our laptop authentication system, and a valid 166 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. library barcode is also required. Users are responsible for notifying the library promptly if their registered laptop is lost or stolen, since they may be held responsible if their laptop is used to access and damage the network. Users taking advantage of this service are required to abide by all UCCSN and UNLV computer policies. The libraries allow the use of the universal serial bus (USB) connections located in the front of the workstations. This includes use with portable USB-based devices such as flash-based memory readers (memory sticks, secure digital) and digital camera connections. The patron assumes all responsibility in attaching personal hardware to library workstations. The libraries are not responsible for any damage done to patron-owned items (hardware, software, or personal data) as a result of connecting such devices to library workstations. As with any use of library workstations, patrons must adhere to all UCCSN, UNLV, and university libraries' computing and network-use policies. Patrons are responsible for the security of their personal hardware, software, and data. Inappropriate Behavior Behavior that adversely affects the work of others and interferes with the ability of library staff to provide good service is considered inappropriate. It is expected that users of the libraries' public computers will be sensitive to the perspective of others and responsive to library staff's reasonable requests for changes in behavior and compliance with library and university policies. The university libraries and their staff reserve the right to remove any user(s) from a computer if they are in violation of any part of this policy and may deny further access to library computers and other library resources for repeat offenders. The libraries will pursue infractions or misconduct through the campus disciplinary channels and law enforcement as appropriate. Revised: March 3, 2004 Updated: Thursday, May 13, 2004 Content Provider: Wendy Starkweather, Director of Public Services POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 167
9658 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Impact of Web Search Engines on Subject Searching in OPAC Yu, Holly;Young, Margo Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 168 The Impact of Web Search Engines on Subject Searching in OPAC Holly Yu and Margo Young This paper analyzes the results of transaction logs at California State University, Los Angeles (CSULA) and studies the effects of implementing a Web-based OPAC along with interface changes. The authors find that user success in subject searching remains problematic. A major increase in the frequency of searches that would have been more successful in resources other than the library catalog is noted over the time period 2000-2002. The authors attribute this increase to the prevalence of Web search engines and suggest that metasearching, relevance-ranked results, and relevance feedback ("more like this") are now expected in user searching and should be integrated into online catalogs as search options. I n spite of many studies and articles on Online Public Access Catalogs (OPAC) over the last twenty-five years, many of the original ideas about improving user success in searching library catalog have yet to be imple- mented. Ironically, many of these techniques are now found in Web search engines. The popularity of the Web appears to have influenced users' mental models and thus their expectations and behavior when using a Web- based OPAC interface. This study examines current search behavior using transaction-log analysis (TLA) of subject searches when zero-hits are retrieved. It considers some of the features of Web search engines and online bookstores and suggests future enhancements for OPACs. I Literature Review Many studies have been published since the 1980s center- ing on the OPAC. Seymour and Large and Beheshti pro- vide in-depth overviews on OPAC research from the mid-1980s through the mid-1990s.' Much of this research has addressed system design and user behavior including: • user demographic s, • search behavior, • knowledge of system, • knowledge of subject matter, Holly Yu (hyu3@calstatela.edu) is Library Web Administrator and Reference Librarian at the University Library, California State University, Los Angeles. Margo Young (Margo.e.young@jpl. nasa.gov) is Manager of the Library, Archives and Records Sec- tion at the Jet Propulsion Laboratory, California Institute of Technology, Pasadena. • library settings, • search strategies, and • OPAC systems 2 OPAC research has employed a number of data-col- lection methodologies: experiment, interviews, question- naires, observation, think aloud, and transaction logs. ' Transaction logs have been used extensively to study the use of OPACs, and library literature reflects this. While the exact details of TLA vary greatly, Peters et al. define it simply as "the study of electronically recorded interac- tions between online information retrieval systems and the persons who search for the information found in those systems."' This section reviews the TLA literature relevant to the study. I Number of Hits TLA cannot portray user intention or actual satisfaction since relevance, success, or failure are subjectively deter- mined and require the user to decide. Peters recommends combining TLA with another technique such as observa- tion, questionnaire or survey, interview, or focus group. 5 In spite of the limit ations of TLA, many studies (including this one) rely on it alone. Typically, these studies define failure as zero hits in response to a search. Generalizing from several studies, approximately 30 percent of all searches result in zero hits.6 The failure rate is even higher for subject searches: Peters reported that about 40 percent of subject searches failed by retrieving zero hits. 7 Some researchers also define an upper number of results for a successful sea rch. Buckland found that the average retrieval set was 98.8 Blecic reported that Cochrane and Markey found that OPAC users retrieve too much (15 percent of the time). 9 Wiberly, Daugherty, and Danowski (as reported in Peters) found that the median number of postings considered to be too many was fifteen, although when fifteen to thirty postings were retrieved, more users displayed them all than abandoned the search. 10 I Subject Searching Some studies have specifically looked at subject search- ing. Hildreth differentiated among various types of searches and defined one hundred items as the upper limit for keyword searches and ninety as the upper limit for subject searches." Larson defined reasonable subject retrieval as between one and twenty items and found that only 12 percent of subject searches retrieved the appro- priate number. 12 168 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Larson is not the only researcher to have reported poor results in subject searching. For more than twenty years, research has demonstrated that subject or topical searches are both popular and problematic. Tolle and Han found that subject searching is most frequently used and the least successful. 13 Moore reported that 30 percent of searches were for subject, and Matthews et al. found that 59 percent of all searches were for subject information. 14 Hunter found that 52 percent of all searches were subject searches and that 63 percent of these had zero hits. 15 Van Pulis and Ludy referred to Alzofon and Van Pulis's earlier work in 1984 where they reported that 42 percent of all searches were subject searches.16 Hildreth found that 62.1 percent of subject searches and 35.4 percent of keyword searches failed. 17 Larson categorized the major problems with online catalogs as follows: • users' lack of knowledge of Library of Congress sub- ject headings (LCSH), • users' problems with mechanical and conceptual aspects of query formulation, • searches that retrieve nothing, • searches that retrieve too much, and • searches that retrieve records that do not match what the user had in mind. 18 During an eleven-year longitudinal study, Larson found that subject searching was being replaced by key- word searching. 19 No consistent pattern in the number of search terms has emerged in the literature. Van Pulis and Ludy reported that user searches were typically single words. 20 Markey contended that users' search terms frequently matched standardized vocabulary in large catalogs. 21 None of Markey's researchers consulted LCSH, and only 11 percent of Van Pulis and Ludy's did so, notably in spite of their library's user-education programs. Peters reported that Lester found that the average search was less than two words and fewer than thirteen characters." Hildreth found that more than two-thirds of keyword searches included two or more words and 42 percent of these multiple-word searches resulted in zero hits. 23 The proportion of zero-hit keyword searches rose with the increasing number of words in the search. Subject headings have been a matter of considerable study. Gerhan examined catalog records and surmised their accessibility in an online catalog. He contended that when a keyword from the title only is accessed, only 50 percent of all relevant books would be found and that title keywords would lead a user to subject-relevant records in 55 percent of cases while LCSH would lead a user success- fully in 85 percent of the cases. 24 In contrast, Cherry found that 42 percent of zero-hit subject searches would have been more fruitful as keyword or title searches than by fol- lowing cross references retrieved from the subject field.25 She recommended converting zero-hit subject queries to other types of subject searches (keyword). Thorne and Whitlatch recommended that subject searchers should select keyword rather than subject headings as their first access strategy. 26 Types of Problems in Subject Searches Numerous studies have categorized reasons for search failure (typically in zero-hit situations), but Peters reports that a standard categorization has not yet been estab- lished .27 Tn cases where more than one error is made in a search (and Hunter reported this to be frequent), there is no consistency in how that is assigned. Nonetheless, some major categories of problems stand out: • misspelling and typographical errors-Peters found that these errors accounted for 20.8 percent of all unsuccessful keyword searches, while Henty (reported by Peters) concluded that 33 percent of such searches could be attributed to this.28 Hunter found that 9.3 per- cent of subject searches had typographical and spelling errors. 29 • keyword search-Hunter found 52.6 percent of zero- hit searches used uncontrolled vocabulary terms. 30 • wrong source or field-Hunter concluded that 4.5 percent of searches should have been done in a source other than the catalog, while 1.3 percent of searches were of the wrong type (an author search in the subject-search option). 31 • items not in the database-Peters found that searches for items not held in the database accounted for 39.1 percent of unsuccessful searches, while Hunter found that problem in only 2.5 percent of the problem cases. 32 In addition to these problems, Hunter also found that index display and rules relating to the systems accounted for 27 percent of errors. 33 I Resulting Recommendations for Change While Hildreth stated, "There has been little research on most components of the OPAC interface" in 1997, he pro- posed two options to improve user success: increased user training or improved design based on information- seeking behavior. 34 Wallace pointed out that there is a very short window of opportunity when searchers are amenable to instruction and that successful screen designs should therefore focus on presenting the quick- searching options employed by the majority of users first. 35 Large and Beheshti observed "that too many options simply caused confusion, at least for less experi- enced OPAC users," and they summarized that OPAC- THE IMPACT OF WEB SEARCH ENGINES ON SUBJECT SEARCHING IN OPAC I YU AND YOUNG 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. interface research focuses on menu sequence, browsing, and querying .3'; Menu Sequence In terms of menu sequence, Hancock-Beaulieu indicated that "the menu sequence in which search options are offered will influence user selection." 37 Ballard found that the amount of keyword searching was affected by its posi - tion on the menu. 38 Scott reported that both keyword- and subject-search success improved when the keyword was plac ed at the top of the menus .39 Thorne and Whitlach used a combination of methods in their study and concluded that several interface changes should be implemented : • strongly encourage novi ce users to start with key- word (list keyword above subject heading), • relabel "keyword" to "subject or title words," and • relabel II subject heading" to "Library of Congress Subject Heading."' 0 Blecic et al. studied tran sactio n logs over six months to track th e impact of "simplifying and clarifying" OPAC introductory screens. After moving the keyword option to th e top, keyword searching incr ease d from 13.30 per- cent to 15.83 percent of all sea rch statements. Blecic et al. found her original tally of 35.05 p ercent of correct searches having zero hits decre ased to 31.35 percent after screen changes. 41 Querying OPAC-interface design has been based on an assumption that us ers come to the catalog knowing what the y need to know . In either text-bas ed OPAC or Web-based OPAC, query-based searches are still mainstream. Searchers are required to have knowledge of title, author, or subject. Ortiz-Repiso and Moscoso observed that Web-based cata- logs, like all library catalogs, basi cally fulfill two functions: locating works based on known details and identifying which documents in the databas e cover a given subject. 42 Natural-language input has long been considered a desi r- able way to overcome this shortcoming. Browsing Relevance-ranked output and hypertext were considered by Hildr eth to be promising in 1997.43 OPACs have not been conceived within a true h ypertext environment, but rather they maintain the structure of their original for- mats, principally machine-readable cataloging (MARC), and therefore impede the generation of a structure of nodes and links. 44 In addition to continuing to employ MARC format as its underlying structure, the concept of main entry and added entr y, field label, and displa y logic all reflect cataloging rules . Amazon.com and Barnes and Noble have completel y mo ve d away from this century- old structure to pro vi d e easy access to book information . In the Web environment , th e concept of main ent ry loses its meaning to multiple-acces s points and linking capabil- ities of author, subject, and call number. Another prominent drawback of Web-based OP A Cs is that they have not taken advantage of thesaurus structure and utilized the thesaurus for sea rching feedback. The hierarc hical relationship in LCSH is underutilized in terms of the relationship betw een terms and associations through related terms. Web-based OPACs have failed to make use of this important access. The persistence of the se drawbacks in OPAC-interfac e design is rooted d eeply in cataloging rules that were derived from the manual environment more than a cen- tury ago. It reflects th e gap between "concepts typically h eld by nonprofessional users and those used in library practices." 45 In her article "Why Are Online Catalogs Still Hard to Use?" Borgman conclude s: Despite numerous imp rove m en ts to the user interfac e of online catalogs in recent years, searc her s still find th em hard to use . Most of th e improvements are in sur- face features rather than in th e core functionality. We see little evidence th at our research on searching behav- ior studies has influ enced onlin e catalog design. " Catalog Content Users misunderstand th e scope of the catalog. In ques- tionnaire responses, 80 percent of Van Pulis and Ludy 's participants indicated the y had considered looking else- where than the library catalog, as in periodical ind exes. 47 Blazek and Bilal report ed a reque st for inclusion of journal- article titles in one respo nse to their questionnaire .48 Libraries responded to th ese requests by acquiring data- bas es on CD-ROM , loadin g them locally (sometimes using the catalog system to mount a separate databas e), and, most recently, providing access to databases over the Internet. However, seldom h ave libraries responded to these requests by integratin g searc h access through a sin- gle front end as the default search. I Impact of Web Search Engines Blecic et al. found that keyword searching increased from 13.3 percent to 28.3 percent over her four-year series of logs. At the same time, zero hits in keyword increased from 8.71 percent to 20.78 percent while subject zero hits dropped from 23 percent to 13.69 percent. She surmised th at the influence of Web interfaces might have affected the regression-fluctuation in search syntax, initial articles, and author order. 49 170 INFORMATION TECHNOLOGY ANO LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. automatically sco uts the Web for pa ges that are related to its res ults so it can find a large number of resources ver y qu ickly without requiring th e user to select the right keyw ord s . Teoma structures the appropriate com- munities of int eres t on-the-fly and ranks th e results on a range of facto rs including authorities and hubs (good resources pointing to related resources). Google offers an opti on of "similar pages." Whil e the subj ect-r edirect function in a Web-bas ed OPAC emulates thi s, it succ ee ds only if th e user 's initi a l search term y ielded the right result. OPAC u sers ha ve the option of clicking on hyper- linked h ea din gs (author, titl e, subject headin gs ) but can- not ask the sys tem to perform a more so phisticated sea rch on their behalf. User-Popularity Tracking Amazon and Barnes and Nobl e Web sites pr ese nt enhan- ced information about items b y user-popul arity tracking. Circulation stati stics or user comments could serv e as a form of "r ecommend er sys tem " to h elp novi ces narrow th eir selections. Messa ge s such as "o ther student s who checked this bo ok out also read thes e book s" could be dynamically in serted in bibliographic records. Users could also be allowed to pro vide comment s on mat eri als in the catalog, thus providin g an int era ctive experience for OPAC user s. Summary of Web Features There are positive and negati ve imp acts of Web sea rch engines and on line bookstores on Web-based OPAC u sers . U sers who find Web p ages to b e comfortable, easy, and familiar may mak e greater use of Web-ba sed OPACs. While th ey brin g with them their knowledge of search eng ine s, they also brin g their misp erce ption s. The possi- bility of using similar too ls to those found on Web sea rch en gine s can greatly "re infor ce the u sefuln ess of the cata- log as well as th e positiv e perc eption that th e end us er has of it." 61 Given the diver sity of the error s that u sers experi- ence , a co mbination of approaches is necessa r y to improve their search success. Automatic mapping of free- tex t-to -th esauru s term s, tran sla tion of common spelling mistak es, and links to related pages are to ols alr eady in use in th e Web sea rch engines . "See similar pages," exten- sive us e of releva nce feedback, and popularity track ing along with natural language are less common. I Recommendations for Web-based OPACs Th e authors' TLA rev ea led a continuing problem with subject-h ea din g searches and sho we d a trend toward searching top ics that are n ot typically answered in a bo ok catalog. The form er probl em ha s a well-documented hi s- tory, whil e the authors b elieve th e latt er probl em stems from the influence of th e Web and Web sear ch engin es . Severa l changes to typical OPACs are recommended to addr ess th e trend s observ ed in th e cour se of thi s study. Metasearching Th e recent trend of incorporating databases and OPAC s into a single sear ch reflects the neces sity of exp anding information resourc es and simplif ying access to resources. Thi s stud y's empirical results clearly indicate a need to exp and thi s integration into one sea rch. While some argu e that this metasearching w ill further au gment the syntax digr ess ion an d pr eve nt us ers from becom ing information literate, oth er s beli eve that metas ea rchin g, along with th e option of sear chin g each individu al d a tabase , is an ulti- m a te goal for onl ine search. Like it or not, the m etasearch technolog y, also known as federat ed or broadca st search, "crea tes a portal that could allow the libr ary to become the on e-stop shop th eir us ers and potential use rs find so attractive ."65 One- sea rch-for-all cannot solve all problems; how ev er, guidin g u sers to where the y are mo st likel y to find results quickly (the quick search) should sa tisfy th e ne ed s of th e majority of u sers . Menu Sequence Eff ec tive scree n d es ign h as a p osi ti ve e ffec t on user su c- cess. The m enu sequence for search opti ons plays a signif- icant role in user selection . This research and oth ers it h ave demonstr ated th at users choose an option hi gher rath er than lower in a list. Too many options "simply cause con- fu sion, at least for less experienced OPAC use rs." •• Browsing Feature Browsing is a natural and effective approach to m any information -seekin g problems an d requires less effort and knowledge on the part of th e u ser. The liter a ture sug- gests that a great deal of the use of th e Web relies on known Web sites, recommended sites , or return visits to sit es recently visited-thus relying on browsin g rather than on searching. Jenkins, Co rritore , and Widenb eck found that domain novice s seldom clicked very deep- out and b ack-while Web experts explor ed mor e deeply. 67 Holscher and Strub e not e that Hurtineene and Wandtke claim that only minimal trainin g is necessa ry for brow s- in g an individual Web site, whil e Pollok and Hockl ey claim that considerably more experience is req uired for qu ery ing and na viga tin g among sites. 68 Hancock -Beaulieu found that betwe en 30 p ercent and 45 percent of all online searches, reg ardl ess of th e typ e of search, ar e concluded with brow sing the librar y shelve s.69 176 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to implement user help throu gh tip s or tac tics select ed and accumulated from a collection of common u ser- searc h mistakes. In such a case, the system would play a more activ e role by generatin g relevant search tips on th e fly and using zero-hits search resul ts as a basis for gener - ating a spe ll check or sugg esting altern ate wording. An idea l scenario is th at OPAC allow s the user to pursue mu ltiple avenues of an inq uiry by entering fra g- ments of th e question, exploring vocabulary choices, and reformulating the search wi th the assis tan ce of var iou s spec ialized intelligent assistants. Borgman suggests that an OPAC should be jud ge d by whether the ca tal og answers questions rather th an merely mat ches queri es. She s ugges ts the need to design systems that are ba sed on behavioral models of h ow people ask questions, argu- ing th a t users still need to tra n sla te their question into what a sys tem will accept. " User Instruction On- site tr aini ng and online documentation can help mak e it eas ier to u se OPAC. With the adven t of information lit- eracy, the shi ft in librar y instruction from procedur e- based query formulation to question-being-answered has taken place. At CSULA, in struct ion for en try-level classes focu ses on formulating a research sta teme nt and then identifying keywords and alternate terms. The instruc - tion sess ions that follow the initia l-conce pt formulation are sh ort an d focus on how to en ter keyword or subject, a u t h 01~ a n d title, and th e u se of Boolea n operators. Thi s approac h may improve success until th e sys tems provid e th e tools to improve sea rch stra tegies or accept an unt rai ned user 's input. As an increas ing numb er of users access online librar y ca talogs remotel y, assistance needs to be embedded into intuitive sys tems. "Time invested in elaborate help sys- tems often is better spent in redesigning the user interfac e so that help is no longer n eeded." 74 Users are not willing to devote much of their time to learning to use these sys- tems. They just want to get th eir searc h results quickly and expec t the catalog to be easy to use w ith little or no tim e invested in learning th e sys tem. I Conclusion The em piri cal study repo rted in thi s paper indicates th at p rogress has been made in terms of increasing search suc- cess by improv ing the OPAC search int erfac e. The goal is to design Web-based OPAC systems for today's users who are like ly to bring a mental model of Web search engin es to the lib rary catalog. Web-b ased OPACs and Web search engi nes differ in terms of th eir sys tems and interfac e design. However, in most cases, these differences do not res ult in different sea rch charac teris tics by users. Researc h findings on the impact of Web searc h engines and u ser searc hing expectations and behavior should b e ade- quately utilized to guide the in terface design. Web users typically do n ot know how a search engine works. Therefore, fund amental fea tures in the desi gn of the n ext generation of th e OPAC in terface should includ e ch ang in g the search to allow natural-language searching wit h keyword search first, and focu s on meetin g th e quick-search need . Such a concep t-ba sed sea rch will allow u sers to enter natu ra l lan guage of their chos en top ic in the searc h bo x w hil e th e system maps the quer y to th e s tru cture and content of the database. Relevance feedb ack to allow the system to brin g back related page s, spe llin g correctio n, and relevan ce-ranked output remain key goals for future OPACs. References and Notes 1. Sharon Seymour, "On line Public-Access Catalog User Stud ies: A Revi ew of Research Methodologies, March 1986- November 1989," Library and Information Science Research 13 (1991): 89-102; Andrew Large and Ja mshid Beheshti , "OPACs: A Resear ch Review," Library and Information Science Research 19 (1997): 2, 111-33. 2. Ibid., 113-16. 3. Ibid., 116-20. 4. Thomas A. Peters et al.," An Introduct ion to the Special Sec- tion on Transaction-Log Analysis," Library Hi Tech 11(1993): 2, 37. 5. Thomas A. Peters, "The History and Developm ent of Transaction- Log Analysis," Library Hi Tech 11 (1993): 2, 56. 6. Pauline A. Cochrane an d Karen Markey, "Cata log Use Studies since th e Introdu ction of Onlin e Interactiv e Ca tal ogs: Impact on Design for Subj ec t Access, " in Redesign of Catalogs and Indexes for Improved Subject Access: Selected Papers of Pauline A. Cochrnne (Phoenix: Oryx , 1985), 159-84; Steve n A. Zink , "Moni- toring User Success th ro u gh Transac tion-Log Analysis: The WolfPAC Example," Reference Services Review 19 (Sprin g 1991): 449- 56; Michael K. Buckl and et al., "OAS IS: A Front End for Prototy ping Catalog Enhancements," Library Hi Tech 10 (1992): 7-22. 7. Thomas A. Peters, "When Smart People Fail: An Analysis of the Tra nsaction Log of an On line Public-Access Catalog," Journal of Academic Librarianship 15 (1989): 5, 267. 8. Michael K. Buckland et al., "OASIS," 7-22. 9. Deborah D. Blecic et al., "Using Transac tion-Lo g Ana lys is to Imp rove OPAC Retrieval Result s," College and Research Libraries (Jan. 1998): 48. 10. Peters, "Histo ry and Development of Transacti on-Log Analys is," 2, 52. 11. Cha rles R. Hildr eth , "The Use and Understanding of Key- word Searching in a Un iversity Online Catalog," Information Technology and Libraries 16 (1997): 6. 12. Ray R. Larson, "Th e Decline of Subject Searching: Long - Term Trends and Patt erns of Index Use in an Online Catalog," Journal of the American Society for Information Science and Technol- ogy 42 (1991): 3, 210. 178 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 13. John E. Tolle and Sehchang Hah, "O nline Search Patterns: NLM CATLINE Database," Journal of the American Society for Information Science 36 (Mar. 1985): 82- 93. 14. Carol Weiss Moore, "User Reac tion to Online Catalog s: An Exploratory Study," College and Research Libraries 42 (1981): 295-302; Joseph R. Matth ews et a l., Using Online Catalogs: A Nationwide Survey-A Report of a Study Sponsored by the Council on Library Resources (New York: N ea l-Schuman, 1983), 144. 15. Rhonda N. Hunter, "Success and Failures of Patrons Searching the Online Catalog at a Large Academic Library: A Transaction-Log Analysis," R.Q 30 (Spring 1991): 399. 16. Noelle Van Puli s and Lorne E. Ludy, "Subject Searching in an Onl ine Cata log with Aut h ority Contro l," College and Research Libraries 49 (1988): 526. 17. Hildret h, "Th e Use and Understanding of Keyword Searching," 6. 18. Ray R. Larson, "The Decline of Subjec t Searching," 3, 60. 19. Ibid. 20. Van Pulis and Ludy, "Subj ect Searching in an Onlin e Cat - alog," 527. 21. Karen Markey, Research Report on the Process of Subject Searching in the Library Catalog: Final Report of the Subject Access Research Project (repo rt no. OCLC /OP R/ RR-83-1) (Dub lin , Ohio: OCLC Online Co mput er Library Center, 1983), 529. 22. Pe ters, "The History and Deve lopment o f Transaction- Log Ana lysis," 2, 43. 23. Hi ldr eth, "The Use and Understanding of Keyword Searching," 8-9. 24. David R. Gerhan, "LCSH in vivo: Subje ct Searching Per- formance an d Strategy in th e OPAC Era," Journal of Academic Librarianship 15 (1989): 86-8 7. 25. Joan M. Cherry, "Improving Subject Access in OP ACS: An Exploratory Study of Conversion of Users' Queries," Journal of Academic Librarianship 18 (1992): 2, 98. 26. Rosemary Thorne and Jo Bell Whitlatch, "Patron On line Catalog Success," College and Research Libraries 55 (1994): 496. 27. Peters, "The History and Developmen t of Transaction- Log Analys is," 2, 48. 28. Ibid. 29. H unt er, "Succe ss and Failures," 400. 30. Ibid., 399. 31. Ibid., 400. 32. Peters, "The Histor y and Developmen t of Transa ction- Log Analysis," 2, 56. 33. Hunter, "Success and Failures," 400. 34. Hildreth , "The Use and Understandi n g of Keyword Searchi n g," 6. 35. Patricia M . Wa llace, "How Do Patrons Search th e Online C:, talog W h en No One 's Looking? Trnn sae tion-Log A nal ysis and Impli cation s for Bibliographic Instruction and System Desi gn, " RQ 33 (winter 1993): 3, 249. 36. Large and Beheshti, "OPACs: A Research Review," 125. 37. M. M. Hancock-Beaulieu , "Online Cata logue: A Case for the User," in The Online Catalogue: Developments and Directions, C. Hildreth, ed. (London: Library Association , 1989), 25-46. 38. Terry Ballard, "Com parative Searching Styles of Patrons and Staff," Library Resources and Technical Services 38 (1994): 293- 305. 39. Jane Scott et al.,"@*@ This Computer and the Horse It Rode in On: Patron Frustration and Failur e at th e OPAC" (in "Co ntinuity and Transformation : The Promise of Confluen ce": U SABi Li rs·"' I [: ,, ), B p l..JR l.i ""( ' " User Interface Consulting FED ERAT ED SEARCH tN(,lN ES 1.IBR;'.'\RY PORTALS & [)AT/\, (LN 'ITR S ()PACS f.." ( H i LDREi' l's Dl(, ITAL LIBR AR IES Ezra Schwartz LOCS (773) 256-1418 ezra@artandtech.com http://www.artandtech.com Proceedings of the ACRL 7th Nationa l Conference, Chicago: ACRL 1995), 247-56. 40. Thorne and Whitlat ch, "Patron On lin e Catalog Success," 496. 41. Blecic et al., "Usin g Tran sac tion-Log Ana lys is," 46. 42. Virginia Ortiz-Repiso and Purificac ion Moscoso, "We b- Based OP A Cs: Between Tradition and Innovation ," lnformntion Technology and Libraries 18, no. 2 (June 1999): 68-69. 43. Hildreth, "The Use and Understanding of Keyword Search- ing," 6. 44. Ortiz-Repiso and Mos coso, "Web-Bas ed OPAC s," 71. 45. Ibid., 75. 46. Chris tine Borgm an, "Why Are On line Catalogs Still Hard to Us e?" Journal of the Americnn Society for Information Science 47 (1996): 7, 501. 47. Van Pulis and Ludy, "Subje ct Searching in an Onlin e Cat - alog," 53. 48. Rla zek and Bilal , "Prob lems with OPAC: A Case Study of an Academic Research Library," RQ 28 (w int er 1988): 175. 49. Debora h D. Blecic et al., "A Longitud inal Stu dy of the Effects of OPAC Screen Changes on Searching Behavior and User Success," College and Research Library 60, no. 6 (Nov. 1999): 524,527. 50. Bernar d J. Jan sen and Udo Pooch, "A Revi ew of Web Searching Studies and a Framework for Future Resear ch," jour- nal of the American Society for Information Science and Technology 52 (2001): 3, 249-50. 51. Ibid., 250. 52. Blazek and Bilal, "Problems with OPAC: A Case Study," 175; Moore , "User Reaction to Online Cata logs," 295-302. THE IMPACT OF WEB SEARCH ENGINES ON SUBJECT SEARCHING IN OPAC I YU AND YOUNG 179 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 53. M. J. Bates, "The Design of Browsin g and Berry-Pickin g Techniques for the Onlin e Search Interfac e," Online Review 13 (1989): 5, 407-24. 54. Jan sen and Pooch, "A Review of Web Searc hing Studies, " 238. 55. Judy Luther, "Trumping Google? Metasearching's Promise," Library Journal 128 (2003): 16, 36. 56. Jack Muramatsu and Wanda Pratt, "Transparent Queries: Investigating Users' Mental Models of Search Engines," Research and Development in Information Retrieval Sept. 2001. Accessed Mar. 10, 2003, http://citeseer.nj.nec.com/muramatsuOltransparent. html. 57. Jans en and Pooch, "A Review of Web Searching Studie s," 235. 58. Luth er, 'T rumping Goog le," 36. 59. Blecic et a l., "A Lon gitudina l Study of th e Effects of OPAC Screen Changes," 527. 60. Sus an M. Colaric, "Ins truction for Web Searching: An Empirical Study," College and Research Libraries News, 64 (2003): 2. 61. A. G. Sutcliff, M. Ennis, and S. J. Watkinson, "Empirical Studies of End-User Informati on Searching," Joumal of the Ameri- can Society for Information Science and Tcchnologtj 51 (2000): 13, 1213. 62. "A ll About Google," Google. Accessed Dec. 10, 2003, www.google.com. 63. G. Salton, Introduction to Modern Information Retrieval (New York: McGraw-Hill, 1983), 18. 64. Orti z-Rep iso and Moscoso, "We b-Ba sed OPACs," 71. 65. Luth er, "Trumping Google," 37. 66. Maaike D. Kiestr a et al, "End-Us ers Searching th e Online Catalogue: The Influenc e of Domain and System Knowledge on Search Patterns. Experiment at Tilburg University," The Elec- tronic Library 12 (Dec. 1994): 335-43. 67. C. Jen kins et al., "Pa tterns of In forma tion Seeking on the Web: A Qualitative Study of Domain Expertise and Web Experti se," IT and Society l (Winter 2003): 3, 74,77. Accessed May 10, 2003, www.ItandSociety.org/. 68. C. Holscher and G. Strube, "Web Search Behavior of Inter- net Experts and Newbi es," 9th International World Wide Web Con- ference, (Amsterdam. 2000). Accessed Mar. 28, 2003, www9.org/ w9cdrom /8 1/81.html; A. Pollock and A. Hockley, "Wha t's Wrong with Internet Searching," D-lib Magazine (Mar. 1997). Accessed May 10, 2003, www.dlib.org/dlib/march97 /b t /03 pollo ck.h tml. 69. M . M . Hanco ck-Beau lieu , "On lin e Catalogue: A Case for the User," 25-46. 70. Wilbert 0. Galitz, The Essential Guide to User Interface Design: An Introduction to GUI Design Principles and Techniques (Chichester, England: Wiley, 1996). 71. Juliana Chan," An Evaluation of Displays of Bibliographic Records in OPACs in Canadian Academic and Public Libraries," MIS Report, Univ. of Toronto, 1995. [025.3132 C454E] 72. Giorgio Brajnik et al., "Strategic H elp in User Interfaces for Information Retriev al," Journal of the American Society for Information Science and Technology (JASIST) 53 (2002): 5, 344 . 73. Borgman, "Why Are Online Catalogs Still Hard to Use?" 500. 74. Ibid . 180 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004
9662 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Using a Native XML Database for Encoded Archival Description Search and Retrieval Cornish, Alan Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 181 Communications Using a Native XML Database for Encoded Archival Description Search and Retrieval Alan Cornish The Northwest Digital Archives (NWDA) is a National Endowment for the Humanities-funded effort by fifteen insti- tutions in the Pacific Northwest to create a finding-aids repository. Approximately 2,300 finding aids that follow the Encoded Archival Description (EAD) standard are being contributed to a union catalog by aca- demic and archival institutions in Idaho, Montana, Oregon, and Washington. This paper provides some information on the EAD standard and on search and retrieval issues for EAD XML documents. It describes native XML technology and the issues that were considered in the selection of a native XML database, Ixiasoft's TextML, to support the NWDA project. Pitti, one of the founders of the EAD standard, noted the primary motiva- tion behind the creation of EAD: "To provide a tool to help mitigate the fact that the geographic distribution of collections severely limits the abil- ity of researchers, educators, and oth- ers to locate and use primary sources."' Pitti expanded on this need for EAD in a 1999 D-Lib article: The logical components of archival description and their relations to one another need to be accurately identified in a machine-readable form to sup- port sophisticated indexing, navigation, and display that provide thorough and accurate access to, and description and control of, archival materials.' In a more recent publication, Pitti and Duff noted a key advantage offered by EAD that relates to the focus of this article, the development of an EAD union catalog: EAD makes it possible to pro- vide union access to detailed archival descriptions and resour- ces in repositories distributed throughout the world. . . . Libraries and archives will be able to easily share information about complementary records and collections, and to "virtu- ally" integrate collections related by provenance, but dispersed geographically or administra- tively.' In a 2001 American Archivist article, Roth examined EAD history and deployment methods used up to the 2001 time period. Importantly, two of the most prominent delivery systems described by Roth-DynaText (a server-side solution) and Panorama (a client-side solution)-were, by 2003, obsolete products for EAD delivery. This is indicative of the rapid pace of change in EAD deployment, in part due to the migration from SGML to XML technologies. Roth described survey results obtained on EAD deployment that underscore the rec- ognized need at that time for a "cost- effective server-side XML delivery system." The lack of such a solution motivated institutions to choose HTML as a delivery method for EAD finding aids.4 Articles like Roth's that describe specific EAD search-and-retrieval implementation options are in short supply. One such option, the Univer- sity of Michigan DLXS XPAT soft- ware, is employed for the search and retrieval of EAD and other metadata in the University of Illinois at Urbana- Champaign (UIUC) Cultural Heritage Repository. 5 Another option, harvest- ing EAD records into machine-read- able cataloging (MARC) to establish search and retrieval access in an inte- grated library system, was described by Fleck and Seadle in a 2002 Coalition for Networked Information Task Force briefing. Using an XML Harvester product created by Innova- tive Interfaces, MARC records are generated based upon MARC encod- ing analogs included in the EAD markup and loaded into an Innova- tive Interfaces INNOPAC system. 6 This product has been used to create access to EAD finding aids in the cat- alog for Michigan State University's Vincent Voice Library. In a 2001 article, Gilliland- Swetland recommended several desirable features for an EAD search- and-retrieval system. She emphasized the challenge of EAD search and retrieval by noting the nature of find- ing aids themselves: Archivists have historically been materials-centric rather than user-centric in their descriptive practices, resulting in the find- ing aid assuming a form quite unlike the concise bibliographic description with name and subject access most users are accustomed to using in other information systems such as library catalogs, abstracts, and indexes.' Without describing specific soft- ware tools, Gilliland-Swetland argued for a user-centric approach to the search and retrieval of finding aids by examining the needs of specific user communities such as genealogists, K-12 teachers, and historians. 8 Several initiatives similar to the NWDA effort are described in the professional literature. The Online Archive of California (OAC), which was founded in the mid-1990s, is a consortium of California special- collections repositories. A number of key consortium functions are central- ized, including "monitoring to ensure consistency of EAD encoding across all OAC finding aids" according to agreed-upon best practices, a critical need in the creation of a union cata- log.9 Brown and Schottlaender also describe the integration of the OAC into the California Digital Library, which enables linkages between EAD Alan Cornish (cornish@wsu.edu) is Sys- tems Librarian, Washington State Univer- sity Libraries, Pullman. USING A NATIVE XML DATABASE FOR ENCODING ARCHIVAL DESCRIPTION SEARCH AND RETRIEVAL I CORNISH 181 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. finding aids and digiti zed copies of original materials. 10 Finall y, one import ant develop- ment area is the po ssibilit y of inte - grating EAD docum ent s into Open Archives Initiative (OAI) services in order to enh ance resourc e discovery. A 2002 paper written by Prom and Habin g, both of whom work with th e UIUC Cultural Herit age Repository, explored th e possibility of mapping EAD to OAI, the latt er of w hich is based up on th e fifteen- eleme nt Dub- lin Cor e Metadata Set (unqualified) . While no ting, "w e do n ot propose that th e full capabiliti es of EAD find- ing aids could be subsum ed by OAI," Prom and Habing sug gested that it is possible to map the top-l eve l and co mpon e nt portions of EA D int o OAI, res ul tin g in multipl e OAI records from a singl e EAD finding aid. In thi s scenario, a sin gle OAI record is created from th e collection- level information and multipl e records from component-level infor- mation in an EAD docum en t.11 Evaluation of EAD Search and Retrieval Products In order to iden tify a software solution for supporting a union catalog of EAD findin g aids, the con so rtium con- ducted a product evaluation. The strengths and weakn esses of the native XML technology em ployed by the consortiu m can be best understood by lookin g at alternative XML prod- uct s and product categor ies . Table 1 shows the products con sidere d during an evaluati on period th at consisted of both product research and actual tri- als. In approaching the eva luation, the consortium and its union -catalog host institut ion , the Washin gton Stat e University Libraries, had seve ral spe- cific need s in mind. First, the licensing an d support costs for the product needed to fit w ithin the consortium's budget. Second , th e sea rch-and- retrieval softw are had to sup port sev- eral basic fun ctions: Keywo rd search- ing across all union-cat alog finding aids; specific field searching based upon elements or attribut es in the EAD docum ent ; an abilit y to cus- tomize the look and feel of the inter- face and search-results screens; and the ability to display search term(s) in the conte xt of the finding aid . As not ed in the tabl e, three of the ev aluated products are n ativ e XML databases. Cyrenne provid es a defi- nition of native XML as a database with the se features: • The XML document is stored intact: "t he XML d ocum ent is preserv ed as a separat e, unique entity in its entirety ." • "Schema independenc e," that is, "a ny we ll-formed XML docu- ment can be stored and queried." • The qu ery language is XML- based: "na ti ve XML d ata base vendors typically u se a quer y langua ge d es igned sp eci fically for XML" as opposed to SQL.12 Of the thr ee native XML products, only the licensi ng costs of Ixiasoft's TextML and the open-sourc e XIndice so ftware fell within the available proj- ec t fundin g. Both pack ages were extensively tested, with Text ML prov- ing superio r at handlin g th e large (sometimes in the MB-size range) and structurall y complex EAD documents crea ted by consortium memb ers. One key strength of TextML that m et an NWDA consortium-need involved field sea rching. In TextML, it is possibl e to m ap a search field to one or m ore XPath s ta tements , enabling th e crea tion of sea rch fields b ase d upon the precise us e of an ele- ment or attribute in EAD d ocuments. The importanc e of this capability is show n with th e EAD ele- ment, which can appear at the collec- tion lev el and at the sub or dinate component level in a docu men t. With TextML, usin g its limited XPath sup- port, it is p ossib le to refer ence a spe- cific, contextual use of . In addition to the native XML sol utions , seve ral oth er product 182 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 types were considered. An XML qu ery engine, Verity Ultra seek, was te sted and produced good results whe n u sed for the search and retrieval of consortium docum en ts. 13 Ultraseek can be used to search dis- crete XML files , supports th e creation of custom int erfaces for th e searc h- and-r etrie va l sys tem, and ha s strong documentation . Pro bably th e most obvious limit a tion in thi s XML qu ery- engin e product conc erned the crea tion of search fields. To contras t U ltr asee k with a native XML solu- tion : Ultras eek 5 .0 (used du r ing the product trial) lacked XPath support. Inste ad, it requir ed a uniqu e eleme nt- attr ibute combin ation for the crea tion of a databa se sea rch field . Returning to th e exa mple , cont extual u ses of could n ot be indexed with o ut recoding consor- tium docum ent s to create a unique eleme nt-attribut e combination on which to ind ex. An XML-enabled databa se, DLXS XPAT, has b een successfully used in se veral EAD projects, including OAC. One d isadv antage of this product is th at it re quir es a UNIX operating sys tem for th e se rver. A dditionall y, XPAT, as a supporting toolse t for di gital-library collection building, provid es functionalit y that duplicates other media tool s at the ho st institution (specifically, OCLC/ DiM eMa CONTENTdm). The use of a Relational Dat abase Management System (RDBMS) to es tablish sea rch and retri eva l for EAD XML d ocume nts was con sid- ered as well. Th e advantage to thi s approac h is th at it would ena ble the u se of codin g techniques built up through other Web-based media d elivery proj ects at the ho st institu- tion. The mo st obvio us negati ve issue is th e need to map XML elements or attributes to tables and field s in an RDBMS, which , as Cyrenne notes, "is often expensiv e and will m ost likely res ult in the loss of some dat a suc h as processing in stru ctions , and com- ments as well as the noti on of ele- me nt and attribut e orderin g." 14 The Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 1. NWDA project---€valuated search and retrieval products Product Vendor Product category License MySQUPHP N/ A Relationa l database management system Open so urce Tamino XML Server Software AG Native XML database Nat ive XML database XML query engine Native XML database Integrated library system XML enabled database Commercial Commercial Commercial Open source Commercial Commercial Textml lxiasoft Ultraseek Verity Xindice N/A XML Harvester Innovative Interfaces XPAT DLXS use of native XML avoids the task of explodin g XML data in to the tabl e and field struc ture s of an RDBMS. Finally, another approac h consid- ered was the use of an integrated library sys tem product. This was a realistic option for NWDA becaus e consortium member institutions had decid ed to include MARC encoding catalogs for selected elements in union-catalog findin g ai ds. Inn o- vative Int er faces produces an XML Harve ste r that can be u sed to gen er- ate MARC records from EAD findin g aids th a t include MARC encoding analo gs. For this proj ect, a local ( or self-cont aine d) cat alog could hav e been created and p opulated with MARC records containing metadat a for th e EAD docum en ts, includin g a URL for online access. This approach offers important strengths and weak- nesses . On the positiv e side, it is a relati ve ly eas y meth od for enablin g search-and-retrieval access to EAD findin g aids. In contrast to the int er - face coding requirement s for TextML, the XML Harvest er provided an almost tu rn key approach to XML search and retrieval. On the negativ e side, tw o factors stood out during th e evaluation . First, it would be difficult to fully custom ize sea rch-and- retrieval interfaces as needed for th e proj ec t. Second, u sing the XML Harv ester, there is no ability to dis- play searc h terms in the context of the findin g aid. Search and retrieval is bas ed upon the m etadata extract ed from th e finding aid usin g th e MARC analogs. In Michigan State's Voice Librar y impl eme ntation of thi s so lu- tion , th e finding aid is an external resource with no hi ghlighting of search ter ms . Strengths and Weaknesses of the TextML Approach Each p roject has it s ow n specific n eed s; thu s, th ere is no correct approach to establishing searc h and retrieval for EAD XML documents. In taking th e needs and resources of th e NWDA conso rtium into account, Ixiaso ft's TextML, a nati ve XML prod - uct, pr ovi ded the best fit and was licens ed for u se. The use of TextML enables the creation of customized interfac es for an XML d atabase (or Docum en t Base, u sing the TextML terminol ogy) and pro vides support for ke yword and field sea rching of consortium documents. The qualified XPath support in TextML enables search fields to be built up on precis e element or attribute combinations wi thin EAD document s. The existence of a major finding- aids Int erne t site empl oy ing TextML was a factor in the proj ect's selection of the sof tware . Th e Acces s to Archive s (A2A) site, access ible from URL www .a2a.pro.gov.uk / , provid es an excell ent model for a publicly sea rchabl e finding-aid sit e. Th e A2A site supp orts keyw ord searching and sea rchin g b y arc hival facility; pro- vides multiple views of sea rch results (a summary recor ds scree n, sea rch ter ms in cont ext, and th e full rec ord); highlights searc h term(s) in the dis- played findin g aid ; and supp or ts the presentation of lar ge findin g -aid doc - ument s. While A2A u ses Ge neral Internation al Standard Arc hival Description, or ISAD(G), as op posed to EAD for its description standard, the similaritie s between th e two stan- d a rd s m akes th e A2A site a va luable example for d eve lopment. '5 One w eakness of TextML is the implementation model supported by Ixiasoft, whi ch assumes significant local de velopme nt of the app lication or Web int er face. Th e rela tionship b etween sof tw are cap abiliti es and local dev elopme nt was con sidered with each of the produ cts listed in tab le 1. As no ted , th e Innovative Interfaces so lution was th e most straightfor wa rd approach , assu ming the existenc e of the MARC analogs in EAD marku p, but provid ed the least flexibility in terms of customization an d establishing a tru e linkage between the searc h system and the actual document. In contra st, while Ixiasoft m akes available a base set of active server pages using visual basic script (ASP / VBScript) code for TextML app lication de velop ment and provides very goo d trainin g and sup- port ser vices, the resp onsi bility for USING A NATIVE XML DATABASE FOR ENCODING ARCHIVAL DESCRIPTION SEARCH AND RETRIEVAL I CORNISH 183 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that d evelopm ent rests with the loca l site . For the NWDA consortium, this development, using the co de base, ha s been manag ea ble. The curr ent state of interface dev elopment for the NWDA proj ect can b e reviewed at http: // nwd a.ws ulibs .ws u.edu / project_info /. Conclusion In se le cting a n EAD se arch- an d- retr ieva l sy s te m, on e important qu es ti on for th e con so rtium was, Whi ch software so lution had the be st prosp ects for migration in the futur e ? Becau se of th e inherent strength s of native XML tec hnolog y in compari- son to the other product catego r ies listed in table 1, a nativ e XML d a ta- base appeared to be the be s t approach , and Tex tML pro v ided the best combination of lic ensing costs, softw are capabilities, and support. It is import a nt to not e that the di s- tinctions betw ee n nativ e XML d ata- bas es and databases that supp or t XML throu g h extensions (XML- enabl ed datab ases) 1nay b eco me more difficult to di scern ov er time, in part due to the ex isting expertise and inv es tments in RDBMS techn o lo - gies.16 Ne verthel ess, capabilities cen- tral to native XML, such as the us e of an XML-bas ed query language, are integral to th e success of such h ybrid systems . References and Notes 1. Daniel Pitti, "Enc oded Archi va l Description: The Dev elop ment of an Encoding Standard for Archival Findin g Aids," The American Archivist 60, no. 3 (Summer 1997): 269. 2. Daniel Pitti, "Encod ed Archi va l Descrip tion: An Introducti on and Over- view," D-Lib Magazine 5, no. 11 (Nov. 1999). Accessed Nov. 2, 2004, www.dlib. org / dlib / novemb er99 / 11 pi tti.h tml. 3. Daniel V. Pitti and Wendy M. Duff (eds.), "Introdu ction ," in Encoded Archival Description on the Internet (Binghamton, N.Y.: Haworth, 2001), 3. 4. James M. Roth, "Serving Up EAD: An Exploratory Stud y on the Deploy- ment and Utili zation of Encoded Arch- iva l Description Findin g Aids, " The American Archivist 64, n o. 2 (Fall/ Winter 2001): 226. 5. Sarah L. Shreeves et al., "H arvest- ing Cultural Her itage Metada ta Using the OAI Protocol," Library Hi Tech 21, no. 2 (2003): 161. 6. Nanc y Fleck and Michael Seadle, "EAD Harv es ting for the Na tional Gallery of the Spoken Word" (pap er pre- sent ed at th e Coalition for Netw orked Information fall 2002 Task Force meeting, San Antoni o, Tex., Dec. 2002). Accessed Nov. 2, 2004, www.cn i.org/ tfms/20 02b. fall/ handout s/ H-EAD-FleckSeadl e.doc . 7. Anne J. Gilliland -Swetland, "Popu- larizing the Finding Aid : Exploiting EAD to Enhance Online Discovery and Retrie- val," in Encoded Archival Description on the Internet (Binghamton, N.Y.: H aw orth, 2001), 207. 8. Ibid , 210-14. 9. Charlotte B. Brown and Brian E. C. Schottlaender, "The Online Archive of Cal- ifornia: A Consor tia! Approach to Encoded Archival Description ," in Encoded Archival Description on the Internet (Binghamton, N .Y.: Haworth , 2001), 99. 10. Ibid, 103-5. OAC available at: www. oac.cd lib. o rg/. Accessed Nov . 2, 2004 . 11. Christ ophe r J. Prom and Thomas Habing, "Using the Op en Archives Initia- tive Protocols with EAD," in Proceed ings of the Second ACM/ IEEE-CS Joint Confer- ence on Digital Libraries (Portland, Ore., July 2002). Accessed Nov. 2, 2004, http:// dli .grainger. ui uc.ed u / publ ications/ jcdl20 02/ pl4prom .pdf . 12 . Marc Cyre nn e, "Going N ative : Wh en Should You Use a Nativ e XML Database?" AIM E-DOC Magazine 16, no. 6 (Nov./ Dec. 2002), 16. Accessed Nov. 2, 2004, www .edocmag az ine.com / ar ticle_ new.asp?ID=25421. 13. Product categor y decisions based up on definitions and classifications avail- able from : Ronald Bourret, "XML Database Products." Accessed Nov. 2, 2004, www. rp bourret.com / xml / XMLD a t a b a se Prods.htm. 14. Cyrenne, "Going Native, " 18. 15. Bill Stockting, "EAD in A2A," Microsoft PowerPoint present at ion. Accessed N ov. 2, 2004, www.agad .archiwa. gov.pl/ ead / stocking.ppt. 16. Uw e Ho henst ein, "Supporting XML in Oracl e9i," in Akmal B. Chaudhri, 184 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Awais Rashid, and Roberto Zicari (eds.), XML Data Management: Native XML and XML-Enabled Database Systems (Boston: Add ison -Wesley, 2003), 123-4 . Using GIS to Measure In-Library Book-Use Behavior Jingfeng Xia This article is an attempt to develop Geographic Information Syst ems (GIS) technologi; into an analytical tool for exam- ining the relationships between the height of the bookshelves and the behavior of library readers in utiliz ing books within a library. The tool would contain a database to store book-use information and some GIS maps to represent bookshelves. Upon ana- lyzing the data stored in the database, dif- ferent frequ encies of book use across bookshelf layers are displayed on the maps. The tool would provide a wonderful means of visualization through which analysts can quickly realize the spatial distribution of books used in a library. This article reveals that readers tend to pull books out of the bookshelf layers that are easily reachable by human eyes and hands, and thus opens some issues for librarians to reconsider the management of library collections. Several years ago , when working as a library ass istant reshelving books in a univer sit y library, the author noted that the majority of books used inside th e librar y were from the mid-range layers of b oo kshelv es . That is , by pro- portion, few book s pulled out by librar y rea ders were from the top or b ottom layers. Books on the layers that were ea sily re achable by readers were frequently utilized . Such a b oo k-u se distribution patt ern mad e th e job of reshelving books easy, but created some inquiries: how could book locations influ ence th e choices of read ers in selecting books? If this was not a n isolat ed observ a tion, it must have exposed an int ere sting
9663 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Using GIS to Measure In-Library Book-Use Behavior Xia, Jingfeng Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 184 that development rests with the local site. For the NWDA consortium, this development , using the code base, ha s been manageable. The current stat e of interface dev elop ment for the NWDA project can be reviewed at http :// nwda. wsulibs .wsu.edu / project_info /. Conclusion In se lecting an EAD searc h- and- retrieval system, one important qu es tion for th e consortium was, Which software solution had the best prosp ects for migration in the futur e? Because of the inherent strength s of nativ e XML technology in comp ari - son to the other product categories list ed in tabl e 1, a nati ve XML data- bas e appeared to be the best appro ach, and TextML provided the best combination of licensi ng costs, software capabilities, and support. It is important to note that the dis- tinctions betw een nativ e XML data- bas es and databases that support XML throu gh extensions (XML- enabled databa ses) ma y b eco me more difficult to dis cern over time, in part du e to the existi ng exper tise and in vestme nts in RDBMS technolo- gies. 16 Nevertheless, capabilities cen- tral to native XML, such as the us e of an XML-based query language, are integral to th e success of such hybrid syst ems . References and Notes 1. Daniel Pitti , "Encoded Archival De scriptio n: The Development of a n Encoding Standard for Archival Finding Aids ," The American Archivist 60, no. 3 (Summ er 1997): 269. 2. Daniel Pitti, "Encod ed Archival Des cription: An Introducti on and Over- vi ew," 0-Lib Magazine 5, no. 11 (Nov. 1999). Accessed Nov. 2, 2004, www.dlib. org / dlib / november99 / 11 pitti.html. 3. Daniel V. Pitti and Wendy M. Duff (ed s.), "Introduction," in Encoded Archival Description on the Internet (Binghamton, N.Y.: Haworth, 2001), 3. 4. James M . Roth, "Serv ing Up EAD: An Exp lorat o ry Study on the Deploy- ment and Utilization of Encod ed Arch- ival Description Finding Aids," The American Archivist 64, no. 2 (Fall /Win ter 2001): 226. 5. Sarah L. Shreeves et al., "Har ves t- ing Cultural Heritage Metadata Using the OAI Protocol," Library Hi Tech 21, no. 2 (2003): 161. 6. Nan cy Fleck and Michael Sead le, "EAD Harvesting for the National Ga llery of th e Spoken Word" (pap er pre- sent ed at th e Coa liti on for Netw orke d Information fall 2002 Task Force meeting, San Anton io, Tex., Dec. 2002). Accessed Nov. 2, 2004, www.cni .org/tfms/2002b. fall/handouts/H-EAD-FleckSeadle.doc. 7. Anne J. Gilliland -Swe tland , "Po pu- larizi ng th e Finding Aid : Exploiting EAD to Enhance Online Discover y and Retrie- val," in Encoded Archival Description on the Internet (Bing h a mton , N.Y.: Haworth, 2001), 207. 8. Ibid, 210-14. 9. Charlott e B. Brown and Brian E. C. Schottlaender, "The Onlin e Arch ive of Cal- ifornia: A Consortia! Approach to Encode d Archival Descrip tion, " in Encoded Archival Description on the Internet (Bingham ton, N .Y.: Haworth, 2001), 99. 10. Ibid , 103-5. OAC ava ilable at: www . o ac.c dlib.org / . Accessed Nov. 2, 2004. 11. Christopher J. Prom and Thomas Habing, "Using the Open Archiv es Initia- tive Protocols w ith EAD," in Proceed ings of th e Second ACM/IEEE-CS Joint Confe r- ence on Digit al Librari es (Portland, Ore., July 2002). Accessed Nov . 2, 2004, http:// dli .grai ng er.uiu c.edu / publications / jcdl20 02/ p14prom.pdf. 12. Marc Cyrenne, "Go ing N at ive: Wh en Should You Use a Native XML Database?" AIM E-DOC Magazine 16, no . 6 (Nov./Dec. 2002), 16. Accessed Nov. 2, 2004, www. edo cmaga zine.com/ article_ n ew.as p?ID=2 5421. 13. Product categor y decisions ba sed upon definiti ons and classifications avail- able from: Ronald Bourret, "XML Database Products." Accessed Nov. 2, 2004, www. rpbourret .com/ x ml / XMLDa ta base Prods .htm. 14. Cyrenn e, "Going Native," 18. 15. Bill Stockting, "EAD in A2A," Microsoft Power Point pres entation. Accessed N ov. 2, 2004, www.agad.a rchiwa . gov.pl/ ead /s tocking.pp t. 16. Uwe Hohenstein, "Supp orting XML in Oracle9i ," in Akm a l B. Chaudhri , 184 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Awais Rashid, and Rob erto Zic ar i (eds.), XML Data Management: Native XML and XML-Enabled Database Systems (Boston: Addison-Wesley, 2003), 123-4. Using GIS to Measure In-Library Book-Use Behavior Jingfeng Xia This article is an attempt to develop Geographic Information Systems (GIS) technology into an analytical tool for exam- ining the relationships between the height of the bookshelves and the behavior of library readers in utilizing books within a library. The tool would contain a database to store book-use information and some GIS maps to represent bookshelves. Upon ana- lyzing the data stored in the database, dif- feren t frequencies of book use across bookshelf layers are displayed on the maps. The tool would provide a wonderful means of visualization through which analysts can quickly realize the spatial distribution of books used in a library. This article reveals that readers tend to pull books out of the bookshelf layers that are easily reachable by human eyes and hands, and thus opens some issues for librarians to reconsider the management of library collections. Several years ago, when working as a library assistant reshelving books in a university librar y, the author noted that the majority of books used inside the library were from the mid-range laye rs of bookshelves. That is, b y pro- portion , few books pulled out by library readers were from the top or bottom layers. Books on the layers that were easily reachable by readers were frequentl y utilized . Such a book-us e distribution patt ern made the job of reshelving books easy, but created some inquiries: how could book locati ons influ ence th e choices of readers in selecting books? If this was not an isolated observation, it must have exposed an inter es ting Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. phenomenon that librarians needed to pay attention to . Then , by finding out the reasons , librarians might becom e capable of guiding, to some extent , us ers' selectiv eness on library books by deliberately arranging col- lections at design ated heights on book sh elves. A research study was designed to develop Geographical Information Systems (GIS) into an analytical tool to examine former casual observa- tions by the author. The study was conducted in the MacKimmie Library at the University of Calgary. Thi s paper highlights th e results of the study that aimed at assessing th e behavior of library readers in pulling out books from bookshelves . Thes e book s, when not checked out, are cat- egoriz ed as "pickup books" becau se they are usually discarded inside a library after use and then picked up by library assistants for reshelving. Like many other libraries , the MacKimmie Library does not encourage reasd ers to reshelve books th emse lves. ArcView, a GIS software, was selected to develop th e tool for this study because GIS ha s the functions of dynamicall y analyzing and di s- playin g spatial data. The research on library readers pullin g out books involv es the measur emen ts of book- shelf heights, and thu s deals with spatial coordinates. With the capabil- ity of presenting book shelves in dif- ferent views on map s, GIS is able to provide readers with an easy und er- standing of the anal ytical results in visual forms, which make any textu al description s wordy . At the same time, some GIS products are avail- able now in most academic libraries, thus giving develop ers convenient access to use. Hypothesis When library users decide to check books out of a library, the se books are what the y think of as useful. Peopl e are usually hesitant to carry home books that are of little or uncertain use, not only because of the limit on the numb er of check-out books , but also bec ause of the physical work required for carrying them. Moreover, some items, such as periodicals and multimedia materials, are either des- ignated as "refe rence only" or have a very short loan period . It is reasonable to beli eve that user s carefully select what they want from library collec- tions and keep these book s for handy use outside the library. By contrast , in-library book use repre sents a different category of library readers' behavior . There are two general categories of in-library book us e: readers bringin g their own books into a library for use, and readers pulling out book s from book- shelves inside a librar y. The former is commonly seen when students study textbook s for examinations (not the topic of this study), whil e the latter is a little more complex. 1 As library users approach book- shelves to extract book s, th ey may or may not hav e a definit e target. When coming with call numb ers, peopl e will deliberately draw the books they want for reading, photoc opyi ng , or referencing. Ho wever, there are time s when user s on ly wander in bookshelf aisles of desired collections, uncer- tain about singling out specific books . Th ey may simply shelf-shop to randomly select whatever is inter- esting to them, or they may locate a subject of need and go to the storage position(s) to look for whatever books are there. No matter what these readers' intention s are, they roam among collections, pick book s for quick u se, and leave them inside the library after use, although some materials may also be checked out. Because of such arbitrary selec- tions from library collections , physi- cal con venie nce sometimes influence s library users in takin g books from booksh elves-they ma y look around for books on bookshelf layers that are at a reach able height. The standard library bookshelf is hi gher than the average person's height and is struc- tured to have five to eigh t layers. In aca demic libraries, "wood shelving is available in three heights: 82 in. (2050 mm), with a bottom shelf and six ad justabl e shelves; 60 in. (1500 mm), with a b ottom shelf and four adjustable she lves; and 42 in. (1050 mm), with a bottom shelf and two adjustable shel ves ." 2 For regular col- lections in mo st academic libraries, bookshelve s are usually about eighty- two inches high and hav e seven lay- ers. Books on the top lay er are out of reach for many reader s, requiring them to use a ladder to draw a book from it. Many users are hesitant to use ladders. Even worse, a reader will have to bend over or squ at down to view the contents of books on the bot- tom layer of a bookshelf . Hence , the hypothe sis is that books used inside a library are prima- rily distribut ed among the mid-ranged layers of bookshelves. Specifically, if a bookshelf ha s seven lay ers, books placed on layers two through six are most frequently consulted. This is the subject of this research paper . Background A considerable number of studies have investigated the utilization of books that are checked out of a library. An esti mate made in 1967 pointed out that over seven hundred research results pertained to this topic. ' How ever, the situation of books used inside a library has not been given enough attention. One of the reasons for this seeming neglect comes from the belief that the records of library book s in circulation provide similar info rma tion as those of books used within libraries." Thi s misunder- standing wa s lately criticized by other researchers who discov ere d the dif- ferences in use behavior between Jingfeng Xia (jxia@email.arizona.edu) is a student at the School of Information Resources and Library Science at the University of Arizona, Tucson. USING GIS TO MEASURE IN-LIBRARY BOOK-USE BEHAVIOR I XIA 185 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. libr ary readers takin g books h ome and those using books inside librar- ies. 5 Research ers hav e now recog - ni zed that correlations between the two sets of data are n ot as strong as they seemed to be. Such reco gnition, unfortunately, ha s not resulted in mor e consequ ent work to explor e the issu e of in-lib rary book use. This is probabl y due to th e difficulties of co llecting data or the la ck of appropriate research meth- ods .6 Also, the majority of rele va nt surv eys w ere conducted several de cades ago and focu sed primaril y on exp loring a go od method of sam - plin g in-library book us e.7 Am ong the se studies , Fu ssler and Simon pre- ferr ed to carry out researc h by dis- tributing questionnaires am ong library reader s; Drott u sed random- sampling m et hods to statisti ca lly examine th e importance of librar y- book use; and Jain, as well as Salv erso n, emphasized dividing th e survey time s into differ ent investi ga- tion units when conducting res earch. Simil a rly, M orse point ed out the compl ex ity of measurin g lib rary- book u se a t wo rk , advocating an involv ement of computerized opera- tion s in librar y-book man ag ement. The sampling strategies and ana- ly tical methods implemented in pa st studie s are still applicable to curr ent res earc h. Non etheless, because many new technol ogie s ha ve come into view since th en, it is quite likel y tha t som e new ways of obtaining and analy zing th e d ata of in-library book use can now be developed. Th e n ew app roac hes must have the capability of providing not only accurate m eas- urem ent of the data but also the me ans for easy manipulation . Th eir result s must be able to enhance th e und ers tandin g of us er behavio r in expl ori ng th e reso urc es of existing collection inv entorie s . One of th e solutions is an analytic al tool. An analytical tool can control data collection and anal ysis by computeri- zati on . If the system is ab le to accu- mul ate const antly upd ated records ov er time, it will remedy the probl em of poor sampling th at man y resear- chers hav e encount ered, be cause an alysis will then b e done on all the data rather th an w ith certain isolated samples. The development of m odern technologi es makes such data collec- tion and storage po ssible and easier than ever before. On e exampl e of the technologi es is the radio freque ncy identification (RFID) tag system that ha s been adopted b y some public and acade mic librar ies recently.8 Thi s sys- tem stores a tag in each librar y item with the item's biblio gra phic informa- tion, and uses an antenna to keep tr ack of th e tag. By automatically com- municating with dat a stored in the tags, the system can collect dat a on all librar y collections in a timel y manner and export them into pred esigned d atabases for easy man ag ement. Data an a lys is and pres enta tion comprise ano ther p ar t of the an aly ti- cal mechani sm. Researc hers h ave to carefully evaluate existing technolo- gies in order to select prop er prod- u cts or de ve lop parti cular pro gra ms to integrate with RFID (if used) and th e databases. It is fortunate th at GIS techno log y is available with numer- ous functi ons for analyzing and demonstrating data , especiall y spa- tial data. Da ta visuali za tion through GIS produ cts has been very good, which giv es them advantages over other analytic al, stati stical, or repor t- in g produ cts. Combining RFID and GIS into one system would seem to be th e per- fect solution-the former can effec- tive ly carry out dat a collection and th e latter can efficiently perfo rm data analysis and presentation. H ow ever, while GIS products h ave been u sed in libraries in the Unit ed States for more th an a dec ade, mo st academic lib- raries are hesitant to invest in RFID because of its high costs . GIS technol- ogy alone, however , can still provide sufficient functions to be dev eloped into such an analytical tool. Up to n ow, tho se librarie s that have provid ed GIS serv ices only use the software that assists in the uti- lization of geospatial data and map- 186 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 ping te chnologie s for users .9 GIS is not expl oited enou gh to aid the man- age ment of librari es them selves and the res earch of librar y collections. Some commercial GIS software, such as "Lib rary Decision" by Civic-Tech- nologi es, has be en recently marketed to support the analysis of library- user d a ta for public libraries. 10 Ho wev er, it only wor ks w ell on the data of conventional geographical nature, that is, th e distributi on and location of librari es and th eir users with the mapping of city bl ocks and streets . It does not app ly to a librar y an d its books, and especiall y not to the distribution of books us ed insid e th e librar y. Such products are also not ap plicabl e to acad emic librari es that do not always concentrate on the ana lysis of geog rap hical area s of their us ers. Even so, GIS h as all the function s that such a propos ed analytical tool demands. It is suit able for assisting in the research of in-library book us e where library floor layout s or other facilities can be d raw n into maps on multiple-dimensional views. At the same tim e, bookshelves wi th individ- ual lay ers can be treated as an innova- tive form of map by GIS technology (see figur e 1), makin g visible the rela- tionship of book u se to the height of the book sh elf. As soon as th e presen- tation mechanism is linked to data- bases, any updat es on book use will be mirror ed visuall y. Method This proj ect is one of a serie s of proj- ects for deve lopin g GIS into a tool to manag e and anal yze the u sage char- acteristi cs of library books . The other projects include u sing GIS to measure book u sability for the de velopm ent of collection inventorie s; to assist in the managem ent of libr ary physic al space an d facilities; and to locat e library items . 11 In order to make GIS workable for the subject of this paper , the focus was placed only on the exploration of corr ela tions b e tween b ooks helf Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 1. The front view of one bookshelf rack on the fifth floor of the University of Calgary MacKimmie Library. Eight bookshelves assemb le the range. Here, different shades of color represents the numbers of books used on each individual layer. The display is only for demonstration and not to actual scale. height s and book-use frequenci es in an aca demic library environm ent. Th ere are two major step s to con- ductin g this research : collectin g data and d ev eloping a GIS anal y tical tool. Since MacKimmie Librar y did not in ves t in RFID at th e tim e thi s resea rch was undertak en , p ersonal ob servations were mad e to record b ook-use data. 12 The dev elopm ent of th e GIS tool involves creatin g a sm all d a tabase to store data and facilit ate d ata analysis. It also requir es creatin g seve r al bookshelf and sh elf-r an ge m ap s to pre sent anal ytical result s in visualized forms. Arc View-the mo st p opul ar GIS produ ct in th e w orld- was ut ilize d for the de ve lopment. This paper presents only a p or tion of co llection areas at MacKimmie Library. Part of the fifth floor, wh ere som e collections of humaniti es and social sciences are stored, w as selected becau se this floor is amon g the busi est of th e floors used by read ers. It is filled with sixty-eight ran ges of b ook- shelves containin g book s from call numbers B to DU. The terms used in this paper includ e bookshelf, referring to one unit of furnitur e fitted with horizontal sh elves to h old book s; rack, which includ es more than one bookshelf standin g tog eth er in a line ; and range, comp osed of two racks standing b ack-t o-back. Bookshelves on the fifth floor are arr anged to sur- round a group of facility rooms in the central area. Stud y corridors are set between booksh elves and the wall. Each booksh elf ran ge consists of two bookshelf rack s, each of which in turn has eight individual book- shel ves . All of the book shel ves are about eight y-two in ches high and have seven laye rs. Th e laye rs, except for the top on es th a t are open, are equal in height , w idth , and length. Data Collection Personal surv eys wer e taken by the author to not e d own each call number of books that w ere n ot in their origi- n a l p os ition s on the sh elv es, but in stead were found discard ed on the floo r, tables, chairs, sofas , or on top or in front of other stocked book s . Boo ks on th e sh elving carts ar e also account ed for. The surveys we re sep- ar ately con ducted three times a d ay - mo rnin g, afternoon, and ev en in g- in ord er to cat ch as m any book s u sed in a day as p oss ible. To avoid reco rdin g the sa me boo k mor e than on ce, n o duplicat e call numbers w ere acce pt ed for any single da y even thou gh th e sam e book wa s found in diff erent locations on that day. On the oth er hand , the sam e call number coul d be ent ere d int o the records on th e second day alth ough it was recorded th e d ay befo re a nd remained in th e sa m e pla ce w ith out b eing pick ed up by librar y ass is tants . (Thi s dupli ca te reco rdin g was ve ry rare beca use of th e routin e work of book pi ckup by libra ry ass istants.) A period of two w eeks w as d esignated for the sur vey in th e first h alf of December 2002. Th e final exam in ation week was pl ann ed becau se it represents a week of h ea vy book u se, although previous resea rch found th at readers in this w ee k tend ed to u se library collection s less th an their own stud y mat erials." A suppl em ent a ry surve y th a t a lso las ted two w eeks, includin g a final exam ina tion wee k, wa s condu cted in th e lib ra ry in late spring 2004. To simplify the rese arch , some excepti ons w ere established for d a ta collection. Pe riodicals were exclud ed beca use th ey have a very short loa n p er iod (gen erall y one day) . Libra ry u sers m ay pr efer to read articl es in journ als w ithin the library and thu s w ill h av e a clear idea as to wh a t m aterials to read. '' Books belon ging to oth er floo rs of the librar y, o r b oo ks b elon g ing to th e fifth floor but found out sid e th e area were not includ ed in th e an alysis. Furthermore, du e to the n atur e and time limit of thes e ob ser - v ation s, b ooks pulled out of tar geted bo okshelves were not distingui sh ed from b oo ks taken from book sh elves at rand om . Thi s information can onl y become ava ilable throu gh int erv iew s USING GIS TO MEASURE IN-LIBRARY BOOK-USE BEHAVIOR I XIA 187 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with library users, which can be another rese arch project. Each book shelf laye r wa s recorded with and signified by two call num- bers: the start and end numbers of books. For example, the call numbers "BF1999 .K54" to "BH21 .B35 1965," representing books stored on a partic- ular layer, were record ed to identify that layer . Because book shifting can happen from time to time, such recording of start and end call num- bers for individual book shelf layers only reflects the condition s when this research wa s undertaken and may need updates whenever changes occur. Data Manipulation and Visualization Using a bookshelf lay er as the recording unit is essenti al for the analysi s of the relationship between book use and bookshelf height. Each book used can be classifi ed to fit in one unit according to the call num - ber of the book. Therefore , building a databas e with a table for lay ers will be an important part in the develop- ment of such an analytic al tool. The LAYERS tabl e includes a data field as an identifi e r to stand for the sequenc e of e ach layer-1 for the top layer, 2 for th e next layer down , and so on , in addition to storing the start and end call numbers of books for each lay er. If more than one book- shelf in th e library has seven layers, layer identifiers will it erate from booksh elf to bookshelf . Therefore, this tabl e will also need an identifier for each individual book shelf with which lay ers are associated. The dat abase will also contain such information as bookshelf ranges, bookshelf racks, and books , all of which are individual database tables that are joined with each other by relational keys. Among them, the RANGES table is simply character- ized by its id entifier, and is designed to repre se nt two rack s of book - shelves that stand back to back. The BOOKSHELVES table is identified by the call numbers of the start and end books stor ed across individual bookshelves rather than on individ- ual layers. Furthermore, th e BOOKS table is primarily filled with the data of individual book call numbers as well as book pickup time s and book discard locations . GIS h as lim ited ability for orga n- izing da tab ase struc tu re. If n ecessa ry, oth er da tab ase managemen t sys tem s, su ch as Microsof t Access, can b e incor p ora ted . Qu ery codes are built to ge t su mmarize d infor m ation for speci fic p ur poses, and th e agg re- ga ted da ta are exp or ted int o GIS data bases for fur the r sp a tial an alysis or con venie nt vis u al prese nt ati on . Da ta vis u aliza tion can be show n at differe nt leve ls- by layer, books helf, rack, and range . Th e firs t attempt at ma king a vis u al dem on stra tion of this researc h is for th e area of in di - vi du al b ooks helves at layer leve l (see figur e 1). Th e follow in g qu ery w ill return necessary summ arize d infor- ma tion: SELECT sum(b.call_no) AS total_num, l.layer_id, l.shelf_id FROM (BOOKS b INNER JOIN LAYERS l ON b .some_id = l.some_id) WHERE b.call_no > l.start_no and b.call_no < l.end_no GROUP BY l.layer_id, l.shelf_id ORDER BY l.shelf_id, l.layer_id. At the same time, another attempt is made to d emonstrate book num- bers per layer, at bookshelf level, across multipl e bookshelf ranges. This demonstration provides a better visualization in the GIS di splay so that an ov erall view of the height distributions of book usage over cer- tain collection areas can be presented (see figures 2 and 3). To achi eve such visualization, data must be com- pared in order to get information about which layer of a bookshelf 188 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 contains the most frequently used books and which holds thos e that are rarely visit ed . This demonstration indicates that any alternative selec- tion of analytical-display units can be easily performed by making mod- ifications on the query that works on aggregating data . Technically, data visualization can be presented by using an y GIS soft- ware, although ArcView is used here because it has been availabl e in the systems of many academic libraries. Bookshelf ranges in MacKimmie Library 's fifth floor were drawn into map features . In order to show them with a three-dim ensional view, each of the seven layers was given a sequential number as its height value , and all book shelves were treated as having the same height. These height values are tre a ted as the z values in any three-dimensional analysis. Then, by associating the numbers of books from the database with the heights of layers on the map, ArcView is able to sketch the hei ght distributions of in- library book us e in new perspectives, dramatically improving the under- standing of book use. In order to implement the visual- iza tion of all layers across a book- shelf range, lay ers were drawn as map features (see figure 1). Layer heights and widths are in appropri- at e proportion . (Individual book s on each layer are for demonstration only, and thus are not in the exact shape and number.) Figure 1 shows how a bookshelf rack has been pre- sented as a GIS map, which is a totally new idea in the applications of GIS visualization . The databas e and visualization mechanism constitute what is referred to in this paper as the analyt- ical tool. One will find that th e devel- opment is relatively easy and the tool is incredibly simple. However, it is a dynamic device. If expanded into other parts of the library collections, this tool will become an integrated system that is able to assi st in the management of library book use and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ••••m•==== ===----::"'-:-=-=-=-=-=-=-=-=-=-=-===:-::::-::-":"".:-:-~.,Jgl4 file Edit 7:J Scene Iheme .S1.liace 6t~ s ~ I:!~ IJID ~ ~ liiffl~[i] !HJ~ ~~~I§]~ [QI (ill ---- ..................... _ -~ ¥.l ,, Ill Figure 2. A three-dimensional view of bookshelf ranges on the fifth floor at the MacKimmie Library. The height of each bookshelf represents the corresponding height of the layer from which most books were removed. This display is not to actual scale. I -'! st a,t IIJ gJ.1V Once the appropriate pieces of HTML code had been replaced with corresponding include statements, the changeover was complete. From this point forward, changes in such things as database names, coverage periods, and descriptive material will be made to one .txt file. The change will immediately be reflected across all subject pages with no additional work involved for the librarians responsible for those pages. Note that the SUL server has been configured so that it parses all Web pages. This is necessary because most of the library's Web pages have some SSL This configuration means that the Web page extensions remain .html. If the server is not configured in this manner, then all pages con- taining SSI must end in a .shtml extension. This is a subject that requires discussion with automation librarians or the department respon- sible for the library's server. Advantages Obviously, the biggest advantage to this method is the time saved for individual librarians. There is now no need for librarians to do any maintenance work for links to infor- mation housed in the alphabetical list. Static HTML pages referencing Gale's InfoTrac OneFile database, for instance, would have required updates to approximately forty sub- ject pages; now, one librarian can cor- rect one .txt file and simultaneously update all forty subject pages. Time saved can be used in collecting and editing the list of Web sites that are a part of each subject page; this is a task that has been pushed back in the past, in favor of making more urgent database information changes. .! i Coffeecup HTML Editor - www.CoffeeCup.com , f.ile J;:dit ~iew Q.ocument [nsert E,ormat Iools \'Lindow t!elp J . [~ · lrl· ~ ~ I ~ @ • ! .. ,,., ,. IX IQ :II"' t;~ ~ . ! '1&1 t':f ~ ft; • •&;pl .;11= • · 1;9l· ,,. ®·. !Ai· ,.,i · ~ . fil• · ~ · ~ · ee . !.l!J • Edrt j Preview I Help I r Academic Sea rch Elite a> ( EBSCO)   ; < img src= " / image::: /f ulltext. gif " a lt = "some full text " border = "O"> multi-di sc iplinary database includes some sc hol ar ly articles
Fig. 1. HTML Code for Academic Search Elite Using a PURL Called eb-ase El Microsoft Excel 4lphabeticalResources.xls l~ Ole ~dit 'iiew Insert FQ.rmat Ioofs Qata Y!,indow t1elp D Q§; !iii ,f6I 'M It [l. ~ 1 nth till, • ~ AcceSsibleA~h ives · accessible.Ix! Fig. 2. Database Names, .txt File Names, and Resultant Include Commands In addition, librarians who are using this simple technique do not need extensive training. The creation of the Excel database of include com- mands allows for quick additions to an existing page, or the creation of new subject pages. Librarians using the include commands can simply copy and paste them; there is no need for them to understand the syntax or to be able to repeat it. This makes using SSI particularly attractive to staff who do not want the added bur- den of further training in HTML. The librarian responsible for creating the .txt files and the Excel database of statements demonstrated the copy- ing and pasting of the include state- ments to all the other librarians who edit HTML pages in a one-time ten- minute training session. The only additional training issue has involved page structure. Since the library uses a table structure for the subject pages, all table tags are included in the database .txt files. Making sure that librarians under- stand that they do not need to recre- ate the table tags has been the only additional training issue for the department. As librarians begin to use these commands, links to resources across subject pages will look the same and will provide the user with the same information. This increased unifor- mity results in a more professional appearance for the Web site as a whole. Disadvantages This revolution in the maintenance of subject pages has not been without its disadvantages. The primary com- plaint by librarians using SSI include commands is that they cannot pre- view their changes in their HTML editors. SUL's department uses the CoffeeCup HTML Editor, which allows previews, but the previews are not visible for items that are retrieved using SSis. This is because the page is not fully assembled until the server assembles it. When the librarian views the page in the editor, 196 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. prior to uploading it to th e server, the include commands are without tar - gets. The target .txt files are on the server. When a user requ ests a p age, include commands pull in the missing pieces (the .txt files, or other files); th en, th e completed pag e is seam- lessly presented to the us er via his or her brow ser. As Mach notes, "Pre- view ing a Web page without crucial element s . . . can be di sconcerting, esp ecially to visuall y oriented d esign- ers."20 In SUL's experienc e with thi s particular issue, librarian s who are uncomfortable loading pages with locally invisible elements can load th em into temporary fold ers on the server, check them for errors there, and then move them to th eir appro - priate dir ectories . Conclusion Situational factors have allowed SUL to imple ment this change with sur- prising ease and speed. Because the library has its own server, and because th ere is an automation librar- ian on staff, communicati on and chan ge have been easy and efficient. Librar y staff deduce that it is becau se the include command of SSI is b eing u sed more than other possible com- mands that the librar y is not experi- encing an increase in loadin g tim e on its pages. Of course, the size of SUL's reso urce list makes this kind of solu- Art & Tec h EBSCO tion feasible ; certainly, if the librar y were working with hundreds of resources, it would be more likely that a datab ase -driv en strategy would be ad op ted . The simplicity and elegance of the SSI include com- mand process has encourage d adop- tion, and SUL ha s seen no ill effects from the us er side of operations. Librarian Web au th ors qui ckly over- came any slight di sco mfort with the new proc ess and are now able to devote a portion of editing time to other, less m ono tonous tasks. References and Notes 1. Carla Dun smore, "A Qualitative Study of Web-Mounted Pathfinders Cre- ated by Academic Business Libraries," Libri 52, no . 3 (Sept. 2002): 140-41. 2. Charles W. Dea n , "Th e Public Elec- tronic Libr ary : Web-based Subj ec t Guides," Library Hi Tech 16, no. 3-4 (1998): 80-88; Gary Rob erts , "Designi ng a Data- base-Driven Web Site, or, The Evolution of the Infoiguan a," Computers in Libraries 20, no. 9 (Oct. 2000): 26-32; Bryan H. Davidson, "Database-Driven, Dynamic Content Delivery: Providing an d Manag- ing Access to Online Resources Using Microsoft Access and Ac ti ve Server Pages," OCLC Systems and Services 17, no . 1 (2001): 34-42; Marybeth Grimes and Sara E. Morris , "A Co mp ari so n of Acade- mic Librarie s' Webliographies, " Internet Reference Services Quarterly 5, no . 4 (2001): 69-77; Laur a Ga lv an -Estra da, "Moving towards a User-Cent ere d, Database-Dri- ven Web Site at th e UCSD Libraries," Index to Advertisers 179 200 LITA Internet Reference Services Quarterly 7, no. 1-2 (2002): 49-61. 3. Roberts, "Infoiguana "; Davidson, "Da tabase Driven"; Galvan- Estrada, "User -Cen tered, Database-Driv en Web Site." 4. Davidson, "Database Driven," und er " Int roduction ." 5. Ibid., under "Developm ent Con- side ra tions." 6. Roberts, "Infoiguana ," 32. 7. Ga lvan-Estrada, " U ser -Centered, Database-Driven Web Site, " 55-56. 8. Jody Co ndit Fagan, "Server -Side Includ es Made Sim ple, " The Electronic Library 20, no. 5 (2002): 382-83 . 9. Michelle Mach, "The Service of Serv er -Side Includes," Information Tech- nology and Libraries 20, no. 4 (2001): 213. 10. Greg R. Notess, "Serv er Side Includes for Site Management," Online 24, no. 4 (July 2000): 78, 80. 11. Ibid. 12. Mach, "Se rvice of Server-Side Includ es," 216. 13. Ibid., 214. 14. Fagan, "Server -Side Includ es M ade Simple," 387. 15. Ibid., 383. 16 . Ibid. 17. Ibid. 18. Apache HTTPD Server Project, "Apac h e HTTP Server Version 1.3: Secu - rity Tips for Server Configurati on," Th e Apache Softwar e Foundation. Accessed Oct. 29, 2003, http: / / httpd. apac he.org/ docs / misc / sec urity _tips .html. 19. An th on y Baratta, e-mail to th eLis t mailing list, May 16, 2003, Accessed Nov . 4, 2003, http:/ / lists.evolt.or g/ archive/ Week-of-Mon-20030512/140824.html. 20. Mach, "Service of Serv er -Side Includ es," 217. cover 2, 191, covers 3--4 USING SERVER-SIDE INCLUDE COMMANDS I NORTHRUP, CHERRY, AND DARBY 197
9665 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Free Culture: How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity Coyle, Karen Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 198 Book Review Free Culture How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity By Lawrence Lessig . New York: Penguin, 2004. 240p. $24.95 (ISBN 1- 594-20006-8). This is the third book by Stanford law professor Larry Lessig, and the third in which he furthers his basic theme : that the ancient regime of intellectual property owners is locked in a battle with the capabilities of new technol- ogy. Lessig u sed his first book, Code and Other Laws of Cyberspace (Basic Books, 1999), to explain that the notion of cyberspace as free, open, and anar- chic is simply a myth, and a danger- ous one at that: the very architecture of our computers and how they com- municate determine what one can and cam10t do within that environ- ment. If you can get control of that architecture, say by mand ating filters on cont ent, yo u can get subs tantial control over the culture of that com- munication space. In his sec ond book, The Future of Ideas: The Fate of the Commons in a Connected World (Random, 2001), Lessig describes how the chang e from real prop erty to vir- tual propert y actually means more opportunity for control , not less. The theme that he takes up in Free Culture is his conc ern that certain power- ful inter ests in our society (read: Hollywood) are using copyright law to lock down the very stuff of creativ- ity: mainly , pa st creativity. Lessig himself admits in his pref- ace that his is not a new or unique argument. He cites Richard Stallman's writings in the mid-1980s that became the basis for the Free Software move- ment as containing many of the same concepts that Lessig argues in his book. In this case, it serves as a kind of proof of concept (that new ideas build on past ideas) rather than a criticism of lack of originality. Stallman's work is not, however, a substitute for Lessig's; not only does Lessig address popular culture where Stallman addresses only computer code, but Lessig has one key thing in his favor: h e is a mast er story-tell er and a darned good writer, not something one usually expec ts in an academic and an expert in constitutional law. His book opens with the first flight osf the Wright brothers and the death of a farmer's chick ens, followed by Buster Keaton's film Steamboat Bill and Disney's famous mouse . Th e next chapter traces the history of photogra- phy and how the law once considered that snapping a picture could require prior permission from the owners of any property caught in th e view- finder. Later he tells how an improve- ment to a sea rch engin e led one college student to owe the Recording Industry Association of America $15 million. Throughout the book Lessig illustrates copyright through the lives of real people and uses histor y, sci- ence, and the arts to mak e this law come to life for the reader . Lessig explains that intellectual property differ s from real property in the eye of the law. Unlike real prop- erty, where th e property owner has near total control over its uses, the only control offered to authors origi- nally was the control over who could make copies of the work and distrib- ut e them. In addition, that right-the "copy right" -lasted only a short time. The original length of copyright in the United States was fourteen years, with the right to renew for another fourteen years. So a total of twenty-eight years stood betwe en an author's rights and the public domain, and those rights were limited to publishing copies. Others could quote from a work, even derive other works from it (such as turning a no ve l into a play) , all within a law that was designed to promote science and the arts. Fast forward to the present day and we have a very different situation. Not only has there been a change in th e length of time that copyright applies to a work; a major change in 198 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Tom Zillner, Editor copyright law in 1976 extended copy- right to works that had not previously b een covered. In the earli es t U.S. copyright regimes of the late 18th cen- tury, only works that were registered with the copyright office were afforded the prot ection of copyright law, and only about five perc en t of works produc ed were so registered. Th e rest were in the public domain. Later, actual registration with the copyright office was unnecessary but the author was required to place a copyright notice on a work (e.g ., "© 2004, Karen Coyle") in order to claim copyright in it. Copyright holder s had to renew works in order make use of the full term of protection, and renewal rates were actually quite low. In 1976, all such requirements were removed, and the law was amended to state that any work in a fixed m edium automatically receives copy- right protection, and for the full term. That is true even if the author do es not want that protection . So although many saw the great exchange of ideas an d information on the Internet as being a huge commons of knowledge, to be shared and sha red alike, a ll of it has, in fact, alwa ys been covered by copyright law-every word out there belongs to someone. That chang e, combined with a much earlier change that gave a copyright holder control over deriv- ative works, puts creators into a deadlock. Th ey cannot safely build on the work of others without per- mission (thus Less ig's argument that we are becomin g a "permission cul- ture ") . Yet, we have no m echanism (such as registration of works that would result in a databas e of cre- ators) that would facilitate getting th at permission . If you find a work on the Internet and it has no named author or no contact information for the author, the law forbids you to reuse the work without permission, but there is nothing that would make getting that permission a man- ageable task. Of course, even if you do know who th e rights hold er is , permission is not a given. For exam- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ple, you hear a great song on the radio and want to use parts of that tune in your next rap performance. You would need to approach the major record label that holds the rights and ask permission, which might not be granted. You could go ah ead and use the sample and, if challenged, claim "fair use." But being challenged means going to court in a world where a court case could cost you in the six digits, an amount of money that most creators do not have. Lessig, of course, spends quite a bit of time in his book on the length of copyright, now life of the author plus seventy years. It was exactly this issue that he and Eric Eldred took to the Supreme Court in 2003. Lessig argued before the court that if Congress can seemingly arbitrarily increase the length of copyright, as it has eleven times since 1962, then there is effectively no limit to the copyright term. Yet "for a limited time" was clearly mandated in the U.S. Constitution. Lessig lost his case. You might expect him to spend his efforts explaining how the Supreme Court was wrong and he was right, but that is not what he does . Right or wrong, they are the Supreme Court, and his job was to convince them to decide in favor of his client. Instead, Lessig revises his estimation of what can be accom- plished with constitutional argu- ments and spends a chapter outlining compromises that might- just might-be possible in the future. To the extent that Eldred v. Ashcroft had an effect on Lessig's thinking , and there is evidence that the effect was profound, it will have an effect on all of us because Lessig is one of the key actors in this arena. Throughout the book, Lessig points out the difference between copyright law and the actual market for works. There is a great irony in the fact that copyright law now protects works for a century or more while most books are in print for one year or less. It is this vast storehouse of out-of- print and unexploited works that makes a strong argument for some modification of our copyright law. He also recognizes that there are different creative cultures in our society, with different views of the purpose of cre- ation. Here he cites academic move- ments like the Public Library of Science as solutions for the sector of society that has a low or nonexistent commercial interest but a need to get its works as widely distributed as pos- sible. For these creators, and for "shar- ers" everywhere, Lessig promotes the CreativeCommons solution (at www. creativecommons.org), a simple licen- sing scheme that allows creators to attach a license to their work that lets others know how they can make use of it. In a sense, CreativeCommons is a way to opt out of the default copyright that is applied to all works. When I first received my copy of Free Culture, I did two things: I looked up libraries in the index, and I looked up the book online to see what other reviewers had said. Online, I found a Web site for the book (http:/ /free-culture.org) that pointed to two very interesting sites: one that lists free, downloadable full- text copies of the book in over a dozen different formats; and one that allows you to listen to the chapters being read aloud by volunteers and admirers. (I did listen to a few chap- ters and generally they are as listen- able as most nonfiction audio books. In the end, though, I read the hard copy of the book.) Lessig is making a point by offering his work outside the usual confines of copyright law, but in fact the meaning of his gesture is more economic than legal. Al- though he, and Cory Doctorow before him (Down and Out in the Magic Kingdom, Tor Books, 2003), bro- kered agreements with their publish- ers to publish simultaneously in print with free digital copies, few authors and publishers today will choose that option for fear of loss of revenue, not because of their belief in the sanctity of intellectual property. If there were sufficient proof that free online copies of works increased sales of hard copies, this would quickly become the norm, regardless of the state of copyright law. As for libraries-unfortunately, they do not fare well. He dedicates a short chapter to Brewster Kahle and his Way-Back Machine as his example of the need to archive our culture for future access. I admit that I winced when Lessig stated: But Kahle is not the only librar- ian. The Internet Archive is not the only archive. But Kahle and the Internet Archive suggest what the future of librarie s or archives could be. (114) Lessig also mentions libraries in his arguments about out-of-print and inaccessible works, but in this case he actually gets it wrong: After it [a book] is out of print , it can be sold in used book store s without the copyright owner getting anything and stored in libraries, where many get to read the book, also for free. (113) Since we know that Lessig is very aware that books are sold and lent even while they are still in print, we have to assume that the elegance of the argum ent was preferred over preci- sion . But he makes this error mor e than once in the book, leaving librarie s to appear to be a home for leftov ers and remaindered works. That is too bad. We know that Lessig is aware of libraries; anyone active in the legal profession depends on them. He has spoken at library-related conferences and events. Yet he does not see libraries as key players in the battle against overly powerful copyright interests . More to the point, libraries have not captured his imagination, or given him a good story to tell. So here is a challenge for myself and my fel- low librarians: whether it means chat- ting up Lessig after one of his many public performances, becoming active in CreativeCommons, or stopping by Palo Alto to take a busy law professor to lunch , we need to make sure that we get on , and stay on, Lessig's radar . We need him ; h e needs us.-Karen Coyle, Digital Libraries Consultant, http:// kcoyle.net BOOK REVIEW 199
9718 ---- June_ITAL_Fagan_final An Evidence-Based Review of Academic Web Search Engines, 2014-2016: Implications for Librarians’ Practice and Research Agenda Jody Condit Fagan AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 7 7 ABSTRACT Academic web search engines have become central to scholarly research. While the fitness of Google Scholar for research purposes has been examined repeatedly, Microsoft Academic and Google Books have not received much attention. Recent studies have much to tell us about Google Scholar’s coverage of the sciences and its utility for evaluating researcher impact. But other aspects have been understudied, such as coverage of the arts and humanities, books, and non-Western, non-English publications. User research has also tapered off. A small number of articles hint at the opportunity for librarians to become expert advisors concerning scholarly communication made possible or enhanced by these platforms. This article seeks to summarize research concerning Google Scholar, Google Books, and Microsoft Academic from the past three years with a mind to informing practice and setting a research agenda. Selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas. INTRODUCTION Recent Pew Internet surveys indicate an overwhelming majority of American adults see themselves as lifelong learners who like to “gather as much information as [they] can” when they encounter something unfamiliar (Horrigan 2016). Although significant barriers to access remain, the open access movement and search engine giants have made full text more available than ever.1 The general public may not begin with an academic search engine, but Google may direct them to Google Scholar or Google Books. Within academia, students and faculty rely heavily on academic web search engines (especially Google Scholar) for research; among academic researchers in high-income areas, academic search engines recently surpassed abstracts & indexes as a starting place for research (Inger and Gardner 2016, 85, Fig. 4). Given these trends, academic librarians have a professional obligation to understand the role of academic web search engines as part of the research process. Jody Condit Fagan (faganjc@jmu.edu) is Professor and Director of Technology, James Madison University, Harrisonburg, VA. 1 Khabsa and Giles estimate “almost 1 in 4 of web accessible scholarly documents are freely and publicly available” (2014, 5). AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 8 Two recent events also point to the need for a review of research. Legal decisions in 2016 confirmed Google’s right to make copies of books for its index without paying or even obtaining permission from copyright holders, solidifying the company’s opportunity to shape the online experience with respect to books. Meanwhile, Microsoft rebooted their academic web search engine, now called Microsoft Academic. At the same time, information scientists, librarians, and other academics conducted research into the performance and utility of academic web search engines. This article seeks to review the last three years of research concerning academic web search engines, make recommendations related to the practice of librarianship, and propose a research agenda. METHODOLOGY A literature review was conducted to find articles, conference presentations, and books about the use or utility of Google Books, Google Scholar, and Microsoft Academic for scholarly use, including comparisons with other search tools. Because of the pace of technological change, the focus was on recent studies (2014 through 2016, inclusive). A search was conducted on “Google Books” in EBSCO’s Library and Information Science and Technology Abstracts (LISTA) on December 19, 2016, limited to 2014-2016. Of the 46 results found, most were related to legal activity. Only four items related to the tool’s use for research. These four titles were entered into Google Scholar to look for citing references, but no additional relevant citations were found. In the relevant articles found, the literature reviews testified to the general lack of studies of Google Books as a research tool (Abrizah and Thelwall 2014; Weiss 2016) with a few exceptions concerning early reviews of metadata, scanning, and coverage problems (Weiss 2016). A search on “Google Books” in combination with “evaluation OR review OR comparison” was also submitted to JMU’s discovery service,2 limited to 2014-2016 in combination with the terms. Forty-nine items were found and from these, three relevant citations were added; these were also entered into Google Scholar to look for citing references. However, no additional relevant citations were found. Thus, a total of seven citations from 2014-2016 were found with relevant information concerning Google Books. Earlier citations from the articles’ bibliographies were also reviewed when research was based on previous work, and to inform the development of a fuller research agenda. A search on “Microsoft Academic” in LISTA on February 3, 2017 netted fourteen citations from 2014-2016. Only seven seemed to focus on evaluation of the tool for research purposes. A search on “Microsoft Academic” in combination with terms “evaluation OR review OR comparison” was also submitted to JMU’s discovery service, limited to 2014-2016. Eighteen items were found but no additional citations were added, either because they had already been found or were not relevant. The seven titles found in LISTA were searched in Google Scholar for citing references; four additional relevant citations were found, plus a paper relevant to Google Scholar not 2 JMU’s version of EBSCO Discovery Service contained 453,754,281 items at the time of writing and is carefully vetted to contain items of curricular relevance to the JMU community (Fagan and Gaines 2016). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 9 previously discovered (Weideman 2015). Thus, a total of eleven citations were found with relevant information for this review concerning Microsoft Academic. Because of this small number, several articles prior to 2014 were included in this review for historical context. An initial search was performed on “Google Scholar” in LISTA on November 19, 2016, limited to 2014-2016. This netted 159 results, of which 24 items were relevant. A search on “Google Scholar” in combination with terms “evaluation OR review OR comparison” was also submitted to JMU’s discovery tool limited to 2014-2016, and eleven relevant citations were added. Items older than 2014 that were repeatedly cited or that formed the basis of recent research were retrieved for historical context. Finally, relevant articles were submitted to Google Scholar, which netted an additional 41 relevant citations. Altogether, 70 citations were found to articles with relevant information for this review concerning Google Scholar in 2014-2016. Readers interested in literature reviews covering Google Scholar studies prior to 2014 are directed to (Gray et al. 2012; Erb and Sica 2015; Harzing and Alakangas 2016b). FINDINGS Google Books Google Books (https://books.google.com) contains about 30 million books, approaching the Library of Congress’s 37 million, but far shy of Google’s estimate of 130 million books in existence (Wu 2015), which Google intends to continue indexing (Jackson 2010). Content in Google Books includes publisher-supplied, self-published, and author-supplied content (Harper 2016) as well as the results of the famous Google Books Library Project. Started in December 2004 as the “Google Print” project,3 the project involved over 40 libraries digitizing works from their collections, with Google indexing and performing OCR to make them available in Google Books (Weiss 2016; Mays 2015). Scholars have noted many errors with Google Books metadata, including misspellings, inaccurate dates, and inaccurate subject classifications (Harper 2016; Weiss 2016). Google does not release information about the database’s coverage, including which books are indexed or which libraries’ collections are included (Abrizah and Thelwall 2014). Researchers have suggested the database covers mostly U.S. and English-language books (Abrizah and Thelwall 2014; Weiss 2016). The conveniences of Google Books include limits by the type of book availability (e.g. free e- books vs. Google e-books), document type, and date. The detail view of a book allows magnification, hyperlinked tables of contents, buying and “Find in a Library” options, “My Library,” and user history (Whitmer 2015). Google Books also offers textbook rental (Harper 2016) and limited print-on-demand services for out-of-print books (Mays 2015; Boumenot 2015). In April 2016, the Supreme Court affirmed Google’s right to make copies for its index without paying or even obtaining permission from copyright holders (Authors Guild 2016; Los Angeles Times 2016). Scanning of library books and “snippet view” was deemed fair use: “The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do 3 https://www.google.com/googlebooks/about/history.html AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 10 not provide a significant market substitute for the protected aspects of the originals” (U.S. Court of Appeals for the Second Circuit 2015). Literature concerning high-level implications of Google Books suggests the tool is having a profound effect on research and scholarship. The tool has been credited for serving as “a huge laboratory” for indexing, interpretation, working with document image repositories, and other activities (Jones 2010). At the same time, the academic community has expressed concerns about Google Books’s effects on social justice and how its full-text search capability may change the very nature of discovery (Hoffmann 2014; Hoffmann 2016; Szpiech 2014). One study found that books are far more prevalently cited in Wikipedia than are research articles (Kousha and Thelwall 2017). Yet investigations of Google Books’ coverage and utility as a research tool seem to be sorely lacking. As Weiss noted, “no critical studies seem to exist on the effect that Google Books might have on the contemporary reference experience” (Weiss 2016, 293). Furthermore, no information was found concerning how many users are taking advantage of Google Books; the tool was noticeably absent from surveys such as (Inger and Gardner's (2016) and from research centers such as the Pew Internet Research Project. In a largely descriptive review, Harper (2016) bemoaned Google Books’ lack of integration with link resolvers and discovery tools, and judged it lacking in relevant material for the health sciences, because so much of the content is older. She also noted the majority of books scanned are in English, which could skew scholarship. The non-English skew of Google Books was also lamented by Weiss, who noted an “underrepresentation of Spanish and overestimation of French and German (or even Japanese for that matter)” especially as compared to the number of Spanish speakers in the United States (Weiss 2016, 286-306). Whitmer (2015) and Mays (2015) provided practical information about how Google Books can be used as a reference tool. Whitmer presented major Google Books features and challenged librarians to teach Google Books during library instruction. Mays conducted a cursory search on the 1871 Chicago Fire and described the primary documents she retrieved as “pure gold,” including records of city council meetings, notes from insurance companies, reports from relief societies, church sermons on the fire, and personal memoirs (Mays 2015, 22). Mays also described Google Books as a godsend to genealogists for finding local records (e.g. police departments, labor unions, public schools). In her experience, the geographic regions surrounding the forty participating Google Books Library Project libraries are “better represented than other areas” (Mays 2015, 25). Mays concludes, “Its poor indexing and search capabilities are overshadowed by the ease of its fulltext search capabilities and the wonderful ephemera that enriches its holdings far beyond mere ‘books’” (Mays 2015, 26). Abrizah and Thelwall (2014) investigated whether Google Books and Google Scholar provided “good impact data for books published in non-Western countries.” They used a comprehensive list of arts, humanities, and social sciences books (n=1,357) from the five main university presses in INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 11 Malaysia 1961-2013. They found only 23% of the books were cited in Google Books4 and 37% in Google Scholar (p. 2502). The overlap was small: only 15% were cited in both Google Scholar and Google Books. English-language books were more likely to be cited in Google Books; 40% of English language books were cited versus 16% Malay. Examining the top 20 books cited in Google Books, researchers found them to be mostly written in English (95% in Google Books vs 29% in the sample), and published by University of Malaysia Press (60% in Google Books vs 26% in the sample) (2505). The authors concluded that due to the low overlap between Google Scholar and Google Books, searching both engines was required to find the most citations to academic books. Kousha and Thelwall (2015; 2011) compared Google Books with Thomson Reuters Book Citation Index (BKCI) to examine its suitability for scholarly impact assessment and found Google Books to have a clear advantage over BKCI in the total number of citations found within the arts and humanities, but not for the social sciences or sciences. They advised combining results from BKCI with Google Books when performing research impact assessment for the arts and humanities and social sciences, but not using Google Books for the sciences, “because of the lower regard for books among scientists and the lower proportion of Google Books citations compared to BKCI citations for science and medicine” (Kousha and Thelwall 2015, 317). Microsoft Academic Microsoft Academic (https://academic.microsoft.com) is an entirely new software product as of 2016. Therefore, the studies cited prior to 2016 refer to entirely different search engines than the one currently available. However, a historical account of the tool and reviewers’ opinions was deemed helpful for informing a fuller picture of academic web search engines and pointing to a research agenda. Microsoft Academic was born as Windows Live Academic in 2006 (Carlson 2006), was renamed Live Search Academic after a first year of struggle (Jacsó 2008), and was scrapped two years later after the company recognized it did not have sufficient development support in the United States (Jacsó 2011). Microsoft Asia Research Group launched a beta tool called Libra in 2009, which redirected to the “Microsoft Academic Search” service by 2011. Early reviews of the 2011 edition of Microsoft Academic Search were promising, although the tool clearly lacked the quantity of data searched by Google Scholar (Jacsó 2011; Hands 2012). There were a few studies involving Microsoft Academic Search in 2014. Ortega and Aguillo (2014) compared Microsoft Academic Search and Google Scholar Citations for research evaluation and concluded “Microsoft Academic Search is better for disciplinary studies than for analyses at institutional and individual levels. On the other hand, Google Scholar Citations is a good tool for individual assessment because it draws on a wider variety of documents and citations” (1155). 4 Google Books does not support citation searching; the researchers searched for the book title to manually find citations to a book. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 12 As part of a comparative investigation of an automatic method for citation snowballing using Microsoft Academic Search, Choong et al. (2014) manually searched for a sample of 949 citations to journal or conference articles cited from 20 systematic reviews. They found Microsoft Academic Search contained 78% of the cited articles and noted its utility for testing automated methods due to its free API and no blocks to automated access. The researchers also tested their method against Google Scholar, but noted “computer-access restrictions prevented a robust comparison” (n.p.). Also in 2014, Orduna-Malea et al. (2014) attempted a longitudinal study of disciplines, journals, and organizations in Microsoft Academic Search only to find the database had not been updated since 2013. Furthermore they found the indexing to be incomplete and still in process, meaning Microsoft Academic Search’s presentation of information about any particular publication, organization, or author was distorted. Despite this finding, MAS was included in two studies of scholar profiles. Ortega (2015) compared scholar profiles across Google Scholar, Microsoft Academic Search, Research Gate, Academia.edu, and Mendeley, and found little overlap across the sites. They also found social and usage indicators did not consistently correlate with bibliometric indicators, except on the ResearchGate platform. Social and usage indicators were “influenced by their own social sites,” while bibliometric indicators seemed more stable across all services (13). Ward et al. (2015) still included Microsoft Academic Search in their discussion of scholarly profiles as part of the social media network, noting Microsoft Academic Search was painfully time-consuming to work with in terms of consolidating data, correcting items, and adding missing items. In September 2016, Hug et al. demonstrated the utility of the new Microsoft Academic API by conducting a comparative evaluation of normalized data from Microsoft Academic and Scopus (Hug, Ochsner, and Braendle 2016). They noted Microsoft Academic has “grown massively from 83 million publication records in 2015 to 140 million in 2016” (10). The Microsoft Academic API offers rich, structured metadata with the exception of document type. They found all attributes containing text were normalized and that identifiers were available for all entities, including references, supporting bibliometricians’ needs for data retrieval, handling, and processing. In addition to the lack of document type, the researchers also found the “fields of study” to be too granular and dynamic, and their hierarchies incoherent. They also desired the ability to use the DOI to build API requests. Nevertheless, the advantages of Microsoft Academic’s metadata and API retrieval suggested to Hug et al. that Microsoft Academic was superior to Google Scholar for calculating research impact indicators and bibliometrics in general. In October 2016, Harzing and Alakangas compared publication and citation coverage of the new Microsoft Academic with Google Scholar, Scopus, and Web of Science using a sample of 145 academics at the University of Melbourne (Harzing and Alakangas 2016a) including observations from 20-40 faculty each in the humanities, social sciences, engineering, sciences, and life sciences. They discovered Microsoft Academic had improved substantially since their previous study (Harzing 2016b), increasing 9.6% for a comparison sample in comparison with 1.4%, 2%, and 1.7% growth in Google Scholar, Scopus, and Web of Science (n.p.). The researchers noted a few INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 13 problems with data quality, “although the Microsoft Academic team have indicated they are working on a resolution” (n.p.). On average, the researchers found that Microsoft Academic found 59% as many citations as Google Scholar, 97% as many citations as Scopus, and 108% as many citations as Web of Science. Google Scholar had the top counts for each disciplinary area, followed by Scopus except in the social sciences and humanities, where Microsoft Academic ranked second. The researchers explained that Microsoft Academic “only includes citation records if it can validate both citing and cited papers as credible,” as established through a machine-learning- based system, and discussed an emerging metric of “estimated citation count” also provided by Microsoft Academic. The researchers concluded that Microsoft Academic is promising to be “an excellent alternative for citation analysis” and suggested Microsoft should work to improve coverage of books and grey literature. Google Scholar Google Scholar was released in beta form in November 2004, and was expanded to include judicial case law in 2009. While Google Scholar has received much attention in academia, it seems to be regarded by Google as a niche product: in 2011 Google removed Scholar from the list of top services and list of “more” services, relegating it to the “even more” list. In 2014, the Scholar team consisted of just nine people (Levy 2014). Describing Google Scholar in an introductory manner is not helped by Google’s vague documentation, which simply says it “includes scholarly articles from a wide variety of sources in all fields of research, all languages, all countries, and over all time periods.”5 The “wide variety of sources” includes “journal papers, conference papers, technical reports, or their drafts, dissertations, pre-prints, post-prints, or abstracts,” as well as court opinions and patents, but not “news or magazine articles, book reviews, and editorials.” Books and dissertations uploaded to Google Book Search are “automatically” included in Scholar. Google says abstracts are key, noting “Sites that show login pages, error pages, or bare bibliographic data without abstracts will not be considered for inclusion and may be removed from Google Scholar.” Studies of Google Scholar can be divided in to three major categories of focus: investigating the coverage of Google Scholar; the use and utility of Google Scholar as part of the research process; and Google Scholar’s utility for bibliographic measurement, including evaluating the productivity of individual researchers and the impact of journals. There is some overlap across these categories, because studies of Google Scholar seem to involve three questions: 1) What is being searched? 2) How does the search function? and 3) To what extent can the user usefully accomplish her task? The Coverage of Google Scholar Scholars want to know what “scholarship” is covered by Google Scholar, but the documentation merely states that it indexes “papers, not journals”6 and challenges researchers to investigate 5 https://scholar.google.com/intl/en/scholar/inclusion.html 6 https://www.google.com/intl/en/scholar/help.html#coverage AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 14 Google Scholar’s coverage empirically despite Google Scholar’s notoriously challenging technical limitations. While some limitations of Google Scholar have been corrected over the years, longstanding logistical hurdles involved with studying Google Scholar’s coverage have been well-documented for over a decade (Shultz 2007; Bonato 2016; Haddaway et al. 2015; Levay et al. 2016), and include: • Search queries are limited to 256 characters • Not being able to retrieve more than 1,000 results • Not being able to display more than 20 results per page • Not being able to download batches of results (e.g. to load into citation management software) • Duplicate citations (beyond the multiple article “versions”), requiring manual screening • Retrieving different results with Advanced and Basic searches • No designation of the format of items (e.g. conference papers) • Minimal sort options for results • Basic Boolean operators only7 • Illogical interpretation of Boolean operators: esophagus OR oesophagus and oesophagus OR esophagus return different numbers of results (Boeker, Vach, and Motschall 2013) • Non-disclosure of the algorithm by which search results are sorted. Additionally, one study reported experiencing an automated block to the researcher’s IP address after the export of approximately 180 citations or 180 individual searches (Haddaway et al. 2015, 14). Furthermore, the Research Excellence Framework was unable to use Google Scholar to assess the quality of research in UK higher education institutions, because of researchers’ inability to agree with Google on a “suitable process for bulk access to their citation information, due to arrangements that Google Scholar have in place with publishers” (Research Excellence Framework 2013, 1562). Such barriers can limit what can be studied and also cost researchers significant time in terms of downloading (Prins et al. 2016) and cleaning citations (Levay et al. 2016). Despite these hurdles, research activity analyzing the coverage of Google Scholar has continued in the past two years, often building off previous studies. This section will first discuss Google Scholar’s size and ranking, followed by its coverage of articles and citations, then its coverage of books, grey literature, and open access and institutional repositories. Google Scholar Size and Ranking In a 2014 study, Khabsa and Giles estimated there were at least 114 million English-language scholarly documents on the Web, of which Google Scholar had “nearly 100 million.” Another study by Orduna-Malea, Ayllón, Martín-Martín, and López-Cózar (2015) estimated that the total number 7 E.g., no nesting of logical subexpressions deeper than one level (Boeker, Vach, and Motschall 2013) and no truncation operators. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 15 of documents indexed by Google Scholar, without any language restriction, was between 160 and 165 million. By comparison, in 2016 the author’s discovery tool contained about 168 million items in academic journals, conference materials, dissertations, and reviews.8 Google Scholar’s presence in the information marketplace has influenced vendors to increase the discoverability of their content, including pushing for the display of abstracts and/or the first page of articles (Levy 2014). ProQuest and Gale indexes were added to Google Scholar in 2015 (Quint 2016). Martín-Martín et al. (2016b) noted that Google Scholar’s agreements with big publishers come at a price: “the impossibility of offering an API,” which would support bibliometricians’ research (54). Google Scholar’s results ranking “aims to rank documents the way researchers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.”9 Martín-Martín and his colleagues (2017, 159) conducted a large, longitudinal study of null query results in Google Scholar and found a strong correlation between result list ranking and times cited. The influence of citations is so strong that when the researchers performed the same search process four months later, 14.7% of documents were missing in the second sample, causing them to conclude even a change of one or two citations could lead to a document being excluded or included from the top 1,000 results (157). Using citation counts as a major part of the ranking algorithm has been hypothesized to produce the “Matthew Effect,” where “work that is already influential becomes even more widely known by virtue of being the first hit from a Google Scholar search, whereas possibly meritorious but obscure academic work is buried at the bottom” (Antell et al. 2013, 281). Google Scholar has been shown to heavily bias its ranking toward English-language publications even when there are highly cited non-English publications in the result set, although selection of interface language may influence the ranking. Martin-Martin and his colleagues noted that Google Scholar seems to use the domain of the document’s hosting web site as a proxy for language, meaning that “some documents written in English but with their primary version hosted in non- Anglophone countries’ web domains do appear in lower positions in spite of receiving a large number of citations” (Martin-Martin et al. 2017, 161). This effect is shown dramatically in Figure 3 of their paper. Google Scholar Coverage: Articles and Citations The coverage of articles, journals, and citations by Google Scholar has been commonly examined by using brute force methods to retrieve a sample of items from Google Scholar and possibly one or more of its competitors. (Studies discussed in this section are listed in Table 1). The goal is usually to determine how well Google Scholar’s database compares to traditional research databases, usually in a specific field. Core methodology involves importing citations into software such as Publish or Perish (Harzing 2016a), cleaning the data, then performing statistical tests, 8 The discovery tool does not contain all available metadata but has been carefully vetted (Fagan and Gaines 2016). 9 https://www.google.com/intl/en/scholar/about.html AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 16 expert review, or both. Haddaway (2015) and Moed et al. (2016) have written articles specifically discussing methodological aspects. Recent studies repeatedly find that Google Scholar’s coverage meets or exceeds that of other search tools, no matter what is identified by target samples, including journals, articles, and citations (Karlsson 2014; Harzing 2014; Harzing 2016b; Harzing and Alakangas 2016b; Moed, Bar- Ilan, and Halevi 2016; Prins et al. 2016; Wildgaard 2015; Ciccone and Vickery 2015). In only three studies did Google Scholar find fewer items, and the meaningful difference was minimal.10 Science disciplines were the most studied in Google Scholar, including agriculture, astronomy, chemistry, computer science, ecology, environmental science, fisheries, geosciences, mathematics, medicine, molecular biology, oceanography, physics, and public health. Social sciences studied include education (Prins et al. 2016), economics (Harzing 2014), geography (Ştirbu et al. 2015, 322-329), information science (Winter, Zadpoor, and Dodou 2014; Harzing 2016b), and psychology (Pitol and De Groote 2014). Studies related to the arts or humanities 2014-2016 included an analysis of open access journals in music (Testa 2016) and a comparison between Google Scholar and Web of Science for research evaluation within education, pedagogical sciences, and anthropology11 (Prins et al. 2016). Wildgaard (2015) and Bornmann et al. (2016) included samples of humanities scholars as part of bibliometric studies, but did not discuss disciplinary aspects related to coverage. Prior to 2014, the only study found related to the arts and humanities compared Google Scholar with Historical Abstracts (Kirkwood Jr. and Kirkwood 2011). Google Scholar’s coverage has been growing over time (Meier and Conkling 2008; Harzing 2014; Winter, Zadpoor, and Dodou 2014; Bartol and Mackiewicz-Talarczyk 2015, 531; Orduña-Malea and Delgado López-Cózar 2014) with recent increases in older articles (Winter, Zadpoor, and Dodou 2014; Harzing and Alakangas 2016b), leading some to question whether this supports the documented trend of increased citation of older literature (Martín-Martín et al. 2016c; Varshney 2012). Winter et al. noted that in 2005 Web of Science yielded more citations than Google Scholar for about two-thirds of their sample, but for the same sample in 2013, Google Scholar found more citations than Web of Science, with only 6.8% of citations not retrieved by Google Scholar (Winter, Zadpoor, and Dodou 2014, 1560). The unique citations of Web of Science were “typically documents before the digital age and conference proceedings not available online” (Winter, Zadpoor, and Dodou 2014, 1560). Harzing and Alakangas’s (2016b) large-scale longitudinal comparison of Google Scholar, Scopus, and Web of Science suggested that Google Scholar’s retroactive expansion has stabilized and now all three databases are growing at similar rates. 10 For example, Bramer, Giustini, and Kramer (2016a) found slightly more of their 4,795 references from systematic reviews in Embase (97.5%) than in Google Scholar (97.2%). In Testa (2016), the music database RILM indexed two more of the 84 OA journals than Google Scholar (which indexed at least one article from 93% of the journals). Finally, in a study using citations to the most-cited article of all time as a sample, Web of Science found more citations than did Google Scholar (Winter, Zadpoor, and Dodou 2014). 11 Prins et al. classified anthropology as part of the humanities. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 17 Google Scholar also seems to cover both the oldest and the most recent publications. Unlike traditional abstracts and indexes, Google Scholar is not limited by starting year, so as publishers post tables of contents of their earliest journals online, Google Scholar discovers those sources (Antell et al. 2013, 281). Trapp (2016) reported the number of citations to a highly-cited physics paper after the first 11 days of publication to be 67 in Web of Science, 72 in Scopus, and 462 in Google Scholar (Trapp 2016, 4). In a study of 800 citations to Nobelists in multiple fields, Harzing found that “Google Scholar could effectively be 9–12 months ahead of Web of Science in terms of publication and citation coverage” (2013, 1073). An increasing proportion of journal articles in Google Scholar are freely available in full text. A large-scale, longitudinal study of highly-cited articles 1950-2013 found 40% of article citations in the sample were freely available in full text (Martín-Martín et al. 2014). Another large-sample study found 61% of articles in their sample from 2004–2014 could be freely accessed (Jamali and Nabavi 2015). In both studies, nih.gov and ResearchGate were the top two full-text providers. Google Scholar’s coverage of major publisher content varies; having some coverage of a publisher does not imply all articles or journals from that publisher are covered. In a sample of 222 citations compared across Google Scholar, Scopus, and Web of Science, Google Scholar contained all of the Springer titles, as many Elsevier titles as Scopus, and the most articles by Wolters Kluwer and John Wiley. However, among the three databases, Google Scholar contained the fewest articles by BMJ and Nature (Rothfus et al. 2016). AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 18 18 Study Sample Results (Bartol and Mackiewicz- Talarczyk 2015) Documents retrieved in response to searches on crops and fibers in article titles, 1994-2013 (samples varied by crop) Google Scholar returned more documents retrieved for each crop. For example, “hemp” retrieved 644 results in Google Scholar, 493 in Scopus, and 318 in Web of Science; Google Scholar demonstrated higher yearly growth of records over time. (Bramer, Giustini, and Kramer 2016b) References from a pool of systematic reviewer searches in medicine (n=4795) Google found 97.2%, Embase, 97.5%, MEDLINE 92.3% of all references; When using search strategies, Embase retrieved 81.6%, MEDLINE 72.6%, and Google Scholar 72.8%. (Ciccone and Vickery 2015) Based on 183 user searches randomly selected from NCSU Libraries’ 2013 Summon search logs (n=137) No significant difference between the performance of Google Scholar, Summon, and EDS for known-item searches; “Google Scholar outperformed both discovery services for topical searches.” (Harzing 2014) Publications and citation metrics for 20 Nobelists in chemistry, economics, medicine, physics, 2012- 2013 (samples varied) Google Scholar coverage is now “increasing at a stable rate” and provides “comprehensive coverage across a wide set of disciplines for articles published in the last four decades” (575). (Harzing 2016b) Citations from one researcher (n=126) Microsoft Academic found all books and journal articles covered by Google Scholar; Google Scholar found 35 additional publications including book chapters, white papers, and conference papers. (Harzing and Alakangas 2016a) Samples from (Harzing and Alakangas 2016b, 802) (samples varied by faculty) Google Scholar provided higher “true” citation counts than Microsoft Academic but Microsoft Academic “estimated” citation counts were 12% higher than Google Scholar for life sciences and equivalent for the sciences. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 19 (Harzing and Alakangas 2016b) Citations of the works of 145 faculty among 37 scholarly disciplines at the University of Melbourne (samples varied by faculty) For the top faculty member, Google Scholar had 519 total papers (compared with 309 in both Web of Science and Scopus); Google Scholar had 16,507 citations (compared with 11,287 in Web of Science and 11,740 in Scopus). (Hilbert et al. 2015) Documents published by 76 information scientists in German-speaking countries (n=1,017) Google Scholar covered 63%, Scopus, 31%, BibSonomy, 24%, Mendeley, 19%, Web of Science, 15%, CiteULike, 8%. (Jamali and Nabavi 2015) Items published between 2004 and 2014 (n=8,310) 61% of articles were freely available; of these, 81% were publisher versions and 14% were pre-prints; ResearchGate was the top full-text source netting 10.5% of full-text sources, followed by ncbi.nlm.nih.gov (6.5%). (Karlsson 2014) Journals from ten different fields (n=30) Google Scholar retrieved documents from all the selected journals; Summon only retrieved documents from 14 out of 30 journals. (Lee et al. 2015) Journal articles housed in Florida State University’s institutional repository (n=170) Metadata found in Google for 46% of items and in Google Scholar for 75% of items; Google Scholar found 78% of available full text. Google Scholar found full text for six items with no full text in the IR. (Martín-Martín et al. 2014) Items highly cited by Google Scholar (n=64,000) 40% could be freely accessed using Google Scholar; Nih.gov and ResearchGate were the top two full-text providers. (Moed, Bar-Ilan, and Halevi 2016) Citations to 36 highly cited articles in 12 scientific-scholarly English-language journals (n=about 7,000) 47% of sources were in both Google Scholar and Scopus; 47% of sources were in Google Scholar only; 6% of sources were in Scopus only; Of the unique Google Scholar citations, sources were most often from Google Books, Springer, SSRN, ResearchGate, ACM Digital Library, Arxiv, and ACLweb.org. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 20 (Prins et al. 2016) Article citations in the field of education and pedagogies, and citations to 328 articles in anthropology (n=774) Google Scholar found 22,887 citations in Education & Pedagogical Science compared to Web of Science’s 8,870, and 8,092 in Anthropology compared with Web of Science’s 1,097. (Ştirbu et al. 2015) Compared # of citations resulting from two geographical topic searches (samples varied) Google Scholar found 2,732 geographical references whereas Web of Science found only 275, GeoRef 97, and FRANCIS 45. For sedimentation, Google Scholar found 1,855 geographical references compared to Web of Science’s 606, GeoRef’s 1,265, and FRANCIS’s 33; Google Scholar overlapped Web of Science by 67% and 82% for the two searches, and GeoRef by 57% and 62% (Testa 2016) Open access journals in music (n=84) Google Scholar indexed at least one article from 93% of OA journals. RILM indexed two additional journals. (Wildgaard 2015) Publications from researchers in astronomy, environmental science, philosophy and public health (n=512) Publication count from Web of Science was 2-4 times lower for all disciplines than Google Scholar; Citation count was up to 13 times lower in Web of Science than in Google Scholar. (Winter, Zadpoor, and Dodou 2014) Growth of citations to 2 classic articles (1995- 2013) and 56 science and social science articles in Google Scholar, 2005-2013 (samples varied) Total citation counts 21% higher in Web of Science than Google Scholar for Lowry (1951) but Google Scholar 17% higher than Web of Science for Garfield (1955) and 102% higher for the 56 research articles; Google Scholar showed a significant retroactive expansion to all articles compared to negligible retroactive growth in Web of Science. Table 1. Studies investigating Google Scholar’s coverage of journal articles and citations, 2014-2016. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 21 Google Scholar Coverage: Books Many studies mentioned that books, including Google Books, are sometimes included in Google Scholar results. Jamali and Nabavi (2015) found 13% of their sample of 8,310 citations from Google Scholar were books, while Martín-Martín et al. (2014) had found that 18% of their sample of 64,000 citations from Google Scholar were books. Within the field of anthropology, Prins (2016) found books to generate the most citation impact in Google Scholar (41% of books in their sample were cited in Google Scholar) compared to articles (21% of articles were cited in Google Scholar). In education, 31% of articles and 25% of books were cited by Google Scholar (3). Abrizah and Thelwall found only 37% of their sample of 1,357 arts, humanities, and social sciences books from the five main university presses in Malaysia had been cited in Google Scholar (23% of the books had been cited in Google Books) (Abrizah and Thelwall 2014, 2502). The overlap was small: 15% had impact in both Google Scholar and Google Books. The authors concluded that due to the low overlap between Google Scholar and Google Books, searching both engines is required to find the most citations to academic books. English books were significantly more likely to be cited in Google Scholar (48% vs. 32%), as were edited books (53% vs. 36%). They surmised edited books’ citation advantage was due to the use of book chapters in social sciences. They found arts and humanities books more likely to be cited in Google Scholar than social sciences books (40% vs. 34%) (Abrizah and Thelwall 2014, 2503). Google Scholar Coverage: Grey Literature Grey literature refers to documents not published commercially, including theses, reports, conference papers, government information, and poster sessions. Haddaway et al. (2015) was the only empirical study found focused on grey literature. They discovered that between 8% and 39% of full-text search results from Google Scholar were grey literature, with the greatest concentration of citations from grey literature on page 80 of results for full-text searches and page 35 for title searches. They concluded “the high proportion of grey literature that is missed by Google Scholar means it is not a viable alternative to hand searching for grey literature as a stand- alone tool” (2015, 14). For one of the systematic reviews in their sample, none of the 84 grey literature articles cited were found within the exported Google Scholar search results. The only other investigation of grey literature found was Bonato (2016), who after conducting a very limited number of searches on one specific topic and a search for a known item, concluded Google Scholar to be “deficient.” In conclusion, despite much offhand praise for Google Scholar’s grey literature coverage (Erb and Sica 2015; Antell et al. 2013), the topic has been little studied and when it has, grey literature results have not been prominent. Google Scholar Coverage: Open Access and Institutional Repository Content Erb and Sica touted Google Scholar’s access to “free content that might not be available through a library’s subscription services,” including open access journals and institutional repository coverage (2015, 48). Recent research has dug deeper into both these content areas. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 22 In general, OA articles have been shown to net more citations than non-OA articles, as Koler-Povh, Južnic, and Turk (2014) showed within the field of civil engineering. Across their sample of 2,026 scholarly articles in 14 journals, all indexed in Web of Science, Scopus, and Google Scholar, OA articles received an average of 43 citations while non-OA articles were cited 29 times (1039). Google Scholar did a better job discovering those citations; in Google Scholar the median of citations of OA articles was always higher than that for non-OA articles, wheras this was true in Web of Science for only 10 of the 14 journals and in Scopus for 11 of the 14 journals (1040). Similarly, Chen (2014) found Google Scholar to index far more OA journals than Scopus and Web of Science, especially “gold OA.”12 Google Scholar’s advantage should not be assumed across all disciplines, however; Testa (2016) found both Google Scholar and RILM to provide good coverage of OA journals in music, with Google Scholar indexing at least one article from 93% of the 84 OA journals in the sample. But the bibliographic database RILM indexed two more OA journals than Google Scholar. Google Scholar indexing of repositories may be critical for success, but results vary by IR platform and whether the IR metadata has been structured according to Google’s guidelines. In a random sample from Shodhganga, India’s central ETD database, Weideman (2015) found not one article had been indexed in full text by Google Scholar, although in many cases the metadata was indexed, leading the author to identify needed changes to the way Shodhganga stores ETDs.13 Likewise, Chen (2014) found that neither Google Scholar nor Google appears to index Baidu Wenku, a major full-text archive and social networking site in China similar to ResearchGate, and Orduña-Malea and López-Cózar (2015) found that Latin American repositories are not very visible in Google or Google Scholar due to limitations of the description schemas chosen as well as search engine reliability. In Yang’s (2016) study of Texas Tech’s DSpace IR, Google was the only search engine that indexed, discovered, or linked to PDF files supplemented with metadata; Google Scholar did not discover or provide links to the IR’s PDF files, and was less successful at discovering metadata. When Google Scholar is able to index IR content, it may be responsible for significant traffic. In a study of four major U.S. universities’ institutional repositories (three DSpace, one CONTENTdm) involving a dataset of 57,087 unique URLs and 413,786 records, researchers found that 48%–66% of referrals came from Google Scholar (Obrien et al. 2016, 870). The importance of Google Scholar in contrast to Google was noted by Lee et al. (2015), who conducted title searches on 170 journal articles housed in Florida State University’s institutional repository (using bePress’s Digital Commons platform), 100 of which existed in full text in the IR. Links to the IR were found in Google results for 45.9% of the 170 items, and in Google Scholar for 74.7% of the 170 items. Furthermore, Google Scholar linked to the full text for 78% of the 100 cases where full text was available, and even provided links to freely available full text for six items that did not have full 12 OA articles on publisher web sites, whether the journal itself is OA or not (Chen 2014) 13 Most notably, the need to store thesis documents as one PDF file instead of divided into multiple, separate files, to create HTML landing pages as per Google’s recommendations, and to submit the addresses of these pages to Google Scholar. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 23 text in the IR. However, the researchers also noted “relying on either Google or Google Scholar individually cannot ensure full access to scholarly works housed in OA IRs.” In their study, among the 104 fully open access items there was an overlap in results of only 57.5%; Google provided links to 20 items not found with Google Scholar, and Google Scholar provided links to 25 items not found with Google (Lee et al. 2015, 15). Google Scholar results note the number of “versions” available for each item. In a study of 982 science article citations (including both OA and non-OA) in IRs, Pitol and DeGroote found 56% of citations had between four and nine Google Scholar versions (2014, 603) Almost 90% of the citations shown were the publisher version, but of these, only 14.3% were freely available in full text on the publisher web site. Meanwhile, 70% percent of the items had at least one free full-text version available through a “hidden” Google Scholar version. The author’s experience in retrieving full text for this review indicates this issue still exists, but research would be needed to formulate reliable recommendations for users. Use and utility of Google Scholar as part of the research process Studies were found concerning Google Scholar’s popularity with users and their reasons for preferring it (or not) over other tools. Another group of studies examined issues related to the utility of Google Scholar for research processes, including issues related to messy metadata. Finally, a cluster of articles focused specifically on using Google Scholar for systematic reviews. Popularity and User Preferences Several studies have shown Google Scholar to be well-known to scholarly communities. A survey of 3,500 scholars from 95 countries found that over 60% of 3,500 scientists and engineers and over 70% of respondents in the social sciences, arts, and humanities were aware of Google Scholar and used it regularly (Van Noorden 2014). In a large-scale journal-reader survey, Inger and Gardner (2016) found that among academic researchers in high-income areas, academic search engines surpassed abstracts and indexes as a starting place for research (2016, 85, Figure 4). In low-income areas, Google use exceeded Google Scholar use for academic research. Major library link resolver software offers reports of full-text requests broken down by referrer. Inger and Gardner (2016) showed a large variance across subjects for whether people prefer Google or Google Scholar: “People in the social sciences, education, law, and business use Google Scholar more to find journal articles. However, people working in the humanities and religion and theology prefer to use Google” (88). Humanities scholar use of Google over Google Scholar was also found by Kemman et al. (2013); Google, Google Images, Google Scholar, and YouTube were used more than JSTOR or other library databases, even though humanities scholars’ trust in Google and Google Scholar was lower. User research since 2014 concerning Google Scholar has focused on graduate students. Results suggest Scholar is used regularly but the tool is only partially sufficient. In their study of 20 engineering masters’ students’ use of abstracts and indexes, Johnson and Simonsen (2015) found AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 24 that half their sample (n=20) had used Google Scholar the last time they located an article using specific search terms or criteria. Google was the second most-used source at 20%, followed by abstracting and indexing services (15%). Graduate students describe Google Scholar with nuance and refer to it as a specific part of their process. In Bøyum and Aabø’s (2015) interviews with eight PhD business students and Wu and Chen’s (2014, 381) interviews with 32 graduate students drawn from multiple academic disciplines, the majority described using library databases and Google Scholar for different purposes depending on the context. Graduate students in both studies were well aware of Google Scholar’s use for citation searching. Bøyum and Aabø’s (2015) subjects described library resources as more “academically robust” than Google or Google Scholar. Wu and Chen’s (2014) interviewees praised Google Scholar for its wider coverage and convenience, but lamented the uncertain quality, sometimes inaccessible full text, too many results, lack of sorting function (document type or date), finding documents from different disciplines, and duplicate citations. Google Scholar was seen by their subjects as useful during early stages of information seeking. In contrast to general assumptions, more than half the students (Wu and Chen 2014, 381) interviewed reported browsing more than 3 pages’ worth of Google Scholar results. About half of interviewees reported looking at cited documents to find more, however students had mixed opinions about whether the citing documents turned out to be relevant. Google Scholar’s “My Library” feature, introduced in 2013, now competes with other bibliographic citation management software. In a survey of 344 (mostly graduate) students, Conrad, Leonard, and Somerville found Google Scholar was the most-used (47%) followed by EndNote (37%), and Zotero (19%) (2015, 572). Follow-up interviews with 13 of the students revealed that a few students used multiple tools, for example one participant noted he/she used “EndNote for sharing data with lab partners and others “across the community”; Mendeley for her own personal thesis work, where she needs to “build a whole body of literature”; and Google Scholar Citations for “quick reference lists that I may not need for a second or third time.” Messy Metadata Many studies have suggested Google Scholar’s metadata is “messy.” Although none in the period of study examined this phenomenon in conjunction with relative user performance, the issues found could affect scholarship. A 2016 study itemized the most common mistakes in Google Scholar resulting from its extraction process: 1) incorrect title identification; 2) missing or incorrectly assigned authors; 3) book reviews indexed as books; 4) failing to group versions of the same document, which inflates citation counts; 5) grouping different editions of books, which deflates citation counts; 6) attributing citations to documents that did not cite them, or missing citations that did; and 7) duplicate author profiles (Martín-Martín et al. 2016b). The authors concluded that “in an academic big data environment, these errors (which we deem affect less than 10% of the records in the database) are of no great consequence, and do not affect the core system INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 25 performance significantly” (54). Two of these issues have been studied specifically: duplicate citations and missing publication dates. The rate of duplicate citations in Google Scholar has ranged upwards of 2.93% (Haddaway et al. 2015) and 5% (Winter, Zadpoor, and Dodou 2014, 1562), which can be compared to a .05% duplicate citation rate in Web of Science (Haddaway et al. 2015, 13). Haddaway found the main reasons for duplication include “typographical errors, including punctuation and formatting differences; capitalization differences (Google Scholar only), incomplete titles, and the fact that Google Scholar scans citations within reference lists and may include those as well as the citing article” (2015, 13). The issue of missing publication dates varies greatly across samples. Dates were found to be missing 9% of the time in Winter et al.’s study, although it varied by publication type: 4% of journals, 15% of theses, and 41% of the unknown document types” (Winter, Zadpoor, and Dodou 2014, 1562). However Martin-Martin et al. studied a sample of 32,680 highly-cited documents and found that Web of Science and Google Scholar agreed on publication dates 96.7% of the time, with an idiosyncratically large proportion of those mismatches in 2012 and 2013 (2017, 159). Utility for Research Processes Prior to 2014, studies such as Asher, Duke, and Wilson's 2012 evaluated Google Scholar’s utility as a general research tool, often in comparison with discovery tools. Since 2014, the only such study found was Namei and Young’s comparison of Summon, Google Scholar, and Google using 299 known-item queries. They found Google Scholar and Summon returned relevant results 74% of the time; Google returned relevant results 91% of the time. For “scholarly formats,” they found Summon returned relevant results 76% of the time; Google 79%; and Google 91% (2015, 526- 527). The remainder of studies in this category focused specifically on systematic reviews, perhaps because such reviews are so time-consuming. Authors develop search strategies carefully, execute them in multiple databases, and document their search methods and results carefully. Some prestigious journals are beginning to require similar rigor for any original research article, not just systematic reviews (Cals and Kotz 2016). Information provided by professional organizations about the use of Google Scholar for systematic reviews seems inconsistent: the Cochrane Handbook for Systematic Reviews of Interventions lists Google Scholar among sources for searching, but none of the five “highlighted reviews” on the Cochrane web site at the time of this article’s writing used Google Scholar in their methodologies. The UK organization National Institute for Health and Care Excellence’s manual (National Institute for Health and Care Excellence (NICE)) only mentions Google Scholar in an appendix of search sources under “Conference Abstracts.” A study by Gehanno et al. (2013) found Google Scholar contained 100% of the references from 29 systematic reviews, and suggested Google Scholar could be the first choice for systematic reviews or meta-analyses. This finding prompted a slew of follow-up studies in the next three years. An AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 26 immediate response by Giustini and Boulos (2013) pointed out that systematic reviews are not performed by searching for article titles as with Gehanno et al.’s method, but through search strategies. When they tried to replicate a systematic review’s topical search strategy in Google Scholar, the citations were not easily discovered. In addition the authors were not able to find all the papers from a given systematic review even by title searching. Haddaway et al. also found imperfect coverage: for one of the seven reviews examined, 31.5% of citations could not be found (2015, 11). Haddaway also noted that special characters and fonts (as with chemical symbols) can cause poor matching when such characters are part of article titles. Recent literature concurs that it is still necessary to search multiple databases when conducting a systematic review, including abstracts and indexes, no matter how good Google Scholar’s coverage seems to be. No one database’s coverage is complete, including Google Scholar (Thielen et al. 2016), and practical recall of Google Scholar is exceptionally low due to the 1,000 result limit, yet at the same time, Google Scholar’s lack of precision is costly in terms of researchers’ time (Bramer, Giustini, and Kramer 2016b; Haddaway et al. 2015). The challenges limiting study of Google Scholar’s coverage also bedevil those wishing to use it for reviews, especially the 1,000 result retrieval limit, lack of batch export, and lack of exported abstracts (Levay et al. 2016). Additionally, Google Scholar’s changing content, unknown algorithm and updating practices, search inconsistencies, limited Boolean functions, and 256-character query limit prevent the tool from accommodating the detailed, reproducible search methodologies required by systematic reviews (Bonato 2016; Haddaway et al. 2015; Giustini and Boulos 2013). Bonato noted Google Scholar retrieved different results with Advanced and Basic searches; could not determine the format of items (e.g. conference papers); and found other inconsistent results.14 Bonato also lamented the lack of any kind of document type limit. Despite the limitations and logistical challenges, practitioners and scholars are finding solid reasons for including academic web search engines as part of most systematic review methodologies (Cals and Kotz 2016). Stansfield et al. noted that “relevant literature for low- and middle-income countries, such as working and policy papers, is often not included in databases,” and that Google Scholar finds additional journal articles and grey literature not indexed in databases (2016, 191). For eight systematic reviews by EPPI-Center, “over a quarter of relevant citations were found from websites and internet search engines” (Stansfield, Dickson, and Bangpan 2016, 2). Specific tools and practices have been recommended when using search engines within the context of systematic reviews. Software is available to record search strategies and results (Harzing and Alakangas 2016b; Haddaway 2015). Haddaway suggests the use of snapshot tools (Haddaway 2015) to record the first 1,000 Google Scholar records rather than the typical assessment of the first 50 search results as had been done in the past: “This change in practice 14 Bonato (2016) found zero hits for conference papers when limiting by year 2015-2016, but found two papers presented at a 2015 meeting. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 27 could significantly improve both the transparency and coverage of systematic reviews, especially with respect to their grey literature components.” (Haddaway et al. 2015, 15). Both Haddaway (2015) and Cochrane recommend that review authors print or save locally electronic copies of the full text or relevant details rather than bookmarking web sites, “in case the record of the trial is removed or altered at a later stage” (Higgins and Green 2011). New methods for searching, downloading, and integrating academic search engine results into review procedures using free software to increase transparency, repeatability, and efficiency have been proposed by Haddaway and his colleagues (2015). Google Scholar Citations and Metrics Google Scholar Citations and Metrics are not academic search engines, but this article included them because these products are interwoven into the fabric of the Google Scholar database. Google Scholar Citations, launched in late 2011 (Martín-Martín et al. 2016b, 12) groups citations by author, while Google Metrics (launch date uncertain) provides similar data for articles and journals. Readers interested in an in-depth literature review of Google Scholar Citations for earlier years (2005-2012) are directed to (Thelwall and Kousha 2015b). In his comprehensive review of more recent literature about using Google Scholar Citations for citation analysis, Waltman (2016) described several themes. Google Scholar’s coverage of many fields is significantly broader than Web of Science and Scopus, and this seems to be continuing to improve over time. However studies regularly report Google Scholar’s inaccuracies, content gaps, phantom data, easily manipulatable citation counts, lack of transparency, and limitations for empirical bibliometric studies. As discussed in the coverage section, Google Scholar’s citation database is competitive with other major databases such as Web of Science and has been growing dramatically in the last few years (Winter, Zadpoor, and Dodou 2014; Harzing and Alakangas 2016b; Harzing 2014) but has recently stabilized (Harzing and Alakangas 2016b). More and more studies are concluding that Google Scholar will report more comprehensive information about citation impact than Web of Science or Scopus. Across a sample of articles from many years of one science journal, Trapp (2016) found the proportion of articles with zero citations was 37% for Web of Science, 29% for Scopus, and 19% for Google Scholar. Some of Google Scholar’s superiority for citation analysis in the social sciences and humanities is due to its inclusion of book content, software, and additional journals (Prins et al. 2016; Bornmann et al. 2016). Bornmann et al. (2016) noted citations to all ten of a research institute’s ten books published in 2009 were found in Google Scholar, whereas Web of Science found citations for only two books. Furthermore they found data in Google Scholar for 55 of the total of 71 of the institute’s book chapters. For the four conference proceedings they could identify in Google Scholar, there were 100 citations, of which 65 could be found in Google Scholar. The comparative success of Google Scholar for citation impact varies by discipline, however: (Levay et al. 2016) found Web of Science to be more reliable than Google Scholar, quicker for AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 28 downloading results, and better for retrieving 100% of the most important publications in public health. Despite Google Scholar’s growth, using all three major tools (Scopus, Web of Science, and Google Scholar) still seems to be necessary for evaluating researcher productivity. Rothfus (2016) compared Web of Science, Scopus, and Google Scholar citation counts for evaluating the impact of the Canadian Network of Observational Drug Effect Studies (CNODES), as represented by a sample of 222 citations from five articles. Attempting to determine citation metrics for the CNODES research team yielded different results for every article when using the three tools. They found that “using three tools (Web of Science, Scopus, Google Scholar) to determine citation metrics as indicators of research performance and impact provided varying results, with poor overall agreement among the three” (237). Major academic libraries’ web sites often explain how to find one’s h-index in all three (Suiter and Moulaison 2015). Researchers have also noted the disadvantages of Google Scholar for citation impact studies. Google Scholar is costly in terms of researcher time. Levay et al. (2016) estimated the cost of “administering results” from Web of Science to be 4 hours versus 75 hours for Google Scholar. Administering results includes using the search tool to search, download, and add records to bibliographic citation software, and removing duplicate citations. Duplicate citations are often mentioned as a problem (Prins et al. 2016), although Moed (2016) suggested the double counting by Google Scholar would occur only if the level of analysis is on target sources, not if it is on target articles.15 Downloaded citation samples can still suffer from double counts, however: Harzing and Alakangas described how cleaning “a fairly extreme case” in their study reduced the number of papers from 244 to 106 (2016b). Google Scholar also does not identify self-citations, which can dramatically influence the meaning of results (Prins et al. 2016). Furthermore, researchers have shown it is possible to corrupt Google Scholar Citations by uploading obviously false documents (Delgado López-Cózar, Robinson-García, and Torres-Salinas 2014).While the researchers noted traditional citation indexes can also be defrauded, Google’s products are less transparent and abuses may not be easily detected. Google did not respond to the research team when contacted and simply deleted the false documents to which it had been alerted without reporting the situation to the affected authors, and the researchers concluded: “This lack of transparency is the main obstacle when considering Google Scholar and its by-products for research evaluation purposes” (453). Because these disadvantages do not outweigh Google Scholar’s seemingly broader coverage, many articles investigate workarounds for using Google Scholar more effectively when evaluating 15 “if a document is, for instance, first published in ArXiv, and a next version later in a journal J, citations to the two versions are aggregated. In Google Scholar Metrics, in which ArXiv is included as a source, this document (assuming that its citation count exceed the h5 value of ArXiv and journal J) is listed both under ArXiv and under journal J, with the same, aggregate citation count (Moed 2016, 29). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 29 research impact. Harzing and Alakangas (2016b) recommend the hI index16, which is corrected for career length and co-authorship patterns, as the citation metric of choice for a fair comparison of Google Scholar with other tools. Bornmann et al. (2016) investigated a method to normalize data and reduce errors when using Google Scholar data to evaluate citations in the social sciences and humanities. Researcher profiles can also be used to find other scholars by topic. In a 2014 survey of researchers (n=8,554), Dagienė and Krapavickaitė found that 22% used a third-party service such as Google Scholar or Microsoft Academic to produce lists of their scholarly activities and 63% reported their scholarly record was freely available on the Web (2016, 158, 161). Google Scholar ranked only second to Microsoft Word as the most frequently used software to maintain academic activity records (160). Martín-Martín et al. (2016b) examined 814 authors in the field of bibliometrics using Google Scholar Citations, ResearcherID, ResearchGate, Mendeley, and Twitter. Google Scholar was the most used social research sharing platform, followed by ResearchGate, with ResearcherID gaining wider acceptance among authors deemed “core” to the field. Only about one-third of the authors created a Twitter profile, and many Mendeley and ResearcherID profiles were found empty. The study found Google Scholar academic profiles’ distinctive advantages to be automatic updates and its high growth rate, with disadvantages of scarce quality control, inherited metadata mistakes from Google Scholar, and its manipulatability. Overall, Martin-Martin and colleagues concluded that Google Scholar “should be the preferred source for relational and comparative analyses in which the emphasis is put on author clusters” (57). Google Scholar Metrics provides citation information for articles and journals. In a sample of 1,000 journals, Orduña-Malea and Delgado López-Cózar found that “despite all the technical and methodological problems,” Google Scholar Metrics provides sound and reliable journal rankings (2014, 2365). Google Scholar Metrics seems to be an annual publication; the 2016 edition contains 5,734 publications and 12 language rankings. Russian, Korean, Polish, Ukranian, and Indionesian were added this year, while Italian and Dutch were removed for unknown reasons (Martín-Martín et al. 2016a). Researchers also found that many discussion papers and working papers were removed in 2016. English-language publications are broken into subject areas and disciplines. Google Scholar Metrics often, but not always creates separate entries for each language in which a journal is published. Bibliometricians call for Google Scholar Metrics to display the total number of documents published in the publications indexed and the total number of citations received: “These are the two essential parameters that make it possible to assess the reliability and accuracy of any bibliometric indicator” (13). Adding country and language of publication and self-citation rates are among the other improvements listed by Lopez-Cozar and colleagues. 16 Harzing and Alakangas (2016b) define the hIa as the hI norm/academic age. Academic age refers to the number of years elapsed since first publication. To calculate hI norm, one divides the number of citations by the number of authors for that paper, and then calculates the h-index of the normalized citation count. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 30 Informing Practice The glaring lack of research related to the coverage of arts and humanities scholarship, limited research on book coverage, and relaunch of Microsoft Academic make it impossible to form a general recommendation regarding the use of academic web search engines for serious research. Until the ambiguity of arts and humanities coverage is clarified, and until academic web search engines are transparent and stable, traditional bibliographic databases still seem essential for systematic reviews, citation analysis, and other rigorous literature search purposes. Discipline- specific databases also have features such as controlled vocabulary, industry classification codes, and peer review indicators that make scholars more efficient and effective. Nevertheless, the increasing relevance of academic search engines and solid coverage of sciences and social sciences make it essential for librarians to become as expert with Google Scholar, Google Books, and Microsoft Academic. For some scholarly tasks, academic search engines may be superior: for example, when looking up doi numbers for this paper’s bibliography, the most efficient process seemed to be a Google search on the article title plus the term “doi,” and the most likely site to display in the results was ResearchGate.17 Librarians and scholars should champion these tools as an important part of an efficient, effective scholarly research process (Walsh 2015), while also acknowledging the gaps in coverage, biases, metadata issues and missing features available in other databases. Academic web search engines could form the centerpiece for instruction sessions surrounding the scholarly network, as shown by “cited by” features, author profiles, and full-text sources. Traditional abstracts and indexes could then be presented on the basis of their strengths. At some point, explaining how to access full text will likely no longer focus on the link resolver but on the many possible document versions a user might encounter (e.g. pre-prints or editions of books) and how to make an informed choice. In the meantime, even though web search engines and repositories may retrieve copious full text outside library subscriptions, college students should still be made aware of the library’s collections and services such as interlibrary loan. When considering Google Scholar’s weaknesses, it’s important to keep in mind Chen’s observation that we may not have a tool available that does any better (Antell et al. 2013). While Google Scholar may be biased toward English-language publications, so are many bibliographic databases. Overall, Google Scholar seems to have increased the visibility of international research (Bartol and Mackiewicz-Talarczyk 2015). While Google Scholar’s coverage of grey literature has been shown to be somewhat uneven (Bonato 2016; Haddaway et al. 2015), it seems to include more diversity among relevant document types than many abstracts and indexes (Ştirbu et al. 2015; Bartol and Mackiewicz-Talarczyk 2015). Although the rigors of systematic reviews may contraindicate the tool’s use as a single source, it adds value to search results from other databases (Bramer, Giustini, and Kramer 2016a). User preferences and priorities should also be taken into account; Google 17 Because the authority of ResearchGate is ambiguous, in such cases I then looked up the doi using Google to find the publisher’s version. In some cases, the doi was not displayed on the publisher’s result page (e.g., https://muse.jhu.edu/article/197091). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 31 Scholar results have been said to contain “clutter,” but many researchers have found the noise in Google Scholar tolerable given its other benefits (Ştirbu et al. 2015). Google Books purportedly contains about 30 million items, focused on U.S.-published and English- language books. But its coverage is hit-or-miss, surprising Mays (2015) with an unexpected wealth of primary sources but disappointing Harper (2016) with limited coverage of academic health sciences books. Recent court decisions have enabled Google to continue progressing toward their goal of full-text indexing and making snippet views available for the Google-estimated universe of 130 million books, which suggests its utility may increase. Google Books is not integrated with link resolvers or discovery tools but has been found useful for providing information about scholarly research impact, especially for the arts, humanities, and social sciences. As re-launched in 2016, Microsoft Academic shows real potential to compete with Google Scholar in coverage and utility for finding journal articles. As of February 2017 its index contains 120 million citations. In contrast to the mystery of Google Scholar’s black-box algorithms and restrictive limitations, Microsoft Academic uses an open-system approach and offers an API. Microsoft Academic appears to have less coverage of books and grey literature compared with Google Scholar. Research is badly needed about the coverage and utility of both Google Books and Microsoft Academic. Google Scholar continues to evolve, launching a new algorithm for known-item searching in 201618 that appears to work very well. Google Scholar does not reveal how many items it searches but studies have suggested 160 million documents have been indexed. Studies have shown the Google Scholar relevance algorithm to be heavily influenced by citation counts and language of publication. Google Scholar has been so heavily researched and is such a “black box” that more attention would seem to have diminishing returns, except in the area of coverage of and utility for arts and humanities research. Librarians may find these takeaways useful for working with or teaching Google Scholar: • Little is known about coverage of arts and humanities by Google Scholar. • Recent studies repeatedly find that in the sciences and social sciences Google Scholar covers as much if not more than library databases, has more recent coverage, and frequently provides access to full text without the need for library subscriptions. • Although the number of studies is limited, Google Scholar seems excellent at retrieving known scholarly items compared with discovery tools. • Using proper accent marks in the title when searching for non-English language items appears to be important. 18 Google Scholar’s blog notes that in January 2016, a change was made so “Scholar now automatically identifies queries that are likely to be looking for a specific paper” Technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 32 • Finding full text for non-English journal articles may require searching Google Scholar in the original language. • While Google Scholar may include results from Google Books, it appears both tools should be used rather than assuming Google Books will appear in Google Scholar. • While Google Scholar does include grey literature, these results do not usually rank highly. • Google Scholar and Google must both be used to effectively search across institutional repository content. • Free full text may be buried underneath the “All X versions” links because the publisher’s web site is usually the dominant version presented to the user. The right-hand column links may help ameliorate this situation, but not reliably. • Google Scholar is well-known in most academic communities and used regularly; however, it is seldom the only tool used, with scholars continuing to use other web search tools, library abstracts and indexes, and published web sites as well. • Experts in writing systematic reviews recommend Google Scholar be included as a search tool along with traditional abstracts and indexes, using software to record the search process and results. • For evaluating research impact, Google Scholar may be superior to Web of Science or Scopus, but using all three tools still seems necessary. • As with any database, citation metadata should be verified against the publisher’s data; with Google Scholar, publication dates should receive deliberate attention. • When Google Scholar covers some of a major publisher’s content, that does not imply it covers all of that publisher’s content. • Google Scholar Metrics appears to provide reliable journal rankings. Research Agenda This review of the literature also provides direction for future research concerning academic web search engines. Because this review focused on 2014-2016, researchers may need to review studies from earlier periods for methodological ideas and previous findings, noting that dramatic changes in search engine coverage and behavior can occur within only a few years.19 Across the studies, some general best practices were observed. When comparing the coverage of academic web search engines, their utility for establishing research impact, or other bibliometric studies, researchers should strongly consider using software such as Publish or Perish, and to design their research approach with previous methodologies in mind. Information scientists have charted a set of clear disciplinary methods; there is no need to start from scratch. Even when 19 For example Ştirbu found that Google Scholar overlapped GeoRef by 57% and 62% (Ştirbu et al. 2015, 328), compared with a finding by Neuhaus in 2006 where Scholar overlapped with GeoRef by 26% (2006, 133). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 33 performing a large-scale quantitative assessment such as (Kousha and Thelwall 2015), manually examining and discussing a subset of the sample seems helpful for checking assumptions and for enhancing the meaning of the findings to the reader. Some researchers examined the “top 20” or “top 10” results qualitatively (Kousha and Thelwall 2015), while others took a random sample from within their large-study sample (Kousha, Thelwall, and Rezaie 2011). Academic search engines for arts and humanities research Research into the use of academic web search engines within arts and humanities fields is sorely needed. Surveys show humanities scholars use both Google and Google Scholar (Inger and Gardner 2016; Kemman, Kleppe, and Scagliola 2013; Van Noorden 2014). During interviews of 20 historians by Martin and Quan-Haase (2016) concerning serendipity, five mentioned Google Books and Google Scholar as important for recreating serendipity of the physical library online. Almost all arts and humanities scholars search the Internet for researchers and their activities, and commonly expressed the belief that having a complete list of research activities online improves public awareness (Dagienė and Krapavickaitė 2016). Mays’s (2015) practical advice and the few recent studies on citation impact of Google Books for these disciplines point to the enormous potential for this tool’s use. Articles describing opportunities for new online searching habits of humanities scholars have not always included Google Scholar (Huistra and Mellink 2016). Wu and Chen’s interviews with humanities graduate students suggested their behavior and preferences were different from science and technology students, doing more known-item searching and struggling with “semantically ambiguous keywords” that retrieved irrelevant results (2014, 381). Platform preferences seem to have a disciplinary aspect: Hammarfelt’s (2014) investigation of altmetrics in the humanities suggests Mendeley and Twitter should be included along with Google Scholar when examining citation impact of humanities research, while a 2014 Nature survey suggests ResearchGate is much less popular in the social sciences and humanities than in the sciences (Van Noorden 2014). In summary, arts and humanities scholars are active users of academic web search engines and related tools, but their preferences and behavior, and the relative success of Google Scholar as a research tool cannot be inferred from the vast literature focused on the sciences. Advice from librarians and scholars about the strengths and limitations of academic web search engines in these fields would be incredibly useful. Specific examples of needed research, and related studies to reference for methodological ideas: • Similar to the studies that have been done in the sciences, how well do academic search engines cover the arts and humanities? An emphasis on formats important to the discipline would be important (Prins et al. 2016). • How does the quality of search results compare between academic search engines and traditional library databases for arts and humanities topics? To what extent can the user usefully accomplish her task? (Ruppel 2009)? AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 34 • To what extent do academic search engines support the research process for scholarship distinctive to arts and humanities disciplines (e.g. historiographies, review essays)? • In academic search engines, how visible is the arts and humanities literature found in institutional repositories (Pitol and De Groote 2014)? Specific aspects of academic search engine coverage This review suggests that broad studies of academic search engine coverage may have reached a saturation point. However, specific aspects of coverage need additional investigation: • Grey literature: Although Google Scholar’s inclusion of grey literature is frequently mentioned as valuable, empirical studies evaluating its coverage are scarce. Additional research following the methodology of Haddaway (2015) could investigate the bibliographies of literature other than systematic reviews, investigate various disciplines, or use a sample of valuable known items (similar to Kousha, Thelwall, and Rezaie’s (2011) methodology for books). • Non-Western, non-English language literature: For further investigation of the repeated finding of non-Western, non-English language bias (Abrizah and Thelwall 2014; Cavacini 2015), comparisons to library abstracts and indexes would be helpful for providing context. To what extent is this bias present in traditional research tools? Hilbert et al. found the coverage of their sample increased for English language in both Web of Science and Scopus, and “to a lesser extent” in Google Scholar (2015, 260). • Books: Any investigations of book coverage in Microsoft Academic and Google Scholar would be welcome. Very few 2014-2016 studies focused on books in Google Scholar, and even looking in earlier years turned up little research. Georgas (2015) compared Google with a federated search tool for finding books, so her study may be a useful reference. Kousha et al. (2011) found three times as many citations in Google Scholar than in Scopus to a sample of 1,000 academic books. The authors concluded “there are substantial numbers of citations to academic books from Google Books and Google Scholar, and it therefore may be possible to use these potential sources to help evaluate research in book- oriented disciplines” (Kousha, Thelwall, and Rezaie 2011, 2157). • Institutional Repositories: Yang (2016) recommended that “librarians of digital resources conduct research on their local digital repositories, as the indexing effects and discovery rates on metadata or associated text files may be different case by case,” and the studies found 2014-2016 show that IR platform and metadata schema dramatically affect discovery, with some IRs nearly invisible (Weideman 2015; Chen 2014; Orduña-Malea and López-Cózar 2015; Yang 2016) and others somewhat findable by Google Scholar (Lee et al. 2015; Obrien et al. 2016). Askey and Arlitsch (2015) have explained how Google Scholar’s decisions regarding metadata schema can dramatically affect results.20 Libraries who 20 For example, Google’s rejection of Dublin Core. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 35 would like their institutional repositories to serve as social sharing platforms for research should consider conducting a study similar to (Martín-Martín et al. 2016b). Finally, a study of IR journal article visibility in academic web search engines could be extremely informative. • Full-text retrieval: The indexing coverage of academic search engines relates to the retrieval of full text, which is another area ripe for more research studies, especially in light of the impressive quantity of full text that can be retrieved without user authentication. Johnson and Simonsen (2015) found that more of the engineering students they surveyed obtained scholarly articles from a free download or getting a PDF from a colleague at another institution than used the library’s subscription. Meanwhile, libraries continue to pay for costly subscription resources. Monitoring this situation is essential for strategic decision-making. Quint (2016) and Karlsson (2014) have suggested strategies for libraries and vendors to support broader access to subscription full text through creative licensing and per-item fee approaches. Institutional repositories have had mixed results in changing scholars’ habits (both contributors and searchers) but are demonstrably contributing to the presence of full text in the academic search engine experience. When will academic users find a good-enough selection of full-text articles that they no longer need the expanded full text paid for by their institutions? Google Books Similarly to Microsoft Academic, Google Books as a search tool also needs dedicated research from librarians and information scientists about its coverage, utility, and/or adoption. A purposeful comparison with other large digital repositories such as HathiTrust (https://www.hathitrust.org) would be a boon to practitioners and the public. While HathiTrust is transparent about its coverage (https://www.hathitrust.org/statistics_visualizations), specific areas of Google Books’ coverage have been called into question. Weiss (2016) suggested a gap in Google Books exists from about 1915-1965 “because many publishers either have let it fall out of print, or the book is orphaned and no one wants to go through the trouble of tracking down the copyright owners” and found that copies in Google Books “will likely be locked down and thus unreadable, or visible only as a snippet, at best” (303). Has this situation changed since the court rulings concerning the legality of snippet view? Longitudinal studies in the growth of Google Books similar to (Harzing 2014) could illuminate this and other questions about Google Books’s ability to deliver content. Uneven coverage of content types, geography, and language should be investigated. Mays noted a possible geographical imbalance within the United States (Mays 2015, 26). Others noted significant language and international imbalances, and large disciplinary differences (Weiss 2016; Abrizah and Thelwall 2014; Kousha and Thelwall 2015). Weiss and others suggest the implications of Google Books’ coverage imbalance have enormous social implications: “Google and other [massive digital libraries] have essentially canonized the books they have scanned and contribute to the marginalization of those left unscanned” (301). Therefore more holistic quantitative investigations of the types of information in Google Books and possible skewness AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 36 would be welcome. Finally, Chen’s study (2012) comparing the coverage of Google Books and WorldCat could be repeated to provide longitudinal information. The utility of Google Books for research purposes also needs further investigation. Books are far more prevalently cited in Wikipedia than are research articles (Thelwall and Kousha 2015a). Examining samples of Wikipedia articles’ citation lists for the prevalence of Google Books could reveal how dominant a force Google Books has become in that space. On a more philosophical level, investigating the ways Google Books might transform scholarly processes would be useful. Szpiech (2014) considered how the Google Books version of a medieval manuscript transformed his relationship with texts, causing a rupture “produced by my new power to extract words and information from a text without being subject to its order, scale, or authority” (78). He hypothesized readers approach Google Books texts as consumers, rather than learners, whereby “the critical sense of the gestalt” is at risk of being forgotten” (84). Have other researchers in experienced what he describes? Microsoft Academic Given the stated openness of Microsoft’s new academic web search engine,21 the closed nature of Google Scholar, and the promising findings of bibliometricians (Harzing 2016b; Harzing and Alakangas 2016a), librarians and information scientists should embark on a thorough review of Microsoft Academic with similar enthusiasm to which they approached Google Scholar. The search engine’s coverage, utility for research, and suitability for bibliometric analysis22 all need to be examined. Microsoft Academic’s abilities for supporting scholarly social networking would also be of interest, perhaps using Ward et al. (2015) as a theoretical groundwork. The tool’s coverage and utility for various disciplines and research purposes is a wide-open field for highly useful research. Professional and Instructional Approaches Based on User Research To inform instructional approaches, more study on user behavior is needed, perhaps repeating Herrera’s (2011) study with Google Scholar and Microsoft Academic. In light of the recent focus on graduate students, research concerning the use of academic web search engines by undergraduates, community college students, high school students, and other groups would be welcome. Using an interview or focus group generates exploratory findings that could be tested through surveys with a larger, more representative sample of the population of interest. Studying searching behaviors has been common; can librarians design creative studies to investigate reading, engagement, and reflection when web search engines are used as part of the process? Is there a way to study whether the “Matthew Effect” (Antell et al. 2013, 281), the aging citation 21 Microsoft’s FAQ says the company is “adopting an open approach in developing the service, and we invite community participation. We like to think what we have developed is a community property. As such, we are opening up our academic knowledge as a downloadable dataset” and offers the Academic Knowledge API (https://www.microsoft.com/cognitive-services/en-us/academic-knowledge-api). 22 See Jacsó (2011) for methodology. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 37 phenomenon (Verstak et al. 2014; Martín-Martín et al. 2016a; Davis and Cochran 2015), or other epistemological hypotheses are influencing scholarship patterns? A bold study could be performed to examine differences in quality outcomes between samples of students using primarily academic search engines versus traditional library search tools. Exploratory studies in this area could begin by surveying students about their use of search tools for research methods courses or asking them to record their research process in a journal, and correlating the findings with their grades on the final research product. Three specific areas of user research needed are the use of scholarly social network platforms, researcher profiles, and the influence of these on scholarly collaboration and research (Ward, Bejarano, and Dudás 2015, 178); the performance of Google’s relatively new known-item search23 (compared with Microsoft Academic’s known-item search abilities), and searching in non-English languages. Regarding the latter, Albarillo’s (2016) method which he applied to library databases could be repeated with Google Scholar, Microsoft Academic, and Google Books. Finally, to continue their strong track record as experts in navigating the landscape of digital scholarship, librarians need to research assumptions regarding best practices for scholarly logistics. For example, searching Google for article titles plus the term “doi,” then scanning the results list for ResearchGate was found by this study’s author to most efficiently provide doi numbers: but is this a reliable approach? Does ResearchGate have sufficient accuracy to be recommended as the optimal tool for this task? What is the most efficient way for a scholar to locate full text for a citation? Are academic search engines’ bibliographic citation management software export tools competitive with third-party commercial tools such as RefWorks? Another area needing investigation is the visibility of links to free full text in Google Scholar. Pitol and DeGroote found that 70% percent of the items in their study had at least one free full-text version available through a “hidden” Google Scholar version (2014, 603), and this author’s work on this review article indicates this problem still exists — but to what extent? Also, when free full text exists in multiple repositories (e.g. ResearchGate, Digital Commons, Academic.edu), which are the most trustworthy and practically useful for scholars? Librarians should discuss the answers to these questions and be ready to provide expert advice to users. CONCLUSION With so many users opting to use academic web search engines for research, librarians need to investigate the performance of Microsoft Academic, Google Books, and of Google Scholar for the arts and humanities, and to re-think library services and collections in light of these tools’ strengths and limitations. The evolution of web indexing and increasing free access to full text should be monitored in conjunction with library collection development. To remain relevant to 23 Google Scholar’s blog notes that in January 2016, a change was made so “Scholar now automatically identifies queries that are likely to be looking for a specific paper” Technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 38 modern researchers, librarians should continue to strengthen their knowledge of and expertise with public academic web search engines, full-text repositories, and scholarly networks. BIBLIOGRAPHY Abrizah, A., and Mike Thelwall. 2014. "Can the Impact of Non- Western Academic Books be Measured? An Investigation of Google Books and Google Scholar for Malaysia." Journal of the Association for Information Science & Technology 65 (12): 2498-2508. https://doi.org/10.1002/asi.23145. Albarillo, Frans. 2016. "Evaluating Language Functionality in Library Databases." International Information & Library Review 48 (1): 1-10. https://doi.org/10.1080/10572317.2016.1146036. Antell, Karen, Molly Strothmann, Xiaotian Chen, and Kevin O’Kelly. 2013. "Cross-Examining Google Scholar." Reference & User Services Quarterly 52 (4): 279-282. https://doi.org/10.5860/rusq.52n4.279. Asher, Andrew D., Lynda M. Duke, and Suzanne Wilson. 2012. "Paths of Discovery: Comparing the Search Effectiveness of EBSCO Discovery Service, Summon, Google Scholar, and Conventional Library Resources." College & Research Libraries 74(5):464-488. https://doi.org/10.5860/crl- 374. Askey, Dale, and Kenning Arlitsch. 2015. "Heeding the Signals: Applying Web Best Practices When Google Recommends." Journal of Library Administration 55 (1): 49-59. https://doi.org/10.1080/01930826.2014.978685. Authors Guild. "Authors Guild v. Google." Accessed January 1, 2016, https://www.authorsguild.org/where-we-stand/authors-guild-v-google/. Bartol, Tomaž, and Maria Mackiewicz-Talarczyk. 2015. "Bibliometric Analysis of Publishing Trends in Fiber Crops in Google Scholar, Scopus, and Web of Science." Journal of Natural Fibers 12 (6): 531. https://doi.org/10.1080/15440478.2014.972000. Boeker, Martin, Werner Vach, and Edith Motschall. 2013. "Google Scholar as Replacement for Systematic Literature Searches: Good Relative Recall and Precision Are Not Enough." BMC Medical Research Methodology 13 (1): 1. Bonato, Sarah. 2016. "Google Scholar and Scopus for Finding Gray Literature Publications." Journal of the Medical Library Association 104 (3): 252-254. https://doi.org/10.3163/1536- 5050.104.3.021. Bornmann, Lutz, Andreas Thor, Werner Marx, and Hermann Schier. 2016. "The Application of Bibliometrics to Research Evaluation in the Humanities and Social Sciences: An Exploratory Study using Normalized Google Scholar Data for the Publications of a Research Institute." INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 39 Journal of the Association for Information Science & Technology 67 (11): 2778-2789. https://doi.org/10.1002/asi.23627. Boumenot, Diane. "Printing a Book from Google Books." One Rhode Island Family. Last modified December 3, 2015, accessed January 1, 2017. https://onerhodeislandfamily.com/2015/12/03/printing-a-book-from-google-books/. Bøyum, Idunn, and Svanhild Aabø. 2015. "The Information Practices of Business PhD Students." New Library World 116 (3): 187-200. https://doi.org/10.1108/NLW-06-2014-0073. Bramer, Wichor M., Dean Giustini, and Bianca M. R. Kramer. 2016. "Comparing the Coverage, Recall, and Precision of Searches for 120 Systematic Reviews in Embase, MEDLINE, and Google Scholar: A Prospective Study." Systematic Reviews 5(39):1-7. https://doi.org/10.1186/s13643-016-0215-7. Cals, J. W., and D. Kotz. 2016. "Literature Review in Biomedical Research: Useful Search Engines Beyond PubMed." Journal of Clinical Epidemiology 71: 115-117. https://doi.org/10.1016/j.jclinepi.2015.10.012. Carlson, Scott. 2006. "Challenging Google, Microsoft Unveils a Search Tool for Scholarly Articles." Chronicle of Higher Education 52 (33). Cavacini, Antonio. 2015. "What is the Best Database for Computer Science Journal Articles?" Scientometrics 102 (3): 2059-2071. https://doi.org/10.1007/s11192-014-1506-1. Chen, Xiaotian. 2012. "Google Books and WorldCat: A Comparison of their Content." Online Information Review 36 (4): 507-516. https://doi.org/10.1108/14684521211254031. ———. 2014. "Open Access in 2013: Reaching the 50% Milestone." Serials Review 40 (1): 21-27. https://doi.org/10.1080/00987913.2014.895556. Choong, Miew Keen, Filippo Galgani, Adam G. Dunn, and Guy Tsafnat. 2014. "Automatic Evidence Retrieval for Systematic Reviews." Journal of Medical Internet Research 16 (10): 1-1. https://doi.org/10.2196/jmir.3369. Ciccone, Karen, and John Vickery. 2015. "Summon, EBSCO Discovery Service, and Google Scholar: A Comparison of Search Performance using User Queries." Evidence Based Library & Information Practice 10 (1): 34-49. https://ejournals.library.ualberta.ca/index.php/EBLIP/article/view/23845. Conrad, Lettie Y., Elisabeth Leonard, and Mary M. Somerville. 2015. "New Pathways in Scholarly Discovery: Understanding the Next Generation of Researcher Tools." Paper presented at the Association of College and Research Libraries annual conference, March 25-27, Portland, OR. https://pdfs.semanticscholar.org/3cb1/315476ccf9b443c01eb9b1d175ae3b0a5b4e.pdf. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 40 Dagienė, Eleonora, and Danutė Krapavickaitė. 2016. "How Researchers Manage their Academic Activities." Learned Publishing 29(3):155-163. https://doi.org/10.1002/leap.1030. Davis, Philip M., and Angela Cochran. 2015. "Cited Half-Life of the Journal Literature." arXiv Preprint arXiv:1504.07479. https://arxiv.org/abs/1504.07479. Delgado López-Cózar, Emilio, Nicolás Robinson-García, and Daniel Torres-Salinas. 2014. "The Google Scholar Experiment: How to Index False Papers and Manipulate Bibliometric Indicators." Journal of the Association for Information Science & Technology 65 (3): 446-454. https://doi.org/10.1002/asi.23056. Erb, Brian, and Rob Sica. 2015. "Flagship Database for Literature Searching Or Flelpful Auxiliary?" Charleston Advisor 17 (2): 47-50. https://doi.org/10.5260/chara.17.2.47. Fagan, Jody Condit, and David Gaines. 2016. "Take Charge of EDS: Vet Your Content." Presentation to the EBSCO Users' Group, Boston, MA, May 10-11. Gehanno, Jean-François, Laetitia Rollin, and Stefan Darmoni. 2013. "Is the Coverage of Google Scholar Enough to be Used Alone for Systematic Reviews." BMC Medical Informatics and Decision Making 13 (1): 1. https://doi.org/10.1186/1472-6947-13-7. Georgas, Helen. 2015. "Google vs. the Library (Part III): Assessing the Quality of Sources found by Undergraduates." portal: Libraries and the Academy 15 (1): 133-161. https://doi.org/10.1353/pla.2015.0012. Giustini, Dean, and Maged N. Kamel Boulos. 2013. "Google Scholar is Not Enough to be Used Alone for Systematic Reviews." Online Journal of Public Health Informatics 5 (2). https://doi.org/10.5210/ojphi.v5i2.4623. Gray, Jerry E., Michelle C. Hamilton, Alexandra Hauser, Margaret M. Janz, Justin P. Peters, and Fiona Taggart. 2012. "Scholarish: Google Scholar and its Value to the Sciences." Issues in Science and Technology Librarianship 70 (Summer). https://doi.org/10.1002/asi.21372/full. Haddaway, Neal R. 2015. "The Use of Web-Scraping Software in Searching for Grey Literature." Grey Journal 11 (3): 186-190. Haddaway, Neal Robert, Alexandra Mary Collins, Deborah Coughlin, and Stuart Kirk. 2015. "The Role of Google Scholar in Evidence Reviews and its Applicability to Grey Literature Searching." PloS One 10 (9): e0138237. https://doi.org/10.1371/journal.pone.0138237. Hammarfelt, Björn. 2014. "Using Altmetrics for Assessing Research Impact in the Humanities." Scientometrics 101 (2): 1419-1430. https://doi.org/10.1007/s11192-014-1261-3. Hands, Africa. 2012. "Microsoft Academic Search – http://academic.research.microsoft.com." Technical Services Quarterly 29 (3): 251-252. https://doi.org/10.1080/07317131.2012.682026. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 41 Harper, Sarah Fletcher. 2016. "Google Books Review." Journal of Electronic Resources in Medical Libraries 13 (1): 2-7. https://doi.org/10.1080/15424065.2016.1142835. Harzing, Anne-Wil. 2013. "A Preliminary Test of Google Scholar as a Source for Citation Data: A Longitudinal Study of Nobel Prize Winners." Scientometrics 94 (3): 1057-1075. https://doi.org/10.1007/s11192-012-0777-7. ———. 2014. "A Longitudinal Study of Google Scholar Coverage between 2012 and 2013." Scientometrics 98 (1): 565-575. https://doi.org/10.1007/s11192-013-0975-y. ———. 2016a. Publish Or Perish. Vol. 5. http://www.harzing.com/resources/publish-or-perish. ———. 2016b. "Microsoft Academic (Search): A Phoenix Arisen from the Ashes?" Scientometrics 108 (3): 1637-1647.https://doi.org/10.1007/s11192-016-2026-y. Harzing, Anne-Wil, and Satu Alakangas. 2016a. "Microsoft Academic: Is the Phoenix Getting Wings?" Scientometrics: 1-13. Harzing, Anne-Wil, and Satu Alakangas. 2016b. "Google Scholar, Scopus and the Web of Science: A Longitudinal and Cross-Disciplinary Comparison." Scientometrics 106 (2): 787-804. https://doi.org/10.1007/s11192-015-1798-9. Herrera, Gail. 2011. "Google Scholar Users and User Behaviors: An Exploratory Study." College & Research Libraries 72 (4): 316-331. https://doi.org/10.5860/crl-125rl. Higgins, Julian, and S. Green, eds. 2011. Cochrane Handbook for Systematic Reviews of Interventions. Version 5.1.0 ed.: The Cochrane Collaboration. http://handbook.cochrane.org/. Hilbert, Fee, Julia Barth, Julia Gremm, Daniel Gros, Jessica Haiter, Maria Henkel, Wilhelm Reinhardt, and Wolfgang G. Stock. 2015. "Coverage of Academic Citation Databases Compared with Coverage of Scientific Social Media." Online Information Review 39 (2): 255-264. https://doi.org/10.1108/OIR-07-2014-0159. Hoffmann, Anna Lauren. 2014. "Google Books as Infrastructure of in/Justice: Towards a Sociotechnical Account of Rawlsian Justice, Information, and Technology." Theses and Dissertations. Paper 530. http://dc.uwm.edu/etd/530/. ———. 2016. "Google Books, Libraries, and Self-Respect: Information Justice Beyond Distributions." The Library 86 (1). https://doi.org/10.1086/684141. Horrigan, John B. "Lifelong Learning and Technology." Pew Research Center, last modified March 22, 2016, accessed February 7, 2017, http://www.pewinternet.org/2016/03/22/lifelong- learning-and-technology/. Hug, Sven E., Michael Ochsner, and Martin P. Braendle. 2016. "Citation Analysis with Microsoft Academic." arXiv Preprint arXiv:1609.05354.https://arxiv.org/abs/1609.05354. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 42 Huistra, Hieke, and Bram Mellink. 2016. "Phrasing History: Selecting Sources in Digital Repositories." Historical Methods: A Journal of Quantitative and Interdisciplinary History 49 (4): 220-229. https://doi.org/10.1093/llc/fqw002. Inger, Simon, and Tracy Gardner. 2016. "How Readers Discover Content in Scholarly Publications." Information Services & Use 36 (1): 81-97. https://doi.org/10.3233/ISU-160800. Jackson, Joab. 2010. "Google: 129 Million Different Books have been Published." PC World, August 6, 2010. http://www.pcworld.com/article/202803/google_129_million_different_books_have_been_pu blished.html. Jacsó, P. 2008. "Live Search Academic." Peter’s Digital Reference Shelf, April. Jacsó, Péter. 2011. "The Pros and Cons of Microsoft Academic Search from a Bibliometric Perspective." Online Information Review 35 (6): 983-997. https://doi.org/10.1108/14684521111210788. Jamali, Hamid R., and Majid Nabavi. 2015. "Open Access and Sources of Full-Text Articles in Google Scholar in Different Subject Fields." Scientometrics 105 (3): 1635-1651. https://doi.org/10.1007/s11192-015-1642-2. Johnson, Paula C., and Jennifer E. Simonsen. 2015. "Do Engineering Master's Students Know What They Don't Know?" Library Review 64 (1): 36-57. https://doi.org/10.1108/LR-05-2014-0052. Jones, Edgar. 2010. "Google Books as a General Research Collection." Library Resources & Technical Services 54 (2): 77-89. https://doi.org/10.5860/lrts.54n2.77. Karlsson, Niklas. 2014. "The Crossroads of Academic Electronic Availability: How Well does Google Scholar Measure Up Against a University-Based Metadata System in 2014?" Current Science 107 (10): 1661-1665. http://www.currentscience.ac.in/Volumes/107/10/1661.pdf. Kemman, Max, Martijn Kleppe, and Stef Scagliola. 2013. "Just Google It-Digital Research Practices of Humanities Scholars." arXiv Preprint arXiv:1309.2434. https://arxiv.org/abs/1309.2434. Khabsa, Madian, and C. Lee Giles. 2014. "The Number of Scholarly Documents on the Public Web." PloS One 9 (5): https://doi.org/10.1371/journal.pone.0093949 Kirkwood Jr., Hal, and Monica C. Kirkwood. 2011. "Historical Research." Online 35 (4): 28-32. Koler-Povh, Teja, Primož Južnic, and Goran Turk. 2014. "Impact of Open Access on Citation of Scholarly Publications in the Field of Civil Engineering." Scientometrics 98 (2): 1033-1045. https://doi.org/10.1007/s11192-013-1101-x. Kousha, Kayvan, Mike Thelwall, and Somayeh Rezaie. 2011. "Assessing the Citation Impact of Books: The Role of Google Books, Google Scholar, and Scopus." Journal of the American Society INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 43 for Information Science and Technology 62 (11): 2147-2164. https://doi.org/10.1002/asi.21608. Kousha, Kayvan, and Mike Thelwall. 2017. "Are Wikipedia Citations Important Evidence of the Impact of Scholarly Articles and Books?" Journal of the Association for Information Science and Technology. 68(3):762-779. https://doi.org/10.1002/asi.23694. Kousha, Kayvan, and Mike Thelwall. 2015. "An Automatic Method for Extracting Citations from Google Books." Journal of the Association for Information Science & Technology 66 (2): 309- 320. https://doi.org/10.1002/asi.23170. Lee, Jongwook, Gary Burnett, Micah Vandegrift, Hoon Baeg Jung, and Richard Morris. 2015. "Availability and Accessibility in an Open Access Institutional Repository: A Case Study." Information Research 20 (1): 334-349. Levay, Paul, Nicola Ainsworth, Rachel Kettle, and Antony Morgan. 2016. "Identifying Evidence for Public Health Guidance: A Comparison of Citation Searching with Web of Science and Google Scholar." Research Synthesis Methods 7 (1): 34-45. https://doi.org/10.1002/jrsm.1158. Levy, Steven. "Making the World’s Problem Solvers 10% More Efficient." Backchannel. Last modified October 17, 2014, accessed January 14, 2016, https://medium.com/backchannel/the-gentleman-who-made-scholar-d71289d9a82d. Los Angeles Times. 2016. "Google, Books and 'Fair Use'." Los Angeles Times, April 19, 2016. http://www.latimes.com/opinion/editorials/la-ed-google-book-search-20160419-story.html Martin, Kim, and Anabel Quan-Haase. 2016. "The Role of Agency in Historians’ Experiences of Serendipity in Physical and Digital Information Environments." Journal of Documentation 72 (6): 1008-1026. https://doi.org/10.1108/JD-11-2015-0144. Martín-Martín, Alberto, Juan Manuel Ayllón, Enrique Orduña-Malea, and Emilio Delgado López- Cózar. 2016a. "2016 Google Scholar Metrics Released: A Matter of Languages... and Something Else." arXiv Preprint arXiv:1607.06260. https://arxiv.org/abs/1607.06260. Martín-Martín, Alberto, Enrique Orduña-Malea, Juan M. Ayllón, and Emilio Delgado López-Cózar. 2016b. "The Counting House: Measuring those Who Count. Presence of Bibliometrics, Scientometrics, Informetrics, Webometrics and Altmetrics in the Google Scholar Citations, ResearcherID, ResearchGate, Mendeley & Twitter." arXiv Preprint arXiv:1602.02412. https://arxiv.org/abs/1602.02412. Martín-Martín, Alberto, Enrique Orduña-Malea, Juan Manuel Ayllón, and Emilio Delgado López- Cózar. 2014. "Does Google Scholar Contain All Highly Cited Documents (1950-2013)?" arXiv Preprint arXiv:1410.8464. https://arxiv.org/abs/1410.8464. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 44 Martín-Martín, Alberto, Enrique Orduña-Malea, Juan Ayllón, and Emilio Delgado López-Cózar. 2016c. "Back to the Past: On the Shoulders of an Academic Search Engine Giant." Scientometrics 107 (3): 1477-1487. https://doi.org/10.1007/s11192-016-1917-2. Martín-Martín, Alberto, Enrique Orduña-Malea, Anne-Wil Harzing, and Emilio Delgado López- Cózar. 2017. "Can we Use Google Scholar to Identify Highly-Cited Documents?" Journal of Informetrics 11 (1): 152-163. https://doi.org/10.1016/j.joi.2016.11.008. Mays, Dorothy A. 2015. "Google Books: Far More Than Just Books." Public Libraries 54 (5): 23-26. http://publiclibrariesonline.org/2015/10/far-more-than-just-books/ Meier, John J., and Thomas W. Conkling. 2008. "Google Scholar’s Coverage of the Engineering Literature: An Empirical Study." The Journal of Academic Librarianship 34 (3): 196-201. https://doi.org/10.1016/j.acalib.2008.03.002. Moed, Henk F., Judit Bar-Ilan, and Gali Halevi. 2016. "A New Methodology for Comparing Google Scholar and Scopus." arXiv Preprint arXiv:1512.05741.https://arxiv.org/abs/1512.05741. Namei, Elizabeth, and Christal A. Young. 2015. "Measuring our Relevancy: Comparing Results in a Web-Scale Discovery Tool, Google & Google Scholar." Paper presented at the Association of College and Research Libraries annual conference, March 25-27, Portland, OR. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/Namei_Young.pdf National Institute for Health and Care Excellence (NICE). "Developing NICE Guidelines: The Manual." Last modified April 2016, accessed November 27, 2016. https://www.nice.org.uk/process/pmg20. Neuhaus, Chris, Ellen Neuhaus, Alan Asher, and Clint Wrede. 2006. "The Depth and Breadth of Google Scholar: An Empirical Study." portal: Libraries and the Academy 6 (2): 127-141. https://doi.org/10.1353/pla.2006.0026. Obrien, Patrick, Kenning Arlitsch, Leila Sterman, Jeff Mixter, Jonathan Wheeler, and Susan Borda. 2016. "Undercounting File Downloads from Institutional Repositories." Journal of Library Administration 56 (7): 854-874. https://doi.org/10.1080/01930826.2016.1216224. Orduña-Malea, Enrique, and Emilio Delgado López-Cózar. 2014. "Google Scholar Metrics Evolution: An Analysis According to Languages." Scientometrics 98 (3): 2353-2367. https://doi.org/10.1007/s11192-013-1164-8. Orduña-Malea, Enrique, and Emilio Delgado López-Cózar. 2015. "The Dark Side of Open Access in Google and Google Scholar: The Case of Latin-American Repositories." Scientometrics 102 (1): 829-846. https://doi.org/10.1007/s11192-014-1369-5. Orduña-Malea, Enrique, Alberto Martín-Martín, Juan M. Ayllon, and Emilio Delgado López-Cózar. 2014. "The Silent Fading of an Academic Search Engine: The Case of Microsoft Academic INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 45 Search." Online Information Review 38(7):936-953. https://doi.org/10.1108/OIR-07-2014- 0169. Ortega, José Luis. 2015. "Relationship between Altmetric and Bibliometric Indicators Across Academic Social Sites: The Case of CSIC's Members." Journal of Informetrics 9 (1): 39-49. https://doi.org/10.1016/j.joi.2014.11.004. Ortega, José Luis, and Isidro F. Aguillo. 2014. "Microsoft Academic Search and Google Scholar Citations: Comparative Analysis of Author Profiles." Journal of the Association for Information Science & Technology 65 (6): 1149-1156. https://doi.org/10.1002/asi.23036. Pitol, Scott P., and Sandra L. De Groote. 2014. "Google Scholar Versions: Do More Versions of an Article Mean Greater Impact?" Library Hi Tech 32 (4): 594-611. https://doi.org/0.1108/LHT- 05-2014-0039. Prins, Ad A. M., Rodrigo Costas, Thed N. van Leeuwen, and Paul F. Wouters. 2016. "Using Google Scholar in Research Evaluation of Humanities and Social Science Programs: A Comparison with Web of Science Data." Research Evaluation 25 (3): 264-270. https://doi.org/10.1093/reseval/rvv049. Quint, Barbara. 2016. "Find and Fetch: Completing the Course." Information Today 33 (3): 17-17. Rothfus, Melissa, Ingrid S. Sketris, Robyn Traynor, Melissa Helwig, and Samuel A. Stewart. 2016. "Measuring Knowledge Translation Uptake using Citation Metrics: A Case Study of a Pan- Canadian Network of Pharmacoepidemiology Researchers." Science & Technology Libraries 35 (3): 228-240. https://doi.org/10.1080/0194262X.2016.1192008. Ruppel, Margie. 2009. "Google Scholar, Social Work Abstracts (EBSCO), and PsycINFO (EBSCO)." Charleston Advisor 10 (3): 5-11. Shultz, M. 2007. "Comparing Test Searches in PubMed and Google Scholar." Journal of the Medical Library Association : JMLA 95 (4): 442-445. https://doi.org/10.3163/1536-5050.95.4.442. Stansfield, Claire, Kelly Dickson, and Mukdarut Bangpan. 2016. "Exploring Issues in the Conduct of Website Searching and Other Online Sources for Systematic Reviews: How Can We be Systematic?" Systematic Reviews 5 (1): 191. https://doi.org/10.1186/s13643-016-0371-9. Ştirbu, Simona, Paul Thirion, Serge Schmitz, Gentiane Haesbroeck, and Ninfa Greco. 2015. "The Utility of Google Scholar when Searching Geographical Literature: Comparison with Three Commercial Bibliographic Databases." The Journal of Academic Librarianship 41 (3): 322-329. https://doi.org/10.1016/j.acalib.2015.02.013. Suiter, Amy M., and Heather Lea Moulaison. 2015. "Supporting Scholars: An Analysis of Academic Library Websites' Documentation on Metrics and Impact." The Journal of Academic Librarianship 41 (6): 814-820. https://doi.org/10.1016/j.acalib.2015.09.004. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 46 Szpiech, Ryan. 2014. "Cracking the Code: Reflections on Manuscripts in the Age of Digital Books." Digital Philology: A Journal of Medieval Cultures 3(1): 75-100. https://doi.org/10.1353/dph.2014.0010. Testa, Matthew. 2016. "Availability and Discoverability of Open-Access Journals in Music." Music Reference Services Quarterly 19 (1): 1-17. https://doi.org/10.1080/10588167.2016.1130386. Thelwall, Mike, and Kayvan Kousha. 2015b. "Web Indicators for Research Evaluation. Part 1: Citations and Links to Academic Articles from the Web." El Profesional De La Información 24 (5): 587-606.https://doi.org/10.3145/epi.2015.sep.08. Thielen, Frederick W., Ghislaine van Mastrigt, L. T. Burgers, Wichor M. Bramer, Marian H. J. M. Majoie, Sylvia M. A. A. Evers, and Jos Kleijnen. 2016. "How to Prepare a Systematic Review of Economic Evaluations for Clinical Practice Guidelines: Database Selection and Search Strategy Development (Part 2/3)." Expert Review of Pharmacoeconomics & Outcomes Research: 1-17. https://doi.org/10.1080/14737167.2016.1246962. Trapp, Jamie. 2016. "Web of Science, Scopus, and Google Scholar Citation Rates: A Case Study of Medical Physics and Biomedical Engineering: What Gets Cited and What Doesn't?" Australasian Physical & Engineering Sciences in Medicine. 39(4): 817-823. https://doi.org/10.1007/s13246-016-0478-2. Van Noorden, R. 2014. "Online Collaboration: Scientists and the Social Network." Nature 512 (7513): 126-129. https://doi.org/10.1038/512126a. Varshney, Lav R. 2012. "The Google Effect in Doctoral Theses." Scientometrics 92 (3): 785-793. https://doi.org/10.1007/s11192-012-0654-4. Verstak, Alex, Anurag Acharya, Helder Suzuki, Sean Henderson, Mikhail Iakhiaev, Cliff Chiung Yu Lin, and Namit Shetty. 2014. "On the Shoulders of Giants: The Growing Impact of Older Articles." arXiv Preprint arXiv:1411.0275. https://arxiv.org/abs/1411.0275. Walsh, Andrew. 2015. "Beyond "Good" and "Bad": Google as a Crucial Component of Information Literacy." In The Complete Guide to Using Google in Libraries, edited by Carol Smallwood, 3-12. New York: Rowman & Littlefield. Waltman, Ludo. 2016. "A Review of the Literature on Citation Impact Indicators." Journal of Informetrics 10 (2): 365-391. https://doi.org/10.1016/j.joi.2016.02.007. Ward, Judit, William Bejarano, and Anikó Dudás. 2015. "Scholarly Social Media Profiles and Libraries: A Review." Liber Quarterly 24 (4): 174–204.https://doi.org/10.18352/lq.9958. Weideman, Melius. 2015. "ETD Visibility: A Study on the Exposure of Indian ETDs to the Google Scholar Crawler." Paper presented at ETD 2015: 18th International Symposium on Electronic Theses and Dissertations, New Delhi, India, November 4-6. http://www.web- INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 47 visibility.co.za/0168-conference-paper-2015-weideman-etd-theses-dissertation-india-google- scholar-crawler.pdf. Weiss, Andrew. 2016. "Examining Massive Digital Libraries (MDLs) and their Impact on Reference Services." Reference Librarian 57 (4): 286-306. https://doi.org/10.1080/02763877.2016.1145614. Whitmer, Susan. 2015. "Google Books: Shamed by Snobs, a Resource for the Rest of Us." In The Complete Guide to using Google in Libraries, edited by Carol Smallwood, 241-250. New York: Rowman & Littlefield. Wildgaard, Lorna. 2015. "A Comparison of 17 Author-Level Bibliometric Indicators for Researchers in Astronomy, Environmental Science, Philosophy and Public Health in Web of Science and Google Scholar." Scientometrics 104 (3): 873-906. https://doi.org/10.1007/s11192-015-1608-4. Winter, Joost, Amir Zadpoor, and Dimitra Dodou. 2014. "The Expansion of Google Scholar Versus Web of Science: A Longitudinal Study." Scientometrics 98 (2): 1547-1565. https://doi.org/10.1007/s11192-013-1089-2. Wu, Tim. 2015. "Whatever Happened to Google Books?" The New Yorker, September 11, 2015. Wu, Ming-der, and Shih-chuan Chen. 2014. "Graduate Students Appreciate Google Scholar, but Still Find use for Libraries." Electronic Library 32 (3): 375-389. https://doi.org/10.1108/EL-08- 2012-0102. Yang, Le. 2016. "Making Search Engines Notice: An Exploratory Study on Discoverability of DSpace Metadata and PDF Files." Journal of Web Librarianship 10 (3): 147-160. https://doi.org/10.1080/19322909.2016.1172539.
9720 ---- Microsoft Word - Author_Edits_March_ITAL_Rebmannproof_Edits.docx TV White Spaces in Public Libraries: A Primer Kristen Radsliff Rebmann, Emmanuel Edward Te, and Donald Means INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 36 ABSTRACT TV White Space (TVWS) represents one new wireless communication technology that has the potential to improve internet access and inclusion. This primer describes TVWS technology as a viable, long-term access solution for the benefit of public libraries and their communities, especially for underserved populations. Discussion focuses first on providing a brief overview of the digital divide and the emerging role of public libraries as internet access providers. Next, a basic description of TVWS and its features is provided, focusing on key aspects of the technology relevant to libraries as community anchor institutions. Several TVWS implementations are described with discussion of TVWS implementations in several public libraries. Finally, consideration is given to first steps that library organizations must take when contemplating new TVWS implementations supportive of Wi- Fi applications and crisis response planning. INTRODUCTION Tens of millions of people rely wholly or in part on libraries to provide access to the Internet. Many lack access to the Federal Communications Commission (FCC) recommended standard of 25 Mbps (megabits per second) download speed and 3 Mbps upload speed.1 Though the FCC reclassified high-speed Internet as a public utility under Title II of the Telecommunications Act to ensure that broadband networks are “fast, fair, and open” in 2015,2 the “digital divide” still remains. One in four community members does not have access to the Internet at home. Accounting for age and education level, households with the lowest median income households have service adoption rates of around 50%, compared to those with higher incomes, with rates of 80 to 90%.3 A recent Pew Research Center survey on home broadband adoption found that 43% of those surveyed reported cost being their main reason for non-adoption.4 Individuals with low quality or no access are more likely to be digitally disadvantaged, tend to use library computers more frequently, and are less equipped to interact and compete economically as more services and application processes move online.5 Kristen Radsliff Rebmann (Kristen.rebmann@sjsu.edu) is Associate Professor, San Jose State University School of Information, San Jose, CA. Emmanuel Edward Te (emmanueledward.te@sjsu.edu) is a graduate student, San Jose State University School of Information, San Jose, CA. Donald Means (don@digitalvillage.com) is co-founder and principal of Digital Village Associates, Sausalito, CA. TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 37 This article highlights TV White Space (TVWS), a new wireless communication technology with the potential to assist libraries in addressing digital access and inclusion issues. This primer provides first a brief overview of the digital divide and the emerging role of public libraries as internet access providers, highlighting the need for cost-efficient, technological solutions. We go further to provide a basic description of TVWS and its features, focusing on key aspects of the technology relevant to libraries as community anchor institutions. Several TVWS implementations are described with discussion of how TVWS was set up in several public libraries. Finally, we extend consideration to first steps library organizations must consider when contemplating new implementations including everyday applications and crisis response planning. Digital Access and Inclusion The term “digital divide” describes the gap between people who can easily access and use technology and the internet, and those who cannot.6 As Kinney observes, “there has not been one single digital divide, but rather a series of divides that attend each new technology.”7 Digital divides are exacerbated by various factors including: socioeconomic status, education, geography, age, ability, language, and especially availability and quality.8 In recent years, the language describing this issue has changed, but the inequalities stay consistent and widen among different dimensions with each emerging technology. The most recent public policy term “digital inclusion” promotes digital literacy efforts for unserved and underserved populations.9 The progression from the term “digital divide” to “digital inclusion” represents a shift in focus from issues of access exclusively toward contexts and quality of participation and usage. Along these lines, the language of digital inclusion reframes the issue by making visible that simply focusing on internet access can obscure the fact that divides associated with quality and effectiveness remain.10 In response to the digital divide, public libraries have become the “unofficial” providers of internet access, stemming from libraries’ access to broadband infrastructure, maintenance of publicly- available computers, and services providing assistance and training.11 A Pew Research Center survey on perceptions of libraries found that most respondents reported viewing public libraries as important parts of their communities, providing resources and assisting in decisions regarding what information to trust.12 However, many public libraries are facing an “infrastructure plateau” of internet access due to few computer workstations and slower broadband connection speeds that can support a growing number of users,13 on top of insufficient funding, physical space, and staffing.14 Previous surveys show that although public libraries are connected to the internet and provide public access workstations and wireless access, nearly 50% of public libraries only offer wireless access that shares the same bandwidth as their workstations.15 This increased usage strains existing network connections and infrastructure, resulting in slower connections for everyone connected to the public library’s network. Many public libraries cannot accommodate more workstations, support the power requirements of both workstations and patrons’ laptops, and afford workstation upgrades and bandwidth increases to move past their insufficient connectivity speeds. Libraries often lack the IT skills, time, and funds to upgrade their INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 38 infrastructure.16 Typical wireless access via Wi-Fi is relegated to distances within library buildings, which may extend to exterior spaces and is available only during operating hours. Despite these challenges, public libraries continually provide access and “at-the-point-of-need” training and support for their patrons, especially for those who do not have easy access to the internet and computers.17 Subsidized by federal funding, libraries represent key access providers and technology trainers for the public without internet access.18 The FCC classifies libraries as “community anchor institutions” (CAIs), organizations that “facilitate greater use of broadband by vulnerable populations, including low-income, the unemployed, and the aged.”19 Recent surveys show that users have a positive view of libraries, providing opportunities to spend time in a safe space, pursue learning, and promote a sense of community. Librarians offer internet skills training programs more often than other community organizations though (at around 75% of the time) training occurs informally.20 In particular, 29% of respondents to a library use survey reported going to libraries to use computers, the internet, or the Wi-Fi network; 7% have also reported using libraries’ Wi-Fi signals outside when libraries are closed.21 The majority of these users are more likely to be young, black, female, and lower income, utilizing library technology resources for school or work (61%), checking email or sending texts (53%), finding health information (38%), and taking online courses or completing certifications (26%).22 Public libraries are already exploring creative approaches to providing internet access for these underserved communities. The mobile hotspot lending program in public library systems in New York City and Kansas City are just two examples.23 Yet libraries must do more by supporting innovation and providing leadership by partnering with other community organizations and their stakeholders to enhance resilience in addressing access and inclusion. The emergence of TVWS wireless technology presents an opportunity for libraries to explore expanding the reach of their wireless signals beyond library buildings and extend 24/7 library Wi-Fi availability to community spaces such as subsidized housing, schools, clinics, parks, senior centers, and museums. TVWS Basics TV Whitespace (TVWS) refers to the unoccupied portions of spectrum in the VHF/UHF terrestrial television frequency bands.24 Television broadcast frequency allocations traditionally assumed that TV station transmissions operating at high power needed wide spectrum separation to prevent interference between broadcasting channels, which led to the specific spectrum allocation of these frequency “guard bands.”25 Research discovered that low-power devices can operate within these spaces, which led the Federal Communications Commission (FCC) to field test TVWS applications to wireless communications and (ultimately) promote TVWS neutrality.26 In 2015, the Federal Communications Commission (FCC) made a portion of these very valuable TVWS bands of spectrum available for open, shared public use, like Wi-Fi. Yet, unlike Wi-Fi, with a reach measured in 10s of meters, the range of TVWS is measured in 100s or even 1000s of meters. TVWS has good propagation characteristics, which makes it an extremely valuable license-exempt radio spectrum.27 It is a relatively stable frequency that does not change over time, allowing for TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 39 spectrum availability estimates to remain reliable and valid, which in turn promotes its various applications.28 Radio spectrum is considered a “common heritage of humanity,”29 as radio waves “do not respect national borders.”30 The FCC recently made a portion of these TVWS bands of spectrum available for open, shared public use.31 TVWS availability and application are contextual and dependent on many key factors. Availability is influenced by frequency (the idle channels purposely planned in TV bands, varying across regions), deployment (the height and location of the TVWS transmit antenna and its installation sites in relation to nearby surrounding TV broadcasting reception), space and distance (geographical areas outside the current planned TV coverage, including no present broadcasting signals), and time (off-air availability of licensed broadcasting transmitters during specific periods of time, subject to change by the broadcaster).32 As TVWS existed as fragmented “safety margins” between broadcast services, TVWS is typically more abundant in rural areas that have less broadcast coverage and in larger contiguous blocks rather than in highly dense urban areas.33 Assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or non- exclusive sharing can alleviate pressure on these resources.34 This “spectrum crunch” of the inefficient use of scarce spectrum resources can be alleviated with dynamic spectrum access (DSA) and spectrum sharing. TVWS availability is small where digital television has been deployed, with the potentials for aggregate interference (from TVWS users in relation to primary TV service) and self-interference (within the TVWS network), which may lead to a “mismatch situation” where there is high demand for bandwidth but very low TVWS bandwidth supply.35 As most spectrum frequencies have been organized through some form of exclusive access in which only the licensee can use the specific spectrum, technologies such as cognitive radios can enable new modes of spectrum access, supporting autonomous, self-configuring, self-planning networks which rely on up-to-date TVWS availability databases. The limited distribution (in many areas) of basic broadband infrastructure and relatively high cost of access often prevents individuals with lower incomes from participating in the digital revolution of information access and its opportunities.36 Despite these challenges to broadband availability, TVWS excels in areas with low broadband coverage. Rural regions possess greater frequency availability due to lower density of spectrum licensing. In comparison to other frequencies operating higher up on the spectrum band, TVWS does not require direct line-of-sight between devices for operation, and has lower deployment costs. Equipment market costs are comparable to Wi-Fi equipment currently on the market.37 Importantly, TVWS can address access and inclusion by having relatively low start-up costs and no ongoing services fees. As a public resource, it can work with existing services to create new, potentially mobile connections to the internet that ensure the continuation of vital services in the event of service interruptions.38 In urban areas with fewer channels available, new efficient spectrum sharing policies will be necessary. Assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or non-exclusive sharing or “recycling” of bands for more INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 40 effective spectrum use by multiple parties with changing spectrum needs can alleviate pressure on these resources.39 TVWS for Public Libraries TVWS is a viable medium for applications from internet access, content distribution within a given location, tracking (people, animals, and assets), task automation, and public safety and security,40 as well as remote patient monitoring and other telemedicine applications.41 TVWS complements existing networks that use other parts of the spectrum for access points, mobile communications, and home media devices.42 Analyses of a recent digital inclusion survey suggest that technology upgrades can have significant impact on the ability of libraries to expand programs and services.43 As community anchor institutions (CAIs), public libraries can use TVWS systems to expand and improve access to their services for their users, especially for underserved populations. Library-led collaborations to deploy TVWS networks in other CAIs and public spaces have numerous benefits. In conjunction with building-centered Wi-Fi, TVWS can redistribute network users from congested library spaces to other community sites, thereby distributing network usage across the community. From an existing broadband connection, libraries can extend their networks of internet access strategically across their communities. Yet, unlike networks which solely use limited-range Wi-Fi, far-reaching TVWS can improve the coverage and inclusion of patrons in accessing library programs, services and the broader internet.44 The portability of the access points allows libraries to extend their reach by providing wireless connections in the short- term, for cultural or civic events like fairs, markets, or concerts, and in the long-term, for use at popular public areas. Recent TVWS pilot installations have proven to be very stable in Kansas, Colorado, Mississippi, and Delaware. Manhattan Public Library (Kansas)’s TVWS project began in fall 2013. Though there were a few delays in the installation and testing process, the TVWS equipment was successfully implemented and welcomed by the community in early 2014. IT staff report that their remote locations have shown that this library service fills a community need, especially for underserved populations.45 Delta County Libraries (Colorado) are conducting trials with two public hotspots to support “Guest” access and potentially provide library patrons with more bandwidth access.46 TVWS implementations in the Pascagoula School District (Mississippi)47 and Delaware Public Libraries48 show successful initial pilot usage in providing wireless internet service directly to community-distributed access points. Though there are contextual differences across these sites, the strength of public libraries as CAIs providing internet access via TVWS systems is evident and promising. First Steps Any library can take the initiative in setting up a TVWS network on its own. The first step is to assess availability of spectrum in the library’s geographic location. Access to TVWS frequencies is free and requires no subscription fees other than the initial equipment investment. Public TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 41 databases of TVWS availability are easily accessible and have been tested by the FCC since 2011;49 Google also has posted its own spectrum database as well.50 From this setup, the library gains access to public TVWS frequencies by which they can broadcast and receive internet connections from paired TVWS-enabled remote hotspots. Once it is determined that there is available spectrum/channels in the desired area, libraries can then explore how their current broadband and wireless connections might be expanded to include several community spaces where internet access is needed. Next, the library works with a TVWS equipment supplier to design and install a TVWS network consisting of a base station that is integrated with their wired connection to the internet. Finally, the library places TVWS-enabled remote hotspots in (previously identified) community-based spaces where Wi-Fi access is needed by underserved populations. Given a high quality backhaul (i.e., fiber optic cable high speed connection), TVWS can spread that signal and provide access from the library, which is able to propagate and penetrate multiple barriers and geographical features with a signal up to 10 times stronger than current Wi-Fi. Depending on the context (geographical features, TVWS availabilities, etc.), hotspots can be installed up to six miles (10 km) away and do not require line-of-sight between the base station and hotspots. This ability is superior to current Wi-Fi networks that only cover patrons in the immediate vicinity of the library. These TVWS remote hotspots also can be easily (and strategically) moved to support occasional community needs (such as neighborhood-wide or city events) or in response to crisis situations. TVWS, Libraries, and Emergency Response Public libraries provide leadership as “ready access point, first choice, first refuge, and last resort” for community services in everyday matters and in emergencies.51 They have assisted residents in relief efforts during Hurricanes Katrina and Rita, and other natural and man-made disasters.52 …the provision of access to computers and the internet was a wholly unique and immeasurably important role for public libraries… The infrastructure can be a tremendous asset in times of emergencies, and should be incorporated into community plans.53 They have likewise provided immediate and long-term assistance to communities and aid workers, providing physical space for recovery operations for emergency agencies, communication technologies, and emotional support for the community. In previous library internet usage surveys, nearly one-third of libraries reported that their computers and internet services would be used by the public in emergencies to access relief services and benefits.54 Such activities include finding and communicating with family and friends, completing online FEMA forms and insurance claims, and checking news sites regarding information of their affected homes.55 Yet, despite the admirable and successful efforts of many public libraries, their infrastructures are not always built to meet the increased demand of user needs and e-government services in emergency contexts.56 Jaeger, Shneiderman, Fleischmann , Preece, Qu, and Wu propose the concept of community response grids (CRGs), which utilize the internet and mobile communications devices so that emergency responders and residents in a disaster area can INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 42 communicate and coordinate accurate, appropriate responses.57 This concept relies on social networks, both in person and online, to enable residents and emergency responders to work together in a multi-directional communication scheme. CRGs provide residents tailored, localized information and a means to report pertinent disaster related information to emergency responders, who in turn can synthesize and analyze submitted information and act accordingly.58 Due to their existing role as community anchor institutions (CAIs), public libraries are uniquely positioned for CRG involvement. Libraries can assist in facilitation of internet access with portable TVWS network connection points. By virtue of their portability, TVWS hotspots can provide essential digital access in times of crisis by moving along with their affected populations. Emergency operations and communications in a crisis occur throughout networks comprised of various technologies. Information management before, during, and after a disaster affects how well a crisis is managed.59 Broadband internet can be one access route in the event that phone and radio transmissions are affected, and vice versa, as part of a “mixed media approach” to get messages to those that need it in an emergency.60 Yet one must remember that internet communications are double-edged: the internet provides relevant material on demand and near instant sharing and collaborating, but these very features can compound a crisis with misinformation.61 Despite these concerns, the potential of the integration of wireless devices and other technologies into a multi-technology, collaborative response system can solve the problem of existing communication structures that lack coordination and quality control.62 The proliferation of smartphones, laptops, and other portable wireless devices makes such technology ideal for emergency communications, especially in how users’ familiarity with their own devices will help them navigate CRG communications while under stress.63 CONCLUSION Supporting internet access and inclusion in public libraries and having equal, affordable, and available access to information is a necessary component to bridging the digital divide. Technology has become “an irreducible component of modern life, and its presence and use has significant impact on an individuals’ ability to fully engage in society.”64 As Cohron argues, this principle represents more than providing people with internet access: it is about “leveling the playing field in regards to information diffusion. The internet is such a prominent utility in peoples’ lives that we, as a society, cannot afford for citizens to go without.”65 Broadband access is the first step; digital literacy training is also a necessity. Access alone is not enough to ensure quality and effective use, however, as the digital divide is representative of broader social inequalities that computer and internet access cannot fully remedy.66 This is a complex problem that requires a multi-faceted solution. As Kinney states, “the digital divide is a moving target, and new divides open up with new technologies. Libraries help bridge some inequities more than others, and substantial disparities exist among library systems.”67 Internet access also becomes a necessity when the internet is to play a role in emergency communications.68 TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 43 It is problematic to suggest that public libraries can be simultaneously promoted as the solution to digital divide issues while facing cuts to funding. Policy makers, community advocates, and the community members themselves are stakeholders in the success of their communities, and must also take responsibility for access and inclusion via public libraries.69 As public agencies automate to increase equality and save money, they exacerbate digital divides by excluding those without access. Suggesting that community members simply visit the library to ensure access to public services places additional pressure on libraries, yet these efforts may go unsupported and unacknowledged. Public libraries are already valuable community access points to resources especially in emergencies, though many suffer from a lack of concerted disaster planning. Along similar lines, many libraries are ill-equipped to accommodate the bandwidth needs of growing and oftentimes sparsely connected populations. As communications and government services move increasingly online, it becomes imperative to build strong cost-effective information infrastructures. TVWS connections can arguably help in breaking down the barriers that challenge ubiquitous access and inclusion. TVWS-enabled remote access points in daily use around communities are ideally situated to provide everyday Wi-Fi and for rapid redeployment to damaged areas (as pop-up hotspots) to provide essential communication and information resources in times of crisis. In short, TVWS can augment the technological infrastructure of public libraries toward further developing their roles as CAIs and leaders serve their communities well into the future. REFERENCES 1. Wireline Competition Bureau, “2016 Broadband Progress Report,” Federal Communications Commission, January 29, 2016, https://www.fcc.gov/reports-research/reports/broadband- progress-reports/2016-broadband-progress-report. 2. Office of Chairman Wheeler, “FCC Adopts Strong, Sustainable Rules to Protect the Open Internet,” Federal Communications Commission, February 26, 2015, https://apps.fcc.gov/edocs_public/attachmatch/DOC-332260A1.pdf. 3. “Here's What the Digital Divide Looks Like in the United States,” The White House, July 15, 2015, https://www.whitehouse.gov/share/heres-what-digital-divide-looks-united-states. 4. John B. Horrigan and Maeve Duggan, “Home Broadband 2015,” Pew Research Center, December 21, 2015, http://www.pewInternet.org/files/2015/12/Broadband-adoption- full.pdf. This 43% is further divided between 33% reporting the monthly subscription cost as their main reason, while the other 10% report the expensive cost of a computer as their reason for non-adoption. 5. Bo Kinney, “The Internet, Public Libraries, and the Digital Divide,” Public Library Quarterly 29, no. 2 (2010): 104-161, https://doi.org/10.1080/01616841003779718. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 44 6. Madalyn Cohron, “The Continuing Digital Divide in the United States,” The Serials Librarian 69, no. 1 (2015): 77-86, https://doi.org/10.1080/0361526X.2015.1036195. 7. Kinney, “The Internet, Public Libraries, and the Digital Divide.” 8. Paul T. Jaeger, John Carlo Bertot, Kim M. Thompson, Sarah M. Katz, and Elizabeth J. DeCoster, “The Intersection of Public Policy and Public Access: Digital Divides, Digital Literacy, Digital Inclusion, and Public Libraries,” Public Library Quarterly 31, no.1 (2012): 1-20, https://doi.org/10.1080/01616846.2012.654728. 9. Brian Real, John Carlo Bertot, and Paul T. Jaeger, “Rural Public Libraries and Digital Inclusion: Issues and Challenges,” Information Technology and Libraries 33, no. 1 (2014): 6-24, https://doi.org/10.6017/ital.v33i1.5141. 10. Jaeger et al., “The Intersection of Public Policy and Public Access.” 11. John Carlo Bertot, Paul T. Jaeger, Lesley A. Langa, Charles R. McClure, “Public access computing and Internet access in public libraries: The role of public libraries in e-government and emergency situations,” First Monday 11, no. 9 (2006), https://doi.org/10.5210/fm.v11i9.1392. 12. John. B Horrigan, “Libraries 2016,” Pew Research Center, September 9. 2016, http://www.pewinternet.org/2016/09/09/libraries-2016/. 13. Real et al., “Rural Public Libraries and Digital Inclusion.” 14. John Carlo Bertot, Charles R. McClure, and Paul T. Jaeger, “The Impacts of Free Public Internet Access on Public Library Patrons and Communities,” Library Quarterly 78, no.3 (2008): 285- 301, https://doi.org/10.1086/588445. 15. Charles R. McClure, Paul T. Jaeger, John Carlo Bertot, “The Looming Infrastructure Plateau? Space, Funding, Connection Speed, and the Ability of Public Libraries to meet the Demand for Free Internet Access,” First Monday 12, no. 12 (2007): https://doi.org/10.5210/fm.v12i12.2017 . 16. Ibid. 17. Bertot et al., “Public access computing and Internet access in public libraries.” 18. Ibid.; Jaeger et al., “The Intersection of Public Policy and Public Access.” 19. Wireline Competition Bureau, “WCB Cost Model Virtual Workshop 2012 - Community Anchor Institutions,” Federal Communications Commission, June 1, 2012, https://www.fcc.gov/news- events/blog/2012/06/01/wcb-cost-model-virtual-workshop-2012-community-anchor- institutions. 20. Jennifer Koerber, "ALA and iPAC Analyze Digital Inclusion Survey," Library Journal 141, no. 1 (2016): 24-26. 21. Horrigan, “Libraries 2016.” TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 45 22. Ibid. 23. Timothy Inklebarger, “Bridging the tech gap,” American Libraries, September 11, 2015, https://americanlibrariesmagazine.org/2015/09/11/bridging-tech-gap-wi-fi-lending. 24. Andrew Stirling, “White spaces – the new Wi-Fi?,” International Journal of Digital Television 1, no. 1 (2010): 69–83, https://doi.org/10.1386/jdtv.1.1.69/1; Cristian Gomez, “TV White Spaces: Managing Spaces or Better Managing Inefficiencies?,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 67-77. 25. Steve Song, “Spectrum and Development,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 35-40. 26. Robert Horvitz, “Geo-Database Management of White Space vs. Open Spectrum,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 7-17. 27. Julie Knapp, “FCC Announces Public Testing of First Television White Spaces Database,” Federal Communications Commission, September 14, 2011, https://www.fcc.gov/news- events/blog/2011/09/14/fcc-announces-public-testing-first-television-white-spaces- database. 28. Horvitz, “Geo-Database Management of White Space vs. Open Spectrum.” 29. Ryszard Strużak and Dariusz Więcek, “Regulatory Issues for TV White Spaces,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 19-34. 30. Horvitz, “Geo-Database Management of White Space vs. Open Spectrum,” 8. 31. Engineering & Technology Bureau, “FCC Adopts Rules For Unlicensed Services In TV And 600 MHz Bands,” Federal Communications Commission, August 11, 2015, https://apps.fcc.gov/edocs_public/attachmatch/FCC-15-99A1_Rcd.pdf. 32. Gomez, “TV White Spaces: Managing Spaces or Better Managing Inefficiencies?,” 68. 33. Stirling, “White spaces – the new Wi-Fi?.” 34. Linda E. Doyle, “Cognitive Radio and Africa,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 109-119. 35. Gomez, “TV White Spaces: Managing Spaces or Better Managing Inefficiencies?,” 72. 36. Mike Jensen, “The role of TV White Spaces and Dynamic Spectrum in helping to improve Internet access in Africa and other Developing Regions,” in TV White Spaces A Pragmatic INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 46 Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 83-89. 37. Song, “Spectrum and Development.” 38. Ibid. 39. Doyle, “Cognitive Radio and Africa,” 113. 40. Stirling, “White spaces – the new Wi-Fi?.” 41. Afton Chavez, Ryan Littman-Quinn, Kagiso Ndlovu, and Carrie L Kovarik, “Using TV white space spectrum to practice telemedicine: A promising technology to enhance broadband Internet connectivity within healthcare facilities in rural regions of developing countries,” Journal of Telemedicine and Telecare 22, no. 4 (2015): 260-263, https://doi.org/10.1177/1357633X15595324. 42. Stirling, “White spaces – the new Wi-Fi?.” 43. Koerber, "ALA and iPAC Analyze Digital Inclusion Survey." 44. Chavez et al., “Using TV white space spectrum to practice telemedicine.” 45. Kerry Ingersoll, June 22, 2015, Google+ comment to the Gigabit Libraries Network, https://plus.google.com/107631107756352079114/posts/L4Y8ci8sG5Y. 46. Delta County Libraries, “Super Wi-Fi Pilot,” accessed November 1, 2016, http://www.deltalibraries.org/super-wi-fi-pilot/. 47. Pascagoula TV White Spaces Facebook group, accessed November 1, 2016, https://www.facebook.com/PSDTVWS/. 48. “Delaware Libraries White Space Pilot Update, January 2015,” accessed November 1, 2016, http://lib.de.us/files/2015/01/Delaware-Libraries-White-Space-Pilot-Update-Jan-2015.pdf. 49. Knapp, “FCC Announces Public Testing of First Television White Spaces Database.” 50. See https://www.google.com/get/spectrumdatabase/. 51. Bertot et al., “Public access computing and Internet access in public libraries.” 52. Bertot et al., “The Impacts of Free Public Internet Access.” See also Horrigan, “Libraries 2016.” 53. Paul T. Jaeger, Lesley A. Langa, Charles R. McClure, and John Carlo Bertot, “The 2004 and 2005 Gulf Coast Hurricanes: Evolving Roles and Lessons Learned for Public Libraries in Disaster Preparedness and Community Services,” Public Library Quarterly 25, 3/4, (2007), 199-214. 54. Ibid. TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 47 55. Bertot et al., “Public access computing and Internet access in public libraries.” 56. Ibid. 57. Paul T. Jaeger, Ben Shneiderman, Kenneth R. Fleischmann , Jennifer Preece, Yan Qu, Philip Fei Wu, “Community response grids: E-government, social networks, and effective emergency management,” Telecommunications Policy 31 (2007): 592-604, https://doi.org/10.1016/j.telpol.2007.07.008. 58. Ibid., 595. 59. Laurie Putnam, “By choice or by chance: How the Internet is used to prepare for, manage, and share information about emergencies,” First Monday 7, no.11 (2002), https://doi.org/10.5210/fm.v7i11.1007. 60. Ibid. 61. Ibid. 62. Jaeger et al., “Community response grids,” 598. Jaegar et al. describe how the Internet combines the best of one-to-one, one-to-many, many-to-one, and many-to-many in terms of the flow and quality of information. One-to-one communication is slow; many-to-one only benefits the central network, while outsiders reporting emergencies do not learn what others are reporting; one-to-many is inefficient, limited, and assumes the broadcaster has the appropriate information and can get it to those that need it most; many-to-many can create “information overload” of questionable content. 63. Ibid., 599. 64. Jaeger et al., “The Intersection of Public Policy and Public Access,” 3. 65. Cohron, “The Continuing Digital Divide in the United States,” 84. 66. Kinney, “The Internet, Public Libraries, and the Digital Divide,” 120. 67. Ibid., 148. 68. Jaeger et al., “Community response grids,” 599. 69. Bertot et al., “The Impacts of Free Public Internet Access,” 299.
9733 ---- Microsoft Word - 9733-16966-4-CE.docx Editorial Board Thoughts: Arts into Science, Technology, Engineering, and Mathematics – STEAM, Creative Abrasion, and the Opportunity in Libraries Today Tod Colegrove INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 4 Over the millennia, man’s attempt to understand the universe has been an evolution from the broad to the sharply focused. A wide range of distinctly separate disciplines evolved from the overarching natural philosophy, the study of nature, of Greco-Roman antiquity: anatomy and astronomy through botany, mathematics, and zoology among many others. Similarly, the Arts, Humanities, and Engineering developed from broad over-arching interest into tightly focused disciplines that today are distinctly separate. As these legitimate divisions formed, grew, and developed into ever-deepening specialty, they enabled correspondingly deeper study and discovery1; in response, the supporting collections of the library divided and grew to reflect that increasing complexity. Libraries have long been about the organization of, and access to, information resources. Subject classification systems in use today, such as the Dewey Decimal system, are designed to group like items with like, albeit under broad overarching topic. A perhaps inevitable result for print collections housed under such a classification system is the physical isolation of items - and, by extension, the individuals researching those topics - from one another. Under the Library of Congress system, for example, items categorized as “geography” are physically removed from those in “science;” further still from “technology.” End-users benefit from the possibility of serendipitous discovery while browsing shelves nearby, even as they are effectively shielded from exposure to distracting topics outside of their immediate focus. Recent years have witnessed a rediscovery of, and renewed interest in, the fundamental role the library can have in the creation of knowledge, learning, and innovation among its members. As collections shift from print to electronic, libraries are increasingly less bound to the physical constraints imposed by their print collections. Rather than a continued focus on hyper- specialization and separation, we have the opportunity to rethink the library: exploring novel configurations and services that might better support its community, and embracing emerging roles of trans-disciplinary collaboration and innovation. The Library as Intersection Libraries reflect the institutional and organizational structures of their communities, even as the Tod Colegrove (pcolegrove@unr.edu), a member of the ITAL Editorial Board, is Head of DeLaMare Science & Engineering Library, University of Nevada, Reno. EDITORIAL BOARD THOUGHTS | COLEGROVE https://doi.org/10.6017/ital.v36i1.9733 5 physical organization of the structures built to house print collections mirror the classification system in use. Academic libraries are perhaps most entrenched in the structural division: rather than intrinsically promoting collaboration and discovery across disciplines, the organization of print collections, and typically the spaces around them, is designed to foster increased focus and specialization. Specialized almost to the exclusion of other areas of study altogether, in branch libraries of a college or university this division can reach a pinnacle; libraries and collections devoted to exclusive topics of engineering, science, music, and others, exist on campuses across the country. Amplified by separation and clustering of faculty and researchers, typically by department and discipline, it becomes entirely possible for individuals to “spend a lifetime working in a particular narrow field and never come into contact with the wider context of his or her study.”2 The library is also one of the few places in any community where individuals from a variety of backgrounds and specialties can naturally cross paths with one another. At a college or university, students and faculty from one discipline might otherwise rarely encounter those from other disciplines. Whether public, school, or academic library, outside of the library individuals and groups are typically isolated from one another physically, with little opportunity to interact organically. Without active intervention and deliberate effort on the part of the library, opportunities for creative abrasion3 and trans-disciplinary collaboration become virtually non- existent; its potential to “unleash the creative potential that is latent in a collection of unlike- minded individuals,”4 untapped. Leveraged properly, however, the intersection of interests and expertise that occurs naturally within the neutral spaces of the library can become a powerful tool that supports not only research, but creativity and innovation - a place where ideas and viewpoints can collide, building on one another: “For most of us, the best chance to innovate lies at the Intersection. Not only do we have a greater chance of finding remarkable idea combinations there, we will also find many more of them.... The explosion of remarkable ideas is what happened in Florence during the Renaissance, and it suggests something very important. If we can just reach an intersection of disciplines or cultures, we will have a greater chance of innovating, simply because there are so many unusual ideas to go around.”5 Difficult and Scary The problem? “Stimulating creative abrasion is difficult and scary because we are far more comfortable being with folks like us.”6 And yet a quick review of the literature reveals that knowledge creation, innovation, and success are inextricably linked7, with the fundamental understanding of their connection having undergone a dramatic shift: “knowledge is in fact essential to innovate, and while this might sound obvious today, putting knowledge and innovation and not physical assets at the centre of competitive advantage was a tremendous change.”8 As our libraries move toward embracing an even more active role within our communities, our organizational priorities are undergoing similarly dramatic shifts: support for knowledge creation INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 6 and innovation becomes more central, even as physical assets shift toward a supporting, even peripheral, role. Libraries, as fundamentally neutral hubs of diverse communities, are uniquely positioned to be able to cultivate creative abrasion within and among their communities, fostering not only knowledge creation, but innovation and success. Indeed, the combination of physical, electronic, and staff assets can be the raw stuff by which trans-disciplinary engagement is encouraged. The active cultivation and support of creative abrasion, with direct linkage to desired outcomes, becomes arguably one of the most vital services the library can provide its community. Rather than deepening the cycle of hyper-specialization, the emergence of makerspace in our libraries is one example of a trend toward enabling libraries to broaden and embrace that support. Building on the intellectual diversity within the spaces of the library, staff members, volunteers, and fellow community members can serve as catalyst, triggering groups to “do something with that variety”9 by engaging across traditional boundaries. Indeed, “by deliberately creating diverse organizations and explicitly helping team members appreciate thinking-styles different than their own, creative abrasion can result in successful innovation.”10 Strategic placement and staff support of makerspace activity can dramatically increase the opportunity for creative abrasion - and, by extension, the resulting knowledge creation, creativity and innovation. Arts Bring a Fundamental Literacy and Resource to STEM In recent years, greater emphasis on students acquiring STEM (Science, Technology, Engineering, and Math) skills has raised the topic to be one of the most central issues in education. Considered a key solution to improving the competitiveness of American students on the global stage, the approach of STEM education shares the common goal of breaking down the artificial barriers that exist even within the separate disciplines of sciences, technology, engineering, and math - in short, increasing the diversity of the learning environment. Proponents of STEAM go further by suggesting that adding Art into the mix can bring new energy and language to the table, “sparking curiosity, experimentation, and the desire to discover the unknown in students.” 11 Federal agencies such as the U.S. Department of Education and the National Science Foundation have funded and underwritten a number of grants, conferences, and workshops in the field, including the seminal forum hosted by the Rhode Island School of Design (RISD), “Bridging STEM to STEAM: Developing New Frameworks for Art-Science-Design pedagogy.”12 John Maeda, the president of the RISD, identifies a direct connection between the approach and the creativity and success of late Apple co-founder Steve Jobs, with STEAM support “a pathway to enhance U.S. Economic competitiveness.”13 Proponents go further, arguing the Arts bring both a fundamental literacy and resource to the STEM disciplines, providing “innovations through analogies, models, skills, structures, techniques, methods, and knowledge.”14 Consider the findings of a study of Nobel Prize winners in the sciences, members of the Royal Society, and the U.S. National Academy of Sciences; Nobel laureates were: EDITORIAL BOARD THOUGHTS | COLEGROVE https://doi.org/10.6017/ital.v36i1.9733 7 - twenty-five times as likely as an average scientist to sing, dance, or act; - seventeen times as likely to be an artist; - twelve times more likely to write poetry and literature; - eight times more likely to do woodworking or some other craft; - four times as likely to be a musician; and - twice as likely to be a photographer.15 From the standpoint of creative abrasion, welcoming the “A” of Art into the library support of STEM disciplines increases the diversity of the library, and by default the opportunity for creative abrasion. From Aristotle and Pythagoras through Galileo Galilei and Leonardo da Vinci to Benjamin Franklin, Richard Feynman, and Noam Chomsky, a long list of individuals of wide- ranging genius hints at a potential left largely untapped by our traditional approach. Connections between STEM disciplines, Art, and the innovation arising directly out of their creative abrasion surround us: the electronic screens used on a wide range of technology, including computers, televisions, and cell phones, are the result of a collaboration between a series of painter-scientists and post-impressionist artists such as Seurat - a combination of red, green, and blue dots generate full-spectrum images in a way not unlike that of the artistic technique of pointillism. The electricity to drive that technology is understood, in part, due to early work by Franklin - even as he lay the foundations of the free public library with the opening of America’s first lending library, and pursued a broad range of parallel interests. The stitches used in medical surgery are the result of Nobel laureate Alexis Carrel taking his knowledge of lace making from a traditional arena into the operating room. Prominent American inventors “Samuel Morse (telegraph) and Robert Fulton (steam ship) were among the most prominent American artists before they turned to inventing.”16 In short, “increasing success in science is accompanied by developed ability in other fields such as the fine arts.”17 Rather than isolated in monastic study, “almost all Nobel laureates in the sciences are actively engaged in arts as adults.”18 Perhaps surprisingly, rather than being rewarded by an ever-increasing focus and hyper-specialization, genius in the sciences seems tied to individuals’ activity in the arts and crafts. The study’s authors cite three different Nobel prize winners, including J. H. Van’t Hoff’s 1878 speculation that scientific imagination is correlated with creative activities outside of science19; going on to detail similar findings from general studies dating back over a century. Of even more seminal interest, the authors point to a similar connection for adolescents/young adults where Milgram and colleagues20 found “having at least one persistent and intellectually stimulating hobby is a better predictor of career success in any discipline than IQ, standardized test scores, or grades.”21 Discussion The connection between individuals holding a multiplicity of interests, trans-disciplinary activity, and success is clear; what is less clear is to what extent we are fostering that connection in our libraries today. The potential is nevertheless tantalizing: a random group of people, thrown together, is not likely to be very creative. By going beyond specialization and wading into the INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 8 deeper waters of supporting and cultivating creative abrasion and avocation among the membership of our libraries, we are fostering success and innovation beyond what might otherwise occur. The decision to catalyze and foster the cross-curricular collaboration that is STEAM22 is squarely in the hands of the library: in the design of its spaces, and in the interactions of the staff of the library with the communities served. We can choose to actively connect and catalyze across traditional boundaries. As the head of a science and engineering library, one of the early adopters of makerspace and actively exploring the possibilities of STEAM engagement for several years, I have time and again witnessed the leaps of insight and creativity brought about by creative abrasion. From across disciplines members are engaging with the resources of the library - and, with our encouragement, one another - in an ever-increasing cycle of knowledge creation, innovation, and success. The impact is particularly dramatic among individuals from strongly differing backgrounds and disciplines: for example, when an engineering student, who considers themselves to be expert with a particular technology, witnesses and interacts with an art student using that same technology and accomplishing something truly unexpected, even seemingly magical. Or when a science student approaching a problem from one perspective realizes a practitioner from a different discipline sees the problem from an entirely different, and yet equally valid, point of view. In each case, it’s as if the worldview of each suddenly melts: shifting and expanding, never to return to its original shape. Transformative experiences become the order of the day, even as the informal environment offers a wealth of opportunity to engage with and connect end-users to the more traditional resources of library. By actively seeking out opportunities to bring art into traditionally STEM-focused activity, and vice-versa, we are deliberately increasing the diversity of the environment. Makerspace services and activities, to the extent they are open and visibly accessible to all, are a natural for the spontaneous development of trans-disciplinary collaboration. Within the spaces of the library, opportunities to connect individuals around shared avocational interest might range from music and spontaneous performance areas to spaces salted with LEGO bricks and jigsaw puzzles; the potential connections between our resources and the members of our communities are as diverse as their interests. Indeed, when a practitioner from one discipline can interact and engage with others from across the STEAM spectrum, the world becomes a richer place – and maybe, just maybe, we can fan the flames of curiosity along the way. REFERENCES 1. Bohm, D., and F. D. Peat. 1987. Science, Order, and Creativity: A Dramatic New Look at the Creative Roots of Science and Life. London: Bantam. 2. Ibid., 18-19. 3. Hirshberg, Jerry. 1998. The Creative Priority: Driving Innovative Business in the Real World. London: Penguin. EDITORIAL BOARD THOUGHTS | COLEGROVE https://doi.org/10.6017/ital.v36i1.9733 9 4. Leonard-Barton, Dorothy, and Walter C. Swap. 1999. When Sparks Fly: Harnessing the Power of Group Creativity. Boston, Massachusetts: Harvard Business School Press Books. 5. Johansson, Frans. 2004. The Medici Effect: Breakthrough Insights at the Intersection of Ideas, Concepts, and Cultures. Boston, Massachusetts: Harvard Business School Press, 20. 6. Leonard-Barton, Dorothy, and Walter C. Swap. 1999. When Sparks Fly: Harnessing the Power of Group Creativity. Boston, Massachusetts: Harvard Business School Press Books, 25. 7. Nonaka, Ikujiro. 1994. “A Dynamic Theory of Organizational Knowledge Creation.” Organization Science 5 (1): 14–37. 8. Correia de Sousa, Milton. 2006. “The Sustainable Innovation Engine.” Vine 36 (4): 398–405, accessed February 14, 2017. https://doi.org/10.1108/03055720610716656. 9. Leonard-Barton, Dorothy, and Walter C. Swap. 1999. When Sparks Fly: Harnessing the Power of Group Creativity. Boston, Massachusetts: Harvard Business School Press Books, 20. 10. Adams, Karlyn. 2005. The Sources of Innovation and Creativity. Education, September, 2005, 33. https://doi.org/10.1007/978-3-8349-9320-5. 11. Jolly, Anne. 2014. “Stem vs. STEAM: Do the Arts Belong?” Education Week Teacher. http://www.edweek.org/tm/articles/2014/11/18/ctq-jolly-stem-vs- steam.html?qs=stem+vs.+steam. 12. Rose, Christopher, and Brian K. Smith. 2011. “Bridging STEM to STEAM: Developing New Frameworks for Art-Science-Design Pedagogy.” Rhode Island School District Press Release. 13. Robelen, Erik W. 2011. “STEAM: Experts Make Case for Adding Arts to STEM.” Education Week. http://www.bmfenterprises.com/aep-arts/wp-content/uploads/2012/02/Ed-Week-STEM- to-STEAM.pdf. 14. Root-Bernstein, Robert. 2011. “The Art of Scientific and Technological Innovations – Art of Science Learning.” http://scienceblogs.com/art_of_science_learning/2011/04/11/the-art-of- scientific-and-tech-1/. 15. Ibid. 16. Ibid. 17. Root-Bernstein, Robert, Lindsay Allen, Leighanna Beach, Ragini Bhadula, Justin Fast, Chelsea Hosey, Benjamin Kremkow, et al. 2008. “Arts Foster Scientific Success: Avocations of Nobel, National Academy, Royal Society, and Sigma Xi Members.” Journal of Psychology of Science and Technology. https://doi.org/10.1891/1939-7054.1.2.51. INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 10 18. Ibid. 19. Van’t Hoff, Jacobus Henricus. 1967. “Imagination in Science,” Molecular Biology, Biochemistry and Biophysics, translated by G. F. Springer, 1, Springer-Verlag, pp. 1-18 20. Milgram, Roberta M., and Eunsook Hong. 1997. "Out-of-school activities in gifted adolescents as a predictor of vocational choice and work." Journal Of Secondary Gifted Education 8, no. 3: 111. Education Research Complete, EBSCOhost (accessed February 26, 2017). 21. Root-Bernstein, Robert, Lindsay Allen, Leighanna Beach, Ragini Bhadula, Justin Fast, Chelsea Hosey, Benjamin Kremkow, et al. 2008. “Arts Foster Scientific Success: Avocations of Nobel, National Academy, Royal Society, and Sigma Xi Members.” Journal of Psychology of Science and Technology. https://doi.org/10.1891/1939-7054.1.2.51. 22. Land, Michelle H. 2013. “Full STEAM Ahead: The Benefits of Integrating the Arts into STEM.” Procedia Computer Science 20. Elsevier Masson SAS: 547–52. https://doi.org/10.1016/j.procs.2013.09.317.
9750 ---- A Technology-Dependent Information Literacy Model within the Confines of a Limited Resources Environment Ibrahim Abunadi INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 119 Ibrahim Abunadi (i.abunadi@gmail.com) is an Assistant Professor, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia. ABSTRACT The purpose of this paper is to investigate information literacy as an increasingly evolving trend in computer education. A quantitative research design was implemented, and a longitudinal case study methodology was conducted to measure tendencies in information literacy skill development and to develop a practical information literacy model. It was found that both students and educators believe that the combination of information literacy with a learning management system is more effective in increasing information literacy and research skills where information resources are limited. Based on the quantitative study, a practical, technology-dependent information literacy model was developed and tested in a case study, resulting in fostering the information literacy skills of students who majored in information systems. These results are especially important in smaller universities with libraries having limited technology capabilities, located in developing countries. INTRODUCTION Many different challenges arise during a graduate’s career. Moreover, professional life can involve numerous situations and problems that university students are not prepared for during their college studies.1 The use of internet sources to find solutions to real problems depends on students’/graduates’ information literacy skills.2 A strong aid to students’ learning is the ability to search, analyze, and apply knowledge from different sources, including literature, databases, and the internet.3 One of the issues students face concerning technology is its continuous evolution. Although students learn survival skills in their professional lives, they also require special coping skills. A skill that should be considered for all technology-related courses is information literacy. Lin defines information literacy as a “set of abilities, skills, competencies, or fluencies, which enable people to access and utilize information resources.”4 These are part of the lifelong learning skills of students, which put the power of continuous education in their hands. Another issue is the exclusive allocation of the responsibility for information literacy skill development in smaller educational institutes to librarians or to instructors who majored in library science.5 This paper has taken another approach to information literacy skill development whereby specialized educators, such as capable information systems faculty members, facilitate this skill development. A learning management system (LMS) is a widely used form of technology for course delivery and the organization of subject material. Blackboard, Desire2Learn, Sakai, Moodle, and ANGEL, as common LMS platforms, provide an integrated guidance system to deliver and analyze learning. mailto:i.abunadi@gmail.com INFORMATION LITERACY MODEL | ABUNADI 120 https://doi.org/10.6017/ital.v37i4.9750 These systems can be used to support information literacy instruction. Standard features include assignments and quizzes, while other systems offer tools that allow students to view and comment on other students’ portfolios or work, depending on the LMS’s features.6 Before the 1990s, face-to- face learning was common within the educational domain. However, LMS emerged in the twenty- first century as the internet became a suitable alternative to traditional learning. Moodle, an open- source LMS, is an acronym that stands for “Modular Object-Oriented Dynamic Learning Environment.” This online education system is intended to make learning available with the necessary guidance for educators. Web services available through Moodle are based on a well- organized structural outline, and they are widely used to perform educational tasks and to analyze statistics helpful to instructors.7 Peter et al. (2015) presented an approach related to information literacy instruction in universities and colleges that combines traditional classroom instruction and online learning; this is known as “blended learning.”8 This involves only one seminar in the classroom; thus, it can replace traditional sessions at universities and colleges with education involving information literacy instructions. It has been recommended that a time-efficient method should be adopted by augmenting classroom seminars and literacy instructions through the addition of online materials. However, the findings of this study showed that students who only use online materials do not show greater progress in their learning than those who follow the blended approach. The results of another study by Jackson more effectively integrated educational services into learning management systems and library resources.9 Jackson suggested that better implementation was required, and recommended using Blackboard LMS to include information literacy and scaffolding activities into subject-specific courses. This study intends to determine the most effective method of information literacy education. It evaluates instructors’ and students’ perceptions of the effectiveness of traditional teaching in comparison to electronic teaching in information literacy. In this study, a quantitative research investigation was conducted with participants. A research model and questionnaire were developed for this purpose with three underlying latent variables. The participants were asked to describe their understanding of learning systems and their preferences in information literacy education. Their requirements varied with their continuing education levels and past educational activities, based on which software or website appeared to be more supportive and compatible with them.10 This study considered the research results, developed an information literacy intervention model and applied it to a case study. LITERATURE REVIEW Previously, educational institutions were limited to face-to-face teaching techniques or classroom- based teaching. Face-to-face teaching is the traditional method still used in most educational institutions. In classrooms, the subject is explained, and books or other paper-based materials are read out of class to enhance understanding.11 Face-to-face learning or teaching is limited by the number of physical resources available. Therefore, it becomes difficult to accommodate the widespread interest in information literacy through face-to-face learning.12 Gathering information using only physical resources can lead to information deficiencies.13 Education has evolved to benefit from advances in technologies by using LMS and online sources. The effective usage of LMS and online sources requires the development of information literacy. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 121 Information Literacy Information literacy includes technological literacy, information ethics, online library skills, and critical literacy.14 Technological literacy is defined as the ability to use common software, hardware, and internet tools to reach a specific goal. This ability is an important component of information literacy that enables a graduate to seek answers by using the internet and digital resources.15 Hauptman defines information ethics as “the production, dissemination, storage, retrieval, security, and application of information within an ethical context.”16 This skill is essential to preserve the original rights of researchers cited in a study, based on the ethical standards of the graduate conducting the study. Another important component of information literacy comprises online library skills, which can be defined as the ability to use online digital sources, including digital libraries, to effectively seek different knowledge resources by using search engines, correctly locating required information, and using online support when needed.17 Critical literacy is a thorough evaluation of online material that allows for the appropriate conclusion to be reached on the suitability of the material for the required investigation. 18 Seeking answers from appropriate sources is important to allow graduates to find and report on accurate and valid data. These components of information literacy enable information extraction from topics related to the desired course or field of research. Students, professors, instructors, employees, learners, and educational policy administrators are the major knowledge seekers who use information literacy skills.19 With improved online resources available for learning, many learning requirements are moving toward providing services that are exclusively online. 20 Gray and Montgomery studied an online information literacy course.21 They found that teaching with the aid of information literacy is helpful for students in obtaining improvised instruction. The authors also compared an online information literacy course and face-to-face instruction, focusing primarily on the behaviors and attitudes of teachers and college students toward the online course. The students agreed that the application of information literacy techniques would be particularly helpful to them in clarifying their understanding of complicated instructions. The teachers also indicated that an information literacy course would result in better regulation of academic processes than face-to-face learning. Dimopoulos et al. (2013) measured student performance within an online learning environment, finding that the online learning environment has direct relevance for the completion of challenging tasks within academic settings.22 The findings further indicated that an LMS could improve teaching activities. As an LMS, Moodle was also helpful for students to ensure their development of collaborative problem-solving skills. They concluded that Moodle includes different useful modules such as digital resource repositories, interactive Wikis, and external add- in tools that have been related to student learning when incorporated into the LMS environment, resulting in better performance. Hernández-García and Conde-González focused on learning analytical tools within engineering education, noting that engineering students are more likely to understand complicated concepts better. Therefore, the application of the information literacy model resulted in better performance.23 Further, educating students about information sources was found to be helpful for the instructors in enhancing the students’ learning by improving their online information retrieval skills. This study indicated that students can develop their learning traits more effectively through online learning than through face-to-face learning. INFORMATION LITERACY MODEL | ABUNADI 122 https://doi.org/10.6017/ital.v37i4.9750 Many researchers in this area have developed models that are only theoretical. 24 However, this paper develops a practical information literacy model that can be tested for improvement in information literacy skills. This is especially relevant for computer and information systems courses, which can sometimes fall outside the purview of library-related training or education in universities with limited resources. The inclusion of information literacy training within computer and information systems courses is not regularly done in the information literacy field. 25 Additionally, although some information literacy has been implemented practically in research, no other study has developed a practical information literacy model based on educators’ and students’ information literacy dispositions as well as both information literacy theory and practice.26 Moodle as an LMS Moodle is a useful and accommodating open-source platform with a stable structure of website services that allows instructors and learners to implement a range of helpful plugins. It can be used as a lively online education community and an enhancement to the face-to-face learning process.27 Moodle is used in around 190 countries and offers its services in over seventy languages. It acts as a mediator for instruction and is widely adopted in many institutions. Moodle provides services such as assignments, wikis, messaging, blogs, quizzes, and databases. 28 It can provide a more flexible teaching platform than traditional teaching. Health science educational service providers facilitate self-assurance in their learners. Several educational campuses operate by using face-to-face learning strategies, whereby learners obtain their training on-campus locations. The objective of Moodle is to enable the education of learners through internet access.29 Xing focused on the broad application of the Moodle LMS for developing educational technology within academic settings, suggesting that academic organizations should promote technology as a solution for common problems with students’ learning processes.30 Such suggestions have been supported by Costa et al. (2012) who found that Moodle is significantly helpful for developing an e-learning platform for students. They emphasized that engineering universities must use the Moodle LMS to provide students with extensive technical knowledge. 31 Costello et al. (2012) stated that Moodle, if used, will significantly help students improve their skills and knowledge effectively.32 METHODOLOGY In information literacy skill development, there are studies that support using only face-to-face education or only using an LMS. For example, Churkovich and Oughtred found that face-to-face learning leads to better results in information literacy tutorials than online learning. 33 At the same time, Anderson and May concluded that the use of an LMS is viewed by students as a better method than face-to-face instruction in information literacy.34 To test which educational pedagogy (traditional or technology) is better regarding information literacy, the following two hypotheses were posited: H1: Face-to-face learning has a significantly positive influence on information literacy disposition. H2: Moodle learning has a significantly positive influence on information literacy disposition. To provide a better understanding of the most effective method of information literacy instruction, a quantitative research design was used. The wording of the questionnaire items (shown in table 1) was inspired by the studies of Ng, Horvat et al., Abdullah, and Deng and Tavares. 35 Online INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 123 questionnaires were prepared and distributed to students, teachers, trainers, and professors as well as administrative departments in a small private university located in the Arabian Gulf region. Initially, a pilot study was conducted to test the instrument. This pilot study involved forty-nine participants and fifteen questions on information literacy. It also included demographic questions. Variables Code Item Wording Face-to-face Education Disposition (FED) FED1 Information literacy skills are polished through face-to-face learning FED2 Face-to-face learning accommodates information literacy requirements FED3 Face-to-face learning is easier than learning management systems FED4 Face-to-face learning is better than learning management systems Moodle Usage Disposition (MUD) MUD1 Moodle is more easily accessible than other online resources MUD2 Moodle is an effective web server for information literacy MUD3 Moodle is more reliable than other online resources MUD4 Moodle enables the provision of an extensive amount of useful information MUD5 Moodle is used to overcome language, understanding, and communication gaps Information Literacy Preference (IL) IL1 Students and teachers prefer online resources IL2 Inauthentic websites are helpful for students and teachers IL3 Authentic websites are useful for students and teachers IL4 Students and teachers prefer published articles, journals, and books IL5 Online learning is more effective IL6 Information is essential for individuals’ knowledge Table 1. Item coding. After the pilot study, a full-scale study was conducted, in which the participants were students, professors, and educational administrators. An online questionnaire was sent to the management of an academic institution in the Arabian Gulf region to assess the instruction methodology to improve students’ information literacy skills. The language used in the survey was Arabic, and the questionnaire was translated into English for this article by a professional translator. A total of five hundred questionnaires were sent, and 398 of them were received with complete responses. The following criteria were used to filter questionnaires that were not appropriate for this study: Inclusion Criteria • People currently involved in the education system. • Students, teachers, or members of an academic department. • People who understand information literacy. A question was added in the survey about whether the participant was familiar with information literacy; if not, the participant was removed from the sample. Exclusion Criteria • People who were not involved in the education system. • People who were not aware of online learning systems. • Staff with no role in learning or teaching. INFORMATION LITERACY MODEL | ABUNADI 124 https://doi.org/10.6017/ital.v37i4.9750 Gender Frequency Percent Male 186 46.73 Female 212 53.27 Total 398 100 Qualification Frequency Percent Undergraduate 181 45.48 Graduate 98 24.62 Masters 119 29.90 Total 398 100 Designation Frequency Percent Student 216 54.27 Instructor 90 22.61 Administrator 92 23.12 Total 398 100 Table 2. Demographic information. Question Agree Neutral Disagree Don’t Know Face-to-face Education Disposition (FED) FED1 46.8 22.8 21.3 9.1 FED2 10 74.5 14.2 1.3 FED3 1.5 12.8 75.8 9.9 FED4 32 30 26 12 Information Literacy Preference (IL) IL1 38.8 21.3 1.5 38.4 IL2 0.3 1 98.7 -- IL3 15 31 53.3 -- IL4 49.5 30 13.0 7.5 IL5 48 29.8 -- 22.2 IL6 74 11.5 1.8 12.7 Table 3. Questionnaire response distribution for FED and IL. Question Yes No Moodle Usage Disposition (MUD) MUD1 65 35 MUD2 73.3 26.8 MUD3 67 33 MUD4 66 34 MUD5 63.7 36.3 Table 4. Responses to MUD. The reliability statistics showed a high level of consistency for the pilot test because the Cronbach’s alpha for the fifteen items was 0.901, which is above the recommended level of 0.7.36 Cronbach’s alpha is a widely used coefficient measuring the internal consistency of items as a unified group.37 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 125 Based on the successful pilot study, a full-scale study was conducted. The demographic distribution for the full-scale study is shown in table 2 along with the mean and standard deviation of each demographic factor. The distribution of the questionnaire items for the full-scale study is shown in tables 3 and 4. Cronbach’s alpha was used to determine the reliability of the constructed items for the full-scale study. The standard benchmark for the reliability value is a 0.7 threshold; however, the Cronbach’s alpha for all constructed items was above the 0.7 standard value. Thus, this standard score revealed that all the items had appropriate and adequate reliability.38 RESULTS The research hypotheses were tested using structural equation modeling (SEM) with the analysis of momentum structures (AMOS) approach. SEM includes various statistical methods and computer algorithms that are used to assess latent variables along with observed variables. SEM also indicates the relationships among latent variables, showing the effects of the independent variables on the dependent variables.39 One well-regarded SEM methodology is AMOS, which is a multivariate technique that can concurrently assess the relationships between latent variables and their corresponding indicators (measurement model), as well as the relationships among the model’s variables.40 Highly cited information systems and statistics guidelines were followed for the SEM to ensure the validity and reliability of data analysis. 41 Measurement and Structural Model The measurement model contained fifteen items for ascertaining the representation of three latent variables, including face-to-face education disposition, Moodle usage disposition, and information literacy preferences. Before we proceed to this analysis, the data need to show normality for us to be able to trust the robustness of this parametric SEM. Curran et al. suggested a skewness and kurtosis less than the absolute value of 2 and 7, respectively, to display the normality of the data.42 All items’ absolute values of skewness and kurtosis were less than the suggested cut off, showing a suitable level of normality for conducting SEM analysis. The overall measurement model showed a high level for the fit indices: GFI=0.99, AGFI=0.98, NFI=0.98, CMIN/DF=0.86, and RMR=0.39. All these fit indices show that the theoretical model fits well with the empirical data if they are above 0.95, except CMIN/DF and RMR, which do not follow this cut off. The CMIN/DF should be less than 3, while the RMR should be less than 0.5.43 Table 5 shows all the items loaded on their corresponding latent variables higher than the suggested cut off (0.5). As shown in the table, IL6 was the only item that did not load clearly on its latent variable and, thus, it was dropped from further analysis.44 An additional method to assess item loading was item loading significance, which was significant at the level of 0.001, indicating that all items loaded on their latent variables.45 The indices of the measurement model suggested that the psychometric properties of this instrument can be preceded by the structural model. INFORMATION LITERACY MODEL | ABUNADI 126 https://doi.org/10.6017/ital.v37i4.9750 Item Estimate Face-to-face Education Disposition (FED) FED4 0.71 FED3 0.52 FED2 0.66 FED1 0.89 Moodle Usage Disposition (MUD) MUD5 0.93 MUD4 0.92 MUD3 0.92 MUD2 0.73 MUD1 0.93 Information Literacy Preference Disposition (IL) IL6 0.32 IL5 0.91 IL4 0.72 IL3 0.86 IL2 0.81 IL1 0.83 Table 5. Item loadings. The next step was to assess the structural model, which was used to evaluate the hypothesized relations between the dependent variables (face-to-face education disposition [FED] and Moodle usage disposition [MUD]) and independent variables (IL). Both education methods were tested in the hypotheses to identify the most suitable information literacy delivery mode for students. Both hypotheses were supported, which indicates that an individual method of information literacy delivery (either face-to-face instruction or LMS) is not preferred, and a different model can be suggested. Both hypotheses were supported at the level of 0.001 with an effect size for face-to-face education disposition of 0.32, which indicates a medium impact on information literacy preferences. Meanwhile, the Moodle usage disposition had an effect size of 0.70, which is considered a large effect size (Hair et al. 2010). Finally, the model’s explanatory power of information literacy preferences was determined by R2, which was high (0.85). Based on the previous analysis, it can be said that an individual method of information literacy delivery is insufficient in developing countries. Thus, a different model for information literacy was developed (figure 1), which had an impact on students’ related competencies. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 127 Figure 1. Information Literacy Intervention Model. As shown in figure 1, the model includes conducting weekly information literacy sessions that focus on educating students about technological literacy, information ethics, online library skills, and critical literacy. After each session is concluded, the instructor creates weekly assignments using an LMS that tests the students’ information literacy abilities regarding the subject material. The instructor follows up regarding the students’ overall performance and fills any identified gaps in subsequent information literacy sessions and assignments. The instructor studies the students’ performance after one month and provides feedback to students. Finally, a “real case project assignment” is used to teach students to solve real problems using the skills they learn. The instructor can further extend reflection on the process of assigning “real case project” grades by creating a course exit survey that asks students about their acquired level of information literacy skills. LONGITUDINAL CASE STUDY A small technical university in the Arabian Gulf region faces difficulties in providing adequate library resources to its students because of its limited capabilities. The university has about 4,500 students and five hundred employees. The university library and information technology department shortage in adequate staff and resources, resulting in an insufficient support for student learning. This has caused lack of student information literacy education, which is evident in the submission of student assignments. For example, students are not accustomed to citing materials that were used in their assessments. Thus, these undergraduates are viewed suspiciously by their educators when using online materials. Not knowing how to paraphrase then cite relevant online materials causes missing learning opportunities for students. Information literacy is a skill that should be considered for all technology-related courses.46 The outcomes of this course will be used to improve the education of students and place the power of learning in their hands.47 Therefore, the objective of this case study is to determine the influence of information literacy practices in improving student performance in solving organizational problems, especially when technology and library resources are scarce. This longitudinal case INFORMATION LITERACY MODEL | ABUNADI 128 https://doi.org/10.6017/ital.v37i4.9750 study was conducted in two semesters: the first was conducted traditionally without the use of an information literacy intervention model, whereas in the second semester, the intervention model was introduced. Finally, the performance and opinions of students for the two semesters were compared using a case study assignment and course exit survey. The information literacy intervention model was implemented by providing a series of practical tutorials at the beginning of the semester showing students how to use information from the internet. Then, the students applied the information and used information literacy skills to solve weekly assessments for an enterprise-architecture (EA) course. This course is taught under the information systems program at a private university. Students enrolling in the course are in their second year or higher. The information literacy assessments require students to search for reliable sources of information and cite and reference them. This forces them into the habit of critically examining sources of information, and grasping, analyzing, and using these sources to solve problems. The information literacy technology pedagogical method was followed to improve students’ knowledge of methods of learning.48 The students were educated through a series of classes on how to use the university’s databases, e-books, and internet resources to solve real-life organizational problems and to apply concepts in different situations, as shown in figure 1. The students were given ten small assessments from the Moodle LMS, where a concept taught in the class needed to be applied after students searched for it and learned more about it from different sources. This included looking in the correct places for reliable resources, online scholarly databases, and online videos that could be of use. Then, students were taught how to critically examine resources and determine which of these could be reliable. For example, students were shown that highly cited papers were more reliable than less cited papers and that online videos from professional organizations (e.g., IBM or Gartner) were more reliable than personal videos. Students were also taught how to use in-text citations and how to create reference lists. In the last quarter of the semester, a case study assignment was provided with real-life problems that students were required to solve using different sources, including the internet. The performance of semester-1 students (no intervention was conducted) was compared with that of semester-2 students (information literacy intervention was conducted) taking the same course. An improvement in grades was considered a successful indicator. The comparison point was a major project that required students to solve real-life organizational problems and required greater information literacy. Some of the EA concepts taught in the class required practice to apply them. For example, the as-is organizational modeling that is needed before implementing EA would be difficult to understand unless students actually conducted modeling on selected organizations. This enabled students to understand how they related to the real world. The concepts that were focused on were related to business tools in information systems (e.g., business process management and requirement elicitation) that are widely used for analysis within organizations. The theory behind these tools was explained in class; applying these theories required students to search many sources of information, including online books and research databases. Students were unaware of these resources until the instructors explained their availability on the internet and in the library. The students were provided with regular information literacy sessions to improve their skills in this aspect. They were shown how to search; for instance, if they could not find a specific term, they INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 129 could look for synonyms. They were instructed on how to use search engines and research databases and were shown the relevant electronic journals and books that can aid in solving weekly assessments. The usage of internet multimedia is also important in education.49 The students were shown relevant YouTube channels (e.g., by Harvard and Khan Academy) and relevant massive online open courses (e.g., free courses on Coursera.com and Udemy.com). Weekly tests required students to use these resources to solve the assessment problems. An important outcome of this intervention was an improvement in students’ abilities to use different digital resources. This was evident in semester-2 students’ usage of suitable reference lists and in-text citations, as compared to a lack of such usage by semester-1 students. An additional measure was the higher average score the students indicated in semester 2 (4.15/5), in comparison to semester 1 (3.2/5), for one of the items in the course exit survey relevant to information literacy: “Illustrate responsibility for one’s own learning.” The students were continually taught that information literacy grants a power that comes with responsibility, and no incidents of plagiarism were reported during the semester in which the intervention was conducted. Referencing became a habit with weekly information literacy assessments. The students’ grades in the final project were better than in the previous academic semester. The average grade for the project for semester 1 was 15.5/20, while that for semester 2 was 17/20. The difference between the grades for semester-1 and semester-2 projects was statistically significant at the level of 0.10, indicating significant differences in the students’ grades between the two semesters. The students could use digital library databases, and some were interested in using external online books. It became habitual for students to use in-text citations, and their references became diversified. Some students, however, struggled at times with the limited usage of suitable references in only some paragraphs. This feedback was delivered to students so that they could address this issue in other courses. DISCUSSION AND CONCLUSION This study was conducted to investigate the most effective mode of information literacy delivery. The study focused on smaller universities because they do not have adequate library facilities and technological capabilities to provide students with sufficient information literacy competencies during course delivery. A survey was conducted to determine the most suitable form of information literacy delivery. The survey determined that Moodle and face-to-face methods were both favored regarding information literacy. Thus, the information literacy intervention model was developed and tested in a case study, so that students’ performance would improve. The results of this study have shown that the combination of technology and information literacy instruction is an effective method to improve student skills in using digital resources in seeking knowledge. It was found that both face-to-face learning and the use of an LMS increase student performance in assessments that require information literacy. Face-to-face learning is required in order to explain information literacy concepts, while the LMS is used to disseminate the necessary digital resources and in creating assessment modules. Thus, the arrangement of both theory and practice in information literacy resulted in better understanding and implementation in knowledge seeking and problem-solving related to information systems. The inclusion of information literacy instruction along with the use of LMS for information literacy assessments within information systems courses has reduced the pressure on libraries that lack technological resources (such as PCs) and qualified staff. INFORMATION LITERACY MODEL | ABUNADI 130 https://doi.org/10.6017/ital.v37i4.9750 The results with regard to this study’s hypotheses are in agreement with those of previous studies.50 Hypothesis 1, which posited that there would be a positive significant influence on information literacy disposition, is congruent with the research of Churkovich and Oughtred. 51 Their research focused on student information literacy skill development using library facilities instead of faculty, which is a different approach than the approach followed in the present study. However, both the present study and the study of Churkovich and Oughtred found that using face- to-face instruction leads to improved student performance. Hypothesis 2, which posited a positive impact on information literacy disposition, correlates with the research of Anderson and May.52 They found that using an LMS is more effective than face-to-face instruction for information literacy instruction. Similar to Churkovich and Oughtred (and in contrast to the present study), Anderson and May relied on librarians to deliver information literacy instruction online. However, Anderson and May also relied on faculty staff in addition to librarians. There are two noteworthy outcomes of the first study. First, the questionnaire measurement model showed that the development of this instrument was successful and that the items and their latent variables can be used in further studies. Second, the results regarding the structural model indicated that both face-to-face instruction and Moodle use influenced information literacy preferences. Other studies have supported these results. The results of Peter et al. (2015) agree with the finding of the present study that the combination of face-to-face instruction and LMS use leads to improved student performance.53 Peter et al. (2015), based on psychology students, focused on the time-efficiency of the delivery of information literacy instruction; in contrast, the present study considers information literacy skill development as a progressive, long-term process. The information literacy intervention model is not only a learning medium but an interactive method of teaching that adapts to student learning patterns. The primary limitations of the study were the nature of the sample, the exclusion of some potentially relevant variables, and the simplification of the study’s findings. The sample was limited to students, professors, and people who were aware of the learning programs; it is highly possible that they were more familiar with such technological innovations than the general population. Future studies could retest the hypothesis of the study in a comprehensive manner and impose more control on the respondents. The interaction between people while visiting a site is itself an activity worthy of examination, but it must be either controlled or measured for us to understand the role it plays in shaping attitudes and behaviors. Future studies can apply the developed theoretical model in different settings to determine its interaction with other variables in the information systems field. A quantitative instrument can be developed based on the information literacy intervention model. Alternatively, this model can be applied with qualitative interviews in future studies to develop theoretical themes based on instructors’ and students’ responses. REFERENCES 1 Harry M. Kibirige and Lisa DePalo, “The Internet as a Source of Academic Research Information: Findings of Two Pilot Studies,” Information Technology and Libraries 19, no. 1 (2000): 11–15; Debbie Folaron, A Discipline Coming of Age in the Digital Age (Philadelphia: John Benjamins, 2006); N. N. Edzan, “Tracing Information Literacy of Computer Science Undergraduates: A Content Analysis of Students’ Academic Exercise,” Malaysian Journal of Library & Information Science 12, no. 1 (2007): 97–109. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 131 2 Heinz Bonfadelli, “The Internet and Knowledge Gaps,” European Journal of Communication 17, no. 1 (2002): 65–84, http://journals.sagepub.com/doi/abs/10.1177/0267323102017001607; Kibirige and DePalo, “The Internet as a Source of Academic Research Information,” 11–15. 3 Laurie A. Henry, “Searching for an Answer: The Critical Role of New Literacies While Reading on the Internet,” The Reading Teacher 59, no. 7 (2006): 614–27. 4 Peyina Lin, “Information Literacy Barriers: Language Use and Social Structure,” Library Hi Tech 28, no. 4 (2010): 548–68, https://doi.org/10.1108/07378831011096222. 5 Michael R. Hearn, “Embedding a Librarian in the Classroom: An Intensive Information Literacy Model,” Reference Services Review 33, no. 2 (2005): 219–27. 6 Hui Hui Chen et al., “An Analysis of Moodle in Engineering Education: The Tam Perspective” (paper presented at Teaching, Assessment and Learning for Engineering (TALE), 2012 IEEE International Conference on). 7 N. N. Edzan, “Tracing Information Literacy of Computer Science Undergraduates: A Content Analysis of Students' Academic Exercise,” Malaysian Journal of Library & Information Science 12, no. 1 (2007): 97–109. 8 Johannes Peter et al., “Making Information Literacy Instruction More Efficient by Providing Individual Feedback,” Studies in Higher Education (2015): 1–16, https://doi.org/10.1080/03075079.2015.1079607. 9 Pamela Alexondra Jackson, “Integrating Information Literacy into Blackboard: Building Campus Partnerships for Successful Student Learning,” The Journal of Academic Librarianship 33, no. 4 (2007): 454–61, https://doi.org/10.1016/j.acalib.2007.03.010. 10 Manal Abdulaziz Abdullah, “Learning Style Classification Based on Student's Behavior in Moodle Learning Management System,” Transactions on Machine Learning and Artificial Intelligence 3, no. 1 (2015): 28. 11 Catherine J. Gray and Molly Montgomery, “Teaching an Online Information Literacy Course: Is It Equivalent to Face-to-Face Instruction?,” Journal of Library & Information Services in Distance Learning 8, no. 3–4 (2014): 301–9, https://doi.org/10.1080/1533290X.2014.945876. 12 William Sugar, Trey Martindale, and Frank E Crawley, “One Professor’s Face-to-Face Teaching Strategies While Becoming an Online Instructor,” Quarterly Review of Distance Education 8, no. 4 (2007): 365–85. 13 Stephann Makri et al., “A Library or Just Another Information Resource? A Case Study of Users’ Mental Models of Traditional and Digital Libraries,” Journal of the Association for Information Science and Technology 58, no. 3 (2007): 433–45. 14 Christine Susan Bruce, “Workplace Experiences of Information Literacy,” International Journal of Information Management 19, no. 1 (1999): 33–47, Michael B Eisenberg, Carrie A Lowe, and Kathleen L Spitzer, Information Literacy: Essential Skills for the Information Age, (Westport, CT: Greenwood Publishing Group, 2004), https://doi.org/10.1016/S0268-4012(98)00045-0. INFORMATION LITERACY MODEL | ABUNADI 132 https://doi.org/10.6017/ital.v37i4.9750 15 Andy Carvin, “More Than Just Access: Fitting Literacy and Content into the Digital Divide Equation,” Educause Review 35, no. 6 (2000): 38–47. 16 Robert Hauptman, Ethics and Librarianship (Jefferson, NC: McFarland, 2002). 17 JaNae Kinikin and Keith Hench, “Poster Presentations as an Assessment Tool in a Third/College Level Information Literacy Course: An Effective Method of Measuring Student Understanding of Library Research Skills,” Journal of Information Literacy 6, no. 2 (2012), https://doi.org/10.11645/6.2.1698; Stuart Palmer and Barry Tucker, “Planning, Delivery and Evaluation of Information Literacy Training for Engineering and Technology Students, ” Australian Academic & Research Libraries 35, no. 1 (2004): 16–34, https://doi.org/10.1080/00048623.2004.10755254. 18 Lauren Smith, “Towards a Model of Critical Information Literacy Instruction for the Development of Political Agency,” Journal of Information Literacy 7, no. 2 (2013): 15–32, https://doi.org/10.11645/7.2.1809. 19 Melissa Gross and Don Latham, “What’s Skill Got to Do with It?: Information Literacy Skills and Self‐Views of Ability among First‐Year College Students,” Journal of the American Society for Information Science and Technology 63, no. 3 (2012): 574–83, https://doi.org/10.1002/asi.21681. 20 Bala Haruna et al., “Modelling Web-Based Library Service Quality and User Loyalty in the Context of a Developing Country,” The Electronic Library 35, no. 3 (2017): 507–19, https://doi.org/10.1108/EL-10-2015-0211. 21 Catherine J. Gray and Molly Montgomery, “Teaching an Online Information Literacy Course: Is It Equivalent to Face-to-Face Instruction?,” Journal of Library & Information Services in Distance Learning 8, no. 3–4 (2014): 301–9, https://doi.org/10.1080/1533290X.2014.945876. 22 Ioannis Dimopoulos et al., “Using Learning Analytics in Moodle for Assessing Students’ Performance” (paper presented at the 2nd Moodle Research Conference Sousse, Tunisia, 4 –6, 2013). 23 Ángel Hernández-García and Miguel Á. Conde-González, “Using Learning Analytics Tools in Engineering Education” (paper presented at LASI Spain, Bilbao, 2016). 24 Michael R. Hearn, “Embedding a Librarian in the Classroom: An Intensive Information Literacy Model,” Reference Services Review 33, no. 2 (2005): 219–27, https://doi.org/10.1108/00907320510597426; Thomas P Mackey and Trudi E Jacobson, “Reframing Information Literacy as a Metaliteracy,” College & Research Libraries 72, no. 1 (2011): 62–78; S. Serap Kurbanoglu, Buket Akkoyunlu, and Aysun Umay, “Developing the Information Literacy Self-Efficacy Scale,” Journal of Documentation 62, no. 6 (2006): 730–43, https://doi.org/10.1108/00220410610714949. 25 Michelle Holschuh Simmons, “Librarians as Disciplinary Discourse Mediators: Using Genre Theory to Move toward Critical Information Literacy,” portal: Libraries and the Academy 5, no. 3 (2005): 297–311, https://doi.org/10.1353/pla.2005.0041; Sharon Markless and David R. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 133 Streatfield, “Three Decades of Information Literacy: Redefining the Parameters,” Change and Challenge: Information Literacy for the 21st Century (Blackwood, South Australia: Auslib Press, 2007): 15–36; Meg Raven and Denyse Rodrigues, “A Course of Our Own: Taking an Information Literacy Credit Course from Inception to Reality,” Partnership: The Canadian Journal of Library and Information Practice and Research 12, no. 1 (2017), https://doi.org/10.21083/partnership.v12i1.3907. 26 Joanne Munn and Jann Small, “What Is the Best Way to Develop Information Literacy and Academic Skills of First Year Health Science Students? A Systematic Review,” Evidence Based Library and Information Practice 12, no. 3 (2017): 56–94, https://doi.org/10.18438/B8QS9M; Sheila Corrall, “Crossing the Threshold: Reflective Practice in Information Literacy Development,” Journal of Information Literacy 11, no. 1 (2017): 23–53, https://doi.org/10.11645/11.1.2241. 27 Liping Deng and Nicole Judith Tavares, “From Moodle to Facebook: Exploring Students’ Motivation and Experiences in Online Communities,” Computers & Education 68 (2013): 167– 76, https://doi.org/10.1016/j.compedu.2013.04.028. 28 Ana Horvat et al., “Student Perception of Moodle Learning Management System: A Satisfaction and Significance Analysis,” Interactive Learning Environments 23, no. 4 (2015): 515–27, https://doi.org/10.1080/10494820.2013.788033. 29 Cary Roseth, Mete Akcaoglu, and Andrea Zellner, “Blending Synchronous Face-to-Face and Computer-Supported Cooperative Learning in a Hybrid Doctoral Seminar,” TechTrends 57, no. 3 (2013): 54–59, https://doi.org/10.1007/s11528-013-0663-z. 30 Ruonan Xing, “Practical Teaching Platform Construction Based on Moodle—Taking ‘Education Technology Project Practice’ as an Example,” Communications and Network 5, no. 3 (2013): 631, https://doi.org/10.4236/cn.2013.53B2113. 31 Carolina Costa, Helena Alvelos, and Leonor Teixeira, “The Use of Moodle E-Learning Platform: A Study in a Portuguese University,” Procedia Technology 5 (2012): 334–43, https://doi.org/10.1016/j.protcy.2012.09.037. 32 Eamon Costello, “Opening Up to Open Source: Looking at How Moodle Was Adopted in Higher Education,” Open Learning: The Journal of Open, Distance and e-Learning 28, no. 3 (2013): 187– 200, https://doi.org/10.1080/02680513.2013.856289. 33 Marion Churkovich and Christine Oughtred, “Can an Online Tutorial Pass the Test for Library Instruction? An Evaluation and Comparison of Library Skills Instruction Methods for First Year Students at Deakin University,” Australian Academic & Research Libraries 33, no. 1 (2002): 25– 38, https://doi.org/10.1080/00048623.2002.10755177. 34 Karen Anderson and Frances A. May, “Does the Method of Instruction Matter? An Experimental Examination of Information Literacy Instruction in the Online, Blended, and Face-to-Face Classrooms,” The Journal of Academic Librarianship 36, no. 6 (2010): 495–500, https://doi.org/10.1016/j.acalib.2010.08.005. INFORMATION LITERACY MODEL | ABUNADI 134 https://doi.org/10.6017/ital.v37i4.9750 35 Wan Ng, “Can We Teach Digital Natives Digital Literacy?,” Computers & Education 59, no. 3 (2012): 1065–78, https://doi.org/10.1016/j.compedu.2012.04.016; Horvat et al., “Student Perception of Moodle Learning Management System,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033; Manal Abdulaziz Abdullah, “Learning Style Classification Based on Student's Behavior in Moodle Learning Management System,” Transactions on Machine Learning and Artificial Intelligence 3, no. 1 (2015): 28; Liping Deng and Nicole Judith Tavares, “From Moodle to Facebook: Exploring Students’ Motivation and Experiences in Online Communities,” Computers & Education 68 (2013): 167–76, https://doi.org/10.1016/j.compedu.2013.04.028. 36 J. F. Hair, William C. Black, and Barry J. Babin, Multivariate Data Analysis: A Global Perspective, 7th ed. (Upper Saddle River, NJ: Pearson, 2010). 37 L. J. Cronbach, “Test Validation,” in Educational Measurement, R. L. Thorndike 2nd ed. (Washington, DC: American Council on Education, 1971). 38 B. Tabachnick and L. Fidell, Using Multivariate Statistics, 5th ed. (New York: Allyn and Bacon, 2007). 39 Hair, Black, and Babin, Multivariate Data Analysis. 40 B. M. Byrne, Structural Equation Modeling with Amos: Basic Concepts, Applications, and Programming, 2nd ed. (New York: Taylor & Francis Group, 2010); Hair, Black, and Babin, Multivariate Data Analysis. 41 T. A. Brown, Confirmatory Factor Analysis for Applied Research (Methodology in the Social Sciences) (New York: Guilford, 2006); Byrne, Structural Equation Modeling with Amos; D. Gefen, D. Straub, and M. Boudreau, “Structural Equation Modeling and Regression: Guidelines for Research Practice,” Communications of the Association for Information Systems 4, no. 7 (2000): 1–77; Hair, Multivariate Data Analysis: A Global Perspective. 42 P. J. Curran, S. G. West, and J. F. Finch, “The Robustness of Test Statistics to Nonnormality and Specification Error in Confirmatory Factor Analysis,” Psychological Methods 1, no. 1 (1996): 16–29, https://doi.org/10.1037/1082-989X.1.1.16. 43 Byrne, Structural Equation Modeling with Amos. 44 Brown, Confirmatory Factor Analysis for Applied Research; Byrne, Structural Equation Modeling with Amos. 45 Hair, Black, and Babin, Multivariate Data Analysis: A Global Perspective. 46 Michael B. Eisenberg, Carrie A. Lowe, and Kathleen L. Spitzer, Information Literacy: Essential Skills for the Information Age (Westport, CT: Greenwood Publishing Group, 2004). 47 James Elmborg, “Critical Information Literacy: Implications for Instructional Practice,” The Journal of Academic Librarianship 32, no. 2 (2006): 192–99, https://doi.org/10.1016/j.acalib.2005.12.004. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 135 48 Ibid. 49 Anderson and May, “Does the Method of Instruction Matter?,” 495–500; Horvat et al., “Student Perception of Moodle Learning Management System,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033. 50 Horvat et al., “Student Perception of Moodle Learning Management System,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033; Anderson and May, “Does the Method of Instruction Matter?,” 495–500; Raven and Rodrigues, “A Course of Our Own.” 51 Churkovich and Oughtred, “Can an Online Tutorial Pass the Test for Library Instruction?,” 25–38. 52 Anderson and May, “Does the Method of Instruction Matter?,” 495–500. 53 Peter et al., “Making Information Literacy Instruction More Efficient,” 1–16. ABSTRACT INTRODUCTION LITERATURE REVIEW Information Literacy Moodle as an LMS METHODOLOGY Inclusion Criteria Exclusion Criteria Results Measurement and Structural Model longitudinal CASE STUDY DISCUSSION AND CONCLUSION REFERENCES
9808 ---- Microsoft Word - December_ITAL_fifarek_final.docx President’s Message: For The Record Aimee Fifarek INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 1 For a long time, I’ve have an idea that when a new President of the United States is elected, sometime after he's sworn in, amid all of the briefings, a wizened old man sits down with him to have The Talk. In my imagination the messenger is some cross between the Templar Knight from Indiana Jones and the Last Crusade and the International Express man from Neil Gaiman and Terry Pratchett’s Good Omens: officious yet wise. He tells the new President the why of it all, the real reasons why important things have happened in the ways they have, making all the decisions that seemed so wrong now seem inevitable. And probably not for the first time the new President thinks to himself “What have I gotten myself into?” This is clearly reflective of my desire for there to be, if not a reason for everything that happens, then at least some record of it all that can be reviewed, synthesized, and mined for meaning by future leaders. It’s the Librarian in me I suppose. Although being LITA President bears absolutely no resemblance to being President of the United States, I have been thinking about this little imagining of mine a lot lately. This is probably because, now that I am midway through my Presidential cycle (Vice President, President, Past President), I realize how much of what I’ve done has been marked by the absence of such a record. I did not receive a “How to be LITA President” manual along with my gavel, and no one gave me the LITA version of The Talk. The one person who could have done it, LITA Executive Director Jenny Levine, was as new to her position as I was to mine, so we have learned together and asked many questions of those around us with more experience. We are in the midst of Election season, and will soon have a new President-Elect. Bohyun Kim and David Lee King are both excellent candidates (http://litablog.org/2017/01/meet-your- candidates-for-the-2017-lita-election/); those of you who have not yet voted have a difficult choice. In order to make a little progress toward developing that how-to guide I thought I’d document a few of things I’ve learned since being LITA President. Being LITA President also means being President of a Division of the American Library Association. When I was elected I expected to manage the business of the Library Information Technology Association—Board meetings, Committee Appointments, Presidential Programs and LITA Forums. Seeing the Board complete the LITA Strategic Plan (http://www.ala.org/lita/about/strategic) was a great accomplishment at this level. While it’s possible for a Division leader to have minimal interactions with “Big ALA” during their Aimee Fifarek (aimee.fifarek@phoenix.gov) is LITA President 2016-17 and Deputy Director for Customer Support, IT and Digital Initiatives at Phoenix Public Library, Phoenix, AZ. PRESIDENT’S MESSAGE | FIFAREK https://doi.org/10.6017/ital.v36i1.9808 2 term and still be successful, my priority for my presidential year—increasing value LITAns receive from membership, especially those who are not able to attend in-person conferences—meant that I needed to learn more about how ALA works. After a year and a half, I have a much better understanding of the Association’s budgeting, publishing, and technology practices, and how all of these are impacted by declines in membership and decreasing revenues. Future LITA leaders are going to need to continue to be engaged at the larger organizational level if we are to be able to use LITA’s technological knowledge and expertise to support ALA’s efforts to maximize efficiency while minimizing costs. Being LITA President means speaking not just to, but for, an incredibly diverse community. My plan when I became LITA President was to blog on a more regular basis. However, I didn’t expect some of my first communications to be about a mass shooting in Dallas (in advance of the Forum in Ft. Worth) or working with the Board to craft a statement on inclusivity after the US presidential election. The proverbial curse “may you live in interesting times” has certainly been true this year. Having to speak to the LITA community about those issues made me acutely aware of my responsibility to adequately represent you when we’ve also been asked to weigh in on technology policy issues at the federal level such as the call for increased gun violence research and rescinding ISP regulations on privacy protection. The decision by the Board to include Advocacy and Information Policy as a primary focus for the strategic plan was certainly prescient. We are fortunate that our President Elect, Andromeda Yelton, is both well-versed in the issues and able to speak eloquently to them1. Being LITA President means being part of more than one team I’m continually amazed at the hard work and dedication that Board members (http://www.ala.org/lita/about/board), Committee and Interest Group Chairs (http://www.ala.org/lita/about/committees/chairs), and anyone who fits into Involvement our Member persona (http://litablog.org/2017/03/who-are-lita-members-lita-personas/). The success of LITA as an organization is entirely due to the time and passion of this team. But when you become LITA President-Elect you get a new team—the other Division Vice Presidents. This cohort travels to ALA HQ in Chicago in October after they are elected to meet each other and the incoming ALA President and learn about the structure of ALA. I have learned much from the other Presidents this year, and we have had a number of truly productive discussions about how the Divisions can collaborate and learn from each other to more effectively serve our members. LITA is directly benefitting from the expertise of the other groups and they are in turn looking to us for both our technical skillset and the successes we’ve had over 50-years as an Association. INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 1017 3 Consider this a new preface to the How to Be LITA President manual. I hope that my successors find it useful, and that it will serve as an inspiration for any LITAns out there who are thinking about putting their name on the ballot in future years. It has been a marvelous and educational experience. And the gavel is pretty cool, too. REFERENCES 1. Making ALA Great Again, Publisher’s Weekly, Feb 17, 2017. http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/72814- making-ala-great-again.html
9817 ---- June_ITA_Pekala_final Privacy and User Experience in 21st Century Library Discovery Shayna Pekala INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 48 ABSTRACT Over the last decade, libraries have taken advantage of emerging technologies to provide new discovery tools to help users find information and resources more efficiently. In the wake of this technological shift in discovery, privacy has become an increasingly prominent and complex issue for libraries. The nature of the web, over which users interact with discovery tools, has substantially diminished the library’s ability to control patron privacy. The emergence of a data economy has led to a new wave of online tracking and surveillance, in which multiple third parties collect and share user data during the discovery process, making it much more difficult, if not impossible, for libraries to protect patron privacy. In addition, users are increasingly starting their searches with web search engines, diminishing the library’s control over privacy even further. While libraries have a legal and ethical responsibility to protect patron privacy, they are simultaneously challenged to meet evolving user needs for discovery. In a world where “search” is synonymous with Google, users increasingly expect their library discovery experience to mimic their experience using web search engines.1 However, web search engines rely on a drastically different set of privacy standards, as they strive to create tailored, personalized search results based on user data. Libraries are seemingly forced to make a choice between delivering the discovery experience users expect and protecting user privacy. This paper explores the competing interests of privacy and user experience, and proposes possible strategies to address them in the future design of library discovery tools. INTRODUCTION On March 23, 2017, the internet erupted with outrage in response to the results of a Senate vote to roll back Federal Communications Commission (FCC) rules prohibiting internet service providers (ISPs), such as Comcast, Verizon, and AT&T, from selling customer web browsing histories and other usage data without customer permission. Less than a week after the Senate vote, the House followed suit and similarly voted in favor of rolling back the FCC rules, which were set to go into effect at the end of 2017.2 The repeal became official on April 3, 2017 when the President signed it into law.3 This decision by U.S. lawmakers serves as a reminder that today’s internet economy is a data economy, where personal data flows freely on the web, ready to be compiled and sold to the highest bidder. Continuous online tracking and surveillance has become the new normal. Shayna Pekala (shayna.pekala@georgetown.edu) is Discovery Services Librarian, Georgetown University Library, Washington, DC. PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 49 ISPs are just one of the many players in the online tracking game. Major web search engines, such as Google, Bing, and Yahoo, also collect information about users’ search histories, among other personal information.4 By selling this data to advertisers, data brokers, and/or government agencies, these search engine companies are able to make a profit while providing the search engines themselves for “free.” In addition to profiting off of user data, web search engines also use it to enhance the user experience of their products. Collecting and analyzing user data enables systems to learn user preferences, providing personalized search results that make it easier to navigate the ever-increasing sea of online information. The collection and sharing of user data that occurs on the open web is deeply troubling for libraries, whose professional ethics embody the values of privacy and intellectual freedom. A user’s search history contains information about a user’s thought process, and the monitoring of these thoughts inhibits intellectual inquiry.5 Libraries, however, would be remiss to dismiss the success of web search engines and their use of data altogether. MIT’s preliminary report on the future of libraries urges, “While the notion of ‘tracking’ any individual’s consumption patterns for research and educational materials is anathema to the core values of libraries...the opportunity to leverage emerging technologies and new methodologies for discovery should not be discounted.”6 This article examines the current landscape of library discovery, the competing interests of privacy and user experience at play, and proposes possible strategies to address them in the future design of library discovery tools. BACKGROUND Library Discovery in the Digital Age The advent of new technologies has drastically shaped the way libraries support information discovery. While users once relied on shelf-browsing and card catalogs to find library resources, libraries now provide access to a suite of online tools and interfaces that facilitate cross-collection searching and access to a wide range of materials. In an online environment, many paths to discovery are possible, with the open web playing a newfound and significant role. Today’s library discovery tools fall into three categories: online catalogs (the patron interface of the integrated library system (ILS)), discovery layers (a patron interface with enhanced functionality that is separate from an ILS), and web-scale discovery tools (an enhanced patron interface that relies on a central index to bring together resources from the library catalog, subscription databases, and digital repositories).7 These tools are commonly integrated with a variety of external systems, including proxy servers, inter-library loan, subscription databases, individual publisher websites, and more. For the most part, libraries purchase discovery tools from third-party vendors. While some libraries use open source discovery layers, such as Blacklight or VuFind, there are currently no open source options for web-scale discovery tools.8 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 50 Outside of the library, web search engines (e.g. Google, Bing, and Yahoo), and targeted academic discovery products (e.g. Google Scholar, ResearchGate, and Academia.edu) provide additional systems that enable discovery.9 In fact, web search engines, particularly Google, play a significant role in the research process. Both students and faculty use Google in conjunction with library discovery tools. Students typically use Google at the beginning of the research process to get a better understanding of their topic and identify secondary search terms. Faculty, on the other hand, use Google to find out how other scholars are thinking about a topic.10 Unsurprisingly, Google and Google Scholar provide the majority of content access to major content platforms.11 The Data Economy and Online Privacy Concerns In an information discovery environment that is primarily online, new threats to patron privacy emerge. In today’s economy, user data has become a global commodity. Commercial businesses have recognized the value of data mining for marketing purposes. Bjorn Bloching, et. al. explain, “From cleverly aggregated data points, you can draw multiple conclusions that go right to the heart and mind of the customer.”12 Along the same lines, the ability to collect and analyze user data is extremely valuable to government agencies for surveillance purposes, creating an additional data-driven market.13 The increasing value of user data has drastically expanded the business of online tracking. In her book, Dragnet Nation, journalist Julia Angwin outlines a detailed taxonomy of trackers, including various types of government, commercial, and individual trackers.14 In the online information discovery process, multiple parties collect user data at different points. Consider the following scenario: a user executes a basic keyword search in Google to access an openly available online resource. In the fifteen seconds it takes the user to get to that resource, information about the user’s search is collected by the internet service provider (ISP), the web browser, the search engine, the website hosting the resource, and any third-party trackers embedded in the website. The search query, along with the user’s Internet Protocol (IP) address, become part of the data collector’s profile on the user. In the future, the data collector can sell the user’s profile to a data broker, where it will be merged with profiles from other data collectors to create an even more detailed portrait of the user.15 The data broker, in turn, can sell the complete dataset to the government, law enforcement, commercial businesses, and even criminals. This creates serious privacy concerns, particularly since users have no legal right over how their data is bought and sold.16 Privacy Protection in Libraries Libraries have deeply-rooted values in privacy and strong motivations to protect it. Intellectual freedom, the foundation on which libraries are built, necessarily requires privacy. In its interpretation of the Library Bill of Rights, the American Library Association (ALA) explains, “In a library (physical or virtual), the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others.”17 Many studies support this idea, PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 51 having found that people who are indiscriminately and secretly monitored censor their behavior and speech.18 Libraries have both legal and ethical obligations to protect patron privacy. While there is no federal legislation that protects privacy in libraries, forty-eight states have regulations regarding the confidentiality of library records, though the extent of these protections varies by state.19 Because these statutes were drafted before the widespread use of the internet, they are phrased in a way that addresses circulation records and does not specifically include or exclude internet use records (records with information on sites accessed by patrons) from these protections. Therefore, according to Theresa Chmara, libraries should not treat internet use records any differently than circulation records with respect to confidentiality.20 The library community has established many guiding documents that embody its ethical commitment to protecting patron privacy. The ALA Code of Ethics states in its third principle, “We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”21 The International Federation of Library Associations and Institutions (IFLA) Code of Ethics has more specific language about data sharing, stating, “The relationship between the library and the user is one of confidentiality and librarians and other information workers will take appropriate measures to ensure that user data is not shared beyond the original transaction.”22 The library community has also established practical guidelines for dealing with privacy issues in libraries, particularly those issues relating to digital privacy, including the ALA Privacy Guidelines23 and the National Information Standards Organization (NISO) Consensus Principles on User’s Digital Privacy in Library, Publisher, and Software-Provider Systems.24 Additionally, The Library Freedom Project was launched in 2015 as an educational resource to teach librarians about privacy threats, rights, and tools, and in 2017, the Library and Information Technology Association (LITA) released a set of seven privacy checklists25 to help libraries implement the ALA Privacy Guidelines. Personalization of Online Systems While user data can be used for tracking and surveillance, it can also be used to improve the digital user experience of online systems through personalization. Because the growth of the internet has made it increasingly difficult to navigate the continually growing sea of information online, researchers have put significant effort into designing interfaces, interaction methods, and systems that deliver adaptive and personalized experiences.26 Angsar Koene, et. al. explain, “The basic concept behind personalization of on-line information services is to shield users from the risk of information overload, by pre-filtering search results based on a model of the user’s preferences… A perfect user model would…enable the service provider to perfectly predict the decision a user would make for any given choice.”27 The authors continue to describe three main flavors of personalization systems: 1. content-based systems, in which the system recommends items based on their similarity to items that the user expressed interest in; INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 52 2. collaborative-filtering systems, in which users are given recommendations for items that other users with similar tastes liked in the past; and 3. community-based systems, in which the system recommends items based on the preferences of the user’s friends.28 Many popular consumer services, such as Amazon.com, YouTube, Netflix, Google, etc., have increased (and continue to increase) the level of personalization that they offer.29 One such service in the area of academic resource discovery is Google Scholar’s Updates, which analyzes a user’s publication history in order to predict new publications of interest.30 Libraries, in contrast, have not pressed their developers and vendors to personalize their services in favor of privacy, even though studies have shown that users expect library tools to mimic their experience using web search engines.31 Some web-scale discovery services do, however, allow researchers to set personalization preferences, such as their field of study, and, according to Roger Schonfeld, it is likely that many researchers would benefit tremendously from increased personalization in discovery.32 In this vein, the American Philosophical Society Library recently launched a new recommendation tool for archives and manuscripts that uses circulation data and user-supplied interests to drive recommendations.33 Opportunities for User Experience in Library Discovery A major challenge in today’s online discovery environment is that the user is inhibited by an overwhelming number of results. This leads to users rely on relevance rankings and to fail to examine search results in depth. Creating fine-tuned relevance ranking algorithms based on user behavior is one remedy to this problem, but it relies on the use of personal user data.34 However, there may be opportunities to facilitate data-driven discovery while maintaining the user’s anonymity that would be suitable for library (and other) discovery tools. Irina Trapido proposes that relevance ranking algorithms could be designed to leverage the popularity of a resource measured by its circulation statistics or by ranking popular or introductory materials higher than more specialized ones to help users make sense of large results sets.35 Michael Schofield proposes “context-driven design” as an intermediary solution, whereby the user opts in to have the system infer context from neutral device or browser information, such as the time of day, business hours, weather, events, holidays, etc.36 Jason Clark describes a search prototype he built that applies these principles, but he questions whether these types of enhancements actually add value to users.37 Rachel Vacek cautions that personalization is not guaranteed to be useful or meaningful, and continuous user testing is key.38 DISCUSSION There are several aspects to consider for the design of future library discovery tools. The integrated, complex nature of the web causes privacy to become compromised during the information discovery process. Library discovery tools have been designed not to retain borrowing records, but have not yet evolved to mask user behavior, which is invaluable in today’s data economy. It is imperative that all types of library discovery tools have built-in functionality to PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 53 protect patron privacy beyond borrowing records, while also enabling the ethical use of patron data to improve user experience. Even if library discovery tools were to evolve so that they themselves were absolutely private (where no data were ever collected or shared), other online parties (ISPs, web browsers, advertisers, data brokers, etc.) would still have access to user data through other means, such as cookies and fingerprinting. The operating reality is such that privacy is not immediately and completely controllable by libraries. Laurie Rinehart-Thompson explains, “In the big picture, privacy is at the mercy of ethical and stewardship choices on the part of all information handlers.”39 While libraries alone cannot guarantee complete privacy for their patrons, they can and should mitigate privacy risks to the greatest extent possible. At the same time, ignoring altogether the benefits of using patron data to improve the discovery user experience may threaten the library’s viability in the age of Google. Roger Schonfeld explains, “If systems exclude all personal data and use-related data, the resulting services will be one- dimensional and sterile. I consider it essential for libraries to deliver dynamic and personalized services to remain viable in today's environment; expectations are set by sophisticated social networks and commercial destinations.”40 Libraries must find ways to keep up with greater industry trends while adhering to professional ethics. RECOMMENDATIONS While libraries have traditionally shied away from collecting data about patron transactions, these conservative tendencies run counter to the library’s mission to provide outstanding user experience and the need to evolve in a rapidly changing information industry. As the profession adopts new technologies, ethical dilemmas present themselves that are tied into their use. While several library organizations have issued guidance for libraries about the role of user data in these new technologies, this does not go far enough. The NISO Privacy Principles, for instance, acknowledge that its principles are merely “a starting point.”41 Examining the substance of these guidelines is important for confronting the privacy challenges facing library discovery in the 21st century, but there are additional steps libraries can take to more fully address the competing interests of privacy and user experience in library discovery and in library technologies more generally. Holding Third Parties Accountable Libraries are increasingly at the mercy of third parties when it comes to the development and design of library discovery tools. Unfortunately, these third parties not have the same ethical obligations to protect patron privacy that librarians do. In addition, the existing guidance for protecting user data in library technologies is directed towards librarians, not third party vendors. The library community must hold third parties accountable for the ethical design of library discovery tools. One strategy for doing this would be to develop a ranking or certification process for discovery tools based on a community set of standards. The development of HIPAA-compliant INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 54 records management systems in the medical field sets an example. Because healthcare providers are required by law to guarantee the privacy of patient data,42 they must select Electronic Health Records systems (ERMs) that have been certified by an Office of the National Coordinator for Health Information Technology (ONC)-authorized body.43 In order to be certified, the system must adhere to a set of criteria adopted by the Department of Health and Human Services,44 which includes privacy and security standards.45 Another example is the Consumer Reports standard and testing program for consumer privacy and security, which is currently in development. Consumer Reports explains the reason for developing this new privacy standard, “If Consumer Reports and other public-interest organizations create a reasonable standard and let people know which products do the best job of meeting it, consumer pressure and choices can change the marketplace.”46 Libraries could potentially adapt the Consumer Reports standards and rating system for library discovery tools and other library technologies. Engaging in UX Research & Design Libraries should not rely on third parties alone to address privacy and user experience requirements for library discovery tools. Libraries are well-poised to become more involved in the design process itself by actively engaging in user experience research and design. The opportunities for “context-driven design” and personalization based on circulation and other anonymous data are promising for library discovery but require ample user testing to determine their usefulness. Understanding which types of personalization features offer the most value while preserving privacy is key to accelerating the design of library discovery tools. The growth of User Experience Librarian jobs and the emergence of User Experience teams and departments in libraries signals an increasing amount of user experience expertise in the field, which can be leveraged to investigate these important questions for library discovery. Illuminating the Black Box When librarians adopt new discovery tools without fully understanding their underlying technologies and the data economy in which they operate, this does not serve users. Librarians have ethical obligations that should require them to thoroughly understand how and when user data is captured by library discovery tools and other web technologies, and how this information is compiled and shared at a higher level. Not only do librarians need to understand the technical aspects of discovery technologies, they also need to understand the related user experience benefits and privacy concerns and the resulting ethical implications. As technology continues to evolve, librarians should be required to engage in continued learning in these areas. Such technology literacy skills could be incorporated in the curriculum of Library and Information Science degree programs, as well as in ongoing professional development opportunities. Empowering Library Users Because information discovery in an online environment introduces new privacy risks, communication about this topic between librarians and patrons is paramount. Librarians should PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 55 proactively discuss with patrons the potential risks to their privacy when conducting research online, whether they are using the open web or library discovery tools. It is ultimately up to the patron to weigh their needs and preferences in order to decide which tools to use, but it is the librarian’s responsibility to empower patrons to be able to make these decisions in the first place. CONCLUSION With the rollback of the FCC privacy rules that prohibit ISPs from selling customer search histories without customer permission, understanding digital privacy issues and taking action to protect patron privacy is more important than ever. While privacy and user experience are both necessary and important components of library discovery systems, their requirements are in direct conflict with each other. An absolutely private discovery experience would mean that no user data is ever collected during the search process, whereas a completely personalized discovery experience would mean that all user data is collected and utilized to inform the design and features of the system. It is essential for library discovery tools to have built-in functionality that protects patron privacy to the greatest extent possible and enables the ethical use of patron data to improve user experience. The library community must take action to address these requirements beyond establishing guidelines. Holding third party providers to higher privacy standards is a starting point. In addition, librarians themselves need to engage in user experience research and design to discover and test the usefulness of possible intermediary solutions. Librarians must also become more educated as a profession on digital privacy issues and their ethical implications in order to educate patrons about their fundamental rights to privacy and empower them to make decisions about which discovery tools to use. Collectively, these strategies enable libraries to address user needs, uphold professional ethics, and drive the future of library discovery. REFERENCES 1. Irina Trapido, “Library Discovery Products: Discovering User Expectations through Failure Analysis,” Information Technologies and Libraries 35, no. 3 (2016): 9-23, https://doi.org/10.6017/ital.v35i3.9190. 2. Brian Fung, “The House Just Voted to Wipe Away the FCC’s Landmark Internet Privacy Protections,” The Washington Post, March 28, 2017, https://www.washingtonpost.com/news/the-switch/wp/2017/03/28/the-house-just- voted-to-wipe-out-the-fccs-landmark-internet-privacy-protections. 3. Jon Brodkin, “President Trump Delivers Final Blow to Web Browsing Privacy Rules,” ARS Technica, April 3, 2017, https://arstechnica.com/tech-policy/2017/04/trumps-signature- makes-it-official-isp-privacy-rules-are-dead/. 4. Nathan Freed Wessler, “How Private is Your Online Search History?” ACLU Free Future (blog), https://www.aclu.org/blog/how-private-your-online-search-history. 5. Julia Angwin, Dragnet Nation (New York: Times Books, 2014), 41-42. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 56 6. MIT Libraries, Institute-wide Task Force on the Future of Libraries (2016), 12, https://assets.pubpub.org/abhksylo/FutureLibrariesReport.pdf. 7. Trapido, “Library Discovery Products,” 10. 8. Marshall Breeding, “The Future of Library Resource Discovery,” NISO White Papers, NISO, Baltimore, MD, 2015, 4, http://www.niso.org/apps/group_public/download.php/14487/future_library_resource_dis covery.pdf. 9. Christine Wolff, Alisa B. Rod, and Roger C. Schonfeld, Ithaka S+R US Faculty Survey 2015 (New York: Ithaka S+R, 2016), 11, https://doi.org/10.18665/sr.277685. 10. Deirdre Costello, “Students and Faculty Research Differently” (presentation, Computers in Libraries, Washington, D.C., March 28, 2017), http://conferences.infotoday.com/documents/221/A103_Costello.pdf. 11. Roger C. Schonfeld, Meeting Researchers Where They Start: Streamlining Access to Scholarly Resources (New York: Ithaka S+R, 2015), https://doi.org/10.18665/sr.241038. 12. Björn Bloching, Lars Luck, and Thomas Ramge, In Data We Trust: How Customer Data Is Revolutionizing Our Economy (London: Bloomsbury Publishing, 2012), 65. 13. Angwin, 21-36. 14. Ibid., 32-33. 15. Natasha Singer, “Mapping, and Sharing, the Consumer Genome,” New York Times, June 16, 2012, http://www.nytimes.com/2012/06/17/technology/acxiom-the-quiet-giant-of- consumer-database-marketing.html. 16. Lois Beckett, “Everything We Know About What Data Brokers Know About You,” ProPublica, June 13, 2014, https://www.propublica.org/article/everything-we-know-about-what-data- brokers-know-about-you. 17. “An Interpretation of the Library Bill of Rights,” American Library Association, amended July 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 18. Angwin, Dragnet Nation, 41-42. 19. Anne Klinefelter, “Privacy and Library Public Services: Or, I Know What You Read Last Summer,” Legal References Services Quarterly 26, no. 1-2 (2007): 258-260, https://doi.org/10.1300/J113v26n01_13. 20. Theresa Chmara, Privacy and Confidentiality Issues: Guide for Libraries and Their Lawyers (Chicago: ALA Editions, 2009), 27-28. 21. “Code of Ethics of the American Library Association,” American Library Association, PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 57 amended January 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics. 22. “IFLA Code of Ethics for Librarians and other Information Workers,” International Federation of Library Associations and Institutions, August 12, 2012, http://www.ifla.org/news/ifla-code-of-ethics-for-librarians-and-other-information- workers-full-version. 23. “Privacy & Surveillance,” American Library Association, approved 2015-2016, http://www.ala.org/advocacy/privacyconfidentiality. 24. National Information Standards Organization, NISO Consensus Principles on Users’ Digital Privacy in Library, Publisher, and Software- Provider Systems (NISO Privacy Principles), published on December 10, 2015, http://www.niso.org/apps/group_public/download.php/15863/NISO%20Consensus%20Pr inciples%20on%20Users%92%20Digital%20Privacy.pdf. 25. “Library Privacy Checklists,” Library and Information Technology Association, accessed March 7, 2017, http://www.ala.org/lita/advocacy. 26. Panagiotis Germanakos and Marios Belk, “Personalization in the Digital Era,” in Human- Centred Web Adaptation and Personalization: From Theory to Practice, (Switzerland: Springer International Publishing Switzerland, 2016), 16. 27. Ansgar Koene et al., “Privacy Concerns Arising from Internet Service Personalization Filters,” ACM SIGCAS Computers and Society 45, no. 3 (2015): 167. 28. Ibid., 168. 29. Ibid. 30. James Connor, “Scholar Updates: Making New Connections,” Google Scholar Blog, https://scholar.googleblog.com/2012/08/scholar-updates-making-new-connections.html. 31. Schonfeld, Meeting Researchers Where They Start, 2. 32. Roger C. Schonfeld, Does Discovery Still Happen in the Library?: Roles and Strategies for a Shifting Reality (New York: Ithaka S+R, 2014), 10, https://doi.org/10.18665/sr.24914. 33. Abigail Shelton, “American Philosophical Society Announces Launch of PAL, an Innovative Recommendation Tool for Research Libraries,” American Philosophical Society, April 3, 2017, https://www.amphilsoc.org/press/pal. 34. Trapido, “Library Discovery Products,” 17. 35. Ibid. 36. Michael Schofield, “Does the Best Library Web Design Eliminate Choice?” LibUX, September INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 58 11, 2015, http://libux.co/best-library-web-design-eliminate-choice/. 37. Jason A. Clark, “Anticipatory Design: Improving Search UX using Query Analysis and Machine Cues,” Weave: Journal of Library User Experience 1, no. 4 (2016), https://doi.org/10.3998/weave.12535642.0001.402. 38. Rachel Vacek, “Customizing Discovery at Michigan” (presentation, Electronic Resources & Libraries, Austin, TX, April 4, 2017), https://www.slideshare.net/vacekrae/customizing- discovery-at-the-university-of-michigan. 39. Laurie A. Rinehart-Thompson, Beth M. Hjort, and Bonnie S. Cassidy, “Redefining the Health Information Management Privacy and Security Role,” Perspectives in Health Information Management 6 (2009): 4.s 40. Marshall Breeding, “Perspectives on Patron Privacy and Security,” Computers in Libraries 35, no. 5 (2015): 13. 41. National Information Standards Organization, NISO Consensus Principles. 42. Joel JPC Rodrigues, et al., “Analysis of the Security and Privacy Requirements of Cloud-Based Electronic Health Records Systems,” Journal of Medical Internet Research 15, no. 8 (2013), https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3757992/. 43. Office of the National Coordinator for Health Information Technology, Guide to Privacy and Security of Electronic Health Information, April 2015, https://www.healthit.gov/sites/default/files/pdf/privacy/privacy-and-security-guide.pdf. 44. Office of the National Coordinator for Health Information Technology, “Health IT Certification Program Overview,” January 30, 2016, https://www.healthit.gov/sites/default/files/PUBLICHealthITCertificationProgramOvervie w_v1.1.pdf. 45. Office of the National Coordinator for Health Information Technology, “2015 Edition Health Information Technology (Health IT) Certification Criteria, Base Electronic Health Record (EHR) Definition, and ONC Health IT Certification Program Modifications Final Rule,” October 2015, https://www.healthit.gov/sites/default/files/factsheet_draft_2015-10-06.pdf. 46. Consumer Reports, “Consumer Reports to Begin Evaluating Products, Services for Privacy and Data Security,” Consumer Reports, March 6, 2017, http://www.consumerreports.org/privacy/consumer-reports-to-begin-evaluating- products-services-for-privacy-and-data-security/.
9825 ---- Current Trends and Goals in the Development of Makerspaces at New England College and Research Libraries Ann Marie L. Davis INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 94 Ann Marie L. Davis (davis.5257@osu.edu) is Faculty Librarian of Japanese Studies at The Ohio State University. ABSTRACT This study investigates why and which types of college and research libraries (CRLs) are currently developing makerspaces (or an equivalent space) for their communities. Based on an online survey and phone interviews with a sample population of CRLs in New England, I found that 26 CRLs had or were in the process of developing a makerspace in this region. In addition, several other CRLs were actively promoting and diffusing the maker ethos. Of these libraries, most were motivated to promote open access to new technologies, literacies, and STEM-related knowledge. INTRODUCTION AND OVERVIEW Makerspaces, alternatively known as hackerspaces, tech shops, and fab labs, are trendy new sites where people of all ages and backgrounds gather to experiment and learn. Born of a global community movement, makerspaces bring the do-it-yourself (DIY) approach to communities of tinkerers using technologies including 3D printers, robotics, metal- and woodworking, and arts and crafts.1 Building on this philosophy of shared discovery, public libraries have been creating free programs and open makerspaces since 2011.2 Given their potential for community engagement, college and research libraries (CRLs) have also been joining the movement in growing numbers.3 In recent years, makerspaces in CRLs have generated positive press in popular and academic journals. Despite the optimism, scholarly research that measures their impact is sparse. For example, current library and information science literature overlooks why and how various CRLs choose to create and maintain their respective makerspace. Likewise, there is scant data on the institutional objectives, frameworks, and experiences that characterize current CRL makerspace initiatives.4 This study begins to fill this gap by investigating why and which types of CRLs are creating makerspaces (or an equivalent room or space) for their library communities. Specifically, it focuses on libraries at four-year colleges and research universities in New England. Throughout this study, makerspace is used interchangeably with other terms, including maker labs and innovation spaces, to reflect the variation in names and objectives that underlie the current trends. In exploring their motives and experiences, this article provides a snapshot of the current makerspace movement in CRLs. mailto:davis.5257@osu.edu CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 95 https://doi.org/10.6017/ital.v37i2.9825 The study finds that the number of CRLs actively involved in the makerspace movement is growing. In addition to more than two dozen that have or are in the process of developing a makerspace, another dozen CRLs have staff who support the diffusion of maker technologies, such as 3D printing and crafting tools that support active learning and discovery, in the campus library and beyond.5 Comprising research and liberal arts schools, public and private, and small and large, the CRLs involved with makerspaces are strikingly diverse. Despite these differences, this population is united by common objectives to promote new literacies, provide open access to new technologies, and foster a cooperative ethos of making. LITERATURE REVIEW The body of literature on library makerspaces is brief, descriptive, and often didactic. Given the newness of the maker movement in public and academic libraries, many articles focus on early success stories and defining the movement vis-à-vis the mission of the library. For instance, Laura Britton, known for having created the first makerspace in a public library (The Fayetteville Free Library’s Fabulous Laboratory), defines a makerspace as “a place where people come together to create and collaborate, to share resources, knowledge, and stuff.”6 This definition, she determines, is strikingly similar to that of the library. Most literature on makerspaces appears in academic blogs, professional websites, and popular magazines. Among the most frequently cited is TJ McCue’s article, which celebrates Britton’s (née Smedley) FabLab while distilling the intellectual underpinnings of the makerspace ethos.7 Phillip Torrone, editor of Make: magazine, supports Smedley’s project as an example of “rebuilding” or “retooling” our public spaces.8 Within this camp, David Lankes, professor of information studies at Syracuse University, applauds such work as activist and community-oriented librarianship.9 Many authors emphasize the philosophical “fit,” or intersection, of public makerspaces with the principles of librarianship. Building on Torrone’s work, J. L. Balas claims that creating access to resources for learning and making is in keeping with the “library’s historical role of providing access to the ‘tools of knowledge.’”10 Others emphasize the hands-on, participatory, and inter- generational features of the maker movement, which has the potential to bridge the digital divide.11 Still others identify areas of literacy, innovation, and STE(A)M skills where library makerspaces can have a broad impact. While public libraries often focus on early childhood or adult education, CRLs adopt separate frameworks for information literacy. Like public libraries, they aim to build (meta)literacies and STE(A)M skills. Nevertheless, their programs often tailor to curricular goals in the arts and sciences or specialized degrees in engineering, education, and business. This is especially true of CRLs situated within large, research-intensive universities. Considering their specific missions and aims, this study seeks to identify the goals and challenges that reinforce the development of makerspaces in undergraduate and research environments. RESEARCH DESIGN AND METHOD Data presented in this study was gathered from library directors (or their designees) through an online survey and oral telephone interviews. After choosing a sampling frame of CRLs in New England, I developed a three-path survey, sent invitations, and collected and analyzed data using the online platform SurveyMonkey. The survey was distributed following review by the INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 96 institutional review board (IRB) at Southern Connecticut State University, where I completed a Master of Library Science (MLS) degree. Survey Population To assess generalized findings for the larger population in North America, I chose a cluster- sampling approach that limited the survey population to the CRLs in New England. In generating the sampling frame, I included four-year and advanced-degree institutions based on the assumption that libraries at these schools supported specialized, research, or field-specific degrees. I omitted for-profit and two-year institutions, based on the assumption that they are driven by separate business models. This process generated a contact list of 182 library directors at the designated CRLs in Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont. Survey Design The purpose of the survey was to gather basic data about the size and structure of the respondents’ institutions and to gain insights on their views and practices regarding makerspaces (the survey is reproduced in the appendix). The first page of the survey contained a statement of consent, including my contact information and that of my IRB. After a short set of preliminary questions, the survey branched into one of three paths based on respondents’ answers about makerspaces. The respondents were thus categorized into one of three groups: Path One (P1) for those with no makerspace and no plans to create one, Path Two (P2) for those with plans to develop a makerspace in the near future, and Path Three (P3) for those already running a makerspace in their libraries. P3 was the longest section of the survey, containing several questions about P3 experiences with makerspaces such as staffing, programing, and objectives. Data Collection In summer 2015, brief email invitations and two reminders were sent to the targeted population.12 To increase the participation rate, I sometimes wrote personal emails and made direct phone calls to CRLs known to have makerspace. For cold-call interviews, I developed a script explaining the nature of the online survey. After obtaining informed consent, I proceeded to ask the questions in the online survey and manually enter the participants’ responses at the time of the interview. On a few occasions, online respondents followed up with personal emails volunteering to discuss their library’s experiences in more detail. I took advantage of these invitations, which often provided unique and welcome insights. In analyzing the responses, I used tabulated frequencies for quantitative results and sorted qualitative data into two different categories. The first category was identified as “short and objective” and coded and analyzed numerically. The longer, more “subjective and value-driven” data was analyzed for common trends, relationships, and patterns. Within this second category, I also identified outlier responses that suggested possible exceptions to common experiences. RESULTS The survey closed after one month of data collection. At this time, 55 of 182 potential respondents had participated, yielding a response rate of 30.2%. Among these participants, the survey achieved a 100.0% response rate (9 completed surveys of 9 targeted CRLs) among libraries that were CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 97 https://doi.org/10.6017/ital.v37i2.9825 currently operating makerspaces. I created a list of all known CRL makerspaces in New England based on an exhaustive website search of all CRLs in this region. Subsequent interviews with the managers of the makerspaces on this list revealed no other hidden or unknown makerspaces in this region. Of the 55 respondents, 29 (52.7%) were in P1, 17 (30.9%) were in P2, and 9 (16.4%) were in P3. (See figure 1.) Figure 1. Survey participants’ (n = 55) current CRL efforts and plans to develop and operate a makerspace. Among respondents in P2 and P3, the majority (13 of 23) indicated that they were from libraries that served a student population of 4,999 people or fewer, while only one library served a population of 30,000 or more (see figure 2). In terms of sheer numbers, makerspaces might seem to be gaining traction at smaller CRLs, but proportionally, one cannot say that smaller CRLs are adopting makerspaces at a higher rate because the majority of survey participants have student populations of 19,999 or less (51, or 91.1%). The number of institutions with populations over 20,000 were in a clear minority (5, or 8.9%). (See figure 3.) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 98 Figure 2. P2 and P3 CRLs with makerspaces or concrete plans to develop a makerspace. Figure 3. The majority of CRLs (67.2%) that participated in the survey had a population of 4,999 students or less. Only 1.8% of schools that participated had a population of 30,000 students or more. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 99 https://doi.org/10.6017/ital.v37i2.9825 CRLs with No Makerspace (P1 = 29) In the first part of the survey, the majority of P1 respondents demonstrated positive views toward makerspaces despite having no plans to create one in the near future. Budgetary and space limitations aside, many were relatively open to the possibility of developing a makerspace in a more distant future. In the words of one respondent, “we have several areas within the library that present a heavy demand on our budget. In [the] future, we would love to consider a makerspace, and whether it would be a sensible and appropriate investment that would benefit our students.” When asked what their reasons were for not having a makerspace, some respondents (8, or 27.6%) said they had not given it much thought, but most (21, or 72.4%) offered specific answers. Among these, the most frequently cited reason (11, or 37.8%) was that a library makerspace would be redundant: such spaces and labs were already offered in other departments within the institution or in the broader community. At one CRL, for example, the respondent said the library did not want to compete with faculty initiatives elsewhere on campus. Other reasons included that makerspaces were expensive and not a priority. Some (5, or 17.2%) libraries preferred to allocate their funds to different types of spaces such as “a very good book arts studio/workshop” or “simulation labs.” Some (6, or 20.6%) shared concerns about a lack of space, staff, or simply “a good culture of collaboration [on campus].” Merging these sentiments, one respondent concluded, “People still need the library to be fairly quiet. . . . Having makerspace equipment in our library would be too distracting.” While some were skeptical (sharing concerns about potential hazards or that makerspaces were simply “the flavor of the month”), the majority (roughly 60%) were open and enthusiastic. One respondent, in fact, held a leadership position in a community makerspace beyond campus. According to this librarian, 3D printers, scanners, and laser cutters were sure to become more common, and CRLs would no doubt eventually develop “a formal space for making stuff.” CRLs with Plans for a Makerspace in the Near Future (P2 = 17) The second section of the survey (P2) focused primarily on the motivations and means by which this cohort planned to develop a makerspace. When asked why they were creating a makerspace, the most common response was to promote learning and literacy (15 respondents, or 88.2%). In addition, a large majority (12 respondents, or 70.6%) felt that makerspaces helped to promote the library as relevant, particularly in the digital age. Three more reasons that earned top scores (10 respondents each, or 58.2%) were being inspired by the ethos of making, creating a complement to digital repositories and scholarship initiatives, and providing access to expensive machines or tools. Additional reasons included building outreach and responding to community requests.13 (See figure 4.) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 100 Figure 4. Rationale behind P2 respondents’ decision to plan a makerspace (n = 17). While P2 respondents indicated a clear decision to create a makerspace, their timeframes were noticeably different. I categorized their open responses into one of six timeframes: “within six months,” “within one year,” “within two years,” “within four years,” “within six years,” and “unknown.” The result presented a clear trimodal distribution with three subgroups: six CRLs with plans to open within 18 months, five with plans to open within the next two years, and six with plans to open after three or more years (see figure 5). In addition to their timeframe, P2 respondents were also asked about their plans for financing their future makerspaces. Based on their open responses, the following six funding sources emerged: • the library budget, including surplus moneys or capital project funds • internal funding, including from campus constituents • donations and gifts • external grants • cost recovery plans, including small charges to users • not sure/in progress CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 101 https://doi.org/10.6017/ital.v37i2.9825 Figure 5. P2 respondents’ timeframe for developing the makerspace (n = 17). With seven mentions, the most common of the above funding was the “library budget.” With two mentions each, the least common sources were “cost recovery” and “not sure/in progress.” Among those who mentioned external grant applications, one respondent mentioned a focus on Women and STEM opportunities, and another specifically discussed attempts at grants from the Institute of Museum and Library Services. (See figure 6.) Figure 6. P2respondents’ plans for gathering and financing makerspace (n = 17). Regarding target user groups, some respondents focused on opportunities to enhance specific disciplinary knowledge, while others emphasized a general need for creating a free and open environment. One respondent mentioned that at her state-funded library, the space would be “geared to younger [primary and secondary school] ages,” “student teachers,” and “librarians on practicum assignments.” By contrast, another respondent at a large, private, Carnegie R1 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 102 university emphasized that the space was earmarked for the undergraduate and graduate students. In contrast to the cohort in P1, a notable number in P2 chose to create a makerspace despite the existence of maker-oriented research labs elsewhere on campus. As one respondent noted, the university was still “lacking a physical space where people could transition between technologies” and an open environment “where students doing projects for faculty” could come, especially later in the evenings. Another respondent at a similarly large, private institution explained that his colleagues recognized that most labs at their university were earmarked for specific professional schools. As a result, his colleagues came up with a strategy to provide self-service 3D printing stations at the media center, located in the library at the heart of campus. CRLs with Operating Makerspaces (P3 = 9) The final section of the survey (P3) focused on the motivations and means by which CRLs with makerspaces already in operation chose to develop and maintain their sites. In addition, this section gathered information on P3 CRL funding decisions, service models, and types of users in their makerspaces. Of the nine respondents in this path, all had makerspaces that had opened within the last three years. Among these, roughly a third (4) had been in operation from one to two years; another third (3) had operated for two to three years; and two had opened within the last year. (See table 1.) Table 1. Length of time the CRL makerspace has been in operation for P3 respondents (n = 9). Age of CRL Makerspace or Lab—P3 Answer Options Responses % Less than 6 months 1 11.1 6–12 months 1 11.1 1–2 years 4 44.4 2–3 years 3 33.3 More than 3 years 0 0.0 Total Responses 9 100.0 Priorities and Rationale The reasons behind P3 decisions to make a makerspace were slightly different from those of P2. While “promoting literacy and learning” was still a top priority, two other reasons, “promoting the maker culture of making” and “providing access to expensive machinery,” were deemed equally important (6 respondents, or 66.7%, for each). Other significant priorities included “promoting community outreach” (4 respondents, or 44.4%), “promoting the library as relevant” and in “direct response to community requests” (3 respondents, or 33.3%, for each). (See figure 7.) CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 103 https://doi.org/10.6017/ital.v37i2.9825 Figure 7. Rationale behind P3 respondents’ decision to develop and maintain a makerspace (n = 9). The answer of “other” was also given top priority (5 respondents, or 55.6%). I conclude that this indicated a strong desire among respondents to express in their own words their library’s unique decisions and circumstances. (Their free responses to this question are discussed below.) A familiar theme in the responses of the five respondents who elaborated on their choice of “other” was the desire to situate a makerspace in the central and open environment of the campus library. As one participant noted, there were “other access points and labs on campus,” but those labs were “more siloed” or cut off from the general population. By contrast, the campus library aimed to serve a broader population and anticipated a general “student need.” Later, the same respondent added that the makerspace was an opportunity to promote social justice, cultivate student clubs, and encourage engagement at the hub of the campus community. This type of ecumenical thinking was manifested in a similar remark that the library’s role was to reinforce other learning environments on campus. One respondent saw the makerspace as an additional resource “that complemented the maker opportunities that we have had in our curriculum resource center for decades.” Likewise, the library makerspace was intended to offer opportunities to a range of users on campus and beyond. Funding, Staffing, and Service Models When prompted to discuss how they gathered the resources for their makerspaces, the largest group (4 respondents) stated that a significant means for funding was through gifts and donations. Thus, the majority of CRL makerspaces in New England depended primarily on contributions from friends of the library, university/college alumni, and donors. The second most common source (3 respondents) was through the library budget, including surplus money at the end of the year. Making use of grant money and cost recovery were mentioned by two library participants, and internal and constituent support was useful for two libraries. (See figure 8.) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 104 Figure 8. P3 methods for gathering and financing a makerspace (n = 9). Among these, a particularly noteworthy case was a makerspace that had originated from a new student club focused on 3D printing. Originally based in a student dorm, the club was funded by a campus student union, which allocated grant money to students through a budget derived from the college tuition. As the club quickly grew, it found significant support in the library, which subsequently provided space (on the top floor of the library), staff, and financial support from surplus funds in the library budget. As this example would suggest, the sum of the responses showed that financing the makerspaces depended on a combination of strategies. One participant summarized it best: “We’ve slowly accumulated resources over time, using different funding for different pieces. Some grant funding. Mostly annual budget.” Regarding service models, more than half of these libraries (five) currently offer a combination of programming and open lab time where users could make appointments or just drop in. By contrast, two of the libraries offered programs only, and did not offer an open lab; another two did the opposite, offering no programming but an open makerspace at designated times. Of the latter, one is open Monday to Friday from 8 a.m. to 4 p.m., and the other is open during regular hours, with spaces that “can be booked ahead for classes or projects.” Most labs supported drop-in visitors and were open evenings and weekends. At one makerspace, where there was increasingly heavy demand, the staff required students to submit proposals with project goals. (See table 2.) While some libraries brought in community experts, others held faculty programs, and some scheduled lab time for individual classes. One makerspace prioritized not only the campus, but also the broader community, and thus featured programs for local high schools and seniors. Responses from this library emphasized the social justice thread that inspired their work and the community culture that they aimed to foster. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 105 https://doi.org/10.6017/ital.v37i2.9825 Table 2. Model for services offered in the CRL makerspace or 3D printing lab Do you offer programs in the makerspace/lab or is it simply opened at defined times for users to use? Answer Options Responses % Yes, we offer the following types of programs. 2 22.2 No, we simply leave the makerspace/lab open at the specific times. 2 22.2 We do both. We offer the programs and leave the makerspace/lab open at specific times. 5 55.6 As this data would suggest, most makerspaces were used by students (undergraduates and graduates) and faculty, in addition to local experts and generational groups. Survey responses showed that undergraduate students were the most common users (9 of 9 respondents checked this group as the most frequent type of user), and faculty and graduate students were the second and third most common (8 of 9 respondents checked these groups as most frequent) user groups in the labs. Local entrepreneurs, artists, designers, craftspeople, and campus and library staff also use the makerspaces. (See figure 9.) When prompted to identify “other” categories, one respondent specifically listed “learners, makers, sharers, studiers, [and] clubs.” Figure 9. Of the different types of users listed above, P3 respondents ranked them in order of who used the makerspace or equivalent lab most often (n = 9). The number and type of staff that managed and operated the makerspaces also varied widely at the nine CRLs in P3. Seven of the CRLs employed full-time, dedicated staff, among whom four participants checked off the “dedicated staff”–only options. Of the remaining two CRLs, one INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 106 reported staffing the makerspace with only one student, and one reported not having any staff working in the makerspace. I assume that the makerspace with no employees is managed by staff and students who are assigned to other, unspecified library departments or work groups. (See figure 10.) Figure 10. The staffing situations at the P3 respondents (n = 9), where each respondent is assigned a letter from “A” to “I.” Library programing was also diverse in terms of targeted audiences, speakers, and learning objectives. Instructional workshops varied from 3D scanning and printing to soldering, felt making, sewing, knitting, robotics, and programming (e.g., Raspberry Pi.) The type of equipment contained in each lab is likely correlated to the range in programming; however, investigating these links was beyond the scope of this study. Regarding this equipment, the size and activity of the participant CRLs varied considerably. Some responses were more specific than others, and thus the resulting dataset was incomplete (See table 3.) Challenges and Philosophies of CRL Makerspaces The final portion of the survey invited participants to freely offer their thoughts about operating a CRL makerspace. What follows below is a summary of the two most prominent themes that emerged: the challenges of building the lab and the social philosophies that framed these initiatives. In terms of challenges, the most common hurdle noted was the tremendous learning curve involved in establishing, maintaining, and promoting a makerspace. Setting up some of the 3D printers, for example, required knowledge about electrical networks, computer systems, and safety policies at a federal and local level. Once the hardware was running, lab managers needed to know how the machines interfaced with different and challenging software applications. Communication skills were also critical, as one respondent reported, “Printing anything and everything takes knowledge, experience.” Communicating with stakeholders and users in accessible and proactive ways required strong teaching and customer service skills. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 107 https://doi.org/10.6017/ital.v37i2.9825 Table 3. The types of tools and equipment used at P3 CRL respondents (n = 8), which are assigned letters from A to H. Major Equipment Offered by Individual Library Makerspaces or Equivalent Labs—Path 3 CRL Label Response Text A Die cut machine, 3D printer, 3D pens, raspberry pi, arduino, makey makey, art supplies, sewing supplies, pretty much anything anyone asks for we will try to get. B 2 Makerbot replicators, 1 digital scanner, 1 Othermill C 3D printing, 3D scanning, and Laser cutting. D 3D printing, 3D scanning, laser cutting, vinyl cutting, large format printing, cnc machine, media production/postproduction. E No response F 3 CreatorX, 1 Powerspec, 3 M3D, 2 Replicator 2, 1 Replicator2x, 1 Makergear, 1 LeapfrogXL, 1 Ultimaker, 1Type A,1 Deltaprinter, 1 Delta Maker, 2 Printrbot, 2 Filabots, 2X-box kinect for scanning, 2 Oculus rifts, embedded systems cabinet with Soldering stations, solar panels and micro controllers etc, 1 formlabs SLA, 1 Muve SLA, RoVa 5, a bunch of quadcopters G 3D printers (4 printers, 3 models), 3D scanning/digitizing equipment (3 models), Raspberry Pi, Arduino, a laser cutter and engraving system, poster printer, digital drawing tablets, GoPro, a variety of editing and design software, a number of tools (e.g. Dremel, soldering iron, wrenches, pliers, hammers, etc.), and a number of consumable or misc. items (e.g. paint, electrical tape, acetone, safety equipment, LED lights, screws and nails, etc.) H 48 printers (all Makerbot brand), 35 replicator 5th Gen (a moderate size printer, 5 Replicator Z18 printers (larger built size), and 5 replicator minis, 3 Replicator 2X) 5 Makerbot digitzers (turntable scanners 8" by 8") 1 Cubify Sense Hand Scanner 7 still cameras for photogrammetry 21 I-Mac computers 2 Mac Pros 2 Wacom graphics tablets (thinking about complementing other resources at other labs on campus) Another challenge that often came up was that of managing resources. As one respondent warned, CRLs should beware the “early adoption of certain technologies,” which can become “quickly INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 108 outdated by a rapidly growing field.” For others, it was a challenge to recruit the right staff that could run and fix machines in constant need of repair. In addition to hiring people with manufacturing and teaching skills, a successful lab required individuals who were savvy about outreach and community needs. Despite such challenges, many respondents were eager to discuss the aspirations and rewards of CRL makerspaces. Above all, respondents focused on the pedagogical opportunities on the one hand, and the potential for outreach and social justice on the other. One participant conceded that measuring advances in literacy and education was “intangible,” but he saw great value in “giving students the experience of seeing their ideas come to fruition.” The excitement that this created for one student manifested in a buzz, and subsequently a “fever” or groundswell, in which more users came in to tinker and learn. Meanwhile, the learning that took place among future professionals on campus was “critical,” even when results did not “go viral.” The aspiration to create human connections within and beyond campus was another striking theme. According to one respondent, the makerspace had “enabled some incredibly fruitful collaborations with different departments on campus.” This “fantastic outcome” was becoming more and more visible as the maker community grew. Other CRL makerspaces took pride in fostering a type of learning that was explicitly collaborative, exciting, and even “fun” for users. This in turn meant that some libraries were becoming “very popular,” generating a lot of “good PR,” and becoming central in the lives of new types of library users. Along these lines, some respondents aimed to leverage the power of the makerspace to achieve social justice goals that resonated with core values of librarianship. According to one enthusiastic participant, the ethos of sharing was alive and strong among the staff and the many students who saw their participation in the lab as a lifestyle and culture of collaborating. In another initiative, the respondent looked forward to eventually offering grants to those users who proposed meaningful ways to use the makerspace to create practical value for the community. From this perspective, there was added value in having the 3D printing lab situated specifically on a college or university campus. According to this respondent, the unique quality of the CRL makerspace was that by virtue of its location amid numerous and energetic young people, it was ripe for exploitation by those “who had great ideas and time and energy to do good.” DISCUSSION The aim of this study was to explore why and which types of CRLs had developed makerspaces (or an equivalent space) for their communities. Of the 56 respondents, roughly half (46%) were P2 and P3 libraries who were currently developing or operating a makerspace, respectively. Data from this survey indicated that none of the P2 or P3 CRLs fit a mold or pattern in terms of their size, educational models, or classifications. Upon analyzing the data, I found that the differentiators between the three groups were less clearly defined than originally anticipated. In one example of blurred lines, at least two respondents in P1 indicated that they were more actively engaged with makerspaces than two respondents in P2. Despite not having physical labs within their libraries, these P1 respondents were in the process of actively supporting or making plans for a makerspace within their CRL community. One P1 respondent, for example, served on the planning board for a local community makerspace and had therefore “thoroughly investigated and used” the makerspace at a CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 109 https://doi.org/10.6017/ital.v37i2.9825 neighboring university. Based on his knowledge, he decided to develop a complementary initiative (e.g., a book arts workshop) at his university library. Although his library did not yet have a formal makerspace, he felt confident that the diffusion of 3D printers would come to his library in the near future. Another P1 respondent was responsible for administering faculty teaching and innovation grants. Among the recent grant recipients were two faculty collaborators who used the library’s funds to build a makerspace at a campus location that was separate from the library. Although the makerspace was not directly developed by the respondent’s library, it was nevertheless a direct product of his library’s programmatic support. The respondent reported that for this reason, his library did not want to compete with its own faculty initiatives. In another example of blurred distinctions, one librarian in P2 was as deeply immersed in providing access and education on makerspaces as his colleagues in P3. Although he was not clear on when or how his library would finance a future makerspace, his library already offered many of the same services and workshops as P3 libraries. As a “Maker in the Library,” he offered non- credit-bearing 3D printing seminars to students and offered trial 3D printing services in the library for graduates of the 3D printing seminar. In addition, he made appearances at relevant campus events. When the university museum ran a 3D printing day, for instance, he participated as an expert panelist and gave public demonstrations on library-owned 3D printers and a scanner Kinect bar. In sum, despite the respondents’ categorization in P1 and P2, they sometimes shared more in common with the cohorts in P2 and P3, respectively. Given their library’s programmatic involvement in creating and endorsing the maker movement, these respondents were more than just “interested” or “open to” the prospect of creating a makerspace. While only 16% of CRLs (P3 = 9) responded as actively operating a makerspace, another 30% (P2 = 17) were involved in developing a makerspace in the near future. Moreover, the number of CRLs formally involved with the diffusion of maker technologies was not limited to just these two groups. Although some makerspaces were not directly run by the library, they had come to fruition because of library- based funding, grants, and professional support. And although some libraries did not have immediate plans for a makerspace, they were already promoting maker technologies and the maker ethos in other significant ways. CONCLUSION This study is one of the first comprehensive and comparative studies on CRL makerspace programs and their respective goals, policies, and outcomes. While the number of current CRL makerspaces is relatively low, the data suggests that the population is increasing; a growing number of CRLs are involved in the makerspace movement. More than two dozen CRLs were planning to develop makerspaces in the near future, helping to diffuse maker technologies through CRL programming, and/or supporting nonlibrary maker initiatives on campus and beyond. In addition, some CRLs were buying equipment, hiring dedicated staff, offering relevant workshops and demonstrations, and supporting community efforts to build labs beyond the library. Although the author aimed to find structural commonalities between CRLs in groups P2 and P3, none were found. Respondents in these groups came from institutions of all sizes , a wide variety INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 110 of endowment levels, and both public and private funding models, and they ranged in emphasis from the liberal arts to professional certifications and graduate-level research. Although a majority of CRL respondents were not currently making plans to create a makerspace, many respondents were enthusiastic about current trends, and some even promoted the maker movement in unexpected ways. Acknowledging the steady diffusion of 3D printers, many anticipated using such technologies in the future to promote traditional library values and goals. Respondents in P2 and P3 indicated that their primary rationale for developing a makerspace was to promote learning and literacy. Other prominent reasons included promoting library outreach and the maker culture of learning. Data from CRLs with makerspaces indicated that these benefits were often symbiotic and correlated to strong ideas about universal access to emergent tools and practices in learning. Unexpected challenges for developing and operating makerspaces include staffing them with highly skilled, knowledgeable, and service-oriented employees. Learning the necessary skills— including operating the printers, troubleshooting models, and maintaining a safe environment, to name a few—was time-consuming and labor intensive. The majority of funding for CRLs with or planning maker labs came from internal budgets, gifts and donors, and some grants. While some P1 CRLs indicated that their reason for not developing makerspaces was a lack of community interest, P2 and P3 CRLs were not necessarily motivated by user requests or needs, nor was lack of explicit need or interest a deterrent. On the contrary, a few reported a desire to promote the campus library as ahead of the curve by keeping in front of student and community needs. In a similar contradiction, some P1 respondents reported that their libraries did not want to compete with other labs on campus. Respondents from P2 and P3, however, wanted to offer an alternative to the more siloed or structured model of department- or lab-funded makerspaces. Although makerspaces were sometimes forming in other parts of campus, some P2 and P3 CRLs felt there was a gap in accessibility and therefore aimed to offer more open and flexible spaces. A final salient theme among P2 and P3 respondents was their commitment to equity of access and issues of social justice. Above all, they saw a unique fit for makerspaces in their CRL philosophies to serve the greater good. Among other advantages, CRLs were in a unique position to leverage the power of the makerspaces to take advantage of campus communities of “cognitive surplus” and millennial aspirations to share and create spontaneous communities of knowledge. Given the amount of resources that are required to create and maintain a makerspace, this research will be useful for CRLs considering such a space in the future. The present data suggests that no one type of library currently has a monopoly on maker spaces; regardless of size or funding levels, the common thread among P2 and P3 CRLs was simply a commitment to providing access to emergent technologies and supporting new literacies. While annual budgets and grant applications were critical for some libraries, the majority of CRLs funded the bulk of their makerspaces through gifts and donations. Future studies on the characteristics and challenges of P2 and P3 populations beyond those in New England will certainly amplify our understanding of these trends. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 111 https://doi.org/10.6017/ital.v37i2.9825 APPENDIX: SURVEY QUESTIONS Informed Consent CURRENT TRENDS IN THE DEVELOPMENT OF MAKERSPACES AND 3D PRINTING LABS AT NEW ENGLAND COLLEGE AND RESEARCH LIBRARIES Consent for the Participation in a Research Study Southern Connecticut State University Purpose You are invited to participate in a research project conducted by Ann Marie L. Davis, a masters student in library and information studies at Southern Connecticut State University. The purpose of this project is to investigate the experiences and goals of college and research libraries (CRLs) that currently have or are making plans to have an open makerspace (or an equivalent room or space). The results from this study will be included in a special project report for the MLS degree and the basis for an article to submit for peer-review. Procedures If you decide to participate, you will volunteer to take a fifteen-minute online survey. Risks and Inconveniences There are no known risks associated with this research; other than taking a short amount of time, the survey should not burden you or infringe on your privacy in any way. Potential Benefits and Incentive By participating in this research, you will be contributing to our understanding of current trends and practices with regards to community learning labs in CRLs. In addition, you will be providing useful knowledge that can support other libraries in making more informed decisions as they potentially develop their own makerspaces in the future. Voluntary Participation Your participation in this research study is voluntary. You may choose not to participate and you may withdraw your consent to participate at any time. You will not be penalized in any way should you decide not to participate or withdraw from this study. Protection of Confidentiality The survey is anonymous and does not ask for sensitive or confidential information. Contact Information Before you consent, please ask any questions on any aspect of this study that is unclear to you. You may contact me at my student email address at any time: xxx@owls.southernct.edu. If you have questions regarding your rights as a research participant, you may contact the Southern Connecticut State Institutional Review Board at (203) xxx-xxxx. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 112 Consent By proceeding to the next page, you confirm that you understand the purpose of this research, the nature of this survey and the possible burdens and risks as well as benefits that you may experience. By proceeding, this indicates that you have read this consent form, understand it , and give your consent to participate and allow your responses to be used in this research. ACRL Survey on Makerspaces and 3D Printers Q1. What is the size of your college or university? • 4,999 students or less • 5,000–9,999 students • 10,000–19,999 students • 20,000–29,999 students • 30,000 students or more Q2. How would you categorize your institution? (Please check all that apply) • Private • Public • Doctorate-Granting University (awards 20 or more doctorates) • Master’s College or University (awards 50 or more master’s degrees, but fewer than 20 doctorates) Liberal Arts and Sciences College • Other Q3. Do any of the libraries at your institution have a makerspace or equivalent hands-on learning lab (including a 3-D printing station or lab)? • Yes [if “Yes,” respondents are directed to question 14] • No [if “No,” respondents are directed to question 4] Q4. Do any of the libraries at your institution have plans to develop a makerspace or equivalent learning lab in the near future? • Yes [if “Yes,” respondents are directed to question 8] • No [if “No,” respondents are directed to question 5] PATH ONE (CRLs with no makerspace, no plans for makerspace) Q5. Are there specific reasons why your institution has decided not to pursue developing a makerspace or equivalent lab in the near future? • No reasons. We have not given much thought to makerspaces for our library. • Yes Q6. Thank you for your participation. Would you like a copy of the results when the report is completed? If yes, please enter your email address in the space provided. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 113 https://doi.org/10.6017/ital.v37i2.9825 • No • Yes (please enter your email address below) Q7. You have almost concluded this survey. Before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. If no comments, please click “Next” to end the survey. PATH TWO [CRLs with plans to build a makerspace] Q8. What are the main goals that motivated your library’s decision to develop a makerspace or equivalent lab? (Please check all that apply) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other Q9. Of these goals, please rank them in order of their level of priority for your library. (Choose “N/A” for goals that you did not select in the previous question) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other Q10. What is your library’s time frame for developing a makerspace or equivalent lab? Q11. What are your library’s current plans for gathering and/or financing the resources needed for developing and maintaining the makerspace or equivalent lab? Q12. Thank you for your participation. Would you like a copy of the results when the report is completed? • No • Yes (please enter your email address below) Q13. You have almost concluded this survey. Before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. If no comments, please click “Next” to end the survey. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 114 PATH THREE [CRLs with a makerspace] Q14. How long have you had your makerspace or equivalent learning lab? • less than 6 months • 6–12 months • 1–2 years • 2–3 years • more than 3 years Q15. What were the main goals that motivated your library's decision to develop a makerspace or equivalent lab? (Please check all that apply) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs other Q16. Of these goals, please rank them in order of their level of priority for your library. (Choose “N/A” for goals that you did not select in the previous question) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other Q17. How did your library gather and/or finance the resources needed for developing and maintaining the makerspace or equivalent learning lab? Q18. Do you offer programs in the makerspace/lab or is it simply opened at defined times for users to use? • Yes, we offer the following types of programs: • No, we simply leave the makerspace/lab open at the following times (please note times and/or if a reservation is required): • We do both. We offer the following types of programs and leave the makerspace/lab open at the following times (please note types of programs, times open, and if a reservation is required): CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 115 https://doi.org/10.6017/ital.v37i2.9825 Q19. What type of community members tend to use your library's makerspace or equivalent lab most? (Please check all that apply) • undergraduate researchers • graduate researchers • faculty • staff • general public • local artists, designers, or craftspeople • local entrepreneurs • other Q20. Of the cohorts chosen above, please rank them in order of who uses the makerspace or equivalent lab most often. (Use “N/A” for cohorts that are not relevant to your space or lab) • undergraduate researchers • graduate researchers • faculty • staff • general public • local artists, designers, or craftspeople • local entrepreneurs • other Q21. How many dedicated staff does your library currently employ for the makerspace or equivalent? • 0 • 1 • 2 • 3 • other Q22. Where is your makerspace or equivalent lab located? Q23. What is the title or name of your makerspace or equivalent lab, and if known, what were the reasons behind this particular name? Q24. What major equipment and services does your library makerspace or equivalent lab provide? Q25. What unexpected considerations, challenges, or failures has your library faced in developing and maintaining the makerspace or equivalent lab? Q26. How would you assess the benefits or “return on investment” of having a makerspace or equivalent lab? Q27. Thank you for your participation. Would you like a copy of the final results when the report is completed? If yes, please enter your email address in the space provided. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 116 • No • Yes (please enter your email address below) Q28. You have almost concluded this survey. Before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. If no comments, please click “Next” to end the survey. REFERENCES AND NOTES 1 Laura Britton, “A Fabulous Laboratory: The Makerspace at Fayetteville Free Library,” Public Libraries 51, no. 4 (July/August 2012): 30–33, http://publiclibrariesonline.org/2012/10/a- fabulous-labaratory-the-makerspace-at-fayetteville-free-library/; Madelynn Martiniere, “Hack the World: How the Maker Movement is Impacting Innovation: From DIY Geige,” Medium, October 27, 2014, https://medium.com/@mmartiniere/hack-the-world-how-the-maker- movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz. 2 David V. Loertscher, “Maker Spaces and the Learning Commons,” Teacher Librarian 39, no. 6 (October 2012): 45–46, accessed December 9, 2016, Library, Information Science & Technology Abstracts with Full Text, EBSCOhost; Jon Kalish, “Libraries Make Room For High-Tech ‘Hackerspaces,’” National Public Radio, December 25, 2011, http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech- hackerspaces; Diane Slatter and Zaana Howard, “A Place to Make, Hack, and Learn: Makerspaces in Australian Public Libraries,” Australian Library Journal 62, no. 4: 272–84, https://doi.org/10.1080/00049670.2013.853335. 3 Sharon Crawford Barniskis, “Makerspaces and Teaching Artists,” Teaching Artist Journal 12, no. 1: 6–14. 4 Anne Wong and Helen Partridge, “Making as Learning: Makerspaces in Universities,” Australian Academic & Research Libraries 47, no. 3 (September 2016): 143–59, https://doi.org/10.1080/00048623.2016.1228163. 5 Erich Purpur et al., “Refocusing Mobile Makerspace Outreach Efforts Internally as Professional Development,” Library Hi Tech 34, no. 1 (2016): 130–42. 6 Britton, “A Fabulous Laboratory,” 30. 7 TJ McCue, “First Public Library to Create a Maker Space,” Forbes, November 15, 2011, http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker- space/. 8 Phillip Torrone, “Is It Time to Rebuild and Retool Public Libraries and Make ‘TechShops’?,” Make:, March 20, 2011, http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public- libraries-and-make-techshops/. 9 R. David Lankes, “Killing Librarianship,” (keynote speech, New England Library Association Annual Conference, October 3, 2011, Burlington, Vermont), https://davidlankes.org/killing- librarianship/. http://publiclibrariesonline.org/2012/10/a-fabulous-labaratory-the-makerspace-at-fayetteville-free-library/ http://publiclibrariesonline.org/2012/10/a-fabulous-labaratory-the-makerspace-at-fayetteville-free-library/ https://medium.com/@mmartiniere/hack-the-world-how-the-maker-movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz https://medium.com/@mmartiniere/hack-the-world-how-the-maker-movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech-hackerspaces http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech-hackerspaces https://doi.org/10.1080/00049670.2013.853335 https://doi.org/10.1080/00048623.2016.1228163 http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker-space/ http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker-space/ http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public-libraries-and-make-techshops/ http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public-libraries-and-make-techshops/ https://davidlankes.org/killing-librarianship/ https://davidlankes.org/killing-librarianship/ CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 117 https://doi.org/10.6017/ital.v37i2.9825 10 Janet L. Balas, “Do Makerspaces Add Value to Libraries?,” Computers in Libraries 32, no. 9 (November 2012): 33. 11 Balas, “Do Makerspaces Add Value to Libraries?,” 33; Adrian G Smith et al., “Grassroots Digital Fabrication and Makerspaces: Reconfiguring, Relocating and Recalibrating Innovation?” (working paper, University of Sussex, SPRU Working Paper SWPS, Falmer, Brighton, September 2013), https://doi.org/10.2139/ssrn.2731835. 12 The number of and interval between emails corresponded roughly with Dillman’s “five-contact framework” as outlined in Carolyn Hank, Mary Wilkins Jordan, and Barbara M. Wildemuth, “Survey Research,” in Applications of Social Research Methods to Questions in Information and Library Science, edited by Barbara Wildemuth, 256–69 (Westport, CT: Libraries Unlimited, 2009), 261. 13 In choosing these priorities, respondents were asked to select as many of the reasons that applied to their own CRL. https://doi.org/10.2139/ssrn.2731835 ABSTRACT Introduction and Overview Literature Review Research Design and Method Survey Population Survey Design Data Collection Results CRLs with No Makerspace (P1 = 29) CRLs with Plans for a Makerspace in the Near Future (P2 = 17) CRLs with Operating Makerspaces (P3 = 9) Priorities and Rationale Funding, Staffing, and Service Models Challenges and Philosophies of CRL Makerspaces Discussion Conclusion Appendix: Survey Questions Informed Consent Purpose Procedures Risks and Inconveniences Potential Benefits and Incentive Voluntary Participation Protection of Confidentiality Contact Information Consent ACRL Survey on Makerspaces and 3D Printers PATH ONE PATH TWO PATH THREE References and Notes
9878 ---- Digitization of Text documents Using PDF/A Yan Han and Xueheng Wan INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 52 Yan Han (yhan@email.arizona.edu) is Full Librarian, the University of Arizona Libraries, and Xueheng Wan (wanxueheng@email.arizona.edu) is a student, Department of Computer Science, University of Arizona. ABSTRACT The purpose of this article is to demonstrate a practical use case of PDF/A for digitization of text documents following FADGI’s recommendation of using PDF/A as a preferred digitization file format. The authors demonstrate how to convert and combine TIFFs with associated metadata into a single PDF/A-2b file for a document. Using real-life examples and open source software, the authors show readers how to convert TIFF images, extract associated metadata and International Color Consortium (ICC) profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and ICC profiles. Providing theoretical analysis and empirical examples, the authors show that PDF/A has many advantages over the traditionally preferred file format, TIFF/JPEG2000, for digitization of text documents. BACKGROUND PDF has been primarily used as a file delivery format across many platforms in almost every device since its initial release in 1993. PDF/A was designed to address concerns about long-term preservation of PDF files, but there has been little research and few implementations of this file format. Since the first standard (ISO 19005 PDF/A-1), published in 2005, some articles discuss the PDF/A family of standards, relevant information, and how to implement PDF/A for born-digital documents.1 There is growing interest in the PDF and PDF/A standards after both the US Library of Congress and the National Archives and Records Administration (NARA) joined the PDF Association in 2017. NARA joined the PDF Association because PDF files are used as electronic documents in every government and business agency. As explained in a blog post, the Library of Congress joined the PDF Association because of the benefits to libraries, including participating in developing PDF standards, promoting best-practice use of PDF, and access to the global expertise in PDF technology.2 Few articles, if any, have been published about using this file format for preservation of digitized content. Yan Han published a related article in 2015 about theoretical research on using PDF/A for text documents.3 In this article, Han discussed the shortcomings of the widely used TIFF and JPEG2000 as master preservation file formats and proposed using the then-emerging PDF/A as the preferred file format for digitization of text documents. Han further analyzed the requirements mailto:yhan@email.arizona.edu mailto:wanxueheng@email.arizona.edu DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 53 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 of digitization of text documents and discussed the advantages of PDF/A over TIFF and JPEG2000. These benefits include platform independence, smaller file size, better compression algorithms, and metadata encoding. In addition, the file format reduces workload and simplifies post- digitization processing such as quality control, adding and updating missing pages, and creating new metadata and OCR data for discovery and digital preservation. As a result, PDF/A can be used in every phase of a digital object in an Open Archival Information System (OAIS)—for example, a Submission Information Package (SIP), Archive Information Package (AIP), and Dissemination Information Package (DIP). In summary, a PDF/A file can be a structured, self-contained, and self- described container allowing a simpler one-to-one relationship between an original physical document and its digital surrogate. In September 2016, the Federal Agencies Digital Guidelines Initiative (FADGI) released its latest guidelines for digitization related to raster images: Technical Guidelines for Digitizing Heritage Materials.4 The de-facto best practices for digitization, these guidelines provide federal agencies guidance and have been used in many cultural heritage institutions. Both the PDF Association and the authors welcomed the recognition of PDF/A as the preferred master file format for digitization of text documents such as unbound documents, bound volumes, and newspapers.5 GOALS AND TASKS Since Han has previously provided theoretical methods of coding raster images, metadata, and related information in PDF/A, the goals of this article are threefold: 1. present real-life experience of converting TIFFs/JPEG2000s to PDF/A and back, along with image metadata 2. test open source libraries to create and manipulate images, image metadata, and PDF/A 3. validate generated PDF/As with the first legitimate validator for PDF/A validation The tasks included the following: ● Convert all the master files in TIFFs/JPEG2000 from digitization of text documents into single PDF/A files losslessly. One document, one PDF/A file. ● Evaluate and extract metadata from each TIFF/JPEG2000 image and encode it along with its image when creating the corresponding PDF/A file. ● Demonstrate the runtimes of the above tasks for feasibility evaluation. ● Validate the PDF/A files against the newly released open source PDF/A validator veraPDF. ● Extract each digital image from the PDF/A file back to its original master image files along with associated metadata. ● Verify the extracted image files in the back-and-forth conversion process against the original master image files Choices of PDF/A Standards and Conformance Level This article demonstrates using PDF/A-2b as a self-contained self-describing file format. Currently, there are three related PDF/A standards (PDF/A-1, PDF/A-2, and PDF/A-3), each with INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 54 three conformance levels (a, b, and u). The reasons for choosing PDF/A-2 (instead of PDF/A-1 or PDF/A-3) are the following: ● PDF/A-1 is based on PDF 1.4. In this standard, images coded in PDF/A-1 cannot use JPEG2000 compression (named in PDF/A as JPXDecode). One can still convert TIFFs to PDF/A-1 using other lossless compression methods such as LZW. However, the space- saving benefits of JPEG2000 compression over other methods would not be utilized. ● PDF/A-2 and PDF/A-3 are based on PDF 1.7. One significant feature of PDF 1.7 is that it supports JPEG2000 compression, which saves 40–60 percent of space for raster images compared to uncompressed TIFFs. ● PDF/A-3 has one major feature that PDF/A-2 does not have, which is to allow arbitrary files to be embedded within the PDF file. In this case, there is no file to be embedded. The authors chose conformance level b for simplicity. ● b is basic conformance, which requires only necessary components (e.g., all fonts embedded in the PDF) for reproduction of a document’s visual appearance. ● a is accessible conformance, which means b conformance level plus additional accessibility (structural and semantic features such as document structure). One can add tags to convert PDF/2b to PDF/2a. ● u represents a conformance level with the additional requirement that all text in the document have Unicode equivalents. This article does not cover any post-processing of additional manual or computational features such as adding OCR text to the generated PDF/A files. These features do not help faithfully capture the look and feel of original pages in digitization, and they can be added or updated later without any loss of information. In addition, OCR results rely on the availability of OCR engines for the document’s language, and results can vary between different OCR engines over time. OCR technology is getting better and will produce better results in the future. For example, current OCR technology for English gives very reliable (more than 90 percent) accuracy. In comparison, traditional Chinese manuscripts and Pashto/Persian give unacceptably low accuracy (less than 60 percent). The cutting edge on OCR engines has started to utilize artificial intelligence networks, and the authors believe that a breakthrough will happen soon. Data Source The University of Arizona Libraries (UAL) and Afghanistan Center at Kabul University (ACKU) have been partnering to digitize and preserve ACKU’s permanent collection held in Kabul. This collaborative project created the largest Afghan digital repository in the world. Currently the Afghan digital repository (http://www.afghandata.org) contains more than fifteen thousand titles and 1.6 million pages of documents. Digitization of these text documents follows the previous version of the FADGI guideline, which recommended scanning each page of a text document into a separate TIFF file as the master file. These TIFFs were organized by directories in a file system, where each directory represents a corresponding document containing all the scanned pages of this title. An example of the directory structure can be found in Han’s article. http://www.afghandata.org/ DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 55 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 PDF/A and Image Manipulation Tools There are a few open source and proprietary PDF software development kits (SDK). Adobe PDF Library and Foxit SDK are the most well-known commercial tools to manipulate PDFs. To show readers that they can manipulate and generate PDF/A documents themselves, open source software, rather than commercial tools, was used. Currently, only a very limited number of open source PDF SDKs are available, including iText and PDFBox. iText was chosen because it has g ood documentation and provides a well-built set of APIs to support almost all the PDF and PDF/A features. Initially written by Bruno Lowagie (who was in the ISO PDF standard working group) in 1998 as an in-house project, Lowagie later started up his own company, iText, and published iText in Action with many code examples.6 Moreover, iText has Java and C# coding options with good code documentation. It is worth mentioning that iText has different versions. The author used iText 5.5.10 and 5.4.4. Using an older version in our implementation generated a non-compatible PDF/A file because the it was not aligned with the PDF/A standard.7 For image processing, there were a few popular open source options, including ImageMagick and GIMP. ImageMagick was chosen because of its popularity, stability, and cross-platform implementation. Our implementation identified one issue with ImageMagick: the current version (7.0.4) could not retrieve all the metadata from TIFF files as it did not extract certain information such as the Image File Directory and color profile. These metadata are critical because they are part of the original data from digitization. Unfortunately, the author observed that some image editors were unable to preserve all the metadata from the image files during the conversion process. Hart and De Varies used case studies to show the vulnerability of metadata, demonstrating metadata elements in a digital object can be lost and corrupted by use or conversion of a file to another format. They suggested that action is needed to ensure proper metadata creation and preservation so that all types of metadata must be captured and preserved to achieve the most authentic, consistent, and complete digital preservation for future use.8 Metadata Extraction Tools and Color Profiles As we digitize physical documents and manipulate images, color management is important. The goal of color management is to obtain a controlled conversion between the color representations of various devices such as image scanners, digital cameras, and monitors. A color profile is a set of data that control input and output of a color space. The International Color Consortium (ICC) standards and profiles were created to bring various manufacturers together because embedding color profiles into images is one of the most important color management solutions. Image formats such as TIFF and JPEG2000 and document formats such as PDF may contain embedded color profiles. The authors identified a few open source tools to extract TIFF metadata, includin g ExifTool, Exiv2, and tiffInfo. ExifTool is an open source tool for reading, writing, and manipulating metadata of media files. Exiv2 is another free metadata tool supporting different image formats. The tiffInfo program is widely used in the Linux platform, but it has not been updated for at least ten years. Our implementations showed that ExifTool was the one that most easily extracted the full ICC profiles and other metadata from TIFF and JPEG2000 files. ImageMagick and other image processing software were examined in Van der Knijff’s article discussing JPEG2000 for long-term preservation.9 He found that ICC profiles were lost in ImageMagick. Our implementation has INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 56 showed the current version of ImageMagick has fixed this issue. A metadata sample can be found in appendix A. IMPLEMENTATION Converting and Ordering TIFFs into a Single PDF/A-2 File When ordering and combining all individual TIFFs of a document into a single PDF/A-2b file, the authors intended to preserve all information from the TIFFs, including raster image data streams and metadata stored in each TIFF’s header. The raster image data streams are the main images reflecting the original look and feel of these pages, while the metadata (including technical and administrative metadata such as BitsPerSample, DateTime, and Make/Model/Software) tells us important digitization and provenance information. Both are critical for delivery and digital preservation. The TIFF images were first converted to JPEG2000 with lossless compression using the open source ImageMagick software. Our tests of ImageMagick demonstrated that it can handle different color profiles and will convert images correctly if the original TIFF comes with a color profile. This gave us confidence that past concerns about JPEG2000 and ImageMagick had been resolved. These images were then properly sorted into their original order and combined into a single PDF/A-2 file. An alternative is to directly code TIFF’s image data stream into a PDF/A file, but this approach would miss one benefit of PDF/A-2: tremendous file size reduction with JPEG2000. The following is the pseudocode of ordering and combining all the TIFFs in a text document into a single PDF/A- 2 file. CreatePDFA2(queue TiffList) { Create an empty queue XMLQ; Create an empty queue JP2Q; /* TiffFileList is pre-sorted queue based on the original order */ /* Convert each TIFF to JPEG2000 losslessly, then add each JPEG2000 and its metadata into a queue */ while (TiffList is NOT empty) { String TiffFilePath = TiffList.dequeue(); string xmlFilePath = Tiff metadata extracted using exiftool; XMLQ.enqueue(xmlFilePath); String jp2FilePath = JPEG2000 file location from Tiff converted by ImageMagick; JP2Q.enqueue(jp2FilePath); } /* Convert each image’s metadata to XMP, add each JPEG2000 and its metadata into the PDF/A-2 file based on its original order */ Document pdf2b = new Document(); /* create PDF/A-2b conformance level */ PdfAWriter writer = PdfAWriter.getInstance(doc, new FileOutputStream(PdfAFilePath),PdfAConformaceLevel.PDF_A_2B); writer.createXmpMetadata(); //Create Root XMP DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 57 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 pdf2b.open(); while(JP2Q is NOT empty){ Image jp2 = Image.getInstance(JP2Q.dequeue()); Rectangle size = new Rectangle(jp2.getWidth(), jp2.getHeight()); //PDF page size setting pdf2b.setPageSize(size); pdf2b.newPage(); // create a new page for a new image byte[] bytearr = XmpManipulation(XMLQ.dequeue()); // convert original metadata based on the XMP standard writer .setPageXmpMetadata(bytearr); pdf2b.add(jp2); } pdf2b.close(); } Converting PDF/A-2 Files back to TIFFs and JPEG2000s To ensure that we can extract raster images from the newly created PDF/A-2 file, the authors also wrote code to convert a PDF/A-2 file back to the original TIFF or JPEG2000 format. This implementation was a reverse process of the above operation. Once the reverse conversion process was completed, the authors verified that the image files created from the PDF/A-2 file were the same as before the conversion to PDF/A-2. Note that we generated MD5 checksums to verify image data streams. Images data streams are the same, but metadata location can be varied because of inconsistent TIFF tags used over the years. When converting one TIFF to another TIFF, ImageMagick has its implementation of metadata tags. The code can be found in appendix B. PDF/A Validation PDF/A is one of the most recognized digital preservation formats, specially designed for long -term preservation and access. However, no commonly accepted PDF/A validator was available in the past, although several commercial and open source PDF preflight and validation engines (e.g., Acrobat) were available. Validating a PDF/A against the PDF/A standards is a challenging task for a few reasons, including the complexity of the PDF and PDF/A formats. The PDF Association and the Open Preservation Foundation recognized the need and started a project to develop an open source PDF/A validator and build a maintenance community. Their result, VeraPDF, is an open source validator designed for all PDF/A parts and conformance levels. Released in January 2017, the goal of veraPDF is to become the commonly accepted PDF/A validator. 10 Our generated PDF/As have been validated with veraPDF 1.4 and Adobe Acrobat Pro DC Preflight. Both products validated the PDF/A-2b files as fully compatible. Our implementations showed that veraPDF 1.4 verified more cases than Acrobat DC Preflight. Figure 1 shows a PDF file structure and its metadata. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 58 Figure 1. A PDF object tree with root-level metadata. RUNTIME AND CONCLUSION The time complexity of our code is O(log n) because of the sorting algorithms used. TIFFs were first converted to JPEG2000. When JPEG2000 images are added to a PDF/A-2 file, no further image manipulation is required because the generated PDF/A-2 uses JPEG2000 directly (in other words, it uses the JPXDecode filter). Tables 1 and 2 show the performance comparison running in our computer hardware and software environment (Intel Core i7-2600 CPU@3.4GHz, 8GB DDR3 RAM, 3TB 7200-RPM 64MB-cache hard disk running Ubuntu 16.10). DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 59 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 Table 1. Runtimes of converting grayscale TIFFs to JPEG2000s and to PDF/A-2b No. of Files Total File Size (MB) Image Conversion Runtime (TIFFs to JP2s in seconds) Total Runtime (TIFFs to JP2s to a single PDF/A-2b in seconds) 1 9.1 3.61 3.98 10 91.1 35.63 36.71 20 182.2 71.83 73.98 50 455.5 179.06 184.63 100 910.9 358.3 370.91 Table 2. Runtimes of converting color TIFFs to JPEG2000s and to PDF/A-2b No. of Files Total File Size (MB) Image Conversion Runtime (TIFFs to JP2s in seconds) Total Runtime (TIFFs to JP2s to a single PDF/A-2b in seconds) 1 27.3 14.80 14.94 10 273 150.51 151.55 20 546 289.95 293.21 50 1,415 741.89 749.75 100 2,730 1490.49 1509.23 The results show that (a) the majority of the runtime (more than 95 percent) is spent in converting a TIFF to a JPEG2000 using ImageMagick (see figure 2); (b) the average runtime of converting a TIFF has a constant positive relationship with the file’s size (see figure 2); (c) in INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 60 comparison, the runtime of converting a color TIFF is significantly higher than that of converting a greyscale TIFF (see figure 2); and (d) it is feasible in terms of time and resources to convert existing master images of digital document collections to PDF/A-2b. For example, the runtime of 1 TB of conversion of color TIFFs will be 552,831 seconds (153.5 hours; 6.398 days) using the above hardware. The authors have already processed more than 600,000 TIFFs using this method. The authors conclude that using PDF/A gives institutions advantages of the newly preferred master file format for digitization of text documents over TIFF/JPEG2000. The above implementation demonstrates the ease, the reasonable runtime, and the availability of open source software to perform such conversions. From both the theoretical analysis and empirical evidences, the authors show that PDF/A has advantages over the traditional preferred file format TIFF for digitization of text documents. Following best practice, a PDF/A file can be a self- contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and ICC profiles. SUMMARY The goal of this article is to demonstrate empirical evidences of using PDF/A for digitization of text document. The authors evaluated and used multiple open source software programs for processing raster images, extracting image metadata, and generating PDF/A files. These PDF/A files were validated using the up-to-date PDF/A validators veraPDF and Acrobat Preflight. The authors also calculated the time complexity of the program and measured the total runtime in multiple testing cases. Most of the runtime was spent on image conversions from TIFF to JPEG2000. The creation of the PDF/A-2b file with associated page-level metadata accounted for less than 5 percent of the total runtime. Runtime of conversion of a color TIFF was much higher than that of a greyscale one. Our theoretical analysis and empirical examples show that using PDF/A-2 presents many advantages over the traditional preferred file format (TIFF/JPEG2000) for digitization of text documents. DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 61 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 Figure 2. File size, greyscale and color TIFFs and runtime ratio. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 62 APPENDIX A: SAMPLE TIFF METADATA WITH ICC HEADER 8 3400 4680 8 8 8 Uncompressed RGB (Binary data 41025 bytes, use -b option to extract) 3 1 (Binary data 28079 bytes, use -b option to extract) 400 400 Chunky APPL 2.2.0 Display Device Profile RGB XYZ 2006:02:02 02:20:00 acsp Apple Computer Inc. Not Embedded, Independent none Reflective, Glossy, Positive, Color Perceptual 0.9642 1 0.82491 EPSO 0 EPSON sRGB 0.43607 0.22249 0.01392 0.38515 0.71687 0.09708 0.14307 0.06061 0.7141 0.95045 1 1.08905 Copyright (c) SEIKO EPSON CORPORATION 2000 - 2006. All rights reserved. (Binary data 8204 bytes, use -b option to extract) (Binary data 8204 bytes, use -b option to extract) (Binary data 8204 bytes, use -b option to extract) 0 0 0 DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 63 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 APPENDIX B: SAMPLE CODE TO CONVERT PDF/A-2 BACK TO JPEG2000S /* Assumption: The PDF/A-2b file was specifically generated from image objects converted from TIFF images with JPXDecode along with page-level metadata */ public static void parse(String src, String dest) throws IOException{ PdfReader reader = new PdfReader(src); PdfObject obj; int counter = 0; for(int i = 1; i <= reader.getXrefSize(); i ++){ obj = reader.getPdfObject(i); if(obj != null && obj.isStream()){ PRStream stream = (PRStream) obj; byte[] b; try{ b = PdfReader.getStreamBytes(stream); }catch(UnsupportedPdfException e){ b = PdfReader.getStreamBytesRaw(stream); } PdfObject pdfsubtype = stream.get(PdfName.SUBTYPE); FileOutputStream fos = null; if (pdfsubtype != null && pdfsubtype.toString().equals(PdfName.XML.toString())) { fos = new FileOutputStream(String.format(dest + "_xml/" + counter+".xml", i)); System.out.println("Page Metadata Extracted!"); } if (pdfsubtype != null && pdfsubtype.toString().equals(PdfName.IMAGE.toString())) { counter ++; fos = new FileOutputStream(String.format(dest + "_jp2/" + counter+".jp2", i)); } if (fos != null) { fos.write(b); fos.flush(); fos.close(); System.out.println("JPEG2000s Conversion from PDF completed !"); } } } /* Then Use ImageMagick library to convert JPEG2000s to TIFFs */ INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 64 REFERENCES 1 PDF-Tools.com and PDF Association, “PDF/A—The Standard for Long-Term Archiving,” version 2.4, white paper, May 20, 2009, http://www.pdf- tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf; Duff Johnson, “White Paper: How to Implement PDF/A,” Talking PDF, August 24, 2010, https://talkingpdf.org/white-paper- how-to-implement-pdfa/; Alexandra Oettler, “PDF/A in a Nutshell 2.0: PDF for Long-Term Archiving,” Association for Digital Standards, 2013, https://www.pdfa.org/wp- content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf; Library of Congress, “PDF/A, PDF for Long-Term Preservation,” last modified July 27, 2017, https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml. 2 Library of Congress, “The Time and Place for PDF: An Interview with Duff Johnson of the PDF Association,” The Signal (blog), December 12, 2017, https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff- johnson-of-the-pdf-association/. 3 Yan Han, “Beyond TIFF and JPEG2000: PDF/A as an OAIS Submission Information Package Container,” Library Hi Tech 33, no. 3 (2015): 409–23, https://doi.org/10.1108/LHT-06-2015- 0068. 4 Federal Agencies Digital Guidelines Initiative, Technical Guidelines for Digitizing Cultural Heritage Materials. (Washington, DC: Federal Agencies Digital Guidelines Initiative, 2016), http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20D igital%20Guidelines%20Initiative-2016%20Final_rev1.pdf. 5 Duff Johnson, “US Federal Agencies Approve PDF/A,” PDF Association, September 2, 2016, http://www.pdfa.org/new/us-federal-agencies-approve-pdfa/. 6 Bruno Lowagie, iText in Action, 2nd ed. (Stamford, CT: Manning, 2010). 7 “iText 5.4.4,” iText, last modified September 16, 2013, http://itextpdf.com/changelog/544. 8 Timothy Robert Hart and Denise de Vries, “Metadata Provenance and Vulnerability,” Information Technology and Libraries 36, no. 4 (2017), https://doi.org/10.6017/ital.v36i4.10146. 9 Johan Van der Knijff, “JPEG 2000 for Long-Term Preservation: JP2 as a Preservation Format,” D- Lib 17, no. 5/6 (2011), https://doi.org/10.1045/may2011-vanderknijff. 10 PDF Association, “How veraPDF does PDF/A Validation,” 2016, http://www.pdfa.org/how- verapdf-does-pdfa-validation/. http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://www.pdfa.org/wp-content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf https://www.pdfa.org/wp-content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://doi.org/10.1108/LHT-06-2015-0068 https://doi.org/10.1108/LHT-06-2015-0068 http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ http://itextpdf.com/changelog/544 http://itextpdf.com/changelog/544 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.1045/may2011-vanderknijff https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ Abstract Background Goals and Tasks Choices of PDF/A Standards and Conformance Level Data Source PDF/A and Image Manipulation Tools Metadata Extraction Tools and Color Profiles Implementation Converting and Ordering TIFFs into a Single PDF/A-2 File Converting PDF/A-2 Files back to TIFFs and JPEG2000s PDF/A Validation Runtime and Conclusion Summary Appendix A: Sample TIFF Metadata with ICC header Appendix B: Sample Code to convert PDF/A-2 back to JPEG2000s References
9953 ---- Mobile Website Use and Advanced Researchers: Understanding Library Users at a University Marine Sciences Branch Campus Mary J. Markland, Hannah Gascho Rempel, and Laurie Bridges INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 7 ABSTRACT This exploratory study examined the use of the Oregon State University Libraries website via mobile devices by advanced researchers at an off-campus branch location. Branch campus–affiliated faculty, staff, and graduate students were invited to participate in a survey to determine what their research behaviors are via mobile devices, including frequency of their mobile library website use and the tasks they were attempting to complete. Findings showed that while these advanced researchers do periodically use the library website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. Mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. Results of this survey will be used to address knowledge gaps around library resources and research tools and to generate more ways to study advanced researchers’ use of library services via mobile devices. INTRODUCTION As use of mobile devices has expanded in the academic environment, so has the practice of gathering data from multiple sources about what mobile resources are and are not being used. This data informs the design decisions and resource investments libraries make in mobile tools. Web analytics is one tool that allows researchers to discover which devices patrons use to access library webpages. But web analytics data do not show what patrons want to do and what hurdles they face when using the library website via a mobile device. Web analytics also lacks nuance in that it cannot distinguish user characteristics, such as whether users are novice or advanced researchers, which may affect how these users interact with a mobile device. User surveys are another tool for gathering data on mobile behaviors. User surveys help overcome some of the limitations of web analytics data by directly asking users about their perceived research skills and the resources they use on a mobile device. As is the case at most libraries, Oregon State University Libraries serves a diverse range of users. We were interested in learning whether advanced researchers—particularly advanced researchers who work at a branch campus—use the library’s resources differently than main Mary J. Markland (mary.markland@oregonstate.edu), is Head, Guin Library; Hannah Gascho Rempel (hannah.rempel@oregonstate.edu) is Science Librarian and Coordinator of Graduate Student Success Services; and Laurie Bridges (laurie.bridges@oregonstate.edu) is Instruction and Outreach Librarian, Oregon State University Libraries and Press. mailto:mary.markland@oregonstate.edu mailto:hannah.rempel@oregonstate.edu mailto:laurie.bridges@oregonstate.edu MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 8 campus users. We were chiefly interested in these advanced researchers because of the mobile nature of their work. They are graduate students and faculty in the field of marine science who work in a variety of locations, including their offices, labs, and in the field (which can include rivers, lakes, and the ocean). We focused on the use of the library website via mobile devices as one way to determine whether specific library services should be adapted to best meet the needs of this targeted user community. Oregon State University (OSU) is Oregon’s land-grant university; its home campus is in Corvallis, Oregon. Hatfield Marine Science Center (HMSC) in Newport is a branch campus that includes a branch library. Guin Library at HMSC serves OSU students and faculty from across the OSU colleges along with the co-located federal and state agencies of the National Oceanic and Atmospheric Administration (NOAA), US Fish and Wildlife Service, Environmental Protection Agency (EPA), United States Geological Survey (USGS), United States Department of Agriculture (USDA), and the Oregon Department of Fish and Wildlife. The Guin Library is in Newport, which is forty-five miles from the main campus. Like many other branch libraries, Guin Library was established at a time when providing a print collection close to where researchers and students work was paramount, but today it must adapt its services to meet the changing information needs of its user base. Branch libraries are typically designed to serve a clientele or subject area, which can create a different institutional culture from the main library. Guin Library serves advanced undergraduates, graduate students, and scientific researchers. HMSC’s distance from Corvallis, the small size of the researcher community, and the shared focus on a research area—marine sciences—create a distinct culture. While Guin Library is often referred to as the “heart of HMSC,” the number of in-person library users is decreasing. This decline is not unexpected as numerous studies have shown that faculty and graduate students have fewer needs that require an in-person trip to the library.1 Studies have also shown that faculty and graduate students can be unaware of the services and resources that libraries provide, thereby continuing the cycle of underuse. 2 To learn more about the needs of HMSC’s advanced researchers, this exploratory study examined their research behaviors via mobile devices. The goals of this study were to • determine if and with what frequency advanced researchers at HMSC use the OSU Libraries website via mobile devices; • gather a list of tasks advanced users attempt to accomplish when they visit the OSU Libraries website on a mobile device; and • determine whether the mobile behaviors of these advanced researchers are different from those of researchers from the main OSU campus (including undergraduate students), and if so, whether these differences warrant alternative modes of design or service delivery. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 9 LITERATURE REVIEW The conversation about how best to design mobile library websites has shifted over the past decade. Early in the mobile-adoption process some libraries focused on creating special websites or apps that worked with mobile devices.3 While libraries globally might still be creating mobile- specific websites and apps,4 US libraries are trending toward responsively designed websites as a more user-friendly option and a simpler solution for most libraries with limited staff and budgets.5 Most of the literature on mobile-device use in higher education is focused on undergraduates across a wide range of majors who are using a standard academic library. 6 To help provide context for how libraries have designed their websites for mobile users, some of those specific findings will be shared later. But because our study focused on graduate students and faculty in a science- focused branch library, we will begin with a discussion of what is known about more advanced researchers’ use of library services and their mobile-device habits. Several themes emerged from the literature on graduate students’ relationships with libraries. In an ironic twist, faculty think graduate students are being assisted by the library while librarians think faculty are providing graduate students with the help they need to be successful.7 This results in many graduate students end up using their library’s resources in an entirely disintermediated way. Graduate students, especially those in the sciences, visit the physical library less often and use online resources more than undergraduate students.8 Most graduate students start their research process with assistance from academic staff, such as advisors and committee members,9 and are unaware of many library services and resources.10 As frequent virtual-library users who receive little guidance on how to use the library’s tools, graduate students need a library website that is clear in scope and purpose, offers help, and has targeted services. 11 Compared to reports on undergraduate use of mobile devices to access their library’s website, relatively few studies have focused on graduate-student or faculty mobile behaviors. A recent survey of Japanese Library and Information Science (LIS) students compared and undergraduate graduate students’ usage of mobile devices to access library services and found slight differences. However, both groups reported accessing libraries as last on their list of preferred smartphone uses.12 Aharony examined the mobile use behaviors of Israeli LIS graduate students and found approximately half of these graduate students used smartphones and perceived them to be useful and easy tools for use in their everyday life, and could transfer those habits to library searching behaviors.13 When looking specifically at how patrons use library services via a mobile device, Rempel and Bridges found the top reason graduate students at their main campus used the OSU Libraries website via mobile devices was to find information on library hours, followed by finding a book and researching a topic.14 Barnett-Ellis and Vann surveyed their small university and found that both undergraduate and graduate students were more than twice as likely to use mobile devices as are their faculty and staff; a majority of students also indicated they were likely to use mobile devices to conduct research.15 Finally, survey results showed graduate students in Hofstra University’s College of Education reported accessing library materials via a mobile device twice as often as other student groups. In addition, these graduate students reported being comfortabl e MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 10 reading articles up to five pages long on their mobile devices. Graduate students were also more likely to be at home when using their mobile device to access the library, a finding the authors attributed to education graduate students frequently being employed as full-time teachers.16 Research on how faculty members use library resources characterizes a population that is confident in their literature-searching skills, prefers to search on their own, and has little direct contact with the library.17 Faculty researchers highly value convenience;18 they rely primarily on electronic access to journal articles but prefer print access to monographs.19 Faculty tend to be self-trained at using search tools, such as PubMed or other online databases, and therefore are not always aware of the more in-depth functionality of these tools.20 In contrast to graduate students, Rempel and Bridges found that faculty using the library website via mobile devices were less interested in information about the physical library, such as library hours, and were more likely to be researching a topic.21 Medical faculty are one of the few faculty groups whose mobile-research behaviors have been specifically examined. A survey administered by Bushhousen et al. at a medical university revealed that a third of respondents used mobile apps for research-related activities.22 Findings by Boruff and Storie indicate that one of the biggest barriers to mobile use in health-related academic settings was wireless access.23 Thus apps that did not require the user to be connected to the internet were highly desired. Faculty and graduate students in health-related academic settings saw a role for the library in advocating for better wireless infrastructure, providing access to a targeted set of heavily used resources, and providing online guides or in-person tutorials on mobile apps or procedures specific to their institution. 24 According to the literature, most design decisions for library mobile sites have been made on the basis of information collected about undergraduate students’ behavior at main-branch campuses. To help inform our understanding of how recent decisions have been made, the remainder of the literature review focuses on what is known about undergraduate students’ mobile behavior. Undergraduate students are very comfortable using mobile technologies and perceive themselves to be skilled with these devices. According to the 2015 EDUCAUSE Center for Research and Analysis’ (ECAR) study of undergraduate students and information technology, most undergraduate students consider themselves sophisticated technology users who are engaged with information technologies.25 Undergraduate students mainly use their smartphones for non- class activities. But students indicate they could be more effective technology users if they were more skilled at tools such as the learning management system, online collaboration tools, e-books, or laptops and smartphones in class. Of interest to libraries is the ECAR participants’ top area of reported interest, “search tools to find reference or other information online for class work.”26 However, when a mobile library site is in place, usage rates have been found to be lower than anticipated. In a study of undergraduate science students, Salisbury et al. found only 2 percent of respondents reported using their cell phones to access library databases or the library’s catalog every hour or daily, despite 66 percent of the students browsing the internet using their mobile INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 11 phone hourly or daily. Salisbury et al. speculated that users need to be told about mobile- optimized library resources if libraries want to increase usage. 27 Rempel and Bridges used a pop-up interrupt survey while users were accessing the OSU Libraries mobile site.28 This approach allowed a larger cross-section of library users to be surveyed. It also reduced memory errors by capturing their activities in real time. Activities that had been included in the mobile site because of their perceived usefulness in a mobile environment, such as directions, asking a librarian a question, and the coffee shop webcam, were rarely cited as a reason for visiting the mobile site. The OSU Libraries branch at HMSC is entering a new era. A Marine Studies Initiative will result in the building of a new multidisciplinary research campus at HMSC that aims to serve five hundred undergraduate students. The change in demographics and the increase in students who will need to be served has prompted Guin Library staff to explore how the current population of advanced researchers interact with library resources. In addition, examining the ways undergraduate students at the main campus use these tools will help with planning for the upcoming changes in the user community. METHODS This study used an online Qualtrics survey to gather information about how frequently advanced researchers (graduate students, faculty, and affiliated scientists at a branch library for marine science) use the OSU Libraries website via mobile devices, what they search for, and other ways they use mobile devices to support their research behaviors. A recruitment email with a link to the survey was sent to three discussion lists used by HMSC community in Spring 2016. The survey was available for four weeks, and a reminder email was sent one week before the survey closed. The invitation email included a link to an informed- consent document. Once the consent document had been reviewed, users were taken to the survey via a second link. Respondents could provide an email address to receive a three-dollar coffee card for participating in the study, but their email address was recorded in a separate survey location to preserve their anonymity. The invitation email indicated that this survey was about using the website via a mobile device, and the first survey question asked users if they had ever accessed the library website on a mobile device. If they answered “no,” they were immediately taken to the end of the survey and were not recorded as a participant in the study. A similar survey was conducted with users from OSU’s main campus in 2012–13 and again in 2015. The results from 2012–13 have been published previously,29 but the results from 2015 have not. While the focus of the present study is on the mobile behaviors of advanced researchers in the HMSC community, data from the 2015 main-campus study is used to provide a comparison to the broader OSU community. OSU main-campus respondents in 2015 and HMSC participants in 2016 both answered closed- and open-ended questions that explored participants’ general mobile- device behaviors and behaviors specific to using the OSU Libraries website via mobile devices. MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 12 However, the HMSC survey also asked questions about behaviors related to using the OSU (nonlibrary) website via a mobile device and participants’ mobile scholarly reading and writing behaviors. The survey concluded with several demographic questions. The survey data was analyzed using Qualtrics’ cross-tab functionality and Microsoft Excel to observe trends and potential differences between user groups. Open-ended responses were examined for common themes. Twenty-three members of the HMSC community completed the survey, whereas one hundred participants responded to the 2015 main campus survey. Participation in the 2015 survey was capped at one hundred respondents because limited incentives were available. The participation difference between the two surveys reflects several differences between the two sampled communities. The most obvious difference is size. The OSU community comprises more than thirty-six thousand students, faculty, and staff; the HMSC community is approximately five hundred students, researchers, and faculty—some of whom are also included as part of the larger OSU community. The second factor influencing response rates relates to the difference in size between the two communities, but is more striking in the HMSC community: the survey relied on a self-selected group of users who indicated they had a history using the library website via a mobile device. Therefore, it is not possible to estimate the population size of mobile-device library-website users specific to the branch library or the main campus library. This limitation means that the results from this study cannot be used to generalize findings to all users who visit a library website via mobile devices; instead the results are intended to present a case that other libraries may compare with behaviors observed on their own campuses. Sharing the behaviors of advanced researchers at a branch campus is particularly valuable as this population has historically been understudied. RESULTS AND DISCUSSION Participant Demographics and Devices Used Of the twenty-three respondents to the HMSC mobile behaviors survey, 13 (62 percent) were graduate students, 7 (34 percent) were faculty (this category includes faculty researchers and courtesy faculty), and one respondent was an NOAA employee. Two participants declined to declare their affiliation. Of the 97 respondents to the 2015 OSU main-campus survey who shared their affiliation, 16 (16 percent) were graduate students, 5 (5 percent) were faculty members, and 69 (71 percent) were undergraduates. Respondents varied in the types of mobile devices they used when doing library research. Smartphones were used by 78 percent (18 respondents) and 22 percent (5 respondents) used a tablet. Apple (15 respondents) was the most common device brand used, although six of the respondents used an Android phone or tablet. Compared to the general population’s device ownership, these respondents are more likely to own Apple devices, but the two major device types owned (Apple and Android) match market trends.30 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 13 Frequency of Library Site Use on Mobile Devices Most of the HMSC respondents are infrequent users of the library website via mobile devices: 50 percent (11 respondents) did so less than once a month; 41 percent (9 respondents) did so at least once a month; and 9 percent (2 respondents) did so at least once a week. The low level of library website usage via mobile devices was especially notable as this population reports being heavy users of the library website via laptops or desktop computers, with 82 percent (18 respondents) visiting the library website via those tools at least once a week. Researchers at HMSC used the library website via mobile devices much less often than the 2015 main-campus respondents (undergraduates, graduate students, and faculty). No HMSC respondents visited the mobile site daily compared to 10 percent of main-campus users, and only 9 percent of HMSC respondents visited weekly compared to 28 percent of main-campus users (see Figure 1). Figure 1. 2016 HMSC participants vs. 2015 OSU main-campus participants reported frequency of library website visits via a mobile device by percent of responses. While HMSC advanced researchers share some mobile behaviors with main-campus students, this exploratory study demonstrates they do not use the library website via mobile devices as frequently. Some possible reasons for this are researchers rarely spend time coming and going to and from classes and therefore do not have small gaps of time to fill throughout their day. Instead, their daily schedule involves being in the field or in the lab collecting and analyzing data. 0% 10% 20% 30% 40% 50% 60% This is my first time Less often than once a month At least once a month At least once a week Every day or almost every day Branch 2016 Main 2015 MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 14 Alternatively, they are frequently involved in writing-intensive projects such as drafting journal articles or grant proposals. They carve out specific periods to do research and do not appear to be filling time with short bursts of literature searching. They can work on laptops and do not need to multitask on a phone or tablet between classes or in other situations. Mobile-device ownership among HMSC graduate students might also be limited because of personal budgets that do not allow for owning multiple mobile devices or for having the most recent model. In addition, this group of scientists may not be on the front edge of personal technologies, especially compared to medical researchers, because few mobile apps are designed specifically for the research needs of marine scientists. Where Researchers Are When Using Mobile Devices for Library Tasks Because mobile devices facilitate connecting to resources from many locations, and because advanced researchers conduct research in a range of settings—including the field, the office, and home—we asked respondents where they were most likely to use the library website via a mobile device. Thirty-two percent were most likely to be at home, 27 percent in transit; 18 percent at work; and 9 percent in the field. The popularity of using the library website via mobile devices while in transit was somewhat unexpected, but perhaps should not have been because many people try to maximize their travel time by multitasking on mobile devices. The distance from the main campus might explain this finding because a local bus service provides an easy way to travel to and from the main campus, and the hour-long trip would provide opportunities for multitasking via a mobile device. Relatively few respondents used mobile devices to access the library website while at work. Previous studies show that a lack of reliable campus wireless internet access can affect students’ ability to use mobile technology.31 HMSC also struggles to provide consistent wireless access, and signals are spotty in many areas of our campus. Despite signal boosters in Guin Library, wireless access is still limited at times. In addition, cell phone service is equally spotty both at HMSC and up and down the coast of Oregon. It is much less frustrating to work on a device that has a wired connection to the internet while at HMSC. These respondents did use mobile devices while at home, which might indicate they had a better wireless signal there. Alternatively, working from home on a mobile device might indicate that they compartmentalize their library-research time as an activity to do at home instead of in the office. Researchers used their mobile devices to access the library while in the field less than originally expected, but upon further reflection, it made sense that researchers would be less likely to use library resources during periods of data collection for oceanic or other water-based research projects because of their focused involvement during that stage. The water-based research also increases the risk of losing mobile devices. Library Resources Accessed via Mobile Devices INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 15 To learn more about how these respondents used the library website, we asked them to choose what they were searching for from a list of options. Respondents could choose as many options as applied to their searching behaviors. HMSC respondents’ primary reason for visiting the library’s site via a mobile device was to find a specific source: 68 percent looked for an article, 45 percent for a journal, 36 percent for a book, and 14 percent for a thesis. Many of the HMSC respondents also looked for procedural or library-specific information: 36 percent looked for hours, 32 percent for My Account information, 18 percent for interlibrary loan, 14 percent for contact information, 9 percent for how to borrow and request books, 9 percent for workshop information, and 9 percent for Oregon estuaries bibliographies—a unique resource provided by the HMSC library. Fifty-five percent of searches were for a specific source and 43 percent were for procedural or library- specific information. Notably missing from this list were respondents who reported searching via their mobile device for directions to the library. Compared to the 2015 OSU Libraries main-campus survey respondents, HMSC respondents were much more likely to visit the library website via a mobile device to look for an article (68 percent vs. 37 percent), find a journal (45 percent vs. 23 percent), access My Account information (32 percent vs. 7 percent), use interlibrary loan (18 percent vs. 5 percent), or find contact information (14 percent vs. 1 percent). However, unlike HMSC participants, who do not have access to course reserves at the branch library, 7 percent of OSU main-campus respondents used their mobile devices to find course reserves on the library website. See Figure 2. 0% 10% 20% 30% 40% 50% 60% 70% Directions Contact information Interlibrary loan Course reserves My account A journal A book Library hours An article Branch 2016 Main 2015 MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 16 Figure 2. 2016 HMSC vs. 2015 OSU main-campus participants reported searches while visiting the library website via a mobile device by percent of responses. It is possible that HMSC users with different affiliations might use the library site via a mobile device differently. These exploratory findings show that graduate students used the greatest variety of content via mobile devices. Graduate students as a group reported using 11 of the 14 provided content choices via a mobile device while faculty reported using 8 of the 14. Graduate students were the largest group (62 percent of respondents), which might explain why as a group they searched for more types of content via mobile devices. Interestingly, faculty members and faculty researchers reported looking for a thesis via a mobile device, but no graduate students did. Perhaps these graduate students had not yet learned about the usefulness of referencing past theses as a starting point for their own thesis writing. Or perhaps they were only familiar with searching for journal articles on a topic. In contrast, faculty members might have been searching for specific theses for which they had provided advising or mentoring support. To help us make decisions about how to best direct users to library content via mobile devices, we asked respondents to indicate their searching behaviors and preferences. Of the 16 HMSC respondents who answered this question, 12 (75 percent) used our web-scale discovery search box via mobile devices; 4 (25 percent) reported that they did. Presumably these latter searchers were navigating to another database to find their sources. Of 16 respondents, only 6 (38 percent) indicated that they looked for a specific library database (as opposed to the discovery tool) when using a mobile device. Those respondents who were looking for a database tended to be looking for the Web of Science database, which makes sense for their field of study. When conducting searches for sources on their mobile devices, HMSC respondents employed a variety of search strategies: the 12 respondents who replied used a combination of author (75 percent), journal title (67 percent, keyword (67 percent), and book title (50 percent) searches when starting at the mobile version of the discovery tool. When asked about their preferred way to find sources, a majority of HMSC respondents reported that they tended to prefer a combination of searching and menu navigation while using the library website from mobile devices, while the remainder were evenly divided between preferring menu - driven and search-driven discovery. While OSU Libraries does not currently provide links to any specific apps for source discovery, such as PubMed Mobile or JSTOR Browser, 13 (62 percent) of the HMSC respondents indicated they would be somewhat or very likely to use an app to access and use library services. This finding connects to the issue of reliable wireless access. Medical graduate students had a wider array of apps available to them, but the primary reason they wanted to use these apps was because they provided a better searching experience in hospitals that had intermittent wireless access—an experience to which researchers at HMSC could relate.32 University Website Use Behaviors on Mobile Devices To help situate respondents’ library use behaviors on mobile devices in comparison to the way they use other academic resources on mobile devices, we asked HMSC respondents to describe INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 17 their visits to resources on the OSU (nonlibrary) website via mobile devices. Compared to their use of the library site on a mobile device, respondents’ use of university services was higher: 43 percent (9 respondents) visited the university’s website via a mobile device at least once a week compared to only 9 percent (2 respondents) who visited the library site with that frequency. This makes sense because of the integral function many of these university services play in most university employees’ regular workflow. Respondents indicated visiting key university sites including MyOSU (a portal webpage, visited by 60 percent of respondents), the HMSC webpage (55 percent), Canvas (the university’s learning management system, visited by 50 percent of respondents), and webmail (45 percent). See Figure 3. Figure 3. University webpages HMSC respondents access on a mobile device by percent of responses. University resources such as campus maps, parking locations, and the graduate school website were frequently used by this population. The use of the first two makes sense as HMSC users are located off-site and need to use maps and parking guidance when they visit the main campus. The use of the graduate school website makes sense because the respondents were primarily graduate students and graduate school guidelines are a necessary source of information. Interestingly, our advanced users are similar to undergraduates in that they primarily read email, information from social networking sites, and news on their mobile devices. 33 Other Research Behaviors on Mobile Devices MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 18 We wanted to know what other research-related behaviors the HMSC respondents are engaged in via mobile devices to determine if there might be additional ways to support researchers’ workflows. We specifically asked about respondents’ reading, writing, and note-taking behaviors to learn how well these respondents have integrated them with their mobile usage behaviors. All respondents reported reading on their mobile device (see Figure 4). Email represented the most common reading activity (95 percent), followed by “quick reading” activities, such as reading social networking posts (81 percent), current news (81 percent), and blog posts (62 percent). Smaller numbers used their mobile devices for academic or long-form reading, such as reading scholarly articles (33 percent) or books (19 percent). Of those respondents who read articles and books on their mobile devices, only respondents highlighted or took notes using their mobile device. Seven respondents used a citation manager on their mobile device: three used EndNote, one used Mendeley, one used Pages, and one used Zotero. One respondent used Evernote on their mobile device, and one advanced user reported using specific data and database management software, websites, and apps related to their projects. More advanced and interactive mobile- reading features, such as online spatial landmarks, might be needed before reading scholarly articles on mobile devices becomes more common.34 Figure 4. What HMSC respondents reported reading on a mobile device by percent of responses. LIMITATIONS This exploratory study had several limitations, most of which reflect the nature of doing research with a small population at a branch campus. This study had a small sample size, which limited observations of this population; however, future studies could use research techniques such as interviews or ethnographic studies to gather deep qualitative information about mobile-use 19% 33% 62% 81% 81% 95% 0% 20% 40% 60% 80% 100% 120% Books Academic or scholarly articles Blog posts Current news Social networking posts (Facebook, Twitter, etc.) Email Percent of Responses INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 19 behaviors in this population. A second limitation was that previous studies of the OSU Libraries mobile website used Google Analytics to compare survey results with what users were actually doing on the library website. Unfortunately, this was not possible for this study. Because of how HMSC’s network was set up, anyone at HMSC using the OSU internet connections is assigned an IP address that shows a Corvallis, Oregon, location rather than a Newport, Oregon, location, which rendered parsing HMSC-specific users in Google Analytics impossible. The research behaviors of advanced researchers at a branch campus has not been well-examined; despite its limitations, this study provides beneficial insights into the behaviors of this user population. CONCLUSION Focusing on how advanced researchers at a branch campus use mobile devices while accessing library and other campus information provides a snapshot of key trends among this user group. These exploratory findings show that these advanced researchers are infrequent users of library resources via mobile devices and, contrary to our initial expectations, are not using mobile devices as a research resource while conducting field-based research. Findings showed that while these advanced researchers do periodically use the library website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. Mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. The results of this survey will be used to address the HMSC knowledge gaps around use of library resources and research tools via mobile devices. Both graduate students and faculty lack awareness of library resources and services and have unsophisticated library research skills. 35 While the OSU main campus has library workshops for graduate students and faculty, these workshops have been inconsistently duplicated at the Guin Library. Because the people working at HMSC come from such a wide variety of departments across OSU that focus on marine sciences, HMSC has never had a library orientation. The results indicate possible value in devising ways to promote Guin Library’s resources and services locally, which could include highlighting the availability of mobile library access. While several participants mentioned using research tools like Evernote, Pages, or Zotero on their mobile devices, most participants did not report enhancing their mobile research experience with these mobile-friendly tools. Workshops specifically modeling how to use mobile-friendly tools and apps such as Dropbox, Evernote, GoodReader, or Browzine could help introduce the benefits of these tools to these advanced researchers. Because wireless access is even more of a concern for researchers at this branch location than for researchers at the main campus, database-specific apps will be explored to determine if the use of searching apps could help alleviate inconsistent wireless access. If database apps that are appropriate for marine science researchers are available, these will be promoted to this user population. Future research might involve follow-up interviews or focus groups, ethnographic studies, or interviews, which could expand the knowledge of these researchers’ mobile-device behaviors and MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 20 their perceptions of mobile devices. Exploring the technology usage by these advanced researchers in their labs, including electronic lab notebooks or other tools, might be an interesting contrast to their use of mobile devices. In addition, as the HMSC campus grows with the expansion of the Marine Studies Initiative, increasing numbers of undergraduates will use Guin Library. The ECAR 2015 statistics show that current undergraduates own multiple internet-capable devices.36 Presumably, these HMSC undergraduates will be likely to follow the trends seen in the ECAR data. Certainly, the plans to expand HMSC’s internet and wireless infrastructure will affect all its users. Our mobile survey gave us insights into how a sample of the HMSC population uses the library’s resources and services. These observations will allow Guin Library to expand its services for the HMSC campus. We encourage other librarians to explore their unique user populations when evaluating services and resources. REFERENCES 1 Maria Anna Jankowska, “Identifying University Professors’ Information Needs in the Challenging Environment of Information and Communication Technologies,” Journal of Academic Librarianship 30, no. 1 (2004): 51–66, https://doi.org/10.1016/j.jal.2003.11.007; Pali U. Kuruppu and Anne Marie Gruber, “Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences,” Journal of Academic Librarianship 32, no. 6 (2006): 609–23; Lotta Haglund and Per Olsson, “The Impact on University Libraries of Changes in Information Behavior among Academic Researchers: A Multiple Case Study,” Journal of Academic Librarianship 34, no. 1 (2008): 52–59, https://doi.org/10.1016/j.acalib.2007.11.010; Nirmala Gunapala, “Meeting the Needs of the ‘Invisible University’: Identifying Information Needs of Postdoctoral Scholars in the Sciences,” Issues in Science and Technology Librarianship, no. 77 (Summer 2014), https://doi.org/10.5062/F4B8563P. 2 Tina Chrzastowski and Lura Joseph, “Surveying Graduate and Professional Students’ Perspectives on Library Services, Facilities and Collections at the University of Illinois at Urbana- Champaign: Does Subject Discipline Continue to Influence Library Use?,” Issues in Science and Technology Librarianship no. 45 (Winter 2006), https://doi.org/10.5062/F4DZ068J; Kuruppu and Gruber, “Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences”; Haglund and Olsson, “The Impact on University Libraries of Changes in Information Behavior Among Academic Researchers.” 3 Ellyssa Kroski, “On the Move with the Mobile Web: Libraries and Mobile Technologies,” Library Technology Reports 44, no. 5 (2008): 1–48, https://doi.org/10.5860/ltr.44n5. 4 Paula Torres-Pérez, Eva Méndez-Rodríguez, and Enrique Orduna-Malea, “Mobile Web Adoption in Top Ranked University Libraries: A Preliminary Study,” Journal of Academic Librarianship 42, no. 4 (2016): 329–39, https://doi.org/10.1016/j.acalib.2016.05.011. 5 David J. Comeaux, “Web Design Trends in Academic Libraries—A Longitudinal Study,” Journal of Web Librarianship 11, no. 1 (2017), 1–15, https://doi.org/10.1080/19322909.2016.1230031; https://doi.org/10.1016/j.jal.2003.11.007 https://doi.org/10.1016/j.acalib.2007.11.010 https://doi.org/10.5062/F4B8563P https://doi.org/10.5062/F4DZ068J https://doi.org/10.5860/ltr.44n5 https://doi.org/10.1016/j.acalib.2016.05.011 https://doi.org/10.1080/19322909.2016.1230031 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 21 Zebulin Evelhoch, “Mobile Web Site Ease of Use: An Analysis of Orbis Cascade Alliance Member Web Sites,” Journal of Web Librarianship 10, no. 2 (2016): 101–23, https://doi.org/10.1080/19322909.2016.1167649. 6 Barbara Blummer and Jeffrey M. Kenton, “Academic Libraries’ Mobile Initiatives and Research from 2010 to the Present: Identifying Themes in the Literature,” in Handbook of Research on Mobile Devices and Applications in Higher Education Settings, ed. Laura Briz-Ponce, Juan Juanes- Méndez, and José Francisco García-Peñalvo (Hershey, PA: IGI Global, 2016), 118–39. 7 Jankowska, “Identifying University Professors’ Information Needs in the Challenging Environment of Information and Communication Technologies.” 8 Chrzastowski and Joseph, “Surveying Graduate and Professional Students’ Perspectives on Library Services, Facilities and Collections at the University of Illinois at Urbana-Champaign.” 9 Carole A. George et al., “Scholarly Use of Information: Graduate Students’ Information Seeking Behaviour,” Information Research 11, no. 4 (2006), http://www.informationr.net/ir/11- 4/paper272.html. 10 Kristin Hoffman et al., “Library Research Skills: A Needs Assessment for Graduate Student Workshops,” Issues in Science and Technology Librarianship 53 (Winter-Spring 2008), https://doi.org/10.5062/F48P5XFC; Hannah Gascho Rempel and Jeanne Davidson, “Providing Information Literacy Instruction to Graduate Students through Literature Review Workshops,” Issues in Science and Technology Librarianship 53 (Winter-Spring 2008), https://doi.org/10.5062/F44X55RG. 11 Jankowska, “Identifying University Professors’ Information Needs in the Challenging Environment of Information and Communication Technologies.” 12 Ka Po Lau et al., “Educational Usage of Mobile Devices: Differences Between Postgraduate and Undergraduate Students,” Journal of Academic Librarianship 43, no. 3 (May 2017), 201–8, https://doi.org/10.1016/j.acalib.2017.03.004. 13 Noa Aharony, “Mobile Libraries: Librarians’ and Students’ Perspectives,” College & Research Libraries 75, no. 2 (2014): 202–17, https://doi.org/10.5860/crl12-415. 14 Hannah Gashco Rempel and Laurie M. Bridges, “That Was Then, This Is Now: Replacing the Mobile-Optimized Site with Responsive Design,” Information Technology and Libraries 32, no. 4 (2013): 8–24, https://doi.org/10.6017/ital.v32i4.4636. 15 Paula Barnett-Ellis and Charlcie Pettway Vann, “The Library Right There in My Hand: Determining User Needs for Mobile Services at a Medium-Sized Regional University,” Southeastern Librarian 62, no. 2 (2014): 10–15. https://doi.org/10.1080/19322909.2016.1167649 http://www.informationr.net/ir/11-4/paper272.html http://www.informationr.net/ir/11-4/paper272.html https://doi.org/10.5062/F48P5XFC https://doi.org/10.5062/F44X55RG https://doi.org/10.1016/j.acalib.2017.03.004 https://doi.org/10.5860/crl12-415 https://doi.org/10.6017/ital.v32i4.4636 MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 22 16 William T. Caniano and Amy Catalano, “Academic Libraries and Mobile Devices: User and Reader Preferences,” Reference Librarian 55, no. 4 (2014), 298–317, https://doi.org/10.1080/02763877.2014.929910. 17 Haglund and Olsson, “The Impact on University Libraries of Changes in Information Behavior Among Academic Researchers.” 18 Kuruppu and Gruber, “Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences.” 19 Christine Wolff, Alisa B. Rod, and Roger C. Schonfeld, “Ithaka S+R US Faculty Survey 2015,” Ithaka S+R, April 4, 2016, http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey- 2015/. 20 M. Macedo-Rouet et al., “How Do Scientists Select Articles in the PubMed Database? An Empirical Study of Criteria and Strategies,” Revue Européenne de Psychologie Appliquée/European Review of Applied Psychology 62, no. 2 (2012): 63–72. 21 Rempel and Bridges, “That Was Then, This Is Now.” 22 Ellie Bushhousen et al., “Smartphone Use at a University Health Science Center,” Medical Reference Services Quarterly 32, no. 1 (2013): 52–72, https://doi.org/10.1080/02763869.2013.749134. 23 Jill T. Boruff and Dale Storie, “Mobile Devices in Medicine: A Survey of How Medical Students, Residents, and Faculty Use Smartphones and Other Mobile Devices to Find Information,” Journal of the Medical Library Association 102, no. 1 (2014): 22–30, https://doi.org/10.3163/1536- 5050.102.1.006. 24 Bushhousen et al., “Smartphone Use at a University Health Science Center”; Boruff and Storie, “Mobile Devices in Medicine.” 25 Eden Dahlstrom et al., “ECAR Study of Students and Information Technology, 2015 ," research report, EDUCAUSE Center for Analysis and Research, 2015, https://library.educause.edu/~/media/files/library/2015/8/ers1510ss.pdf?la=en. 26 Ibid., 24. 27 Lutishoor Salisbury, Jozef Laincz, and Jeremy J. Smith, “Science and Technology Undergraduate Students’ Use of the Internet, Cell Phones and Social Networking Sites to Access Library Information,” Issues in Science and Technology Librarianship 69 (Spring 2012), https://doi.org/10.5062/F4SB43PD. 28 Rempel and Bridges, “That Was Then, This Is Now.” 29 Ibid. https://doi.org/10.1080/02763877.2014.929910 http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ https://doi.org/10.1080/02763869.2013.749134 https://doi.org/10.3163/1536-5050.102.1.006 https://doi.org/10.3163/1536-5050.102.1.006 https://library.educause.edu/~/media/files/library/2015/8/ers1510ss.pdf?la=en https://doi.org/10.5062/F4SB43PD INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 23 30 “Mobile/Tablet Operating System Market Share,” NetMarketShare, March 2017, https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8&qpcustomd=1. 31 Boruff and Storie, “Mobile Devices in Medicine”; Patrick Lo et al., “Use of Smartphones by Art and Design Students for Accessing Library Services and Learning,” Library Hi Tech 34, no. 2 (2016): 224–38, https://doi.org/10.1108/LHT-02-2016-0015. 32 Boruff and Storie, “Mobile Devices in Medicine.” 33 Dahlstrom et al., “ECAR Study of Students and Information Technology, 2015.” 34 Caroline Myrberg and Ninna Wiberg, “Screen vs. Paper: What Is the Difference for Reading and Learning?” Insights 28, no. 2 (2015): 49–54, https://doi.org/10.1629/uksg.236. 35 Barnett-Ellis and Vann, “The Library Right There in My Hand”; Haglund and Olsson, “The Impact on University Libraries of Changes in Information Behavior Among Academic Researchers”; Hoffman et al., “Library Research Skills”; Kuruppu and Gruber, “Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences”; Lau et al., “Educational Usage of Mobile Devices”; Macedo-Rouet et al., “How Do Scientists Select Articles in the PubMed Database?” 36 Dahlstrom et al., “ECAR Study of Students and Information Technology, 2015.” https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8&qpcustomd=1 https://doi.org/10.1108/LHT-02-2016-0015 https://doi.org/10.1629/uksg.236 ABSTRACT INTRODUCTION LITERATURE REVIEW METHODS RESULTS AND DISCUSSION Participant Demographics and Devices Used Frequency of Library Site Use on Mobile Devices Where Researchers Are When Using Mobile Devices for Library Tasks Library Resources Accessed via Mobile Devices University Website Use Behaviors on Mobile Devices Other Research Behaviors on Mobile Devices LIMITATIONS CONCLUSION REFERENCES
9959 ---- Everyone’s Invited: A Website Usability Study Involving Multiple Library Stakeholders Elena Azadbakht, John Blair, and Lisa Jones INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 34 Elena Azadbakht (elena.azadbakht@usm.edu) is Health and Nursing Librarian and Assistant Professor, John Blair (john.blair@usm.edu) is Web Services Coordinator, and Lisa Jones (lisa.r.jones@usm.edu) is Head of Finance and Information Technology, University of Southern Mississippi, Hattiesburg, Mississippi. ABSTRACT This article describes a usability study of the University of Southern Mississippi Libraries website conducted in early 2016. The study involved six participants from each of four key user groups— undergraduate students, graduate students, faculty, and library employees—and consisted of six typical library search tasks, such as finding a book and an article on a topic, locating a journal by title, and looking up hours of operation. Library employees and graduate students completed the study’s tasks most successfully, whereas undergraduate students performed relatively simple searches and relied on the Libraries’ discovery tool, Primo. The study’s results displayed several problematic features that affected each user group, including library employees. These results increased internal buy-in for usability-related changes to the library website in a later redesign. INTRODUCTION Within the last decade, usability testing has become a common way for libraries to assess their websites. Eager to gain a better understanding of how users experience our website, we assembled a two-person team and conducted the first usability study of the University of Southern Mississippi Libraries website in February 2016. The Web Advisory Committee—which is tasked with developing, maintaining, and enhancing the Libraries’ online presence—wanted to determine if the content on the website was organized in a way that made sense to users and facilitated the efficient use of the Libraries’ online resources. Our usability study involved six participants from each of the following library user groups: undergraduate students, graduate students, faculty, and library employees. Student and faculty participants represented several academic disciplines and departments. All of the library employees involved in the study work in public-facing roles. The Web Advisory Committee and Libraries’ administration wanted to know how each of these groups differ in their website use and whether they have difficulty with the same architecture or features. Usability testing helped illuminate which aspects of the website’s design might be hindering users from accomplishing key tasks, thereby identifying where and how improvement needed to be made. We included library employees in this study to compare their approach to the website to that of other users in the mailto:elena.azadbakht@usm.edu mailto:john.blair@usm.edu mailto:lisa.r.jones@usm.edu EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 35 https://doi.org/10.6017/ital.v36i4.9959 hope of increasing internal stakeholders’ buy-in for recommendations resulting from this study. This article will discuss the usability study’s design, results, and recommendations as well as the implications of the study’s findings for similarly situated academic libraries. We will give special consideration to how the behavior of library employees compared to that of other groups. LITERATURE REVIEW The literature on library-website user experience and usability is extensive. In 2007, Blummer conducted a literature review of research related to academic-library websites, including usability studies. Her article provides an overview of the goals and outcomes of early library-website usability studies. 1 More recent articles focus on a portion or aspect of a library’s website such as the homepage, federated search or discovery tool, or subject guides. Fagan published an article in 2010 that reviews user studies of faceted browsing and outlines several best practices for designing studies that focus on next-generation catalogs or discovery tools. 2 Other library-website studies have reported on the habits of user groups, with undergraduates being the most commonly studied constituent group. Emde, Morris, and Claassen-Wilson observed University of Kansas faculty and graduate students’ use of the library website, which had been recently redesigned, including a new federated search tool. 3 Many of the study’s participants gravitated toward the subject-specific resources they were familiar with and either missed or avoided using the website’s new features. When asked for their opinions on the federated search tool, several participants said that while it was not a tool they saw themselves using, they did see how it might be a helpful for undergraduate students who were still new to research. The researchers also provided the participants with an article citation and asked them to locate it using the using the library’s website or online resources. While half the participants did use the website’s “E-Journals” link, others were less successful. Some who had the most difficulty “search[ed] for the journal title in a search box that was set up to search database titles.” 4 This led Emde, Morris, and Claassen-Wilson to observe that “locating journal articles from known citations is a difficult concept even for some advanced researchers.” Turner’s 2011 article describes the result of a usability study at Syracuse University Library that included both students and library staff. Participants were asked to start at the library’s homepage and complete five tasks designed to emulate the types of searches a typical library user might perform, such as finding a specific book, a multimedia item, an article in the journal Nature, and primary sources pertaining to a historic event. 5 When asked to find Toni Morrison’s Beloved, most staff members used the library’s traditional online catalog whereas students almost always began their searches with the federated search tool located on the homepage. Participants of both types were less successful at locating a primary source, although this task highlighted key differences in each groups’ approach to searching the library website. Since library staff were more familiar than students with the library’s collections and online search tools, they relied more on facets and limiters to narrow their searches, and some even began their searches by navigating to the library’s webpage for special collections. INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 36 Library staff tended to be more persistent; draw upon their greater knowledge of the library’s collections, website, and search tools; and use special syntax in their searchers, like inverting an author’s first and last names. “Library staff took more time, on average, to locate materials,” writes Turner, because of their “interest in trying alternative strategies.” 6 Students, on the other hand, usually included more detail than necessary in their search queries (such as adding a word related to the format they were searching for after their keywords) and could not always differentiate various types of catalog records, for example, the record for a book review and the record for the book itself. Turner concludes that the students’ mental models for searching online and their experiences with other web-search environments influence their expectations of how library search tools work and that library-website design should take these mental models into consideration. Research on the search behaviors of students versus more experienced researchers or subject experts also has implications for library website design. Two recent articles explore the different mental models or mindsets students bring to a search. The students in Asher and Duke’s 2012 study “generally treated all search boxes as the equivalent of a Google search box” and used very simple keyword searches. 7 This tracked with Holman’s 2010 study, which likewise found that the students she observed relied on simple search strategies and did not understand how search interfaces and systems are structured. 8 METHODS Our research team consisted of the Libraries’ health and nursing librarian and the web services coordinator. We worked closely with the head of finance and information technology in designing and running the usability study. A two-week period in mid-February 2016 was chosen for usability testing to avoid losing potential participants to midterms or spring break. We posted a call for participants to two university discussion lists, on the Libraries website, and on social media (Facebook and Twitter). We also reached out directly to faculty in academic departments we regularly work with and emailed library employees directly. We directed nonlibrary participants to a web form on the Libraries website to provide their name, contact information, university affiliation/class standing, and availability. The health and nursing librarian followed up with and scheduled participants on the basis of their availability. Each student participant received a ten-dollar print card and each faculty participant received a ten-dollar Starbucks gift card. To record the testing sessions, we needed a free or low-cost software option. Since the Libraries already had a subscription to Screencast-O-Matic to develop video tutorials, and the tool allows for simultaneous screen, audio, and video capture, so we decided to use it to record all testing sessions. We also used a spare laptop with an embedded camera and microphone. The health and nursing librarian served as both facilitator and note-taker for most usability testing sessions. Participants were given six tasks to complete. We encouraged participants to EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 37 https://doi.org/10.6017/ital.v36i4.9959 narrate as they completed each task. The sessions began with simple, secondary navigational questions like the following: • How late is our main library open on a typical Monday night? • How could you contact a librarian for help? • Where would you find more information about services offered by the library? Next, we asked the participants to complete tasks designed to assess their ability to search for specific library resources and to illuminate any difficulty users might have navigating the website in the process. Each of the three tasks focused on a particular library-resource type, including books, articles, and journals: • Find a book about rabbits. • Find an article about rabbits. • Check to see if we have a subscription/access to a journal called Nature. After the usability testing was complete, we reviewed the recordings and notes and coded them. For each task, we calculated time to completion and documented the various paths participants took to answer each question, noting any issues they encountered. We also compared the four user groups in our analysis. Limitations Although we controlled for user type (undergraduate, graduate, faculty, or library employee) in the recruitment of study participants, we did not screen by academic discipline. Doing so would have hindered our team’s ability to include enough graduate students and faculty members in the study, as nearly all the volunteers from these two groups were from humanities or social science fields. The results might have differed slightly had the study successfully managed to include more faculty from the so-called hard sciences and allied health fields. Additionally, the order in which we asked participants to attempt the tasks might have affected how they approached some of the later tasks. If a participant chose to search for a book using the Primo discovery tool, for example, they might be more inclined to use it to complete the next task (find an article) rather than navigate to a different online resource or tool. Despite these limitations, usability testing has helped improve the website in key ways. We plan to correct for these limitations in future studies. RESULTS Every group included a participant who failed to complete at least one of the six tasks. An adequate answer to each of the study’s six tasks can be found within one or two pages/clicks from the Libraries homepage (Figure 1). The average distance to a solution remained at about two page loads across all of the study’s participants, despite a few individual “website safaris.” INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 38 Figure 1. University of Southern Mississippi Libraries’ homepage. Graduate students tended to complete tasks the quickest and were generally as successful as library employees. They preferred to use Primo for finding books but tended to favor the list of scholarly databases on the “Articles & Databases” page to find articles and journals. Undergraduates were the second fastest group, but many struggled to complete one or more of the six tasks. They had the most trouble finding books and locating the journal by title. Undergraduates generally performed simple searches and had trouble recovering from missteps. They were heavy users of Primo, relying on the discovery tool more than any other group. The other two user groups, faculty and library employees, were slower at completing tasks. Of the two, faculty took the longest to complete any task and failed to complete tasks at a similar rate as undergraduates. Likewise, this group favored Primo nearly as often. In contrast, library employees took almost as long as faculty to complete tasks but were much more successful. As a group, library employees demonstrated the different paths users could take to complete each task but favored those paths they identified as the “preferred” method for finding an item or resource over the fastest route. EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 39 https://doi.org/10.6017/ital.v36i4.9959 The majority of study participants across all user groups had little trouble with the first three tasks. Although most participants favored the less direct path to the Libraries’ hours—missing the direct link at the top of the homepage (Figure 2)—they spent relatively little time on this task. Likewise, virtually all participants took note of the links to our “Ask-A-Librarian” and “Services” pages located in our homepage’s main navigation menu. This portion of the usability study alerted us to the need for a more prominent display of our opening hours on the homepage. Figure 2. Link to “Hours” from the homepage. Of the second set of tasks—find a book, find an article, and determine if we have access to Nature—the first and last proved the most challenging for participants. One undergraduate was unable to complete the book task, and one faculty member took nearly eight minutes to do so—the longest time to completion of any task by any user in the study. Primo was the most preferred method for finding a book. Although an option for searching our Classic Catalog (which uses Innovative Interfaces’ Millennium integrated library system) is contained within a search widget on the homepage, Primo is the default search option and therefore users’ default choice. Interestingly, even after statements from some faculty such as “I don’t love Primo,” “Primo isn’t the best,” and “the [Classic Catalog] is better,” these participants proceeded to use Primo to find a book. Library employees were evenly split between Primo and Classic Catalog. One undergraduate student, graduate student, and library employee were unable to determine whether we have access to Nature. This task was the most time consuming for library employees because there are multiple ways to approach this question and library employees tended to favor the most consistently successful yet most time-consuming options (e.g., searching within the Classic Catalog). Lacking a clear option in the main navigation bar, the most popular path started INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 40 with our “Articles & Databases” page, but the answer was most often successfully found using Primo. Several participants tried using the “Search for Databases” search box on the “Articles & Databases” page, which yielded no results because it searches only our database list. The search widget on the homepage that includes Primo has an option for searching e-journals by title, as shown in Figure 3. However, nearly all nonlibrary employees missed this feature. Participants from both the undergraduate and graduate student user groups had trouble with this task, including those who were ultimately successful. Unfortunately, many of the undergraduates could not differentiate a journal from an article, and while graduate students were aware of the distinction, a few indicated that they were not used to the idea of finding articles from a specific journal. Figure 3. E-journals search tab. When it came to finding articles, undergraduates, as well as several faculty and a few library employees, gravitated toward Primo. Others, particularly graduate students and library employees, opted to search a specific database—most often Academic Search Premier or JSTOR. However, those who used Primo to answer this question arrived at an answer two to three times faster because of the discovery tool’s accessibility from a search widget on the homepage. Regardless of the tool or resource they used, most participants found a sufficient result or two. Common Breakdowns Despite the clear label “Search for Databases,” at least one participant from each user group, including library employees, attempted to enter a book title, journal name, or keyword into the LibGuides’ database search tool on our “Articles & Databases” page (Figure 4). Some participants attempted this repeatedly despite getting no results. Others did not try a search but stated, with EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 41 https://doi.org/10.6017/ital.v36i4.9959 confidence, that entering a journal, book, or article title into the “Search for Databases” field would yield a relevant result. A few participants also attempted this with the search box on our Research Guides (LibGuides) page, which searches only within the content of the LibGuides themselves. Across all groups, when not starting at the homepage, many participants had difficulty finding books because no clear menu option exists for finding books like it does for articles (our “Articles & Databases” page). This was difficulty was compounded by many participants struggling to return to the Libraries homepage from within the website’s subpages. Those participants who were able to navigate back to the homepage were reminded of the Primo search box located there and used it to search for books. Figure 4. “Search for Databases” box on the “Articles & Databases” page. Another breakdown was the “Help & FAQ” page (Figure 5). Participants who turned there for help at any point in the study spent a relatively long time trying to find a usable answer and often ended up more confused than before. In fact, only one in three participants managed to use “Help & FAQ” successfully because the FAQ consists of many questions with answers on many different pages and subpages. This portion of the website had not been updated in several years and therefore the questions were not listed in order of frequency. INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 42 Figure 5. The answer to the “How do I find books?” FAQ item leads to several subpages. DISCUSSION Using the results of the study, we made several recommendations to the Libraries’ Web Advisory Committee and administration: (1) display our hours of operation on the homepage; (2) remove the search boxes from the “Articles & Databases” and “Research Guides” pages; (3) condense the “Help & FAQ” pages; and (4) create a “Find Books” option on the homepage. All of these recommendations were taken into account during a recent redesign of the website. We also considered each user group’s performance and its implications for website design as well as instruction and outreach efforts. First, our team suggested that the current day’s hours of operation be featured prominently on the website’s front page. Despite “How late is our main library open on a typical Monday night?” being one of two tasks that had a 100 percent completion rate, this change is easy to make, adds convenience, and addresses a long-voiced complaint. Several participants expressed a desire to see this change implemented. Moreover, this is something many of our peer libraries provide on their websites. The team’s next recommendation was to remove the “Find Databases by Title” search box from the “Article & Databases” page. During the study, participants who had a particular database in mind opted to navigate directly to that database rather than search for it. Another such search box exists on the “Research Guides” page. Although most of the participants did not encounter this search box during the study, those that did also mistook it for a general search tool. Participants EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 43 https://doi.org/10.6017/ital.v36i4.9959 from all groups, especially undergraduate students, assumed that any search box on the Libraries’ website was designed to search for and within resources like article databases and the online catalog, regardless of how the search box was labeled. Given our findings, libraries with similar search boxes might also consider removing these from their websites. Another recommended change was to condense the “Help & FAQ” section of the website considerably. The “Help & FAQ” section was too large and unwieldy for participants to use successfully without becoming visibly frustrated, defeating its purpose. Moreover, Google Analytics showed that only nine of the more than one hundred “Help & FAQ” pages were used with any regularity. Going forward, we will work to identify the roughly ten most important questions to feature in this section. The final major recommendation was to consider adding a top-level menu item called “Find Books” that would provide users with a means to escape the depths of the site and direct them to Primo or the Classic Catalog. When participants would get stuck on the book-finding task, they looked for a parallel to the “Articles & Databases” menu option. A “Getting Started” page or LibGuide could take this idea a step further by also including brief, straightforward instructions on finding articles and journals by title. In effect, this option would be another way to condense and reinvent some of the topics originally addressed in the “Help & FAQ” pages. Comparing each user group’s average performance helped illuminate the strengths and weaknesses of the website’s design. We suspect that graduate students were the fastest and nearly most successful group because they are early in their academic careers and doing a great deal of their own research (as compared to faculty). Many of them are also responsible for teaching introductory courses and are working closely with first-year students who are just learning how to do research. Faculty, because their research tends to be on narrower topics, were familiar with the specific resources and tools they use in their work but were less able to efficiently navigate the parts of the website with which they have less experience. Moreover, individual faculty varied widely in their comfort level with technology, and this affected their ability to complete certain tasks. CONCLUSION The results of our website usability study echo those found elsewhere in the literature. Students approach library search interfaces as if they were Google and generally conduct very simple searches. Without knowledge of the Libraries’ digital environment and without the research skills library employees possess, undergraduates in our study tended to favor the most direct route to the answer—if they could identify it. This group had the most trouble with library and academic terminology or concepts like the difference between an article and a journal. Though not as quick as the graduate students, undergraduates completed tasks swiftly, mainly becau se of their reliance on the Primo discovery tool. However, undergraduate students were less able to recover from missteps; more of them confused the “Find Databases by Title” search tool for an article search tool than participants from any other group. Since undergraduates compose the bulk of our user INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 44 base and are the least experienced researchers, we decided to focus our redesign on solutions that will help them use the website more easily. Although all of the library employees in our study work in public-facing roles, not all of them provide regular research help or teach information literacy. Since most of them are very familiar with our website and online resources, they approached the tasks more methodically and thoroughly than other participants. Library employees tended to choose the search strategy or path to discovery that would yield the highest-quality result or they would demonstrate multiple ways of completing a given task, including any necessary workarounds. The inclusion of library employees yielded the most powerful tool in our research team’s arsenal. Holding this group’s “correct” methods side-by-side to equally valid methods of discovery helped shake loose rigid thinking, and the fact that some library employees were unable to complete certain tasks shocked all parties in attendance when we presented our findings to stakeholders. Any potential argument that student, faculty, and staff missteps were the result of improper instruction and not of a usability issue was countered by evidence that the same missteps were sometimes made by library staff. Not only was this an eye-opening revelation to our entire staff, it served as the evidence our team needed to break through entrenched resistance to making any changes. We were met with almost instant, even enthusiastic, buy-in to our redesign recommendations from the Libraries’ administration. Therefore, we highly recommend that other academic libraries consider including library staff as participants in their website usability studies. REFERENCES 1 Barbara A. Blummer, “A Literature Review of Academic Library Web Page Studies,” Journal of Web Librarianship 1, no. 1 (2007): 45–64, https://doi.org/10.1300/J502v01n01_04. 2 Jody Condit Fagan, “Usability Studies of Faceted Browsing: A Literature Review,” Information Technology and Libraries 29, no. 2 (2010): 58–66, https://ejournals.bc.edu/ojs/index.php/ital/article/view/3144/2758. 3 Judith Z. Emde, Sara E. Morris, and Monica Claassen-Wilson, “Testing an Academic Library Website for Usability with Faculty and Graduate Students,” Evidence Based Library and Information Practice 4, no. 4 (2009): 24–36, https://doi.org/10.18438/B8TK7Q. 4 Ibid., 30. 5 Nancy B. Turner, “Librarians Do It Differently: Comparative Usability Testing with Students and Library Staff,” Journal of Web Librarianship 5, no. 4 (2011): 286–98, https://doi.org/10.1080/19322909.2011.624428. 6 Ibid., 295. https://doi.org/10.1300/J502v01n01_04 https://ejournals.bc.edu/ojs/index.php/ital/article/view/3144/2758 https://doi.org/10.18438/B8TK7Q https://doi.org/10.1080/19322909.2011.624428 EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 45 https://doi.org/10.6017/ital.v36i4.9959 7 Andrew D. Asher and Lynda M. Duke, “Searching for Answers: Student Behavior at Illinois Western University,” in College Libraries and Student Culture: What We Now Know (Chicago: American Library Association, 2012), 77–78. 8 Lucy Holman, “Millennial Students’ Mental Models of Search: Implications for Academic Librarians and Database Developers,” Journal of Academic Librarianship 37, no. 1 (2011): 21– 23, https://doi.org/10.1016/j.acalib.2010.10.003. https://doi.org/10.1016/j.acalib.2010.10.003 ABSTRACT INTRODUCTION METHODS Limitations RESULTS Common Breakdowns DISCUSSION CONCLUSION REFERENCES
9966 ---- Untitled A Case Study on the Path to Resource Discovery Beth Guay INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 18 ABSTRACT A meeting in April 2015 explored the potential withdrawal of valuable collections of microfilm held by the University of Maryland, College Park Libraries. This resulted in a project to identify OCLC record numbers (OCN) for addition to OCLC’s Chadwyck-Healey Early English Books Online (EEBO) KBART file.1 Initially, the project was an attempt to adapt cataloging workflows to a new environment in which the copy cataloging of e-resources takes place within discovery system tools rather than traditional cataloging utilities and MARC record set or individual record downloads into online catalogs. In the course of the project, it was discovered that the microfilm and e-version bibliographic records contained metadata which had not been utilized by OCLC to improve its link resolution and discovery services for digitized versions of the microfilm resources. This metadata may be advantageous to OCLC and to others in their work to transition from MARC to linked data on the Semantic Web. With MARC record field indexing and linked data implementations, this collection and others could better support scholarly research. Collections, Discovery Tools, and Metadata Services The University of Maryland, College Park Libraries’ (the Libraries; UM Libraries) collections include 3.45 million print books and 1.2 million eBooks, 17,000 electronic journals, and 352 electronic databases.2 In late 2011, the Libraries implemented WorldCat Local, OCLC’s single- search-box interface to the WorldCat database of cataloged resources and a central index of metadata provided by publishers, Abstracting and Indexing Services, institutional repositories, and so on. With WorldCat Local, and later, WorldCat Discovery, OCLC utilizes a knowledge base in managing e-resources discovery and access.3 Knowledge bases are “associated with link resolvers and electronic resource management systems” and “contain title-level metadata, linking syntax rules, publication ranges and other data.”4 KBART files are so named to represent files compliant with the NISO recommended practice, “Knowledge Bases and Related Tools (KBART).”5 KBART files, created and supplied by content providers, are used to transmit this title level metadata to knowledge base vendors and discovery service providers.6 Since OCLC enhances these files with OCLC numbers (OCN) in order to provide automated holdings maintenance on WorldCat bibliographic records, the Libraries’ Metadata Services Department (MSD) adopted a policy in 2012 to provide access to e-resources only via WorldCat when such files are available. Beth Guay (baguay@umd.edu) is Continuing Resources Librarian, University of Maryland Libraries, University of Maryland, College Park. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 19 Space Planning Early on, the Libraries’ collection policies targeted duplicate copies of print monographs and print journals held electronically in trusted repositories, e.g., JSTOR, for deselection. By March 2014, the Libraries’ Collection Development Council discussed moving microfilm collections to the yet to be opened Severn Library, slated to “house lesser used materials … in order to free up much needed space for users and the development of new collaborative learning spaces.” 7 8 A year later, in April 2015, a meeting was called by the Assistant Head, Collection Development, to investigate microfilm collection retention decisions. This time the Libraries were considering the withdrawal of microfilm resources for which equivalent versions were held online. A caveat placed on the withdrawal of the microfilm by the collection managers was that prior to their withdrawal and subsequent deletion of the Libraries’ holdings on the WorldCat bibliographic records, the equivalent e-version resources should be made discoverable in WorldCat UMD (the Libraries’ WorldCat Discovery implementation) by the addition of the Libraries’ holdings on e-version bibliographic records corresponding to the microfilm version records. Following the meeting, the Librarian for English, Latin American, & Latina/o Studies and Second Language Acquisition provided the Continuing and Electronic Resources Cataloger (C-ER Cataloger) with a list of eight valuable microfilm collections of resources and for each, the name of the comparable online collection (or e-collection) subscribed to. It was agreed that the C-ER Cataloger would investigate to determine if any of those microfilm collections could be withdrawn in compliance with the collection managers’ caveat. In other words, the C-ER Cataloger’s mission was to ensure a one-to-one correspondence of electronic and microfilm version bibliographic records for the equivalent versions of the resources. One of the e-collections added to the WorldCat Knowledge Base (WCKB) by the Libraries was Gale’s, The Making of the Modern World, 1450-1850: Part I collection (MOMW). This collection is comprised of digitized versions of Gale's microfilm resources in the series, The Goldsmiths'-Kress library of economic literature.9 A KBART file was derived from the Libraries’ MOMW MARC record set and uploaded to the WCKB sandbox where it supports the Libraries’ access to the e-version resources. The MOMW MARC record set had been reviewed and vetted by the Libraries prior to its purchase, and upon its purchase, Gale had set the Libraries’ holdings on the WorldCat bibliographic records representing the resources. With this information in mind, the C-ER Cataloger determined that the MOMW e-resource bibliographic records were comparable to those representative of the Libraries’ corresponding Goldsmiths'-Kress library of economic literature microfilm collection, thus meeting the collection managers’ criteria for deselection. The 3380 reels that could be withdrawn comprised a small but not insignificant allotment of physical space in the library. Provision of discoverability of equivalent e-versions of resources held in other collections proved difficult. For example, the corresponding microfilm collections represented in the WCKB’s British Periodicals Collections I and II were held in the series, Early British periodicals and English literary INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 20 periodicals.10 The Libraries had cataloged 186 individual serial titles in the microfilm series, Early British periodicals in 2002, but none in the series, English literary periodicals. Thus the objective would have been to ensure discoverability for the equivalent electronic versions of the Libraries’ 186 cataloged microfilm versions in the Early British periodicals series. At the time of this investigation, there were 580 British Periodicals I and II KBART file title entries; 390 of which had OCN. Whereas the OCN of The Making of the Modern World, 1450-1850: Part I WCKB collection were known entities, the OCN of the remaining e-collections had yet to be vetted. Thus the British Periodicals Collections I and II records were spot checked for evaluation. The quality of the 390 OCLC records ranged from excellent, e.g., OCLC record #297425799, to poor, e.g. #818401694 (see Figure 1, 2, 3, and 4). MARC record images in Figures 1-4 are sourced from OCLC’s Connexion cataloging client interface to the WorldCat bibliographic database. Figures 1 and 2 represent a microfilm version record and a comparable “excellent” quality record given for the resource in the WorldCat Knowledge Base, while Figures 3 and 4 represent a microfilm version and comparable “poor” quality record given for the resource in the WCKB. Note that the C-ER Cataloger’s definition of an excellent quality e-version record was one which provided metadata comparable to those of its equivalent microfilm version record; likewise, a poor quality record lacked comparable metadata. In other words, an excellent quality record was viewed as a guarantor of a discoverable resource, while a poor quality record was viewed as an obstacle to discovery. For this WCKB collection, the C-ER Cataloger determined that staff expertise with serial bibliographic records was required, and due to MSD staffing limitations, moved ahead to examine the other collections. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 21 Figure 1. Microfilm version record INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 22 Figure 2. Excellent quality e-version record — OCN in the KB file A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 23 Figure 3. Microfilm version record INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 24 Figure 4. Poor quality e-version record — OCN in the KB file In an investigation into OCLC’s Chadwyck-Healey Early English Books Online (EEBO) KBART file, for which equivalent e-versions of microfilm resources in the series Early English books, 1475- 1640 and Early English books, 1641-1700 are held, it was found that the availability of comparable e-version bibliographic records was optimal.11 In consultation with the MSD department head, a project to ensure the discoverability of equivalent e-versions of the Libraries’ 5,062 cataloged microfilm resources in the series, Early English books, 1475-1640 was initiated. The C-ER Cataloger had hoped to follow with a similar effort for the Libraries’ resources in the series Early English books, 1641-1700 (represented by 41,306 records in the Libraries’ Integrated Library System). Background: EEBO, Related Resources and Bibliographic Records Much has been written on EEBO’s inception and continuing development as a collection of digital reproductions of microfilm reproductions of pre-1700 print resources, and on its scholarly value (Kitchuk, 2007; Martin, 2007; Gadd, 2009; Mak, 2013; Folger Shakespeare Library, 2015).12 Alfred Pollard and Gilbert Redgrave’s A short-title catalogue of books printed in England, Scotland, & Ireland and of English books printed abroad, 1475-1640 (“STC”), and the “companion” volume, Donald Wing's Short-title catalogue of books printed in England, Scotland, Ireland, Wales, and British America, and of English books printed in other countries, 1641-1700 (“Wing”), respectively, were used in selecting the print resources for filming.13 Gadd (2009, 683) pinpointed the STC as “a catalogue of editions (or more accurately, editions and issues) not copies although, of course, the A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 25 information about any edition is derived primarily from the surviving copies … Each entry gives the location of known copies …” 14 The “successor” to STC and Wing, the English Short Title Catalog (ESTC), “includes records for every item listed in STC, every item in Wing, every item in the Eighteenth Century Short Title Catalogue … and newspapers and other serials which began publication before 1801” and is freely available online from the British Library.15 16 Gadd (2009, 685-686) offered this critique concerning EEBO’s bibliographic data and relationship to the ESTC: EEBO’s relationship with the original STC and Wing is straightforward and clear; EEBO’s relationship with electronic ESTC, on the other hand, is less well-known. A series of agreements made between ESTC and University Microfilms/ProQuest between 1989 and 1997 allowed EEBO to draw directly on ESTC’s existing bibliographical data … EEBO heavily edited ESTC’s data for its own purposes; certain categories of data were removed (e.g. collations, Stationer’s Register entrances), some information was amended (e.g., subject headings), and some was added (e.g. microfilm specific details). Second, there is no formal mechanism for synchronizing the data between the two resources. Occasionally, snapshots of data are sent by EEBO to ESTC but there is no guarantee that a correction or revision made to an ESTC entry will be replicated in the corresponding EEBO or vice-versa: neither ESTC nor EEBO will necessarily know when the other made a correction.17 Gadd postured that “as both resources continue to amend and expand their bibliographical data for their own purposes, there is an increasing likelihood of significant discrepancy between the two resources.”18 He did not further address the quality of the bibliographic records describing the EEBO versions of the resources; perhaps he was unaware of the sources of the EEBO bibliographic data. Microfilm version bibliographic records serve as the basis of the metadata describing the EEBO version resources. According to ProQuest, “MARC records (from which EEBO Bibliographic records derive) are produced for the microfilm collection Early English Books (EEB) after they are filmed.”19 OCLC’s cataloging database has served as one source of microfilm version records for titles in the series since the 1980s. In 1984, the Association of Research Libraries (1984, p. J-3) reported that one library had “input an indeterminate amount [of bibliographic records] into OCLC” for Early English books, 1475-1640, and that one had “input records for an indeterminate percentage of the set into OCLC” for resources in the series, Early English books 1641-1700.20 The cataloging sources of these microfilm resources have varied over time, from cooperative projects to UMI/ProQuest staff to individual libraries, however, adherence to standards has characterized the totality of the efforts invested. Joachim (1993, p. 111) described the cooperative effort begun in 1984 by the Indiana University Libraries, University of California, Riverside, University of Delaware, and the University of Utah to catalog microfilm version resources cataloged by Wing: INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 26 In order to maintain standards and consistency among the five libraries, the project director prepared a “Wing STC Project manual.” The manual includes general information, information on authority work, a bibliography, a discussion of special cataloging problems and procedures, sample records, and database input guidelines.21 OCLC’s MARC records for the microfilm and EEBO version resources contain note fields identifying the locations of the print copies filmed and subsequently reproduced digitally by UMI/ProQuest. Gadd (2009, p. 686) emphasized the importance of this information to scholars in stating that “different copies from the same edition might vary, sometimes markedly.”22 As to Gadd’s (2009) critique concerning the lack of a formal synchronization mechanism and increasing likelihood of discrepancies between EEBO and ESTC, further examination of EEBO and ESTC bibliographic record displays such as those shown in Figures 5 and 6 suggest that the British Library is working with ProQuest to align their data. It appears a focus of the British Library may be to inform the scholar of the availability of the microfilm and electronic versions of the print resources. In its ESTC overview, the British Library states that “the existence of selected … printed and digital surrogates within products such as Early English Books Online … is … noted” in its records and that its records “act as an index to several major research microform series … including Early English Books, 1475-1640 … [and] Early English books, 1641-1700.”23 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 27 Figure 5. EEBO bibliographic record for the resource cited by STC 2nd edition entry 9164 and reproduced from the copy held at the Society of Antiquaries, London. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 28 Figure 6. ESTC catalog record for STC 2nd edition, entry 9164 (http://estc.bl.uk/S3614). The code, “Lsa” given as “Loc. Of filmed copy” is the British Libraries’ MARC code for the Society of Antiquaries Library. 24 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 29 Finally, to add to this mix of print, microfilm, and EEBO digitized images, XML/SGML versions of the resources are being created by the Text Creation Partnership (TCP), formed in 1999 by the university libraries of Michigan and Oxford, ProQuest, and the Council on Library and Information Resources, to provide full text search capability.25 Catalog records describing TCP versions are available in WorldCat. According to the TCP, “the TCP does not have the resources to create new catalog records for each text we produce (though you are welcome to do so, and if you are willing to share them we would be very glad to know about it).”26 The UM Libraries’ EEBO Project The OCLC EEBO KBART file, which contained 129,544 title entries when downloaded, 58,518 of which lacked OCN, was combined with a file extracted from the 5,062 MARC records that represented the microfilm resources. The merged file was to be used as a tool in identifying the OCN of the equivalent e-versions of the microfilm resources held. The plan was to add the e- version OCN to the EEBO KBART file via OCLC’s OCN correction form.27 Significant time was spent developing and documenting procedures by which staff could perform the work of identifying OCN for addition to the EEBO file. The basic procedures are as follows: (1) via the OCLC Connexion cataloging client, search and retrieve the e-version record using the microfilm version record data; (2) use titles and/or OCN of the microfilm version record to identify the comparable EEBO resource in the KBART file; (3) view the EEBO resource record using the URL in the file; and (4) record the OCN of the matching e-version record in the appropriate row/column of the file.28 Subsequently, two MSD staff members were recruited to assist in the effort. In early November and mid-December, 2015, training sessions were held with both staff, followed by an individual session with each. Before the year’s end, each staff member had successfully completed an assigned number of “titles” for review. Importantly, from the initial investigative work, a KBART file with 50 OCN was compiled and submitted to OCLC. Confirmation from OCLC Customer Support was given that the file would be loaded. Due to the ongoing developmental status of OCLC’s services, the OCN were not loaded into the WCKB until June 2016. However, a second file sent in April 2016 was loaded in June as well. The number of OCN added to the WorldCat Knowledge Base from the project’s inception through 2016 was small due to staffing issues. The average staff time to complete a microfilm/equivalent e-version title entry in the KBART file was 13 minutes.29 As the project progressed, staff following the procedures confirmed that some OCN in the EEBO KBART file were incorrect. Most often, the “errors” stemmed from the attribution of TCP or German language of cataloging record OCN to the EEBO version resources. These TCP and German language of cataloging records correctly corresponded to matching EEBO version resources, however, TCP version records refer to XML/SGML encoded text editions; secondly, OCLC attempts to prefer English language of cataloging records over others in its knowledge base.30 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 30 Other OCN errors seriously detract from the WCKB’s EEBO file’s value. For example, WorldCat record number 606541404 describes the “fourth edition very much enlarged” of “A Most exact catalogue of the Lords spirituall and temporall, as peers of the realme, in the higher House of Parliament, according to their dignities, offices, and degrees: some other called thither for their assistance, & officers of their attendances …” yet this OCN in the WorldCat Knowledge Base’s EEBO KBART file links to an EEBO record describing the “third edition much enlarged.” See Figure 8 illustrating the WorldCat UMD record which links to an EEBO resource record describing the “third edition much enlarged.” Note that the OCLC record (as seen in the Connexion client view of the record in Figure 9) is cited by STC (2nd ed.) 7746.3 while the EEBO version record linked to is cited by STC (2nd ed.) 7746.2. To make matters worse, the author determined that the corresponding image associated with the EEBO catalog record cited by STC 7746.2 and displayed at the site corresponded to neither resource cited as STC 7742.2 and STC 7746.3. These were both printed in 1628, but the image provided at the EEBO site was of a resource printed in 1640 (see Figure 10). Figure 8. WorldCat UMD record OCN 606541404 linking to the wrong version of a resource in EEBO. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 31 Figure 9. Connexion client view of OCN 606541404 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 32 Figure 10. Digital image linked to from EEBO record describing the “third edition much enlarged” of a resource printed in 1628. http://gateway.proquest.com/openurl?ctx_ver=Z39.88- 2003&res_id=xri:eebo&rft_id=xri:eebo:image:23639 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 33 Further investigation identified errors of misappropriation of OCN in the KBART file to EEBO version records describing copies of editions filmed at locations other than those noted in the corresponding OCLC records. For example, the EEBO resource, “By the King. A proclamation for the adiournement of part of Trinitie terme,” identified in the WCKB as associated with OCN 71492075, links the scholar to a resource described by the EEBO version record as the copy filmed at the British Library. OCLC record 71492075 however indicates that the copy it describes was the copy filmed at the Henry E. Huntington Library and Art Gallery. See Figures 11-13. Figure 11. The WCKB associates OCN 71492075 with the EEBO resource, “By the King. A proclamation for the adiournement of part of Trinitie terme,” described by the EEBO website as the copy filmed at the British Library. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 34 Figure 12. The EEBO resource record linked from OCN 71492075 by the OCLC EEBO KBART file indicates the copy filmed was held by the British Library. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 35 Figure 13. OCN 71492075 indicates it describes a copy of the resource, “By the King : a proclamation for the adiournement of part of Trinitie terme,” filmed at the Henry E. Huntington Library and Art Gallery. Evaluation The UM Libraries’ EEBO project procedures revealed that match points of equivalent microfilm and e-version records were the names of the institutions holding the filmed copies and the STC citations to the resources.31 STC citations are carried in the MARC 510 fields of the bibliographic records in two subfields: 1. in subfield “a,” the names of citing works, given in a brief form, e.g., “STC” to represent Pollard and Redgrave’s Short-title catalogue; and 2. in subfield “c,” the location (e.g., page number or volume) within the citing works, e.g. “8626.”32 Figure 14 displays a Connexion Client view of OCN 33150534, cited as STC 9170, and Figure 15 shows the same record in the WorldCat display view. Unfortunately, the MARC 510 fields are neither indexed by OCLC nor displayed in WorldCat.33 OCLC could enable the identification and INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 36 collocation of records for equivalent print, microfilm and electronic versions by indexing the MARC 510 fields and subfields.34 Figure 14. Microfilm version record OCN 33150534, cited as STC 9170. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 37 Figure 15. WorldCat.org view of OCN 33150534, STC 9170 (http://www.worldcat.org/oclc/33150534). The underlying MARC 510 field metadata is not displayed. Investigation by the author revealed that TCP version records supply these metadata elements in duplicate in different MARC fields; one a free text note field, the other a number/code field, 024. The 024 field is defined to carry a “standard number or code published on an item which cannot be accommodated in another field (e.g., field 020 (International Standard Book Number).”35 It should be noted that use of the 024 field to carry a number that is not published on the item is not in accordance with the field’s definition. The TCP records use the 024 field with a first indicator value “8,” conveying that the number is an unspecified type of standard number or code.36 Subfield “a” of the 024 field, which carries the STC numbers in the TCP version records, is indexed by OCLC. In the TCP version records, however, these elements are ensconced within strings of text, e.g., “(stc) STC (2nd ed.) 9170.”37 A search on standard number, “9170,” in WorldCat will therefore fail to retrieve the appropriate record. See Figure 16 for an example of a TCP version record of a resource cited as STC 9170. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 38 In respect to the MARC field definitions, should there be a need to retrieve bibliographic records representing TCP versions of resources via STC citations, these numbers should be entered in “a” subfields, and the brief abbreviated names of the citing source, e.g., “STC (2nd ed.),” “Wing,” etc. in the “2” subfield which is defined to carry the “Source of number or code.”38 Should OCLC choose to index the MARC 510 fields as described above, the Text Creation Partnership records would be missed. Figure 16. Text Creation Partnership version OCN 832931179, STC 9170 Indexing of the MARC 510 fields/subfields by OCLC combined with use of other MARC field/subfield values, such as language of cataloging, to limit results to desired OCN could support elimination of EEBO KBART file OCN errors and identification of thousands of new OCN for addition to this and perhaps other similar files. 39 As a point of reference, according to OCLC’s “MARC Usage in WorldCat” webpages, as of January 1, 2016, there were 6,382,317 instances of MARC 510 “a” subfields and 4,082,280 instances of the “c” subfields.40 It should be noted, however, there are five first indicator values available for use in MARC 510 fields and only one of them is used to convey the information that the location in the source data is given in the field. Also worth noting, 024 data at the “MARC Usage in WorldCat” webpages shows that there were 4,633,776 occurrences of subfield 2 of the 024 field, and 43,711,819 occurrences of subfield “a.”41 510 field indexing to support identification of OCN for addition to the EEBO KBART file may require the participation of the content provider, ProQuest. The 510 field elements are indexed in A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 39 its Early English Books Online collection. ProQuest could add these data to its EEBO KBART file in support of OCN matching. The KBART Recommended Practice allows content providers “to include any extra data fields after the last KBART utilized position.”42 Finally, it should be noted that reconciliation of errors in the WCKB EEBO file pertaining to the locations of the filmed copies as noted in OCLC records but found to be different at the EEBO site would require more complex steps than 510 field matching. Furthermore, catalogers working on the EEBO project were not instructed to check the images at the EEBO website but only to confirm the STC citation match points in the EEBO version records. A closer examination of EEBO in light of the findings in this paper of an EEBO record linked to a resource printed 12 years later is an area calling for further study. In respect of the needs of scholars as eloquently described by Gadd (2009), the WorldCat Knowledge Base OCN must improve its accuracy in terms of access provision via WorldCat Discovery. MARC 510 Elements: Opportunities for Linked Data Applications? OCLC is actively engaged in research and collaboration with the greater library community to transition its metadata to linked data, however, MARC 510 metadata is lacking in its linked data record display views (see Figure 14 in a Connexion client view of a record and Figure 17 in the WorldCat linked data display view).43 44 On the other hand, in its work to transfer its English Short Title Catalog, a “MARC based … vendor-supplied ILS” to “ESTC21” a “native linked data resource,” it appears the British Library combines the MARC 510 subfield values, e.g., “Bristol, B7384” as a resource property value (Figures 18 and 19).45 46 “Bristol, B7384” represents entry number 7384 in Roger P. Bristol’s Supplement to Charles Evans' American bibliography (see Figure 20, WorldCat OCLC record number 88701).47 As presented in Figure 19 (Stahmer, 2014), “Bristol, B7384” may be comprehensible to a well-versed scholar, librarian or archivist, but not to a computer. Hillmann, Dunsire, and Phipps (2013) posited that “it would be useful if all managers of schemas and other standards were to develop element sets and value vocabulary representations that match the source semantics at the finest granularity and make them available along with maps of the internal ontologies.”48 Could a Semantic Web implementation of MARC 510 metadata at the finest granularity, with resource identifiers representing citing works such as “Bristol” and with property values such as “7384” representing locations within citing works, offer benefits to scholarship? It has been demonstrated in this paper that the consistent match points across bibliographic records representing equivalent versions of these resources has been the metadata contained in MARC 510 fields. Ultimately, a linked data implementation of the MARC bibliographic 510 field should lead the scholar to every known print copy comprising every edition, according to Gadd’s definition of an edition, above, and to the institutional holdings of equivalent microform, digitized images, or digitized full-text versions, giving the scholar the path to the resources of interest.49 OCLC, the British Library, members of the TCP, and other stakeholders may want to consider further exploration of use case scenarios to determine or rule out additional benefits of transforming MARC 510 field metadata to linked data. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 40 Figure 17. Linked data view of OCLC #33150534, http://www.worldcat.org/oclc/33150534 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 41 Figure 18. MARC 510 field data in ESTC INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 42 Figure 19. MARC 510 metadata in structured data view in ESTC21 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 43 Figure 20. Print version of OCN 88701, Supplement to Charles Evans' American bibliography by Roger P. Bristol, http://www.worldcat.org/oclc/88701. CONCLUSION At the current pace, given available staffing and the number of EEBO resources lacking OCN, the time and effort spent by the Libraries’ Metadata Services Department staff toward the goal of adding OCN to the OCLC EEBO KBART file, though well spent, will be years in the making. A collective effort in this endeavor by the WCKB community of users is welcomed by this author.50 A combined effort by OCLC and ProQuest to improve discovery and link resolution services for these valuable scholarly resources could increase their discoverability exponentially, allowing MSD staff to spend more time creating and enhancing the metadata that will lead researchers to the uncatalogued EEBO resources they seek. As to the transition of MARC 510 field metadata to linked data, OCLC, the British Library, members of the TCP, and other stakeholders should consider their options before moving forward without it. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 44 ACKNOWLEDGEMENT The author wishes to thank Karen Coyle for reading and advising on earlier versions of this paper; Becky Culbertson, Nathan Putnam, and Patricia Herron for supporting the project; and Joshua Westgard for converting the data to get the project underway. Special thanks are due to staff members of the UM Libraries, Donna King, Roselin Becker, Erica Hemsley, Yeo-Hee Koh, and Tanisha Lee, and to Freeda Brook, Luther College, for their work on the project. REFERENCES 1. A KBART file is a file compliant with the NISO recommended practice, Knowledge Bases and Related Tools (KBART). See KBART Phase II Working Group, Knowledge Bases and Related Tools (KBART): Recommended Practice: NISO RP-9-2014 (Baltimore, MD: National Information Standards Organization (NISO), 2014), accessed March 14, 2017, http://www.niso.org/publications/rp/rp-9-2014/. 2. University of Maryland Libraries. “About.” Last updated July 28, 2016, http://www.lib.umd.edu/about 3. In 2015, the Libraries implemented WorldCat Discovery, intended to be a replacement for WorldCat Local. 4. Marshall Breeding, The Future of Library Resource Discovery, (Baltimore, MD: National Information Standards Group (NISO), 2015): 17, accessed February 18, 2017. http://www.niso.org/apps/group_public/download.php/14487/future_library_resource_disc overy.pdf 5. KBART Phase II Working Group, Knowledge Bases and Related Tools (KBART): Recommended Practice: NISO RP-9-2014, (Baltimore, MD: National Information Standards Group (NISO), 2014), accessed April 13, 2017, http://www.niso.org/publications/rp/rp-9-2014/ 6. Open Discovery Initiative Working Group, Open Discovery Initiative: Promoting Transparency in Discovery: NISO RP-19-2014, (Baltimore, MD: NISO, 2014): 13, accessed March 14, 2017, http://www.niso.org/publications/rp/rp-9-2014/ 7. University of Maryland Libraries Collection Development Council. “Meeting Notes,” March 4, 2014. 8. “University of Maryland Libraries Master Space Plan,” Nov. 2015, June 2016 update. 9. See Gale’s web page, “The Making of the Modern World (MOMW) FAQ,” at http://find.galegroup.com/mome/component/researchtools/xml/FAQ.xml, accessed February 18, 2017, for a details about the collection. WorldCat Knowledge Base collections may be created by libraries and uploaded to the Knowledge Base. Details on the process are available at http://www.oclc.org/support/services/collection- manager/documentation.en.html#knowledgebase, accessed February 18, 2017. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 45 10. ProQuest’s British Periodicals collection “offers facsimile page images and searchable full text for nearly 500 British periodicals published from the 17th century through to the early 21st” and “is available in four separate collections, British Periodicals Collections I, II, III, and IV, each of which can be purchased separately.” ProQuest British Periodicals product description page, http://search.proquest.com/britishperiodicals/productfulldescdetail?accountid=14696, accessed Jan. 29, 2017 11. Details about resources available in EEBO are provided by ProQuest at its website, “EEBO: About EEBO,” accessed January 29, 2017. http://eebo.chadwyck.com/marketing/about.htm 12. Diana Kichuk, “Metamorphosis: Remediation in Early English Books Online (EEBO),” Literary and Linguistic Computing, 22:3 (2007): 291-303; Shawn Martin, “EEBO, Microfilm, and Umberto Eco: Historical Lessons and Future Directions for Building Electronic Collections,” Microform & Imaging Review, 36:4 (2007): 159-164; Ian Gadd, “The Use and Misuse of Early English Books Online, Literature Compass, 6:3 (2009): 680-692; Bonnie Mak, “Archaeology of a Digitization,” Journal of the Association for Information Science and Technology, 65:8 (2014): 1515-1526; Folger Shakespeare Library, “History of Early English Books Online,” http://folgerpedia.folger.edu/History_of_Early_English_Books_Online, last modified on 26 August 2015. 13. A.W. Pollard and G. R. Redgrave. A short-title catalogue of books printed in England, Scotland, & Ireland and of English books printed abroad, 1475-1640, Rev. ed. (London: The Bibliographical Society, 1976–1991); Donald Wing, Short-title catalogue of books printed in England, Scotland, Ireland, Wales, and British America, and of English books printed in other countries, 1641-1700, 2d ed., newly rev. and enl. (New York : Modern Language Association of America, 1972- <1994>) 14. Gadd, “The Use and Misuse of Early English Books Online,” 683. 15. “About EEBO.” 16. Details on the ESTC are provided by the British Library at http://www.bl.uk/reshelp/findhelprestype/catblhold/estccontent/estccontent.html, viewed March 12, 2017 17. Gadd, “The Use and Misuse of Early English Books Online,” 685-686. 18. Gadd, “The Use and Misuse of Early English Books Online,” 686. 19. EEBO, “Frequently Asked Questions,” accessed February 18, 2017. http://eebo.chadwyck.com/help/faqs.htm 20. Association of Research Libraries, Microform Sets in U.S. and Canadian Libraries, (Washington, D.C.: Association of Research Libraries, 1984), J-3. 21. Martin D. Joachim, “Cooperative Cataloging of Microform Sets,” in Cooperative Cataloging: Past, Present, and Future (New York: The Haworth Press, 1993), 111. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 46 22. Gadd, “The Use and Misuse of Early English Books Online,” 686. 23. British Library, “Catalogs of British Library Holdings: English Short Title Catalogue - content,” accessed February 18, 2017. http://www.bl.uk/reshelp/findhelprestype/catblhold/estccontent/estccontent.html 24. The British Libraries ESTC codes for filmed copy locations are difficult to translate. See Meaghan J. Brown’s finding aid, “STC Location Code Transcription” wherein she offers details on STC and ESTC location codes and the problem her finding aid addresses. Brown explains, “… it is currently possible to search the ESTC for items using MARC codes, but not the location codes familiar from the STC,” accessed February 18, 2017. http://www.meaghan- brown.com/stc-location-codes/ 25. Text Creation Partnership, accessed January 25, 2017. http://www.textcreationpartnership.org/home/ 26. Text Creation Partnership, accessed January 25, 2017. http://www.textcreationpartnership.org/catalog-records/ 27. OCLC’s form is available at https://www.oclc.org/content/dam/support/knowledge- base/ocn_report.xlsx, accessed October 18, 2016. 28. See Appendix 1 for the Procedures 29. With streamlined KBART search features introduced by a Metadata Services Department colleague, it’s expected this time may be reduced moving forward. 30. A June 9, 2015 email from an OCLC staff member to the KB-L@oclc.org listserv reported on OCLC’s efforts to match OCN in its KBART files to English language of cataloging records, when available. 31. UM Libraries’ staff use this metadata in the equivalent OCLC microfilm and e-version and EEBO resource records as match points. Staff do not verify that the images linked to the EEBO version records correspond to those in the aforementioned bibliographic records. It is hoped that ProQuest will investigate the case described in this paper in which the EEBO resource differs from its corresponding record. 32. “510 Citation/Reference Note,” OCLC, Bibliographic Formats and Standards. 4th Edition, last revised August 22, 2016. https://www.oclc.org/bibformats/en/5xx/510.html 33. As of January 29, 2017, the MARC 510 field has not been indexed by OCLC. See http://www.oclc.org/support/help/SearchingWorldCatIndexes/#05_FieldsAndSubfields/5xx _fields.htm 34. E.g., OCLC indexes “internet resources” using a combination of MARC data elements. These are laid out in “Searching WorldCat Indexes” at http://www.oclc.org/support/help/SearchingWorldCatIndexes/#06_Format_Document_Typ e_Codes/Format_Document_type_codes.htm. MARC 21 Bibliographic at A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 47 https://www.loc.gov/marc/bibliographic/bdleader.html provides the Leader position 06 code for “Language material.” MARC Code List for Languages (http://www.loc.gov/marc/languages/) contains the language codes contained in the language of cataloging field/subfield (MARC 040 field, subfield “b”). 35. “024 Other Standard Identifier,” in OCLC, Bibliographic Formats and Standards, 4th edition, accessed January 25, 2017. https://www.oclc.org/bibformats/en/0xx/024.html 36. Ibid. 37. OCLC. Searching WorldCat Indexes, accessed February 18, 2017. http://www.oclc.org/support/help/SearchingWorldCatIndexes/#05_FieldsAndSubfields/0xx _fields.htm%3FTocPath%3DFields%2520and%2520subfields%7C_____2 38. See OCLC Bibliographic Formats and Standards, Fourth edition. 024 Other Standard Identifier https://www.oclc.org/bibformats/en/0xx/024.html, viewed January 25, 2017 39. An Oct. 18, 2016 review of OCLC’s all-collections-list, available at https://www.oclc.org/content/dam/support/knowledge-base/all-collections-list.xlsx indicates that 38.5% percent of the 129,498 resources on the EEBO KBART file have OCLC number coverage. 40. http://experimental.worldcat.org/marcusage/510.html 41. http://experimental.worldcat.org/marcusage/024.html 42. KBART Phase II Working Group, Knowledge Bases and Related Tools (KBART): Recommended Practice: NISO RP-9-2014 (Baltimore, MD: NISO 2014), 18. http://www.niso.org/workrooms/kbart 43. https://www.oclc.org/worldcat/data-strategy.en.html, viewed Jan. 26, 2017 44. The image of the linked data view of Figure 14 was captured on February 18, 2017. 45. Carl Stahmer, “Making MARC Agnostic: Transforming the English Short Title Catalogue for the Linked Data Universe,” in Linked Data for Cultural Heritage, (Chicago: ALA Editions), p. 23-25. 46. The assertion that the ESTC transformation of MARC 510 field metadata is solely based on Carl Stahmer, “The ESTC as a 21st Century Research Tool,” Presentation given at the 2014 conference of the Text Encoding Initiative, viewed February 19, 2017. https://figshare.com/articles/ESTC21_at_TEI_2014/1558057 47. Roger P. Bristol, Supplement to Charles Evans' American Bibliography (Charlottesville: University Press of Virginia, 1970). 48. Dianne Hillmann, Gordon Dunsire, and Jon Phipps, “Maps and Gaps: Strategies for Vocabulary Design and Development,” in DCMI International Conference on Dublin Core and Metadata Applications, 2013: 88, accessed February 18, 2017. http://dcpapers.dublincore.org/pubs/article/view/3673/1896 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 48 49. See Reference 14 above. 50. A discussion and invitation to collaborate on this work took place in late 2016 on the OCLC WorldCat KB listserv (see http://listserv.oclc.org/scripts/wa.exe?SUBED1=kb-l&A=1). To date, the Preus Library, Luther College, will be working with the Libraries on this project.
9987 ---- It is Our Flagship: Surveying the Landscape of Digital Interactive Displays in Learning Environments Lydia Zvyagintseva INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 50 Lydia Zvyagintseva (lzvyagintseva@epl.ca) is the Digital Exhibits Librarian at the Edmonton Public Library in Edmonton, Alberta. ABSTRACT This paper presents the findings of an environmental scan conducted as part of a Digital Exhibits Intern Librarian Project at the Edmonton Public Library in 2016. As part of the Library’s 2016–2018 Business Plan objective to define the vision for a digital exhibits service, this research project aimed to understand the current landscape of digital displays in learning institutions globally. The resulting study consisted of 39 structured interviews with libraries, museums, galleries, schools, and creative design studios. The environmental scan explored the technical infrastructure of digital displays, their user groups, various uses for the technologies within organizational contexts, the content sources, scheduling models, and resourcing needs for this emergent service. Additionally, broader themes surrounding challenges and successes were also included in the study. Despite the variety of approaches taken among learning institutions in supporting digital displays, the majority of organizations have expressed a high degree of satisfaction with these technologies. INTRODUCTION In 2020, the Stanley A. Milner Library, the central branch of the Edmonton (Alberta) Public Library (EPL) will reopen after extensive renovations to both the interior and exterior of the building. As part of the interior renovations, EPL will have installed a large digital interactive display wall modeled after The Cube at Queensland University of Technology (QUT) in Brisbane, Australia. To prepare for the launch of this new technology service, EPL hired a digital exhibits intern librarian in 2016, whose role consisted of conducting research to inform the library in defining the vision for a digital display wall serving as a shared community platform for all manner of digitally accessible and interactive exhibits. As a result, the author carried out an environmental scan and a literature review related to digital display, as well as their consequent service contexts. For the purposes of this paper, “digital displays” refers to the technology and hardware used to showcase information, whereas “digital exhibits” refers to content and software used on those displays. Wherever the service of running, managing, or using this technology is discussed, it is framed as “digital display service” and concerns both technical and organizational aspects of using this technology in a learning institution. METHOD The data were collected between May 30 and August 20, 2016. A series of structured interviews were conducted by Skype, phone, and email. The study population was driven by searching Google mailto:lzvyagintseva@epl.ca IT IS OUR FLAGSHIP | ZVYAGINTSEVA 51 https://doi.org/10.6017/ital.v37i2.9987 and Google News for keywords such as “digital interactive AND library,” “interactive display,” “public display,” or “visualization wall” to identify organizations that have installed digital displays. A list of the study population was expanded by reviewing websites of creative studios specializing in interactive experiences and through a snowball effect once the interviews had begun. A small number of vendors, consisting primarily of creative agencies specializing in digital interactive services, were also included in the study population. Participants were then recruited by email. The goal of this project was to gain a broad understanding of the emergent technology, content, and service model landscape related to digital displays. As a result, structured interviews were deemed to be the most appropriate method of data collection because of their capacity to generate a large amount of qualitative and quantitative data. In total, 39 interviews were conducted. A list of interview questions prepared for the interviews is included in appendix A. Additionally, a complete list of the study population can also be found in Appendix B. Predominantly, organizations from Canada, the United States, Australia, and New Zealand are represented in this study. LITERATURE REVIEW Definitions • Public displays, a term used in the literature to refer to a particular type of digital display, can refer to “small or large sized screens that are placed indoor . . . or outdoor for public viewing and usage” and which may be interactive to support information browsing and searching activities.”1 In public displays, a large proportion of users are passers-by and thus first-time users.2 In academic environments, these technologies may be referred to as “video walls” and have been characterized as display technologies with little interactivity and input from users, often located in high-traffic, public areas with content prepared ahead of time and scheduled for display according to particular priorities.3 • Semi-public displays, on the other hand, can be understood as systems intended to be used by “members of a small, co-located group within a confined physical space, and not general passers-by.”4 In academic environments, they have been referred to as “visualization spaces” or “visualization studios,” and can be defined as workspaces with real-time content displayed for analysis or interpretation, often placed in in libraries or research department units.5 For the purposes of this paper, “digital displays” refers to both public and semi-public displays, as organizations interviewed as part of this study had both types of displays, occasionally simultaneously. • Honeypot effect describes how people interacting with an information system, such as a public display, stimulate other users to observe, approach, and engage in interaction with that system.6 This phenomenon extends beyond digital displays to tourism, art, or retail environments, where a site of interest attracts attention of passers-by and draws them to participate in that site. Interactivity The area of interactivity with public displays has been studied by many researchers, with three commonly used modes of interaction clearly identified: touch, gesture, and remote modes. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 52 • Touch (or multi-touch): This is the most common way users interact with personal mobile devices such as smartphones and tablets. Multi-touch interaction on public displays should support many individuals interacting with the digital screen simultaneously, since many users expect immediate access and will not take turns. For example, some technologies studied in this report support up to 30 touch points at any given time, while others, like QUT’s The Cube, allow for a near infinite number of touch points. Though studies show that this technique is fast and natural, it also requires additional physical effort from the user.7 While touch interaction using infrared sensors has a high touch recognition rate, its shortcomings have been identified as being expensive and being influenced by light interference, such as light around the touch screen.8 • Gesture: This is interaction is through movement of the user’s hands, arms, or entire body, recognized by sensors such as the Microsoft Kinect or Leap Motion systems. Although studies show that this type of interaction is quick and intuitive, it also brings “a cognitive load to the users together with the increased concern of performing gestures in public spaces.”9 Specifically, body gestures were found not to be well suited to passing-by interaction, unlike hand gestures, which can be performed while walking. Hand gestures also have an acceptable mental, physical and temporal workload.10 Research into gesture- based interaction shows that “more movement can negatively influence recall” and is therefore not suited for informational exhibits.11 Similarly, people consider gestures to be too much work “when they require two hands and large movements” to execute.12 Not surprisingly, research suggests that gestures deemed to be socially acceptable for public spaces are small, unobtrusive ones that mimic everyday actions. They are also more likely to be adopted by users. • Remote: These are interactions using another device, such as mobile phones, tablets, virtual-reality headsets, game controllers, and other special devices. Connection protocols may include Bluetooth, SMS messaging, near-field communication, radio-frequency identification, wireless-network connectivity, and other methods. Mobile-based interaction with public displays has received a lot of attention in research, media, and commercial environments because this mode allows users to interact from variable distance with minimal physical effort. However, users often find mobile interaction with a public display “too technical and inconvenient” because it requires sophisticated levels of digital literacy in addition to having access to a suitable device.13 Some suggest that using personal devices for input also helps “avoid occlusion and offers interaction at a distance” without requiring multi-touch or gesture-based interactions.14 As well, subjects in studies on mobile interaction often indicate their preference for this mode because of its low mental effort and low physical demand. However, it is possible that these studies focused on users with high degrees of digital literacies rather than the general public with varying degrees of access and comfort with mobile technologies. User Engagement Attracting user attention is not necessarily guaranteed by virtue of having a public display. According to research, the most significant factors that influence user engagement with public digital displays are age, display content, and social context. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 53 https://doi.org/10.6017/ital.v37i2.9987 Age Hinrichs found that children were the first to engage in interaction with public displays and would often recruit adults accompanying them toward the installation.15 On the other hand, the Hinrichs found adults to be more hesitant in approaching the installation: “they would often look at it from a distance before deciding to explore it further.”16 These findings suggest that designing for children first is an effective strategy for enticing interaction from users of all ages. Display Content Studies on engagement in public digital display environments indicate that both passive and active types of engagement exist with digital displays. The role of emotion in the content displayed also cannot be overlooked. Specifically, Clinch et al. state that people typically pay attention to displays “only when they expected the content to be of interest to them” and that they are “more likely to expect interesting content in a university context rather than within commercial premises.”17 In other words, the context in which the display is situated affects user expectations and primes them for interaction. The dominant communication pattern in existing display and signage systems has been narrowcast, a model in which displays are essentially seen as distribution points for centrally created content without much consideration for users. This model of messaging exists in commercial spaces, such as malls, but also in public areas like transit centers, university campuses, and other spaces where crowds of people may gather or pass by. Observational studies indicate that people tend to perceive this type of content as not relevant to them and ignore it.18 For public displays to be engaging to end users, in other words, “there needs to be some kind of reciprocal interaction.”19 In public spaces, interactive displays may be more successful than non- interactive displays in engaging viewers and making city centers livelier and more attractive.20 In terms of precise measures of attention to such displays, studies of average attention time correlate age with responsiveness to digital signage. Children (1–14 years) are more receptive than adults and men spend more time observing digital signage than women.21 Studies also indicate a significantly higher average attention times for observing dynamic content as compared to static content.22 Scholars like Buerger suggest that designers of applications for public digital displays should assume that viewers are not willing “to spend more than a few seconds to determine whether a display is of interest.”23 Instead, they recommend presenting informational content with minimal text and in such a way that the most important information can be determined in two-to-three seconds. In a museum context, the average interaction time with the digital display was between two and five minutes, which was also the average time people spent exploring analog exhibits.24 Dynamic, game-like exhibits at The Cube incorporate all the above findings to make interaction interesting, short, and drawing the attention of children first. Social Context Social context is another aspect that has been studied extensively in the field of human-computer interaction, and it provides many valuable lessons for applying evidence-based practices to technology service planning in libraries. Many scholars have observed the honeypot effect as related to interaction with digital displays in public settings. This effect describes how users who are actively engaged with the display perform two important functions: they entice passers-by to become actively engaged users themselves, and they demonstrate how to interact with the technology without formal instruction. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 54 Many argue that a conductive social context can “overcome a poor physical space, but an inappropriate social context can inhibit interaction” even in physical spaces where engagement with the technology is encouraged.25 This finding relates to use of gestures on public displays. Researchers also found that contextual social factors such as age and being around others in a public setting do, in fact, influence the choice of multi-touch gestures. Hinrichs suggests enabling a variety of gestures for each action—accommodating different hand postures and a large number of touch points, for example—to support fluid gesture sequences and social interactions.26 A major deterrent to users’ interaction with large public displays has been identified as the potential for social embarrassment.27 As an implication, the authors suggest positioning the display along thoroughfares of traffic and improving how the interaction principles of the display are communicated implicitly to bystanders, thus continually instructing new users on techniques of interaction.28 FINDINGS Technical and Hardware Landscape The average age of public displays was around three years, indicating an early stage of development of this type of service among learning institutions. Such technologies first appeared in Europe more than 10 years ago (for example, the most widely cited early example of a public display is the CityWall in Helsinki in 2007).29 However, adoption in North American did not start until around 2013.The median year for the installation of these technologies among organizations studied in this report is 2014. Among public institutions represented in the study population, such as public libraries and museums, digital displays were most frequently installed in 2015. While most organizations have only one display space, it was not unusual to find several within a single organization. For example, for the purposes of this study, the researcher has counted The Cube as three display spaces, as documentation and promotional literature on the technology cites “3 separate display zones.” As a result, the average number of display spaces in the population of this study is 1.75. The following modes of interaction beyond displaying video content with digital displays have been observed in the study population in descending order of frequency: • Sound (79%). While research on human-computer interaction is inconclusive about best practices related to incorporating sound into digital interactive displays, it is clear, among the organizations interviewed in the environmental scan, that sound is a major component of digital exhibits and should not be overlooked. • Touch or multi-touch (46%). This finding highlights that screens capable of supporting multi-user interaction is not consistent across the study population. • Gesture (25%): These include tools such as Microsoft Kinect, Leap Motion, or other systems for detecting movement for interaction. • Mobile (14%). While some researchers in the human-computer interaction field suggest mobile is the most effective way to bridge the divide between large public displays, personalization of content, and user engagement, mobile interactivity is not used frequently to engage with digital displays in the study population. One outlier is North Carolina State University Library, which takes a holistic, “massively responsive design” approach in which responsive web design principles are applied to content that can be IT IS OUR FLAGSHIP | ZVYAGINTSEVA 55 https://doi.org/10.6017/ital.v37i2.9987 displayed effectively at once online, on digital display walls, and on mobile devices while optimizing institutional resources dedicated to supporting visualization services. Further, as in the broader personal computing environment, the Microsoft Windows operating system dominates display systems, with 61% of the organizations choosing a Windows machine to power their digital display. A fifth (21%) of all organizations have some form of networked computing infrastructure, such as The Cube with its capacity to process exhibit content using 30 servers. Instead, the majority (79%) of organizations interviewed have a single computer powering the display. This finding is perhaps not surprising, given that few institutions have dedicated IT teams to support a single technology service like The Cube. Users and Use Cases Understanding primary audiences was also important for this study, as the organizational user base defines the context for digital exhibits. The breakdown of these audiences is summarized in figure 1. For example, the University of Oregon Ford Alumni Center’s digital interactive display focuses primarily on showcasing the success of its alumni, with a goal of recruiting new students to the university. However, the interactive exhibits also serve the general public through tours and events on the University of Oregon campus. Other organizations with digital displays, such as All Saints Anglican School and the Philadelphia Museum of Art, also target specific audiences, so planning for exhibits may be easier in those contexts than in organizations like the University of Waterloo Stratford Campus, with its display wall at the downtown campus that receives visitor traffic from students, faculty, and the public. 44% 33% 22% Types of Audience Academic Public Both public and academic INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 56 Figure 1. Audience types for digital displays in the study population. Digital displays serve various purposes, which depend on the context of the organization in which they exist, their technical functionality, their primary audience, their service design, and other factors. Interview participants were asked about the various uses for these technologies at their institutions. A single display could have multiple functions within a single institution. The following list summarizes these multiple uses: 1. Educational (67%), such as displaying digital collections, archives, historical maps, and other informational. These activities can be summarized in the words of one participant as “education via browse”—in other words, self-guided discovery rather than formal instruction. 2. Fun or entertainment (56%), including art exhibitions, film screenings, games, playful exhibits, and other engaging content to entice users. 3. Communication (47%), which can be considered a form of digital signage to promote library or institutional services and marketing content. Displays can also deliver presentations and communicate scholarly work. 4. Teaching (42%), including formal and semi-formal instruction, workshops, student presentations, and student course-work showcases. 5. Events (31%), such as public tours, conferences, guest speakers, special events, galas, and other social activities near or using the display. 6. Community engagement (28%), including participation from community members through content contribution, showing local content, using the display technology as an outreach tool, and other strategies to build relationships with user communities. 7. Research (22%), where the display functions as a tool that facilitates scholarly activities like data collection, analysis, and peer review. Many study participants acknowledged challenges in using digital displays for this purpose and have identified other services that might support this use more effectively. Content Types and Management In the words of Deakin University librarians, “Content is critical, but the message is king,” so it was particularly important for the author to understand the current digital display landscape as it relates to content.30 Specifically, the research project encompassed the variety of content used on digital displays as well as how it is created, managed, shared, and received by the audiences of various organizations interviewed in this study. As can be observed in figure 2, all organizations supported 2D content, such as images, video, audio, presentation slides, and other visual and textual material. However, dynamic forms of content, such as social media feeds, interactive maps, and websites were less prevalent. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 57 https://doi.org/10.6017/ital.v37i2.9987 Figure 2. Types of content supported by digital displays in the study population. Discussions around interest in emergent, immersive, and dynamic 3D content such as games and virtual and augmented reality also came up frequently in the study interviews, and the researcher found that these types of content were supported in only 16 (57%) of the 28 total cases. This number is lower than the total number of interviewees because not all organizations interviewed had content to manage or display. In addition, many organizations recognized that they would likely be exploring ways to present 3D games or immersive environments through their digital display in the near future. Not surprisingly, the creative agencies included in this study revealed an awareness and active development of content of this nature, noting “rising demand and interest in 3D and game-like environments.” Furthermore, projects involving motion detection, the Internet of Things, and other sensor-based interactions are also seeing rise in demand, according to study participants. 100 % 61 % 57 % 0 10 20 30 40 50 60 70 80 90 100 Content types Supported Content Types Static 2D Dynamic web Dynamic 3D INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 58 Figure 3. Content management systems for digital displays. In terms of managing various types of content, 20 (71%) of the organizations interviewed had used some form of content management system (CMS), while the rest did not use any tool to manage or organize content. Of those organizations that used a CMS, 15 (75%) relied on a vendor- supplied system, such as tools by FourWinds Interactive, Visix, or NEC Live. The remaining 5 (18%) CMS users created a custom solution without going to a vendor. This finding suggests that since the majority of content supported by organizations with digital displays is 2D, current vendor solutions for managing that content are sufficient for the study population at this point. It is unclear how the rise in demand for dynamic, game-like content will be supported by vendors in the coming years. Table 1 reflects the distribution of approaches to managing content observed in the study population. 18% 11% 53% 18% 71% Content Management No system Unknown Vendor-supplied system In-house created system IT IS OUR FLAGSHIP | ZVYAGINTSEVA 59 https://doi.org/10.6017/ital.v37i2.9987 Table 1. Content management in study population Content Management Responses % Vendor supplied system 15 54 In-house created system 5 18 No system 5 18 Unknown 3 10 Middleware, Automation, and Exhibit Management Middleware can be described as the layer of software between the operating system and applications running on the display, especially in a networked computing environment. For example, most organizations studied in the environmental scan supported a Windows environment with a range of exhibit applications, like slideshows, web browsers, and executable files, such as games. Middleware can simplify and automate the process of starting up, switching between, and shutting off display applications on a set schedule. As figure 4 demonstrates, the majority of the organizations in the study population (17, or 61%) did not have a middleware solution. However, this group was heterogeneous: 14 organizations (50%) did not require a middleware solution because they ran content semi-permanently or relied on user-supplied content, in which case the display functioned as a teaching tool. The remaining three organizations (11%) manually managed scheduling and switching between exhibit content. In such cases, a middleware solution would be valuable to management of content, especially as the number of applications grows, but it was not present in these organizations. Comparatively, 10 organizations (36%) used a custom solution, such as a combination of Windows or Linux scripts to manage automation and scheduling of content on the display. One organization (3%) did not specify their approach to managing content. These findings suggest that no formalized solution to automating and managing software currently exists among the study population. In addition to organizing content, digital-exhibits services involve scheduling or automating content to meet user needs according to the time of day, special events, or seasonal relevance. As a result, the middleware technology solution supports sustainable management of displays and predictable sharing of content for end users. This environmental scan revealed that digital exhibits and interactive experiences are still in the early days of development. It is possible that new solutions for managing content both at the application and the middleware level may emerge in the coming years, but they are currently limited. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 60 Figure 4. Middleware solutions in the study population. Sources of Content When finding sources of content to be displayed on digital displays, organizations interviewed used multiple strategies simultaneously. Table 2 below brings together the findings related to this theme. Table 2. Content sources for digital exhibits Content Source % External/commissioned 64 User-supplied 64 Internal/in-house 50 Collaborative with partner 43 For example, many organizations rely on their users to generate and submit material (18, or 64%); others commission vendors to create exhibits for them (18, or 64%). In 50% of all cases, organizations also produce content for exhibits in-house. In other words, most organizations used a combination of all sources to generate content for their digital displays. Only a few use a single 61% 36% 3% Middleware Use None Custom Unknown IT IS OUR FLAGSHIP | ZVYAGINTSEVA 61 https://doi.org/10.6017/ital.v37i2.9987 source of content, such as the semi-permanent historical exhibit at Henrico County Public Library. Others, like the Duke Media Wall, rely entirely on their users to supply content, which employs a “for students by students” model of content creation. Additionally, only 12 (43%) of the organizations interviewed had explored or established some form of partnership for creating exhibits. Primarily, these partnerships existed with departments, centers, institutes, campus units, and/or students in academic settings, such as the computer science department, faculty of graduate studies, and international studies. Other examples of partnerships were with similar civic, educational, cultural, and heritage organizations, such as municipal libraries, historical societies, art galleries, museums, and nonprofits. Examples included study participants working with Ars Electronica, local symphony orchestras, Harvard Space Science, and NASA on digital exhibits. Clearly, a variety of approaches were taken in the study population to come up with digital exhibits content. Content Creation Guidelines Seven organizations (19%) in the study population shared publicly the content guidelines aimed to simplify the process of engaging users in creating exhibits. These guidelines were analyzed, and key elements were identified that are necessary for users to know in order to contribute in a meaningful way, thereby lowering the barrier to participation. These elements include resolution of the display screen(s), touch capability, ambient light around the display space, required file formats, and maximum file size. A complete list of organizations with such guidelines, along with websites where these guidelines can be found, is included in appendix C. Based on the analysis of this limited sample, the bare minimum for community participation guidelines would include clearly outlining • the scope, purpose, audience, and curatorial policy of the digital exhibits service; • the technical specifications, such as the resolution, aspect ratio, and file formats supported by the display; • the design guidelines, such as colors, templates and other visual elements; • the contact information of the digital exhibits coordinator; and • the online or email submission form. It should be noted, however, that such specifications are primarily useful when a CMS exists and the content solicited from users is at least somewhat standardized. For example, images, slides, or webpages may be easier for community partners to contribute than video games or 3D interactive content. No examples of guidelines for the latter were observed in the study. Content Scheduling Whereas the middleware section of this study examined the technical approaches to content management and automation, this section explores the frequency of exhibit rotation from a service design perspective. As can be observed in figure 5, no consistent or dominant model for exhibit scheduling has been identified in the study population. Generally, approaches to scheduling digital exhibits reflect organizational contexts. For example, museums typically design an exhibit and display it on a permanent basis, while academic institutions change displays of student work or scholarly communication once per semester. The following scheduling models have emerged in the descending order of frequency in the study population. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 62 Figure 5. Content scheduling distribution in the study population. 1. Unstructured (29%): no formal approach, policy, or expectation is identified by the organization regarding displaying exhibits. This model is largely related to the early stage of service development in this domain, lack of staff capacity to support the service, and/or responsiveness to user needs. One study participant, for example, referred to this loose approach by noting that “no formalized approach and no official policy exists.” For example, institutions may have frameworks for what types of content are acceptable but no specific requirements on the content subjects. Institutions adopting a lab space model (see figure 6) for digital displays largely belong to this category. In other words, content is created on the fly through workshops, data analysis, and other situations as needed by users. In this case, no formal scheduling is required apart from space reservations. 2. Seasonal (29%), which can be defined as a period from three to six months and includes semester-based scheduling in academic institutions. Many organizations operate on a quarterly basis, so it would seem logical that content refresh cycles reflect the broader workflow of the organization. 3. Permanent (21%): in the cases of museums, permanent exhibits may mean displaying content indefinitely or until the next hardware refresh, which might reconfigure the entire interactive display service. No specific date ranges were cited for this model. 4. Monthly (10%): this pattern was observed among academic libraries, with production of “monthly playlists” featuring curated book lists or other monthly specials. 5. Weekly (7%): North Carolina State University and Deakin University Libraries aim to have fresh content up once per week; they achieve this in part by formalizing the roles needed to support their digital display and visualization services. 29% 29% 21% 10% 7% 4% Content Scheduling Unstructured Seasonal Permanent Monthly Weekly Daily IT IS OUR FLAGSHIP | ZVYAGINTSEVA 63 https://doi.org/10.6017/ital.v37i2.9987 6. Daily (4%): only Griffith University ensures that new content is available every day on its #SeeMore display; it does this largely by relying on standardized external and internal inputs, such as weather updates and the university marketing department content. Staffing and Skills One key element of the digital exhibits research project included investigating staffing models required to support a service of this nature. Not surprisingly, the theme around resource needs for digital exhibits emerged in most interviews conducted. Several participants have noted that one “can’t just throw up content and leave it” while others advised to “have expertise on staff before tech is installed.” Data gathered shows that the average full-time equivalent (FTE) needed to support digital display services in organizations interviewed was 2.97—around three full time staff members. In addition, 74% of the organizations studied had maintenance or support contracts with various vendors, including AV integrators, CMS specialists, creative studios that produced original content, or hardware suppliers. Hardware and AV integrators typically provided a 12-month contract for technical troubleshooting while creative studios ensured a 3- month support contract for digital exhibits they designed. The average time to create an original, interactive exhibit was between 9 and 12 months according to the data provided by creative agencies, The Cube teams, and learning organizations who have in-house teams creating exhibits regularly. This length of time varies on the complexity of interaction designed, depth of the exhibit “narrative,” and modes of input supported by the exhibit application. Additionally, it was important to understand the curatorial labor behind digital exhibits; the author did not necessarily speak with the curator of exhibits, and this work may be carried out by multiple individuals within organizations with digital displays or creative studios. In 20 (57%) of the cases, the person interviewed also curated some of or all the content for the digital display in their respective institutions. In five (14%) of the cases, the individual interviewed was not a curator for any of the content, because there was no need for curation in the first place. For example, displays in these cases were used for analysis or teaching and therefore did not require prepared content. In the rest of the cases (10, or 29%), a creative agency vendor, another member of the team, or a community partner was responsible for the curation of exhibit content. This finding suggests that, while a significant number of organizations outsource the design and curation of exhibits, the majority retain control over this process. Therefore, dedicating resources to curation, organization, and management of exhibit content is deemed significant by the organizations represented in the study. In terms of the capacity to carry out digital display services, skills that have been identified by study participants as being important to supporting work of this nature include the following: 1. technical skills (such as the ability to troubleshoot), general interest in technology, and flexibility and willingness to learn new things (74%) 2. design, visual, and creative sensibility (40%), as this type of work is primarily a visual experience 3. software-development or programming-language knowledge (31%) 4. communication, collaboration, and relationship-building (25%) 5. project management (20%) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 64 6. audiovisual and media skills (14%), as digital exhibits are “as much an AV experience as an IT experience,” according to one study participant 7. curatorial, organizational, and content-management skills (11%) The most frequent dedicated roles mentioned in the interviews are shown in table 3. Table 3. Types of roles significant to digital exhibits work Position Responses % Developer/programmer 11 31 Project manager 8 23 Graphic designer 6 17 User experience or user interface designer 4 11 IT systems administrator 4 11 AV or media specialist 4 11 The relatively low percentages represented in this table suggest the distribution of skills mentioned above among various team members or combining multiple skills in a single role, as may be the case in small institutions or those without formalized services with dedicated roles. Nevertheless, the presence of specific job titles indicates understanding of various skill sets needed to run a service that uses digital displays. Challenges and Successes Many challenges were identified by study participants related to initiating and supporting a service that uses digital displays for learning. Clearly, multiple challenges could be associated with the services related to digital displays within a single organization. However, many successes and lessons learned were also shared by interviewees, often overlapping with identified challenges. This pattern suggests that some organizations can pursue strategies that address challenges faced by their library or museum colleagues while perhaps lacking resources or capacity in other areas related to this type of service. For example, some organizations have observed a lack of user engagement because of limited interactivity of the technology solution they used. Others have had successful user engagement largely by investing in technology solutions that provide a range of modes of interaction. It is important to learn from both these areas to anticipate possible pain points and to be able to capitalize on successes that lead to industry recognition and engagement from library customers. Table 4 summarized the range of challenges identified. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 65 https://doi.org/10.6017/ital.v37i2.9987 Table 4. Challenges related to digital display services Challenge Identified Responses % Technical 14 41 Content 11 33 Costs 11 33 User expectations 11 33 Workflow 10 29 Service design 9 26 Time 8 24 Organizational culture 8 24 User engagement 7 20 As reflected in table 4, several key challenges have been discussed: 1. Technical, such as troubleshooting the technology, keeping up with new technologies or upgrades, and finding software solutions appropriate for the hardware selected. 2. Content, such as coming up with original content or curating existing sources. In the words of one participant, “quality and refresh of content is key—it has to be meaningful, interesting, and new.” This clearly presents a resource requirement. 3. Costs, such as the financial commitment to the service, the unseen costs in putting exhibits together, software licensing, and hardware upgrades. 4. User expectations, such as keeping the service at its full potential, using maximum functionality of the hardware, and software solutions. According to study participants, users “may not want what they think or they say they want,” and to some extent, "such technologies are almost an expectation now, and not as exciting for users.” 5. Workflow or project-management strategies specifically related to emergent multimedia experiences that require new cycles of development and testing. 6. Time to plan, source, create, troubleshoot, launch, and improve exhibits. 7. Service design, such as thinking holistically about the functions of the technology within the larger organizational structure. As one study participant stated, organizations “cannot disregard the reality of the service being tied to a physical space” in that these types of technologies are both a virtual and physical customer experience. 8. Organizational culture and policy, in terms of adapting project-based approaches to planning and resourcing services, getting institutional support, and educating all staff about the purpose, function, and benefits of the service. 9. User engagement, particularly keeping users interested in the exhibits and continually finding new and exciting content. Various participants have found that “linger time is INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 66 between 30 seconds to few minutes” and content being displayed needs to be “something interesting, unique, and succinct, but not a destination in itself.” Despite the clear challenges with delivering digital exhibits services, organizations that participated in this study have identified keys to success (see table 5). Table 5. Successes and lessons learned in using digital displays Successful Approach or Lesson Identified Responses % User engagement and interactivity 16 47 Service design 14 41 “Wow” factor 12 35 Organizational leadership 12 35 Technology solution 10 29 Flexibility 10 29 Communication and collaboration 10 29 Project management 9 26 Team and skill sets 9 26 As reflected in table 5, several approaches have been discussed: • User engagement and interactivity, particularly for those institutions that invested in highly interactive and immersive experiences; the rewards are seen in interest and enthusiasm of their user groups. • Service design: organizations that have carefully planned the service have found that this technology was successfully serving the needs of their user communities. • Promotion and “wow factor” that has brought attention to the organization and the service. It is not surprising that digital displays are central points on tours of dignitaries, political figures, and external guests. Further, many have commented that they “did not imagine a library could be involved in such an innovative experiment,” and others have added that their digital displays have “created new conversations that did not exist before.” • Leadership and vision at the organizational level, which secures support and resources as well as defines the scope of the service to ensure its sustainability and success: “Money is not necessarily the only barrier to doing this service, but risk taking, culture.” • Technology solution, where “everything works” and both the organization and users of the service are happy with the functionality, features, and performance of the chosen solution. • Flexibility and willingness to learn new things, including being open to agile project- management methods, taking risks, and continually learning new tools, technologies, and processes as the service matures. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 67 https://doi.org/10.6017/ital.v37i2.9987 • Communication and collaboration, both internally among stakeholders and externally by building community partnerships, new audiences, and user participation in content creation. For example, one study participant noted that the technology “has contributed to giving the museum a new audience of primarily young people and families—a key objective held in 2010 at the commencement of the gallery refurbishments.” • Workflow and project management for those embracing new approaches required to bring multiple skill sets together to create engaging new exhibits. As one participant has put it, “These types of approaches require testing, improvement, a new workflow and lifecycle for the projects.” • Having the right team with appropriate skills to support the service, though this theme was rated as being less significant than designing services effectively and securing institutional support for the technology service. In other words, study participants noted that having in-house programming or design skills is not enough without proper definition of success for digital exhibits services. Perceptions Institutional and user reception of digital displays as a service to pursue in learning organizations has been identified as overwhelmingly positive, with 87% of the organizations noting positive feedback. For example, one study participant noted the positive attention received by the wider community for the digital display, stating “it is our flagship and people are in general impressed by both the potential and some of the existing content." Some participants have gone as far as to say that the reception among users has been “through the roof” and they have “never had a negative feedback comment” about their display. This finding indicates a high degree of satisfaction with such technologies by organizations that pursued a digital display. Table 6 further explores the range of perceptions observed in the study. Table 6. Perception of digital display services Perception Responses % Positive 20 87 Hesitation or uncertainty 7 30 Concerns about purpose 4 17 Concerns about user engagement 4 17 Concerns about costs 3 13 Negative 3 13 A minority (13%) have noted some negative perceptions, largely related to concerns about costs or functionality of the technology; 30% have observed uncertainty and hesitation on behalf of the staff and users in terms of engagement as well as interrogating its purpose in the organization. For example, one study participant summarizes this mixed sentiment by saying, “The perception is INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 68 that it’s really neat and worthwhile for exploring new ways of teaching, but that the same features and functions could be achieved with less (which we think is a good thing!).” It is helpful to note this trend in perception, as any new service will likely bring a mixture of excitement, hesitation, and occasional opposition. Interestingly, these reactions have originated both from the staff of organizations interviewed and their communities of users. DISCUSSION The findings from this study indicate that the functions of the digital displays are highly dependent on the organizational context in which displays exist. This context, in turn, defines the nature of the services delivered through the digital display. For example, figure 6 can be useful in classifying the various ways digital displays appear in the study population, from research and teaching-oriented lab spaces to public spaces with passive messaging or active immersive game- like digital experiences. Figure 6. Types of digital displays in the study population. As such, visualization walls might belong in the “lab spaces” category that typically appears in academic libraries or research units and do not require content planning and scheduling. What we might call “digital interactive exhibits” tend to appear in museums and galleries with a primarily public audience and may have a permanent, seasonal, or monthly rotation schedule. However, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. Despite these common concerns, the digital-exhibits services were perceived as being overwhelmingly satisfactory in all types of organizations included in this study because they brought new audiences to the organization and were often seen as “showpieces” in the broader community. The data gathered in the environmental scan demonstrates that there is currently little consistency among digital displays in learning environments. This lack of consistency is seen in content-development methods among study participants, their programming, content IT IS OUR FLAGSHIP | ZVYAGINTSEVA 69 https://doi.org/10.6017/ital.v37i2.9987 management, technology solutions, and even naming of the display (and, by extension, the display service). For example, this study revealed that no evidently “open platform” for managing content at the application or the middleware level currently exists. A small number of software tools are used by organizations to support digital displays, but their use is in no way standardized, as compared to nearly every other area of library services. There is some indication that digital- display services may become more standardized in the coming years, and more tools, solutions, vendors, and communities of practice will be available. For example, many signage CMSs are currently on the market, and the number of game-like immersive experience companies is growing, suggesting extension of these services to libraries in the coming years. Only a few software tools exist for creating exhibits, such as IntuiFace and TouchDesigner, though no free, open-source versions of exhibit software are currently available. As well, the growing number of digital exhibits and interactive media companies currently focuses on turnkey—rather than software-as-a-service or platform—solutions. In contrast, some consistency exists in staffing needs and skills required to support the digital- exhibits service. A majority of organizations interviewed agreed that design, software development, systems administration, and project-management skills are needed to ensure digital-exhibits services run sustainably in a learning organization. In addition, lack of public library representation in this study makes it challenging to draw parallels to the library context. Adapting museum practices is also not necessarily reliable, as there is rarely a mandate to engage communities and partner on content creation, as there is in libraries. For example, only the El Paso (Texas) Museum of History engages the local community to source and organize content. These findings suggest that digital displays are a growing domain, and more solutions are likely to emerge in the coming years. The Cube, compared to the rest of the study population, is a unique service model because it successfully brings together most elements examined in the environmental scan. For example, to ensure continual engagement with the digital display, The Cube schedules exhibits on a regular basis and employs user interface designers, systems administrators, software engineers, and project managers. It also extends the content through community engagement, public tours, and STEM programming. It has created an in-house middleware solution to simplify exhibit delivery and has chosen Unity3D as its platform of choice for exhibit development. LIMITATIONS Only organizations from English-speaking countries were interviewed as part of the environmental scan. It is therefore unclear if access to organizations from non–English-speaking countries would have produced new themes and significantly different results. In addition, as with all environmental scans, the data is limited by the degree of understanding, knowledge, and willingness to share information of the individual being interviewed. Particularly, individuals with whom the author spoke may or may not have been technology or service leads for the digital display at their respective institutions. Thus, the study participants had a range of understanding of hardware specifications, functionality, and service-design components associated with digital displays. For example, having access to technology leads would have likely provided more nuanced responses around the middleware solutions and the underlying technical infrastructure required to support this service. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 70 A small number of vendors were also interviewed as part of the environmental scan even though vendors did not necessarily have digital displays or service models parallel to libraries or museums. They are included in appendix B. Nevertheless, gathering data from this group was deemed relevant to the study, as creative agencies have formalized staffing models and clearly identified skill sets necessary to support services of this nature. In addition, this group possesses knowledge of best practices, workflows, and project-management processes related to exhibit development. Finally, this environmental scan also did not capture any interaction with direct users of digital displays, whose experiences and perceptions of these technologies may or may not support the findings gathered from the organizations interviewed. These limitations were addressed by increasing the sample size of the study within the time and resource constraints of the research project. CONCLUSION The findings of this study show that the functions of digital-display technologies and their related services are highly dependent on the organizational context in which they exist. However, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. Despite these common concerns, digital displays were perceived as being overwhelmingly positive in all types of organizations interviewed in this study, as they brought new audiences to the organization and were often seen as “showpieces” in the broader community. The successes and lessons learned from the study population are meant to provide a broader perspective on this maturing domain as well as help inform planning processes for future digital exhibits in learning organizations. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 71 https://doi.org/10.6017/ital.v37i2.9987 APPENDIX A. ENVIRONMENTAL SCAN QUESTIONS Digital Exhibits Environmental Scan Interview Questions—Museums, Libraries, Public Organizations 1. What are the technical specifications of the digital interactive technology at your institution? 2. Who are the primary users of this technology (those interacting with the platform)? Is there anyone you thought would use it and isn’t? 3. What are primary uses for the technology (events, presentations, analysis, workshops)? 4. What types content is supported by the technology (video, images, audio, maps, text, games, 3D, all of the above?) 5. Where is content created and how is this content managed? 6. What is the schedule for the content and how is it prioritized? 7. Can you estimate the FTE (full-time equivalent) of staff members involved in supporting this technology/service, both directly and indirectly? What does indirect support for this technology entail? 8. In your experience, what kinds of skills are necessary in order to support this service? 9. Have partnerships with other organizations producing content to be exhibited been established or explored? 10. What challenges have you encountered in providing this service? 11. What have been some keys to the successes in supporting this service? 12. What has been the biggest success of this service and what has been the biggest disappointment? 13. What is the perception of this technology in institution more broadly? 14. Are there any other institutions you suggest we contact to learn more about similar technologies? INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 72 Digital Exhibits Environmental Scan Interview Questions: Vendors 1. What is the relationship between creative studio and hardware/fabrication? Do you do everything or work with AV integrators instead to put together touch interactives? 2. Who have been the primary users of the interactive exhibits and projects you have completed? 3. Who writes the use cases when creating a digital interactive exhibit? 4. What types content is supported by the technology (video, images, audio, maps, text, games, 3D, all of the above?) Do you see a rise in interest for 3D and game-like environments and do you have internal expertise to support it? 5. Where is content created for the exhibits and how is this content managed? Who curates? 6. What timespan or lifecycle do you design for? 7. How big is your team? How long to projects typically take to create? 8. What types of expertise do you have in house? What might a project team look like? 9. To what extent is there a goal of sharing knowledge back with the company from clients or users? 10. What challenges have you encountered in providing this service? 11. What have been some keys to the successes in supporting this service? IT IS OUR FLAGSHIP | ZVYAGINTSEVA 73 https://doi.org/10.6017/ital.v37i2.9987 APPENDIX B: STUDY POPULATION IN ENVIRONMENTAL SCAN Organization Location Date Interviewed All Saints Anglican School Merrimac, Australia July 25, 2016 Anode Nashville, TN July 22, 2016 Belle & Wissell Seattle, WA July 26, 2016 Bradman Museum Bowral, Australia July 10, 2016 Brown University Library Providence, RI June 3, 2016 University of Calgary Library and Cultural Resources Calgary, AB June 2, 2016 Deakin University Library Geelong, Australia June 14, 2016 University of Colorado Denver Library Denver, CO June 24, 2016 Duke University Library Durham, NC August 17, 2016 El Paso Museum of History El Paso, TX June 24, 2016 Georgia State University Library Atlanta, GA June 10, 2016 Gibson Group Wellington, New Zealand July 16, 2016 Henrico County Public Library Henrico, VA August 9, 2016 Ideum Corrales, NM July 26, 2016 Indiana University Bloomington Library Bloomington, IN May 31, 2016 Interactive Mechanics Philadelphia, PA August 2, 2016 Johns Hopkins University Library Baltimore, MD June 20, 2016 Nashville Public Library Nashville, TN July 22, 2016 North Carolina State University Library Raleigh, NC June 8, 2016 University of North Carolina atChapel Hill Library Chapel Hill, NC June 2, 2016 University of Nebraska Omaha Omaha, NE June 16, 2016 Omaha Do Space Omaha, NE July 11, 2016 University of Oregon Alumni Center Eugene, OR June 7, 2016 Philadelphia Museum of Art Philadelphia, PA August 10, 2016 Queensland University of Technology Brisbane, Australia June 30; July 29, 2016; August 16, 2016 Société des Arts Technologiques Montreal, QC August 8, 2016 Second Story Portland, OR July 28, 2016 St. Louis University St. Louis, MO July 4, 2016 Stanford University Library Stanford, CA July 22, 2016 University of Illinois at Chicago Chicago, IL June 22, 2016 University of Mary Washington Fredericksburg, VA July 7, 2016 Visibull Waterloo, ON August 12, 2016 University of Waterloo Stratford Campus Stratford, ON June 22, 2016 Yale University Center for Science and Social Science Information New Haven, CT July 13, 2016 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 74 APPENDIX C: DIGITAL CONTENT PUBLISHING GUIDELINES Organization Name Guidelines Website Deakin University Library http://www.deakin.edu.au/library/projects/sparking-true- imagination Duke University https://wiki.duke.edu/display/LMW/LMW+Home Griffith University https://intranet.secure.griffith.edu.au/work/digital- signage/seemore North Carolina State University Library http://www.lib.ncsu.edu/videowalls University Colorado Denver http://library.auraria.edu/discoverywall University of Calgary Library and Cultural Resources http://lcr.ucalgary.ca/media-walls University of Waterloo Stratford Campus https://uwaterloo.ca/stratford-campus/research/christie- microtiles-wall http://www.deakin.edu.au/library/projects/sparking-true-imagination http://www.deakin.edu.au/library/projects/sparking-true-imagination https://wiki.duke.edu/display/LMW/LMW+Home https://intranet.secure.griffith.edu.au/work/digital-signage/seemore https://intranet.secure.griffith.edu.au/work/digital-signage/seemore http://www.lib.ncsu.edu/videowalls http://library.auraria.edu/discoverywall http://lcr.ucalgary.ca/media-walls https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall IT IS OUR FLAGSHIP | ZVYAGINTSEVA 75 https://doi.org/10.6017/ital.v37i2.9987 REFERENCES 1 Flora Salim and Usman Haque, “Urban Computing in the Wild: A Survey on Large Scale Participation and Citizen Engagement with Ubiquitous Computing, Cyber Physical Systems, and Internet of Things,” International Journal of Human-Computer Studies 81 (September 2015): 31–48, https://doi.org/10.1016/j.ijhcs.2015.03.003. 2 Peter Peltonen et al., “It’s Mine, Don't Touch! Interactions at a Large Multi-touch Display in a City Center,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Florence, Italy, April 5–10, 2008, 1285–94, https://doi.org/10.1145/1357054.1357255. 3 Shawna Sadler, Mike Nutt, and Renee Reaume, “Managing Public Video Walls in Academic Library,” (presentation, CNI Spring 2015 Meeting, Seattle, Washington, April 13-14, 2015), http://dro.deakin.edu.au/eserv/DU:30073322/sadler-managing-2015.pdf. 4 Peltonen et al., “It’s Mine, Don't Touch!” 5 John Brosz, E. Patrick Rashleigh, and Josh Boyer. “Experiences with High Resolution Display Walls in Academic Libraries” (presentation, CNI Fall 2015 Meeting, Washington, DC, December 13-14, 2015), https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf; Bryan Sinclair, Jill Sexton, and Joseph Hurley, “Visualization on the Big Screen: Hands-On Immersive Environments Designed for Student and Faculty Collaboration” (presentation, CNI Spring 2015 Meeting, Seattle, Washington, April 13–14, 2015), https://scholarworks.gsu.edu/univ_lib_facpres/29/. 6 Niels Wouters et al., “Uncovering the Honeypot Effect: How Audiences Engage with Public Interactive Systems. Conference on Designing Interactive Systems,” DIS ’16 Proceedings of the 2016 ACM Conference on Designing Interactive Systems, Brisbane, Australia, June 4–8, 2016, 5- 16, https://doi.org/10.1145/2901790.2901796. 7 Gonzalo Parra, Joris Klerkx, and Erik Duval, “Understanding Engagement with Interactive Public Displays: An Awareness Campaign in the Wild,” Proceedings of the International Symposium on Pervasive Displays, Copenhagen, Denmark, June 3–4, 2014, 180–85, https:/doi.org/10.1145 /2611009.2611020; Ekaterina Kurdyukova, Mohammad Obaid, and Elisabeth Andre, “Direct, Bodily or Mobile Interaction?,” Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia, Ulm, Germany, December 4–6, 2012, https://doi.org/10.1145 /2406367.2406421; Tongyan Ning et al., “No Need to Stop: Menu Techniques for Passing by Public Displays,” Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems, Vancouver, British Columbia, https://www.gillesbailly.fr/publis/BAILLY_CHI11.pdf. 8 Jung Soo Lee et al., “A Study on Digital Signage Interaction Using Mobile Device,” International Journal of Information and Electronics Engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/IJIEE.2015.V5.566. Jung Soo Lee et al., “A Study on Digital Signage Interaction Using Mobile Device,” International Journal of Information and Electronics Engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/IJIEE.2015.V5.566. 9 Parra et al, “Understanding Engagement,” 181. https://doi.org/10.1016/j.ijhcs.2015.03.003 https://doi.org/10.1145/1357054.1357255 http://dro.deakin.edu.au/eserv/DU:30073322/sadler-managing-2015.pdf https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf https://scholarworks.gsu.edu/univ_lib_facpres/29/ https://doi.org/10.1145/2901790.2901796 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2406367.2406421 https://doi.org/10.1145/2406367.2406421 https://www.gillesbailly.fr/publis/BAILLY_CHI11.pdf https://doi.org/10.7763/IJIEE.2015.V5.566 https://doi.org/10.7763/IJIEE.2015.V5.566 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 76 10 Parra et al, “Understanding Engagement,” 181; Walter, Robert, Gilles Gailly, and Jorg Müller. “StrikeAPose: revealing mid-air gestures on public displays.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, April 27-May 2, 2013, 841- 850. https://doi.org/10.1145/2470654.2470774. 11 Philipp Panhey et al., “What People Really Remember: Understanding Cognitive Effects When Interactive with Large Displays,” Proceedings of the 2015 International Conference on Interactive Tabletops & Surfaces, Madeira, Portugal, November 15–18, 2015, 103–6, https://doi.org/10.1145/2817721.2817732. 12 Christopher Ackad et al., “An In-the-Wild Study of Learning Mid-air Gestures to Browse Hierarchical Information at a Large Interactive Public Display,” Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, September 7–11, 2015, 1227–38, https://doi.org/10.1145/2750858.2807532. 13 Parra et al, “Understanding Engagement,” 181; Kurdyukova, Obaid and Andre, 2012, n.p. 14 Jouni Vepsäläinen et al., “Web-Based Public-Screen Gaming: Insights from Deployments,” IEEE Pervasive Computing 15 no. 3 (2016): 40–46, https://ieeexplore.ieee.org/document/7508836/. 15 Uta Hinrichs, Holly Schmidt, and Sheelagh Carpendale, “EMDialog: Bringing Information Visualization into the Museum,” IEEE Transactions on Visualization and Computer Graphics 14 no. 6 (November 2008):1181-1188. https://doi.org/10.1109/TVCG.2008.127. 16 Hinrichs, Schmidt, and Carpendale, “EMDialog.” 17 Sarah Clinch et al., “Reflections on the Long-term Use of an Experimental Digital Signage System,” Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, September 17-21, 2011, 133-142. https://doi.org/10.1145/2030112.2030132. 18 Elaine M. Huang, Anna Koster, and Jan Borchers. “Overcoming Assumptions and Uncovering Practices: When Does the Public Really Look at Public Displays?,” Proceedings of the 6th International Conference on Pervasive Computing, Sydney, Australia, May 19-22, 2008, 228-243. https://doi.org/10.1007/978-3-540-79576-6_14; Jorg Muller et al., “Looking glass: a field study on noticing interactivity of a shop window,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, Texas, May 5-10, 2012, 297-306. https://doi.org/10.1145/2207676.2207718. 19 Salim & Haque, “Urban Computing in the Wild,” 35 20 Mettina Veenstra et al., “Should Public Displays Be Interactive? Evaluating the Impact of Interactivity on Audience Engagement,” Proceedings of the 4th International Symposium on Pervasive Displays, Saarbruecken, Germany, June 10–12, 2015, 15–21, https://doi.org/10.1145/2757710.2757727. 21 Clinch et al., “Reflections.” https://doi.org/10.1145/2470654.2470774 https://doi.org/10.1145/2817721.2817732 https://doi.org/10.1145/2750858.2807532 https://ieeexplore.ieee.org/document/7508836/ https://doi.org/10.1109/TVCG.2008.127 https://doi.org/10.1145/2030112.2030132 https://doi.org/10.1007/978-3-540-79576-6_14 https://doi.org/10.1145/2207676.2207718 https://doi.org/10.1145/2757710.2757727 IT IS OUR FLAGSHIP | ZVYAGINTSEVA 77 https://doi.org/10.6017/ital.v37i2.9987 22 Robert Ravnik and Franc Solina, “Audience Measurement of Digital Signage: Qualitative Study in Real-World Environment Using Computer Vision,” Interacting with Computers 25, no. 3 (2013), https://doi.org/10.1093/iwc/iws023. 23 Neal Buerger, “Types of Public Interactive Display Technologies and How to Motivate Users to Interact,” Media Informatics Advanced Seminar on Ubiquitous Computing, 2011, Hausen, Doris, Conradi, Bettina, Hang, Alina, Hennecke, Fabiant, Kratz, Sven, Lohmann, Sebastian, Richter, Hendrik, Butz, Andreas and Hussmann, Heinrich (eds). University of Munich, Department of Computer Science, Media Informatics Group, 2011. https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf . 24 C. G. Screven, “Information Design in Informal Settings: Museums and other Public Spaces,” in Information Design, ed. Robert E. Jacobson (Cambridge, MA: MIT Press, 2000), 131–192. 25 Parra et al., “Understanding Engagement,” 181. 26 Uta Hinrichs and Sheelagh Carpendale, “Gestures in the wild: Studying Multi-touch Gesture Sequences on Interactive Tabletop Exhibits,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vancouver, British Columbia, May 7–12, 2011, 3023–32, https://doi.org/10.1145/1978942.1979391. 27 Harry Brignull and Yvonne Rogers, “Enticing People to Interact with Large Public Displays in Public Spaces,” INTERACT ’03, Proceedings of the International Conference on Human-Computer Interaction, Zurich, Switzerland, September 1-5, 2003, 17-24, Matthias Rauterberg, Marino Menozzi, and Janet Wesson (eds.), Tokyo: IOS Press, 2003. http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/INTERACT200 3-p17.pdf. 28 Peltonen et al., “It’s Mine, Don't Touch!” 29 Peltonen et al., “It’s Mine, Don't Touch!” 30 Anne Horn, Bernadette Lingham, and Sue Owen, “Library Learning Spaces in the Digital Age,” Proceedings of the 35th Annual International Association of Scientific and Technological University Libraries Conference, Espoo, Finland, June 2-5, 2014. http://docs.lib.purdue.edu/iatul/2014/libraryspace/2. https://doi.org/10.1093/iwc/iws023 https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf https://doi.org/10.1145/1978942.1979391 http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/INTERACT2003-p17.pdf http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/INTERACT2003-p17.pdf http://docs.lib.purdue.edu/iatul/2014/libraryspace/2 ABSTRACT Introduction Method Literature Review Definitions Interactivity User Engagement Age Display Content Social Context Findings Technical and Hardware Landscape Users and Use Cases Figure 1. Audience types for digital displays in the study population. Content Types and Management Middleware, Automation, and Exhibit Management Sources of Content Content Creation Guidelines Content Scheduling Staffing and Skills Challenges and Successes Perceptions Discussion Figure 6. Types of digital displays in the study population. Limitations Conclusion Appendix A. ENVIRONMENTAL SCAN QUESTIONS Digital Exhibits Environmental Scan Interview Questions—Museums, Libraries, Public Organizations Digital Exhibits Environmental Scan Interview Questions: Vendors Appendix B: Study Population in Environmental Scan Appendix C: Digital Content Publishing Guidelines References