10019 ---- JUne_ITAL_fifarek_final President’s Message: For The Record Aimee Fifarek INFORMATION TECHNOLOGIES AND LIBRARIES | JUNE 2017 1 This is my final column as LITA President. Having just finished the 2016/17 Annual Report, I must admit I’m a little tapped out. Over the last year I’ve written on the events of an ALA Annual and Midwinter Conferences, a LITA Forum, a new strategic plan, information ethics, and advocacy. Even for an English Major and a Librarian that’s a lot of words. As I work with Executive Director Jenny Levine and the rest of the LITA Board to prepare the agenda for our meetings at Annual, the temptation is to focus on all the work that is yet to be done. But with the end of school and fiscal years approaching, it is the ideal time to celebrate everything that has been accomplished over the last 12 months. First off, at some magical point during the year we completed the LITA Staff transition period. Jenny has truly made the Executive Director position her own, and although she and Mark Beatty have more than enough work for six people, they are well on their way to guiding LITA to a bright new future. With her knowledge of the inner workings of ALA and her desire to make everything easier, faster and better, Jenny is truly the right person for this job. Next, we have a great new set of people coming in to lead LITA. Andromeda Yelton is going to be a fabulous LITA President. She is an eloquent speaker, has more determination than anyone I know, and is a kick ass coder to boot. Bohyun Kim has an amazing talent for organizing and motivating people, and as President-Elect work wonders with the new Appointments Committee. Our new Directors-at-Large Lindsay Cronk, Amanda Goodman, and Margaret Heller are all devoted LITAns who will be great additions to the Board. I’m glad I get to work with them all in their new roles as I transition to Past-President. And last but certainly not least we have started to make inroads on our Advocacy and Information Policy strategic focus. The Privacy Interest Group has already raised LITA’s profile by supplementing ALA’s Intellectual Freedom Committee’s Privacy Policies with Privacy Checklists.1 A group of Board members along with Office for Information Technology Policy liaison David Lee King and Advocacy Coordinating Committee liaison Callan Bignoli are working on a new Task Force proposal to outline strategies for effectively collaborating with the ALA Washington Office. These are just the first steps towards a future in which LITA is not only relevant but necessary. With all that hard work accomplished, it must be time to toast to our successes. I hope that everyone who will be at ALA Annual in Chicago (http://2017.alaannual.org/) later this month will join us as we conclude our 50th Anniversary year. Sunday with LITA promises to be amazing, with Aimee Fifarek (aimee.fifarek@phoenix.gov) is LITA President 2016-17 and Deputy Director for Customer Support, IT and Digital Initiatives at Phoenix Public Library, Phoenix, AZ. PRESIDENT’S MESSAGE | FIFAREK https://doi.org/10.6017/ital.v36i2.10019 2 Hugo Award winner Kameron Hurley (http://www.kameronhurley.com) speaking at the President’ Program, followed by what is sure to be a spectacular LITA Happy Hour at The Beer Bistro (http://www.thebeerbistro.com/). We are still working on our goal to raise $10,000 for Professional Development scholarships. We’re only halfway there, so please donate at: https://www.crowdrise.com/lita-50th-anniversary. Being LITA President during the Association’s 50th Anniversary year has been both an honor and a challenge. During a milestone year like this you become acutely aware of all of the hard work and innovation that was required for the Association to thrive for half a century, and feel more than a little pressure to leave an extraordinary legacy that will ensure another fifty years of success. It’s a tall order, especially in an era of rapid political and societal change. But as I navigated through my presidential year I realized that I didn’t have to do anything more than ensure that people who already want to work hard for the greater good have a welcoming place to do just that. After fifty years, LITA still has the thing that made it a success in the first place: a core group of volunteers committed to the belief that new technologies can empower libraries to do great things. The talented and passionate people I have worked with on the Board, in the Committee and Interest Group leadership, and throughout the membership are the best legacy that an Association can have. Now more than ever the people in libraries who “do tech” can be leaders in their communities and on the national stage. Now more than ever it is LITA’s time to shine. REFERENCES 1. http://litablog.org/2017/02/new-checklists-to-support-library-patron-privacy/ 10022 ---- September_ITAL_Ozeran_for_proofing Managing Metadata for Philatelic Materials Megan Ozeran INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 7 ABSTRACT Stamp collectors frequently donate their stamps to cultural heritage institutions. As digitization becomes more prevalent for other kinds of materials, it is worth exploring how cultural heritage institutions are digitizing their philatelic materials. This paper begins with a review of the literature about the purpose of metadata, current metadata standards, and metadata that are relevant to philatelists. The paper then examines the digital philatelic collections of four large cultural heritage institutions, discussing the metadata standards and elements employed by these institutions. The paper concludes with a recommendation to create international standards that describe metadata management explicitly for philatelic materials. INTRODUCTION Postage stamps have existed since Great Britain introduced them in 1840 as a way to prepay postage. Historian and professor Winthrop Boggs (1955) points out that postage stamps have been collected by individuals since 1841, just a few months after the first stamps were issued (5). To describe this collection and research, the term philately was coined by a French stamp collector, Georges Herpin, who “combined two Greek words philos (friend, amateur) and atelia (free, exempt from any charge or tax, franked)” (Boggs 1955, 7). Thus postage stamps and related materials, such as the envelopes to which they have been affixed, are considered philatelic materials. In the United States, numerous societies have formed around philately, such as the American Philatelic Society, the Postal History Society, the Precancel Stamp Society, and the Sacramento Philatelic Society (in northern California). The definitive United States authority on stamps and stamp collecting for nearly 150 years has been the Scott Postage Stamp Catalogue, which was first created by John Walter Scott in 1867 (Boggs 1955, 6). The Scott Catalogue “lists nearly all the postage stamps issued by every country of the world” (American Philatelic Society 2016). Philately is a massively popular hobby, and cultural heritage institutions have amassed large collections of postage stamps through collectors’ donations. In this paper, I will examine how cultural heritage institutions apply metadata to postage stamps in their digital collections. Libraries, archives, and museums have obtained specialized collections of stamps over the decades, and they have used various ways to describe these collections, such as through creating finding aids. Only recently have institutions begun to digitize their stamp collections and make the collections available for online review, as digitization in general has become more common in cultural heritage institutions. Megan Ozeran (megan.ozeran@gmail.com), a recent MLIS degree graduate from San Jose State University School of Information, is winner of the 2017 LITA/Ex Libris Student Writing Award. MANAGING METADATA FOR PHILATELIC MATERIALS | OZERAN | doi:10.6017/ital.v36i3.10022 8 PROBLEM STATEMENT Textual materials have received much attention in regards to digitization, including the creation and implementation of metadata standards and schemas. Philatelic materials are not like textual materials, and are not even like photographic materials, which have also received some digitization attention. In fact, there is very little literature that currently exists describing how metadata is or should be applied to philatelic materials, even though digital collections of these materials already exist. Therefore, the goal of this paper is to examine exactly how metadata is applied to digital collections of philatelic materials. Several related questions drove the research about this topic: As institutions digitize stamp collections, what metadata schema(s) are they using to do so? Are current metadata standards and schemas appropriate for these collections, or have institutions created localized versions? What metadata elements are most crucial in describing philatelic materials to enhance access in a digital collection? LITERATURE REVIEW While there is abundant literature regarding the use of metadata for library, archives, and museum collections, there is a dearth of literature that specifically discusses the use of metadata for philatelic materials. Indeed, there is no literature at all that analyzes best practices for philatelic metadata, despite the fact that several large institutions have already created digital stamp collections. Even among the many metadata standards that have been created, very few specify metadata guidelines for philatelic collections. It is clear that philatelic collections have not been highlighted in discussions over the last few decades about digitization, so best practices must be inferred based on the more general discussions that have taken place. The Purpose and Quality of Metadata When considering why metadata is important to digital collections (of any type), it is crucial to remember, as David Bade (2008) puts it, “Users of the library do not need bibliographic records at all. . .. What they want is to find what they are looking for” (125). In other words, the descriptive metadata in a digital record is important only to the extent that it facilitates the discovery of materials that are useful to a researcher. As Arms and Arms (2004) point out, “Most searching and browsing is done by the end users themselves. Information discovery services can no longer assume that users are trained in the nuances of cataloging standards and complex search syntaxes” (236). Echoing these sentiments, Chan and Zeng (2006) write, “Users should not have to know or understand the methods used to describe and represent the contents of the digital collection” (under “Introduction”). When creating digital records, then, institutions need to consider how the creation, display, and organization of metadata (especially within the search system) make it easier or more difficult for those end users to effectively search the digital collection. How effective metadata is in facilitating user research is ultimately dependent upon the quality of that metadata. Bade (2007) notes that the information systems are essentially a way for an institution to communicate with researchers, and that this communication is only effective if metadata creators understand what the end users are looking for in the content and style of INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 9 communication (3-4). Thus, in somewhat circular fashion, metadata quality is dependent upon understanding how best to communicate with end users. To help define discussions of metadata quality, Bruce and Hillmann (2004) suggest seven factors to consider: “completeness, accuracy, provenance, conformance to expectations, logical consistency and coherence, timeliness, and accessibility” (243). Deciding how to prioritize one or several factors over the others will depend on the resources and goals of the institution, as well as the ultimate needs of the end users. The State of Standards Standards are created by various organizations to define the rules for applying metadata to certain materials in certain settings. Standards generally describe a metadata schema, “a formal structure designed to identify the knowledge structure of a given discipline and to link that structure to the information of the discipline through the creation of an information system that will assist the identification, discovery and use of information within that discipline” (CC:DA 2000, under “Charge #3”). Essentially, a metadata schema standard demonstrates how best to organize and identify materials to enhance discovery and use of those materials. Such standards are helpful to catalogers and digitizers because they define rules for how to include content, how represent content, and/or what the allowable content values are (Chan and Zeng 2006, under “Metadata Schema”). Unfortunately, very few current metadata standards even mention philatelic materials, despite their unique nature. The only standard that appears to do so with any real purpose is the Canadian Rules for Archival Description (RAD), created by the Bureau of Canadian Archivists in 1990, and revised in 2008. Thirteen chapters comprise the first part of the RAD, and these chapters describe the standards for a variety of media. Philatelic materials are given their own focus in chapter 12, which discusses general rules for philatelic description as well as specifics for each of nine areas of description: title and statement of responsibility, edition, issue data, dates of creation and publication, physical description, publisher’s series, archival description, note, and standard number. The RAD therefore provides a decent set of guidelines for describing philatelic materials. The Encoded Archival Description Tag Library created by the Society of American Archivists (EAD3, updated in 2015) mentions philatelic materials only in passing. There is no specific section discussing how to properly apply descriptive metadata to philatelic materials. The single mention of such materials in the entire EAD3 documentation is in the discussion of the tag, where it is noted that “jurisdictional and denominational data for philatelic records” (257) may be recorded. Other standards don’t appear to mention philatelic materials at all, so implementers of those standards must extrapolate based on the general information provided. For example, Describing Archives: A Content Standard (DACS), also published by the Society of American Archivists (2013), does not discuss philatelic materials in any way. It does note, “Different media of course require different rules to describe their particular characteristics…” (xvii), but the recommendations for specific content standards for different media listed in Appendix B still leave out philately (141- 142). Institutions using DACS for philatelic materials need to determine how to localize the standard. Although MARC similarly does not include specific guidelines for philatelic materials, Peter Roberts (2007) suggests ways to effectively use it for cataloging philatelic materials. For MANAGING METADATA FOR PHILATELIC MATERIALS | OZERAN | doi:10.6017/ital.v36i3.10022 10 instance, in the MARC 655 field he suggests using the Getty Art and Architecture Thesaurus terms to describe the form of the materials and the Library of Congress Subject Headings to describe the subjects (genres) of the materials (86-87). In similar ways, most standards could potentially be applied to philatelic materials if an institution were to provide additional local rules for how to best implement the standard. The Metadata that Philatelists Want There are actually a good number of resources for determining what metadata is important to philatelic researchers. Boggs (1955) suggests that a philatelist may want to “study the methods of production; the origin, selection, and the subject matter of designs; their relation to the social, political and economic history of the country of issue; the history of the postal service which issued them” (1-2). These few initial research suggestions can provide some insight into what metadata elements would be most useful in a digital record. David Straight (1994) suggests the most basic crucial items are the date and country of issue for an item (75). Roberts (2007) provides significant background about philatelic materials and research, and indicates multiple metadata elements that will be helpful for researchers. He reiterates that dates are extremely useful, and are often identified on the materials themselves; when specific dates are not visible, a stamp itself may provide evidence of an approximate year based on when the stamp was issued (75). He notes that many of the postal markings also “indicate the time and place of origin, route, destination, and mode of transportation” (78), which will also be of interest to philatelic researchers. If any information is available about the original collector, dealer, or exhibitor of the stamp before it was acquired by a cultural heritage institution, this may also be of great interest to a researcher (81). Roberts also suggests that the finding aids for philatelic collections are more crucial places for description than for specific item records, and that controlled vocabulary subject terms are important in these descriptions (86). Because the Scott Postage Stamp Catalogue is the leading United States authority on stamps, it can also suggest the metadata elements that primarily concern philatelic researchers. Each listing includes a unique Scott number, paper color, variety (e.g., perforation differences), basic information, denomination, color of the stamp, year of issue, value used/unused, any changes in the basic set information, and the total value of the set (Scott Publishing Co. 2014, 14A). The Scott Catalogue also describes a variety of additional components that researchers may be interested in, including the type of paper used, any watermarks, inks used, separation type, printing process used, luminescence, and gum condition (19A-25A). One additional interesting source for deciding what metadata is important to researchers (aside from directly surveying them, of course) is a piece of software that was created to help philatelists catalog their own private collections. StampManage is available in United States and international versions, and it is largely based on the Scott Postage Stamp Catalogue in creating the full listing of stamps that may be available to a collector. It includes a wide variety of metadata elements for cataloging stamps, such as the Scott number, country of origin, date of issue, location of issue, type of stamp, denomination, condition, color, brief description, presence and type of perforations, category, plate block size, mint sheet size, paper type, presence and type of watermark, gum type, and so forth (Liberty Street Software 2016). As a product that is sold to stamp collectors, INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 11 StampManage is likely to have a confident grasp of all the metadata that could possibly be important to its customers. This literature review helps create a holistic view of the issues faced by cultural heritage institutions with digitized stamp collections. Although little progress has been made in the literature to describe how best to apply metadata to philatelic materials, there are ways that institutions can extrapolate guidelines from the literature that does exist. METHODOLOGY To explore my research questions, I interviewed (over email) representatives of several large institutions with digitized stamp collections. The information provided by these institutions sheds light on the current state of metadata and metadata schemas for philatelic collections. Note that there are other institutions with online collections of postage stamps that are not discussed in this paper (e.g., the Swedish Postal Museum, https://digitaltmuseum.se/owners/S-PM). Due to my own language limitations, this paper is limited to analysis of online collections that are described in English. Additional research into institutions with non-English displays would support greater analysis of how cultural heritage institutions are currently creating and providing philatelic metadata. RESULTS Smithsonian National Postal Museum In the United States, the largest publicly accessible digital collection of philatelic materials is from the Smithsonian National Postal Museum. I discussed the metadata for this collection with Elizabeth Heydt, Collections Manager at the museum. Ms. Heydt stated that the stamps are primarily identified “by their country and their Scott number” (E. Heydt, pers. comm., October 5, 2016). For digital collections, the Smithsonian National Postal Museum uses a Gallery Systems database called The Museum System, which includes the Getty Art and Architecture Thesaurus as an embedded thesaurus. Ms. Heydt noted that aside from this embedded thesaurus, they “do not use any additional, formalized data standards such as the Dublin Core, MODS,” or the like. Of note, The Museum System does allege compliance with “standards including SPECTRUM, CCO, CDWA, DACS, CHIN, LIDO, XMP, and other international standards” (Gallery Systems 2015, 4). The end user interface that pulls data from The Museum System is called Arago, which has “an internal structure that built on the Scott Catalogue system and some internal choices for grouping and classifying objects for the philatelic and the postal history collections.” Users can search and browse the entire digital collection through Arago, but Ms. Heydt did note that Arago “is in stasis right now as we are in the planning stages for an updated version sometime in the near future.” Based on an example record (http://arago.si.edu/record_145471_img_1.html), the descriptive metadata currently available for end users include a title, Scott number, detailed description (including keywords), date of issue, medium, museum ID (a unique identifier), and place of origin. Digital images of the stamps are also included. A set of “breadcrumb” links at the top of the page also allow a user to browse each level of the digital collection, from an individual stamp record up to the entire museum collection as a whole. MANAGING METADATA FOR PHILATELIC MATERIALS | OZERAN | doi:10.6017/ital.v36i3.10022 12 Library and Archives Canada I discussed the Library and Archives Canada (LAC) online philatelic collection with James Bone, Archivist at the LAC. He explained that the philatelic collection has had a complicated history: Our philatelic collection largely began with the dissolution of the National Postal Museum … in 1989 and the subsequent division and transfer of its collection to the Canadian Postal Museum for artifacts/objects at the former Canadian Museum of Civilization (now the Canadian Museum of History) and to the Canadian Postal Archives at the former National Archives (which was merged with the National Library in the mid-2000s to create Library and Archives Canada). As a side note, both the Canadian Postal Museum and the Canadian Postal Archives are themselves now defunct – although LAC still acquires philatelic records and records related to philately and postal administration, these functions are no longer handled by a dedicated section but rather by archivists within our government records branch and our private records branch (the latter being me). (J. Bone, pers. comm., October 11, 2016) Regarding the collection’s metadata, Mr. Bone confirmed that the archival records at the LAC all conform to the RAD standard (discussed in the literature review above), and that philatelic materials are all given “at least a minimum level of useful file level or item level description for philatelic records based on Chapter 12 of RAD,” the chapter that specifically discusses philatelic materials. Unfortunately, to his knowledge, the online database for these records does not use a common metadata standard such as OAI-PMH that enables “external metadata harvesting or querying,” so the system is not searchable outside of the LAC website. Mr. Bone also pointed out that there are fields visible on the back end of the LAC online database that are not visible to end users, and the most notable of these omissions is the Scott number (the number assigned to every stamp by the Scott Catalogue). He wrote that it seemed “bizarre” to not have the Scott number visible, “as that’s definitely an access point that I would expect philatelic researchers to use to narrow down a result set to the postage stamp issue of interest.” However, it appears this invisibility was a decision consciously made by the LAC, based on Mr. Bone’s review of an internal LAC standards document. Based on an example record (http://collectionscanada.gc.ca/pam_archives/index.php?fuseaction=genitem.displayItem&lang= eng&rec_nbr=2184475) the following fields are available for end users to view: title, place of origin, denomination, date of issue, title of the collection of which it is a part, extent of item, language, access conditions, terms of use, MIKAN number (a unique identifier), ITEMLEV number (deprecated), and any additional relevant information such as previous exhibitions of the physical item. The Postal Museum The Postal Museum in London is set to open its physical doors in 2017, but much of the collection is already available for browsing and searching online. Stuart Aitken, Curator, Philately, explained to me that the online collection uses the General International Standard Archival Description, Second Edition, as the primary metadata schema, but the online collection also includes “non ISAD(G) fields for certain extra-specific data for our archive material, including philatelic material” (S. Aitken, pers. comm., December 1, 2016). Based on my own review of the ISAD(G) standards INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 13 document (International Council on Archives 1999) and an example record from The Postal Museum’s online collection (http://catalogue.postalmuseum.org/collections/getrecord/GB813_P_150_06_02_011_01_001#cu rrent), it appears nearly all the fields are based on the ISAD(G) standards. These fields include information such as date, level of description, extent of item, language, description, and conditions for access and reproduction. Only the field for “philatelic number” appears to be extra. There may be additional non-ISAD(G) fields that are not included in the example record above, but are included in other records when the extra information is available and relevant. Each digital record also allows end users to submit tags for help with identification and search. No tags were already submitted on the example record reviewed above, but this is likely because the online collection is still rather new. Of note, digital records are created at each archival level, from the broadest collection category down to the individual item (similar to the Smithsonian National Postal Museum collection). To provide an additional way to browse the collection, a sidebar in each digital record shows where it exists in the hierarchy of collections and provides links to each broader collection of which the current record is a part. The British Museum I reached out to the folks at The British Museum to discuss the application of metadata to their online records for postage stamps, but at the time of this writing I have not received any response. However, some information can be gleaned from examining the website. Unlike the other institutions reviewed in this paper, The British Museum’s online collection includes a wide variety of objects. Postage stamps are therefore identified in the online collection by specifying “postage- stamp” in the “Object type” field, which likely uses a controlled vocabulary. Based on an example record (http://www.britishmuseum.org/research/collection_online/collection_object_details.aspx?objec tId=1102502&partId=1&searchText=postage+stamp&page=1), each record for a postage stamp lists the museum number (a unique identifier), denomination, description, date issued, country of origin, materials, dimensions, acquisition name and date, department, and registration number (which appears to be the same as the museum number). Digital images of the stamps are occasionally included. The collection website notes that The British Museum is “continuing every day to improve the information recorded in it [the digital collection] and changes are being fed through on a regular basis. In many cases it does not yet represent the best available knowledge about the objects” (Trustees of the British Museum 2016a, under “About these records”). Therefore, end users are encouraged to read the information in any given record with care, and to provide feedback if they have any additional information or corrections about an object. The online collection also is offered in machine-readable format, via linked data and SPARQL, to encourage wider accessibility and use. The website advises, The use of the W3C open data standard, RDF, allows the Museum's collection data to join and relate to a growing body of linked data published by other organisations around the world interested in promoting accessibility and collaboration. The data has also been organised using the CIDOC CRM (Conceptual Reference Model) crucial for harmonising MANAGING METADATA FOR PHILATELIC MATERIALS | OZERAN | doi:10.6017/ital.v36i3.10022 14 with other cultural heritage data. The CIDOC CRM represents British Museum's data completely and, unlike other standards that fit data into a common set of data fields, all of the meaning contained in the Museum's source data is retained. (Trustees of the British Museum 2016b) Each digital object has RDF and HTML resources, as well as a SPARQL endpoint with an HTML user interface. DISCUSSION The information from the four institutions above provides a starting point for examining best practices for philatelic metadata. In the following discussion, I will review the information in light of the research questions: important metadata elements, the standards that were implemented, and whether the standards that currently exist have been sufficient. As explained in the literature review above, relevant metadata are crucial for enhancing end user research of digital records. This suggests that similarity of metadata across collections of the same type will improve users’ ability to conduct their research. Unfortunately, there are only a few descriptive metadata fields used across all four of the institutions reviewed in this paper. These fields include a title (sometimes used very loosely), the date of issue, the place of issue, a description, and a unique identifier. These fields certainly seem to be the absolute minimum necessary for identifying (and searching for) a postage stamp, since they are among the fields discussed in the literature review as being important to philatelic researchers. Other fields that are included in some but not all of the above collections, such as stamp denomination and access conditions, are nonetheless quite relevant to online collections of postage stamps. Interestingly, although the Scott Catalogue is recognized as a premier stamp catalogue, only one institution (the Smithsonian National Postal Museum) currently uses the Scott identification number as part of the standard philatelic metadata. As noted above, the Library and Archives Canada does include the Scott number in the behind-the-scenes metadata, but does it not display the Scott number to end users. The Postal Museum and The British Museum don’t use the Scott number at all. It appears that only the Smithsonian believes the Scott number is useful to end users, either for search or identification purposes. Of the four institutions, it appears that only The British Museum uses metadata standards that increase the accessibility of the online collection beyond its own website. The implementation of RDF for linked data creates an open collection that is machine-readable beyond the internal database used by the museum. The Smithsonian National Postal Museum, Library and Archives Canada, and The Postal Museum do not appear to use any similar metadata standard for data harvesting or transmission, which means that these collections can only be searched from within their respective websites. The most important thing to note in reviewing the online collections for these four institutions is the fact that each institution uses different standards to apply metadata in a different way. Frankly, this is not a surprise. As discussed in the literature review above, although metadata standards exist for a variety of materials, philatelic materials are simply not considered. Only the Canadian Rules for Archival Description explicitly include information about philatelic materials; INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 15 accordingly, the Library and Archives Canada utilizes these rules when creating its online records of postage stamps. No similar standard exists in the United States or internationally, leaving individual institutions with the task of deciding what generic metadata standard to use as a jumping off point, and then modifying it to meet local needs. As described above, the Smithsonian National Postal Museum uses the metadata schema that comes with their collection management software, and has created an end-user interface based off of internal metadata decisions. The Postal Museum based their metadata primarily off of ISAD(G), an international metadata standard with no specific suggestions for philatelic materials. I was unable to confirm the base metadata schema The British Museum employs, although it is clear they use RDF to make the collection’s digital records more widely available. Each institution appears to be using a different base metadata standard, essentially requiring them to reinvent the wheel upon deciding to digitize philatelic materials. This is what happens when there is no single, unified standard available for the type of material being described. CONCLUSION As this paper has shown, metadata standards are sorely lacking when it comes to philatelic materials. Other kinds of materials have received special considerations because more and more institutions decided it would be important to digitize them, so various groups came together to create standards that provide some guidance. It is time for this to happen for philatelic materials as well. There aren’t many cultural heritage institutions that currently manage digital collections of philatelic materials, so this is an opportunity for those who plan to digitize their collections to consider what has been done and what makes sense to pursue. It is clear that philatelic digitization is still nascent, but as with other kinds of materials, it is only likely that more and more institutions will attempt digitization projects. It is hoped that this paper can serve as a jumping off point for institutions to discuss the creation of international metadata standards specifically for philatelic materials. ACKNOWLEDGEMENTS Many thanks are owed to the people who took time out of their very busy lives to respond to the unrefined inquiries of an MLIS grad student: Stuart Aitken (Curator, Philately, The Postal Museum); James Bone (Archivist, Private Archives Branch, Library and Archives Canada); and Elizabeth Heydt (Collections Manager, Smithsonian National Postal Museum). Their expertise and responsiveness is immensely appreciated. MANAGING METADATA FOR PHILATELIC MATERIALS | OZERAN | doi:10.6017/ital.v36i3.10022 16 REFERENCES AAPE (American Association of Philatelic Exhibitors). 2016a. “AAPE - Join/Renew Your Membership.” http://www.aape.org/join_the_aape.asp. –––––. 2016b. “Exhibits Online.” http://www.aape.org/join_the_aape.asp. American Philatelic Society. 2016. “Stamp Catalogs: Your Guide to the Hobby.” Accessed December 8. http://stamps.org/How-to-Read-a-Catalog. Arms, Caroline R., and William Y. Arms. 2004. “Mixed Content and Mixed Metadata: Information Discovery in a Messy World.” In Metadata in Practice, edited by Diane I. Hillman and Elaine L. Westbrooks, 223-37. Chicago, IL: ALA Editions. Bade, David. 2007. “Structures, Standards, and the People Who Make Them Meaningful.” Paper presented at the 2nd meeting of the Library of Congress Working Group on the Future of Bibliographic Control, Chicago, IL, May 9, 2007. https://www.loc.gov/bibliographic- future/meetings/docs/bade-may9-2007.pdf. Bade, David. 2008. “The Perfect Bibliographic Record: Platonic Ideal, Rhetorical Strategy or Nonsense?” Cataloging & Classification Quarterly 46 (1): 109-33. https://doi.org/10.1080/01639370802183081. Boggs, Winthrop S. 1955. The Foundations of Philately. Princeton, NJ: D. Van Nostrand Company. Bruce, Thomas R., and Diane I. Hillmann. 2004. “The Continuum of Metadata Quality: Defining, Expressing, Exploiting.” In Metadata in Practice, edited by Diane I. Hillman and Elaine L. Westbrooks, 238-56. Chicago, IL: ALA Editions. Bureau of Canadian Archivists. 2008. Rules for Archival Description. Rev. ed. Ottawa, Canada: Canadian Council of Archives. http://www.cdncouncilarchives.ca/archdesrules.html. CC:DA (American Library Association Committee on Cataloging: Description and Access). 2010. “Task Force on Metadata: Final Report.” American Library Association. https://www.libraries.psu.edu/tas/jca/ccda/tf-meta6.html. Chan, Lois M., and Marcia L. Zeng. 2006. “Metadata Interoperability and Standardization – A Study of Methodology Part I: Achieving Interoperability at the Schema Level.” D-Lib Magazine 12 (6). https://doi.org/10.1045/june2006-chan. Gallery Systems. 2015. “TMS: The Museum System.” http://go.gallerysystems.com/About- TMS.html. International Council on Archives. 1999. ISAD(G): General International Standard Archival Description. 2nd ed. Stockholm, Sweden: International Council on Archives. http://www.icacds.org.uk/eng/ISAD(G).pdf. Liberty Street Software. 2016. “StampManage - The Best Way to Catalog Your Stamp Collection.” http://www.libertystreet.com/Stamp-Collecting-Software.htm. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 17 Roberts, Peter J. 2007. “Philatelic Materials in Archival Collections: Their Appraisal, Preservation, and Description.” The American Archivist 70 (1): 70-92. https://doi.org/10.17723/aarc.70.1.w3742751w5344275. Scott Publishing Co. 2014. Scott 2015 Standard Postage Stamp Catalogue. Vol. 3, Countries of the World, G-I. Sidney, OH: Scott Publishing Co. Society of American Archivists. 2013. Describing Archives: A Content Standard. 2nd ed. Chicago, IL: Society of American Archivists. http://files.archivists.org/pubs/DACS2E-2013_v0315.pdf. Society of American Archivists. 2015. Encoded Archival Description Tag Library, Version EAD3. Chicago, IL: Society of American Archivists. http://www2.archivists.org/sites/all/files/TagLibrary-VersionEAD3.pdf. Straight, David. 1994. “Adding Value to Stamp and Coin Collections.” Library Journal 119 (10): 75- 78. Accessed December 8, 2016. http://libaccess.sjlibrary.org/login?url=http://search.ebscohost.com/login.aspx?direct=tr ue&db=ulh&AN=9406157617&site=ehost-live&scope=site. Trustees of the British Museum. 2016a. “About the Collection Database Online.” Accessed December 8. http://www.britishmuseum.org/research/collection_online/about_the_database.aspx. –––––. 2016b. “British Museum Semantic Web Collection Online.” Accessed December 8. http://collection.britishmuseum.org/. 10044 ---- Microsoft Word - June_ITAL_dehmlow_final.docx Editorial Board Thoughts: Developing Relentless Collaborations and Powerful Partnerships Mark Dehmlow INFORMATION TECHNOLOGIES AND LIBRARIES | JUNE 2017 3 With the end of the performance and fiscal year wrapping up, it seemed like a good time to reflect on what change initiatives we have engaged in over the past few years that have strengthened the organizational effectiveness of the IT department in our library. My thoughts almost immediately drifted to our focus on collaboration. Early in my career, it was the profession wide culture of cross-institutional collaboration that convinced me that becoming a librarian would be the right career move. I am certain that the impetus to collaborate stems from our professional service commitment - a values based system that at its core believes that the success of all helps the collective do their jobs better in the name of service to our patrons. And yet, over the years, I have heard stories of and observed first hand internal competitions for resources, vilification of library IT as siloed and opaque factions, and library IT departments that have had strained relationships with their institution’s central IT organizations. As a part of our senior leadership team for the Hesburgh Libraries, two of my core professional interests are organizational effectiveness and staff satisfaction, especially in the face of a rapidly changing technology landscape, competition for talent in the IT sector where it is hard to contend with commercial salaries, and the slow rate of attrition at the University. Retaining talented IT staff requires creating a work culture that is better than the commercial sector, a work culture that values work/life balance, innovation and experimentation, a culture of teamwork and camaraderie, and where there is a clear sense of strategic priority. To build these latter two qualities into our work culture, we have strategically emphasized durable internal and external coalitions with a tenacious sense of partnership. True collaboration reinforces a collective sense of goals, allows for maximal efficiency, discourages unnecessary or destructive competition, and opens the door to the coveted but seldom realized ability to “stop doing” through partnering with other units on campus that share a sense of priority around particular services. Creating sustainable and significant internal collaboration requires etching it into the culture of the organization. Making it a part of the organization’s DNA has to be prioritized and modeled by senior leadership and it begins with advancing shared goals over singular agendas. In our senior leadership team, we have committed to each other as our primary team. We may advocate for staff and initiatives in our own verticals, but our drive is to be holistic stewards for the Libraries, not just our functional departments. We give as much, if not more weight, to the objectives of collective senior leadership team which also helps in clarifying priorities. Our executive leadership Mark Dehmlow (mdehmlow@nd.edu), a member of the ITAL Editorial Board, is Director of Library Information Technology, Hesburgh Libraries, University of Notre Dame, South Bend, IN. EDITORIAL BOARD THOUGHTS | DEHMLOW https://doi.org/10.6017/ital.v36i2.10044 4 models cooperation, cross-divisional problem solving, and collective strategic initiative planning. Using this model, decisions get made more quickly, enhancing our ability is to accomplish things on time with a high level of quality, and with a considerable level of satisfaction for our staff and faculty. The IT department is less viewed as a black box where decisions for what to work on are made behind the curtains and rather as a group of talented staff who help our organization accomplish their priorities. When our IT department needs to advocate for support and timely completion of work from individuals in other departments, the other senior managers help get their units mobilized. We see ourselves as part of the community and the community embraces us as part of them. Historically, it has been tempting to view IT as somewhat separate - a part of the production line, but in an age where every operation in the library is affected by technology, our workflows need to be more integrated and team based. The problems we are working on are more cross-disciplinary and require a plurality of expertise to solve. Libraries are increasingly becoming more and more of an interconnected and interdependent ecosystem that requires thinking holistically about problems and a relentless commitment to building coalitions to drive our services. It may seem obvious that this would be a more effective way to work, yet I have spoken with many people at organizations where there is a clear culture of departmental objective separation and competition for resources. I have long appreciated the the work environment at Notre Dame, in part because we strive to be an organization whose culture has been guided by our core institutional values - accountability, integrity, excellence in leadership, excellence in mission, and teamwork. These values not only drive our internal collaborations, but also the way in which different departments on campus work with each other. We have had a long standing, positive relationship with our central Office of Information Technologies - one that has been tremendously cooperative, but for many years has lacked interconnections at a variety of levels and a clear collaborative and strategic focus. In the last 5 years, our organizations have shifted their focus - the OIT from emphasizing centralized, administrative, enterprise computing to decentralized, academic, enterprise computing and the Libraries from doing everything in house to leveraging services for standardized services and focusing our staff’s time on initiatives where they can create the most value. In part, we developed an in-house IT department because we had service expectations that weren’t a priority at the time for the OIT. But during our strategic transitions, we have extended our working relationships at every level throughout our organizations - from our staff in the trenches to our managers and senior leaders. My focus as the Director for Library IT over the past few years has been to look at ways we can enhance our capacity through partnerships. To that end, there are several interrelated initiatives that we have begun to engage in with the OIT: 1. embedding an OIT presence in the Libraries 2. shifting support for common IT services to the OIT, and 3. consolidating our customer communication through their service portal ServiceNow. INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 5 The first step in this new collaboration with the OIT was letting go of the past and revisiting where the OIT and the Libraries have strategic overlaps that may not have been aligned before. As two service organizations on campus with a deep concern for supporting the academic endeavor, it was easy to find strategic alignment with each other. For the Libraries, we often get questions at our service points about how to change passwords or install printer drivers, needs that are part of the central IT service portfolio. For the OIT, the Libraries are a major campus hub where hordes of students and faculty conduct research and work on assignments, particularly after classes when many of the business unit leave the University for the day. Working closely with the Libraries’ Director for Teaching, Research, and User Services and the OIT’s Senior Director for User Services, we began developing a collaboration grounded in our common desire to support end users which resulted in creating an OIT outpost in the Libraries. While there are many libraries who have this kind of collaboration, this was a revolutionary step for us. This collaboration opened the door for us to begin a discussion about common technology services that we have been supporting internally - printing and general lab computing. For us, these services are important to function well for our end users, but they are not services that require library expertise to accomplish. The OIT supports these services for much of campus and as long as we have aligned expectations around service level - expectations that are practical and committed to excellence - the OIT can handle that function much more efficiently and we can use our staff expertise to support other, emerging services that are core to the Libraries. We are also working closely with the OIT to leverage their IT service portal ServiceNow as the Libraries’ service portal. Given that our service portfolio is much broader than strictly IT services, moving in this direction for us required a willingness from the OIT to think outside of the box and allow us to customize the system to meet our service needs. It has required some reciprocation from the Libraries as well. The ServiceNow platform is more expensive than others we could license, its functionality will require effort from our staff to customize, and it is requiring us to change workflows, especially in the public services areas. Integrating our customer communication into this platform, though, will create a better user experience for our patrons through supporting a common interface they are experienced with and it will allow for us to more easily transfer both staff and patron general IT questions to the OIT. Beginning to work in truly collaborative ways requires shifting the narrative around our relationships from a client/provider model to one of a coalition. Redefining these relationships as partnerships puts both parties on equal footing around the planning table where everyone has an equal stake in the objectives and outcomes. They don’t come effortlessly, they require libraries to ardently become more visible on campus, to articulate the complementary value that we can contribute to campus initiatives, and to proactively request to join initiatives that we haven’t participated in before. It also takes reaching out and helping campus partners see how we can collectively create value together using our unique talents to successfully support the campus community. And lastly, it takes engaging a more holistic view of the University and the way we steward its resources; sometimes that will mean allocating more resources for the common good EDITORIAL BOARD THOUGHTS | DEHMLOW https://doi.org/10.6017/ital.v36i2.10044 6 versus taking the narrower view that we should only consider our own context when adopting solutions. But in the end, if we are willing to think about our role at the University in that broader context and build powerful partnerships, we will collectively be able to serve our end users better. 10046 ---- Trope or Trap? Roleplaying Narratives and Length in Instructional Video Amanda S. Clossen INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 27 Amanda S. Clossen (asc17@psu.edu) is Learning Design Librarian, Pennsylvania State University. ABSTRACT A concern that librarians face when creating video is whether users will actually watch the video they are directed to. This is a significant issue when it comes to how-to and other point-of-need videos. How should a video be designed to ensure maximum student interest and engagement? Many of the basic skills demonstrated in how-to videos are crucial for success in research but are not always directly connected to a class. Whether a video is selected for inclusion by an instructor or viewed after it is noticed by a student depends on how viewable the video is perceived to be. This article will discuss the results of a survey of more than thirteen hundred respondents. This survey was designed to establish the broad preferences of the viewers of instructional how-to videos, specifically focusing on the question of whether the length and presence of a role-playing narrative enhances or detracts from the viewer experience, depending on demographic. LITERATURE REVIEW Length Since the seminal 2010 study by Bowles-Terry, Hensley, and Hinchliffe established emerging best practices for pace, length, content, look and feel, and video versus text, a variety of works compiling best practices for video have been created.1 The very successful Library Minute videos from Arizona State University resulted in a collection of how-tos and best practices by Rachel Perry.2 These included tips on addressing an audience, planning, content, length, frugality, and experimentation. In 2014 Coastal Carolina nursing students were surveyed for their preferences in video, resulting in another set of best practices. These focused on video length, speaking pace, zoom functionality, and use of callouts.3 Martin and Martin’s extensive 2015 review covers content, compatibility, accessibility, and audio.4 The recommended length listed in these best practices varies widely. Thirty-seconds to a minute is recommended by Bowles-Terry, Hensley, and Hinchliffe, while Perry recommends no longer than ninety seconds.5 The Coastal Carolina study and Seminole State review recommend no longer than three minutes.6 Nearly all the articles reviewed stress that complicated concepts should be broken into more easily comprehensible chunks to avoid overwhelming student cognitive load. mailto:asc17@psu.edu TROPE OR TRAP? ROLE-PLAYING NARRATIVES AND LENGTH IN INSTRUCTIONAL VIDEO | CLOSSEN 28 https://doi.org/10.6017/ital.v37i1.10046 Narrative Roleplay Scenario The typical roleplay involves a hypothetical student who needs some sort of assistance and is helped through the process using library resources. Often there is also a hypothetical guide, who can be a librarian, friend, or professor. These hypothetical situations are recorded in a variety of ways: from live-action video recordings, to screencast voice-overs, to text. The efficacy of such tools in library video have been explored little, if at all. Devine, Quinn, and Aguilar’s 2014 study explores the usage and effectiveness of micro- and macro-narratives in resident information literacy instruction,7 but there is no question that this instructional scenario is very different than how-to instructional videos. The interplay between student interest and such narratives is addressed by emotional interest theory, which states that adding unrelated but interesting material increases attention by energizing the learner. These unrelated pieces of engaging material are known as seductive details. This “highly interesting and entertaining information . . . is only tangentially related to the topic but is irrelevant to the author’s intended theme.”8 Exploration of this concept through experimental study has indicated that seductive details are detrimental to learning.9 Some evidence indicates that learners are more likely to remember these details than the important content itself thanks to cognitive load issues.10 However, there have also been cases where seductive details have improved recall.11 In their 2015 study, Park, Flowerday, and Brünken argue that the format and presentation of seductive details have varying effect on learning processes and that they can be used to positive effect.12 In this paper, the seductive details to be studied are those of the roleplay narrative used to frame instruction in how-to videos. METHODS Survey Design The survey was designed to explore three questions: • Does the length of the video affect a user’s willingness to watch it? • Do users prefer videos that are pure instruction or those that use a roleplay narrative to deliver content? • Does the demographic of the viewer affect a video’s viewability? The survey was revised in collaboration with a survey design and statistical specialist at the Penn State Library’s Data Learning Center. The completed survey was then entered into Qualtrics for implementation. Implementation Implementation and subject-gathering was done through a survey-research sampling company that provided both a wide demographic and rapid data collection. This was sponsored by an institutional grant. Subjects from a variety of institution types and geographic locations were solicited via email invitation to complete a survey that explored their perspectives on instructional videos. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 29 The twenty-question survey was focused on respondents of a traditional college age. Implementation resulted in 1,305 responses out of 1,528 surveys. After implementation, results were compiled and analyzed by a statistical expert at the institutional data center. Nearly all the analyses to follow are simple cross-tabulations of respondent choices as correlations between demographics and preference were minor based on a multivariate analysis of variance (MANOVA) test. RESULTS AND DISCUSSION Demographics The survey, which was limited to a traditionally college-aged population (eighteen to twenty- four), produced a nearly 1:1 gender distribution (figure 1). Figure 1. Age and gender distribution. The survey had around 64 percent student participants, 77 percent of these attending school full time. Of those full-time students, 60 percent were resident students, and only 9 percent were solely online students. Unemployed participants were more likely to be full-time resident students whereas online students were more likely to be employed full-time. (See figures 2 and 3.) TROPE OR TRAP? ROLE-PLAYING NARRATIVES AND LENGTH IN INSTRUCTIONAL VIDEO | CLOSSEN 30 https://doi.org/10.6017/ital.v37i1.10046 Figure 2. Employment and student status distribution. Figure 3. Resident versus online status distribution. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 31 Information and Video Confidence The distribution of confidence in information-seeking ability hovered around 90 percent. However, at most, only half of respondents had any familiarity with Google Scholar (see figure 4). This tells us several things, the most important being that what librarians consider appropriate confidence in information-seeking is very different from what the college-aged layperson considers appropriate. This supports Colón-Aguirre and Fleming-May’s 2012 study that indicates that students are likely to use free online websites that require the least effort for their research.13 Figure 4. Information-seeking confidence. Video Length Length of a video does play a role for most. About 70 percent of participants indicated that they are either more likely to watch a video with a timestamp or will rarely watch unless the time is indicated (see figure 5). Timestamp is easily provided by most video players. The mean maximum time for college-age participants’ willingness to watch was about four and a half minutes. The median was approximately three minutes. In general, shorter appears better: three to four minutes is around the maximum length that most eighteen to twenty-nine year olds are willing to watch. This contradicts all the referenced best practices but those proffered by Baker, who described thirty to ninety seconds as ideal video viewing time. Her study found that 41 percent of her students preferred videos that were one to three minutes long, but 24 percent preferred three to five minutes. Because of this, she recommends videos that are three minutes or less.14 TROPE OR TRAP? ROLE-PLAYING NARRATIVES AND LENGTH IN INSTRUCTIONAL VIDEO | CLOSSEN 32 https://doi.org/10.6017/ital.v37i1.10046 Figure 5. Perspective on viewing time. Instructions versus Roleplay The bulk of the survey was questions related to two videos. Both videos were under three minutes long and were produced using TechSmith’s Camtasia screencast software. The screencast video simply explained how to complete a research task—searching Google Scholar for an article addressing a theme in Shakespeare’s Romeo and Juliet. Viewers were guided through the process of finding articles on this topic by a single narrator. No dramatized roleplay situation was presented. The narrative video guided the participants through a hypothetical situation dramatized by two actors. The scenario was a common one—a student procrastinating on a paper and asking her roommate for assistance at the last minute. The roommate guided the student through use of Google Scholar, completing the same tasks as the screencast video. Participants watched both videos and answered a series of questions on their reactions. Number of views was tracked on the media player, verifying that both videos were viewed. Screencasts While watching the screencast video, most participants found that the narrator was trustworthy and that they were learning. Only 15 percent felt the video needed an example scenario. Though there were mixed experiences as to the length of the video, the timing of the video seemed on INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 33 point, as only 11.6 percent strongly believed that the video took too long and 7.5 percent strongly felt that went too quickly. (See figure 6.) Figure 6. Screencast reactions. When asked an open-ended question about what struck them the most in the screencast video, respondents most frequently stated that they found it to be informative and interesting, or at least neutral. However, a variety of responses were observed, both negative and positive, or even contradictory. It is worth noting that within this open-ended format, dislike of the narrator’s voice was independently assigned as one of the top three issues. This stresses the importance of coherent and pleasant narration, as it is something that viewers will likely notice. TROPE OR TRAP? ROLE-PLAYING NARRATIVES AND LENGTH IN INSTRUCTIONAL VIDEO | CLOSSEN 34 https://doi.org/10.6017/ital.v37i1.10046 Figure 7. Open-ended questions: screencast. Narrative While watching the narrative video, participants found that they could relate to the characters or scenario and found that they were learning as much as they were when watching the screencast (see figure 8). However, there were mixed responses regarding video length and credibility of the narrator. When compared across demographics, employed respondents and students were more likely to agree that they could relate to the scenario than unemployed and nonstudents. Male respondents and employed were more likely to think that the video went too fast than female and unemployed respondents. When asked an open-ended question on what most struck them about the narrative video, respondents most often stated that they found it to be boring and long, though a good number also indicated it was interesting and informative (see figure 9). Just as with the screencast video, a variety of responses, both negative and positive, were observed, some even conflicting. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 35 Figure 8. Narrative reactions. Figure 9. Open-ended questions: narrative. TROPE OR TRAP? ROLE-PLAYING NARRATIVES AND LENGTH IN INSTRUCTIONAL VIDEO | CLOSSEN 36 https://doi.org/10.6017/ital.v37i1.10046 In addition, 13.5 percent of respondents were unsatisfied with the content of the video. Just as with the screencast video, a variety of responses, both negative and positive, were observed, some even conflicting. Screencast versus Narrative The screencast video tended to be preferred by respondents, with higher average scores in content, engagement, learning value, and narrator trustworthiness. In contrast, respondents also thought that the screencast video moved too quickly compared to the narrative video. Additionally, participants were more impatient during the narrative video (see figure 10). Figure 10. Screencast versus narrative. To observe differences between the screencast and narrative videos with regards to respondent reactions within specific population demographics, MANOVA test was performed. This test revealed that none of the p-values were significant (at α = .05), leaving no correlation between student status, employment status, and reaction to each video. A more liberal interpretation of the data from this analysis might conclude that differences in impatience across student status were possibly significant (α = .10), with students being more likely to exhibit a smaller difference in *Score defined as 1 = “Not very much” to 5 = “Very much”, with Difference = Screencast score – narrative score. Red rows indicate higher scores for the narrative video. Statistics for differences in screencast and narrative* (n=1305) INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 37 impatience for the two video styles. The preferences for screencast over narrative video did not change when the demographics were spliced. CONCLUSIONS It is impossible to please everyone all the time—at least that is what survey results suggest. There are several takeaways to this study: Video length matters, especially as a consideration before the video is viewed. Timestamps should be included in video creation, or it is highly likely that the video will not be viewed. The video player is key here, as some video players include video length, while others do not. Videos that exceed four minutes are unlikely to be viewed unless they are required. Voice quality in narration matters. Although preference in type of voice inevitably varies, the actor’s voice is noticed over production value. It is important that the narrator speaks evenly and clearly. For brief how-to videos, there is a small preference for screencast instructional videos over a narrative roleplay scenario. The results of the survey indicate that roleplay videos should be well- produced, brief, and high quality. However, what constitutes high quality is not very well established.15 Finally, screencast videos should include an example scenario, however brief, to ground the viewer in the task. SUGGESTIONS FOR FURTHER STUDY Next steps for research might include a more refined survey focusing on the results of this study. Of equal value would be a series of focus groups that are given both a screencast and narrative video and asked to discuss their preferences. Though a wide variety of students were surveyed, limits of this dataset prevented the exploration of specific correlations among students attending different institution types or among those pursing different majors. Further research addressing the differences among these student bodies would be a welcome addition to the literature. REFERENCES 1 Melissa Bowles-Terry, Merinda Kaye Hensley, and Lisa Janicke Hinchliffe, “Best Practices for Online Video Tutorials in Academic Libraries: A Study of Student Preferences and Understanding,” Communications in Information Literacy 4, no. 1 (January 1, 2010): 17–28. 2 Anali Maughan Perry, “Lights, Camera, Action! How to Produce a Library Minute,” College & Research Libraries News 72, no. 5 (2011): 278–83. TROPE OR TRAP? ROLE-PLAYING NARRATIVES AND LENGTH IN INSTRUCTIONAL VIDEO | CLOSSEN 38 https://doi.org/10.6017/ital.v37i1.10046 3 Ariana Baker, “Students’ Preferences Regarding Four Characteristics of Information Literacy Screencasts,” Journal of Library & Information Services in Distance Learning 8, no. 1–2 (January 2, 2014): 67–80, https://doi.org/10.1080/1533290X.2014.916247. 4 Nichole A. Martin and Ross Martin, “Would You Watch It? Creating Effective and Engaging Video Tutorials,” Journal of Library & Information Services in Distance Learning 9, no. 1–2 (January 2, 2015): 40–56, https://doi.org/10.1080/1533290X.2014.946345. 5 Bowles-Terry, Hensley, and Hinchliffe, “Best Practices,” 23; Perry, “Lights, Camera, Action!,” 282. 6 Baker, “Students’ Preferences,” 76; Martin and Martin, “Would You Watch It?,” 48. 7 Jaclyn R. Devine, Todd Quinn, and Paulita Aguilar, “Teaching and Transforming through Stories: An Exploration of Macro- and Micro-Narratives as Teaching Tools,” Reference Librarian 55, no. 4 (October 2, 2014): 273–88, https://doi.org/10.1080/02763877.2014.939537. 8 Shannon F. Harp and Richard E. Mayer, “The Role of Interest in Learning from Scientific Text and Illustrations: On the Distinction between Emotional Interest and Cognitive Interest,” Journal of Educational Psychology 89, no. 1 (1997): 92–102, https://doi.org/10.1037//0022- 0663.89.1.92. 9 Suzanne Hidi and Valerie Anderson, “Situational Interest and Its Impact on Reading and Expository Writing,” in The Role of Interest in Learning and Development, ed. by K. Ann Renniger (Hillsdale, NJ: L. Erlbaum Associates, 1992), 213–14. 10 Babette Park et al., “Does Cognitive Load Moderate the Seductive Details Effect? A Multimedia Study,” in “Current Research Topics in Cognitive Load Theory,” special issue, Computers in Human Behavior 27, no. 1 (January 1, 2011): 5–10, https://doi.org/10.1016/j.chb.2010.05.006. 11 Annette Towler et al., “The Seductive Details Effect in Technology-Delivered Instruction,” Performance Improvement Quarterly 21, no. 2 (January 1, 2008): 65–86, https://doi.org/10.1002/piq.20023. 12 Babette Park, Terri Flowerday, and Roland Brünken, “Cognitive and Affective Effects of Seductive Details in Multimedia Learning,” Computers in Human Behavior 44 (March 1, 2015): 267–78, https://doi.org/10.1016/j.chb.2014.10.061. 13 Mónica Colón-Aguirre and Rachel A. Fleming-May, “‘You Just Type in What You Are Looking For’: Undergraduates’ Use of Library Resources vs. Wikipedia,” Journal of Academic Librarianship 38, no. 6 (November 1, 2012): 391–99, https://doi.org/10.1016/j.acalib.2012.09.013. 14 Baker, “Students’ Preferences,” 76. 15 Towler et al., “The Seductive Details,” 71. https://doi.org/10.1080/1533290X.2014.916247 https://doi.org/10.1080/1533290X.2014.946345 https://doi.org/10.1080/02763877.2014.939537 https://doi.org/10.1037/0022-0663.89.1.92 https://doi.org/10.1037/0022-0663.89.1.92 https://doi.org/10.1016/j.chb.2010.05.006 https://doi.org/10.1002/piq.20023 https://doi.org/10.1016/j.chb.2014.10.061 https://doi.org/10.1016/j.acalib.2012.09.013 Abstract Literature Review Length Narrative Roleplay Scenario Methods Survey Design Implementation Results and Discussion Demographics Information and Video Confidence Video Length Instructions versus Roleplay Screencasts Narrative Screencast versus Narrative Conclusions Suggestions for Further Study References 10060 ---- Of the People, For the People: Digital Literature Resource Knowledge Recommendation Based on User Cognition Wen Lou, Hui Wang, and Jiangen He INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 66 Wen Lou (wlou@infor.ecnu.edu.cn) is an assistant professor in the Faculty of Economics and Management, East China Normal University. Hui Wang (1830233606@qq.com) is a graduate student in the Faculty of Economics and Management, East China Normal University. Jiangen He (jiangen.he@drexel.edu) is a Doctoral Student in the College of Computing and Informatics, Drexel University. ABSTRACT We attempt to improve user satisfaction with the effects of retrieval results and visual appearance by employing users’ own information. User feedback on digital platforms has been proven to be one type of user cognition. Through conducting a digital literature resource organization model based on user cognition, our proposal improves both the content and presentation of retrieval systems. This paper takes Powell's City of Books as an example to describe the construction process of a knowledge network. The model consists of two parts. In the unstructured data part, synopses and reviews were recorded as representatives of user cognition. To build the resource category, linguistic and semantic analyses were used to analyze the concepts and the relationships among them. In the structural data part, the metadata of every book was linked with each other by informetrics relationships. The semantic resource was constructed to assist with building the knowledge network. We conducted a mock-up to compare the new category and knowledge-recommendation system with the current retrieval system. Thirty-nine subjects examined our mock-up and highly valued the differences we made for the improvements in retrieval and appearance. Knowledge recommendation based on user cognition was tested to be positive based on user feedback. There could be more research objects for digital resource knowledge recommendations based on user cognition. INTRODUCTION The concept of user cognition originates in cognitive psychology. This concept principally explores the human cognition process through information-processing methods.1 The concept characterizes a process in which a user obtains unknown information and knowledge through acquired information. As information-science workers, we may explore the psychological activities of users by analyzing their cognitive processes when they are using information services.2 A knowledge-recommendation service based on user cognition has become essential since it emphasizes facilitating collaborations between humans and computers and promotes the participation of users, which ultimately improves user satisfaction. A knowledge-recommendation system is based on a combination of information organization, a retrieval system, and knowledge visualization.3 However, when exploring digital online literature resources, it is difficult to quickly and precisely find what we want because of the problem of information organization and retrieval. Most search results only display a one-by-one list view. mailto:2012101040015@whu.edu.cn mailto:1830233606@qq.com mailto:jiangen.he@drexel.edu OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 67 https://doi.org/10.6017/ital.v37i3.10060 Thus, adding visualization techniques to an interface could improve user satisfaction. Furthermore, the retrieval system and visualizations rely on information organization. Only if information is well designed can the retrieval system and visualization be useful. Therefore, we attempt to improve retrieval efficiency by proposing a digital literature resource organization model based on user cognition to improve both the content and presentation of retrieval systems. Taking Powell’s City of Books as an example, this paper proposes user feedback as first-hand user information. We will focus on (1) resource organizations based on user cognition and (2) new formats on search results based on knowledge recommendations. We will purposefully employ data from users’ own information and give knowledge back to users in accordance with the quote “of the people, for the people.” RELATED WORK User Cognition and Measurement User cognition usually consists of a series of processes, including feeling, noticing, temporary memory, learning, thinking, and long-term memory.4 Feeling and noticing are at an inferior level, while learning, thinking, and memory are comparatively superior. Researchers have so far tried to identify user cognition processes by analyzing user needs. There are four levels of user needs according to Ma and Yang5 (See Figure 1.) In turn, user interests normally reflect potential user needs. Users who retrieve information on their own show feeling needs. Users who give feedback show expression needs. Users who ask questions show knowledge needs, which is the highest level. The methods to quantify user cognition require visible and measurable variables. Existing studies have commonly used website log analysis or user surveys. Website log analysis has been proven to be a solid data source to record and analyze both user interests and information needs.6 User surveys, including online questionnaires and face-to-face interviews, have been widely used to comprehend user feelings and user satisfaction.7 User surveys generally measure two kinds of relationship: between users and digital services and between users and the digital community.8 With a survey, we can make the most of statistics and assessment studies to analyze user satisfaction about an array of standards and systems of existing service platforms, service environments, service quality, and service personnel, which provides some references and suggestions for future study of user experience quality, platform elements, interaction process , and more.9 However, neither log data nor surveys can obtain first-hand user information in real- life settings. Eye tracking and the concept-map method can be used to understand user behavior in the course of user testing.10 However, these approaches are difficult to adapt to a large group of users. Therefore, a linguistic-oriented review analysis has become an increasingly important method. User content, including reviews and tags, could be analyzed through text mining and become valuable data sources to learn their preferences for the product and service in the areas of electronic commerce and digital libraries.11 This type of data has been called “more than words.”12 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 68 Figure 1. Understanding user cognition by analyzing user needs. User-Oriented Knowledge Service Model The user-oriented service model includes user demand, user cognition, and user information behavior. A service model based on user demand chiefly concentrates on the motives, habits, regularities, and purposes of user demand to identify the model of use demand so that the appropriate service is adopted.13 Service models based on user cognition attach importance to the process of user cognition, the influence that users are facing,14 and the change of library information services under the effects of series of cognitive processes (such as feeling, receiving, memorizing, and thinking).15 A service model based on user information behavior focuses on interactive behavior in the process of library information services that users participate in, such as interactions with academic librarians, knowledge platforms,16 and others. Studies have paid more attention to the pre-process of the user-oriented service model, which analyzes information habits and user behaviors.17 Studies have also proposed frameworks of knowledge services, design innovations,18 or personalized systems and frames of the knowledge service model, but they have not succeeded in implementing or performing user testing. Knowledge Service System Construction Most studies of knowledge service system construction are in business areas. Numerous studies have explored knowledge-innovation systems for product services.19 Cheung et al. proposed a knowledge system to improve customer service.20 Vitharana, Jain, and Zahedi composed a knowledge repository to enhance the knowledge-analysis skills of business consultants.21 From OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 69 https://doi.org/10.6017/ital.v37i3.10060 the angle of user demand, Zhou analyzed the elements of service-platform construction and found that crucial platforms should serve knowledge service system construction. 22 Scholars proposed basic models for knowledge management and knowledge sharing, but they did not simulate their applications.23 Knowledge management from the library-science perspective is very different from that in the business area. Library knowledge management usually refers to a digital library, especially a personal digital library.24 Others explore and attempt to construct a personalized knowledge service system,25 while fewer studies about system designs are based on the results of user surveys in accordance with documented surveys. We rarely see a user-feedback study combined with the method of using users’ own knowledge. Users themselves know what they desire. If user-oriented studies separate the system design from user-needs analysis or the other way around, the studies may miss the purpose. Therefore, we propose a resource-organization method based on users’ own knowledge to close the distance between the users and the system. RESOURCE-ORGANIZATION MODEL BASED ON USER COGNITION There are normally two ways to construct a category system. One method gathers experts to determine categories and assign content to them; the category system comes first and the content second. The other method is to derive a category tree from the content itself, as we propose in this paper. In this way, the content takes priority over the categorization system. In this paper, we focus on this second way to organize resources and index content. Resource organization requires a series of steps, including information processes, extraction, and organization. Figure 2 shows the resource-organization model based on user cognition. This model fits the needs of digital resources with comments and reviews. The model has two interrelated parts. One is for indexing the content, and the other is for knowledge recommendations. For the first part, the model integrates all the comments and reviews of all literature in an area or the whole resource. The core concepts and the relationships among the concepts are extracted through natural language processing. The relationships between concepts are either subordination and correlation. A triple consists of two core concepts and their relationship. The triple set includes all triples. Next, all books are indexed by taxonomy in the new category system. However, the indexing of every book is not based upon the traditional method, which is to manually determine each category by reading the literature. We use a method based on the books’ content. While we are extracting the core concepts from all books we extract the core concepts from every book by the same semantic-analysis methods and build up triples for the individual book. Then the triples of this book can match the triple set in the new category system. Once a triple in a single book yields a maximum matching value, the core concepts in the triple set will be indexed as the keywords of the book. A few examples of the matching process will be discussed in the empirical study (in the section “Indexing Books”). The first part is about comments and reviews, which are unstructured data. The second part is to make use of structural data in the bibliography to build a semantic network. Structural data, including titles, keywords, authors, and publishers, is stored separately. We calculate the INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 70 informetrics relationships among the entities. The relationships can be among different entities, such as between one author and another or between an author and a publisher. Then two entities and their relationship compose a triple. The components in triples are linked to each other, which makes them semantic resources. Furthermore, the keywords in structural data are not the original keywords before the new category system but are the modified keywords. Finally, the reindexed resources (books in the new category) and semantic resources (the triples from structural data) are both used to build the knowledge network. Figure 2. Resource-organization model based on user cognition. However, why is it important to use both unstructured data and structural data? The reason is to complete the entire content of a literature resource. Neither of them can fully represent the whole semantics for a literature resource. Structural data lacks subjective content, and unstructured data lacks basic information. Thus, a full semantic network can be built using both kinds of data. OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 71 https://doi.org/10.6017/ital.v37i3.10060 RESOURCE-ORGANIZATION EXPERIMENT Object Selection Located in Portland, Oregon, Powell’s City of Books (hereafter referred to as “Book City”) is one of the largest bookstores in the United States, with 200 million books in its inventory. Book City caught our eyes for four reasons. (1) The comments and reviews of books on Book City’s website are well constructed and plentiful. The National Geographic Channel established it as one of the ten best bookstores in the world.26 Atlantis Books, Pendulo, and Munro's Books are also on the list. Among these bookstores, only Book City and Munro’s Books have indexed the information of comments and reviews. Since user reviews are fundamental to this study, we restricted ourselves to bookstores that provided user reviews. (2) We excluded libraries because literature resources have been well organized in libraries. It might not be necessary to reorganize them according to user cognition. However, we can put this topic in the future study. (3) Book City is a typical online bookstore that also has a physical bookstore. Unlike Amazon, Book City, Indigo, Barnes & Noble, and Munro’s Books have physical bookstores. However, they all have technological limitations on retrieval-system and taxonomical construction compared to Amazon. Thus, it is necessary to investigate these bookstores’ online systems and optimize them. (4) The location was geographically convenient to the researchers. The authors are more familiar with Book City than other bookstores. Moreover, we plan on conducting a face-to-face interview for the user study. It is doable only if the authors can get to the bookstore and the users who live there. In all, we choose Book City as a representative object. Data Collection and Processing On December 22, 2015, we randomly selected the field “Cooking and Food” and downloaded bibliographic data for 462 new and old books that included title, picture, synopsis and review, ISBN, publication date, author, and keywords. In our previous work we described how metadata for all kinds of literature can be categorized into one of three types: structural data, semistructural data, and unstructured data.27 (See table 1). Title, ISBN, date, publisher, and author are classified as structural data. Titles can be seen as structural data or unstructured data depending on the need. Titles will be considered as an indivisible entity in this paper as titles need to retain their original meanings. Keywords are considered as semistructural data for two reasons: (1) normally one book is indexed with multiple keywords, which are natural language; and (2) keywords are separated by punctuation. Each keyword can individually exist with its own meaning. However, in the current category system, keywords are the names of categories and subcategories. Since we are about to reorganize the category system, the current keywords will not be included in the following steps. We use the field “Synopsis and Review” in the downloaded bibliographic records as the source of user cognition. Synopses and reviews are classified as unstructured data. All synopses and reviews of a single book are first incorporated into one paragraph, since some books contain more than one review. Structural data will be stored for constructing a knowledge network. Unstructured data will be part-of-speech tagged and word segmented by the Stanford Segmenter. All the books’ metadata are stored into the defined three data types and separate fields. Each field is linked by the ISBN as the primary key. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 72 Category Organization First, the frequencies of words in all books are separately calculated after word segmenting so that core concepts are identified by the frequencies of words. In total, 29,370 words appeared 43,675 times, after excluding stop words. The 206 words in the sample that occurred more than 105 times appeared 34,944 times. This subset was defined as the core words according to the Pareto principle. Table 1. Data Sample. Field Content Data type Title A Modern Way to Eat: 200+ Satisfying Vegetarian Recipes Structural data ISBN 9781607748038 Date 04/21/2015 Publisher Ten Speed Press Author Anna Jones KWDS Cooking and Food-Vegetarian and Natural Semistructural data Synopsis and Review A beautifully photographed and modern vegetarian cookbook packed with quick, healthy, and fresh recipes that explore the full breadth of vegetarian ingredients—grains, nuts, seeds, and seasonal vegetables—from Jamie Oliver's London-based food stylist and writer Anna Jones. How we want to eat is changing. More and more people cook without meat several nights a week and are constantly seeking to . . . Unstructured data We are inspired by Zhang et al., who described a linguistic-keywords-extraction method by defining multiple kinds of relationships among words.28 The relationships include direct relationship, indirect relationship, part-whole relationship, and related relationship. • Direct relationship. Two core words have a relationship directly to each other. • Indirect relationship. Two core words are related and linked by another word as a media. • Part-whole relationship. The “is a” relation. One core word belongs to the other. It is the most common relationship in context. • Related relationship. Two core words have no relationships but they both appear in a large context. The first two relationships can be mixed with the second two relationships. For instance, a part- whole relationship can have either a direct relationship or an indirect relationship. For this study, we combined every two core words into pairs for analysis. For example, the sentence “A picnic is a great escape from our day-to-day and a chance to turn a meal into something more festive and memorable” would result in several core-word pairs, including OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 73 https://doi.org/10.6017/ital.v37i3.10060 “Picnic” and “Meal,” “Picnic” and “Festive,” and “Meal” and “Festive.” For “Picnic” and “Meal,” there is an obvious part-whole relationship in this context. We observed all their relationships in all books and determined their relationship as a direct part-whole relationship because 67 percent of their relationships are part-whole relationship, 80 percent are direct relationship, and others are related relationship. This is the case when two core words are in the same sentence. For two words in different sentences but within one context, we define the words’ relationship as a sentence relationship. For example, “Ingredient” and “Meat” in one review in table 1 have an indirect relationship because they are connected by other core concepts between them. Therefore, the relationship between “Ingredient” and “Meat” is an indirect part-whole one in this context. For other cases, two concepts are either related if they appear in the same context or are not related if they do not appear in the same review. Thus, all couples of concepts are calculated and stored as semantic triples. Figure 3. Parts of a modified category in “Cooking and Food” based on user cognition. The next step is to build up a category tree (figure 4). A direct part-whole relationship is that between a parent class and child class. An indirect part-whole relationship is the relationship between a parent class and a grandchild class. A related relationship is the relationship between sibling classes. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 74 Compared to the modified category system (figure 3), the current hierarchical category system (figure 4) has two major issues. First, some categories’ names are duplicated. For example, the child class “By Ingredient” contains “Fruit,” “Fruits and Vegetables,” and “Fruits, Vegetables, and Nuts.” Second, there are categories without semantic meaning, such as “Oversized Books.” These two problems brought out disorderly indexing and recalled many irrelevant results. For example, the system would let you refine your search first if you type one word in search box. However, refining is confusing by parent class and children class. Searching “diet” books as an example, the system suggests you refine your search from five subcategories of “Diet and Nutrition” under three different parent classes. However, the modified category system has avoided the duplicated keywords. Furthermore, the hierarchical system based on users’ comments maintains meaning. Figure 4. Parts of current category system in “Cooking and Food.” Indexing Books We found that the list of keywords was confusing due to the inefficiency of the previous category system. It is necessary to re-index the keywords of each book based on the modified category system. We stand on the data-oriented indexing process. The method to detect the core concepts of each book is the same as that for all books in section 4.3. Taking the book A Modern Way to Eat as an example, triples are extracted from the book, including “grain-direct part whole-ingredient,” “nut-direct part whole-ingredient,” “vegetarian-related-health,” and so on. Using all triples of the book to match with the triples set from all books in section 4.3, we index this book to categories by the best match parent class. In this case, 5 out of 9 triples of A Modern Way to Eat are matched with the parent class “Ingredient.” Another two are matched with “Natural” and “Technique,” and OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 75 https://doi.org/10.6017/ital.v37i3.10060 the other two cannot correctly match with the triples set. Then, A Modern Way to Eat will be indexed with “Cooking and Food-Ingredient,” “Cooking and Food-Natural,” and “Cooking and Food-Technique.” 4.5 Semantic-Resource Construction The semantic resource is constructed based on structural data that was prepared at the beginning. The informetrics method (specifically co-word analysis) will be used to extract the precise relationship among the bibliography of books, as we previously proposed.29 We construct all structural data together and conduct co-words matrixes between each title, publisher, date, author, and keyword. For example, the author “Anna Jones” co-occurred with many keywords to varying degrees. The author co-occurred with the keyword “Natural” four times and “Person” seven times. According to Qiu and Lou, the precise relationship needs to be divided by the threshold and formatted as literal words.30 Therefore, among the degree of all relationships between “Anna Jones” and other keywords, the relationship between “Anna Jones” and “Natural” is highly correlated, and the relationship between “Anna Jones” and “Person” is extremely correlated. Triples are composed of two concepts and their relationships. Then a semantic resource is finally constructed that could be used for knowledge retrieval. Figure 5. An example of the knowledge network. Once the semantic resource is ready, the knowledge network is presentable. We adopted D3.js to display the knowledge network (figure 5). The net view automatically exhibits several books related with an author William Davis, which is placed in a conspicuous position on the screen. The forced map can be reformed when users drag any book with the mouse, which will be the noticeable center of other books. The network can connect with the database and the website. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 76 5. USER-EXPERIENCE STUDY ON KNOWLEDGE DISPLAY AND RECOMMENDATION There are two common ways to evaluate a retrieval system. One is to test the statistic results, such as the recall and precision. The other is a user study. Since our aim is “of the people, for the people,” we chose to conduct two user-experience studies over the statistical results. As such, we can obtain what users suggest and comment on our approach. User-Experience Study Design In February 2016, with the help of friends, we recruited volunteers by posting fliers in Portland, Oregon. fifty volunteers contacted us. Thirty-nine responses were received by the end of March 2016 because the other eleven volunteers were not able to enroll in the electronic test. Since we needed to test the feasibility of both the new indexing category and the knowledge recommendation, we set up the user study into two parts, including the comparison of the simple retrieval and the knowledge recommendation. First, we requested permission to use the data source and website frame from Book City. However, we cannot construct a new website for Book City due to intellectual-property issues. Therefore, we constructed a practical mock-up to guide users to simulate a retrieval experiment. Following the procedure of the user experience design, we chose MockingBot (https://mockingbot.com) as the mock-up builder. MockingBot allows the demo users to experience a vivid system that will be developed later. The mock-up supports every tag that can be linked with other pages so that subjects could click on the mock-up just as they would on a real website. The demo is expected to help us (1) examine whether our changes would meet the users’ satisfaction and (2) gather information for a better design. Then we performed face-to-face, user- guided interviews to first gain experience on the previous retrieval system and then compare them with our results. We concurrently recorded the answers and scores of users’ feedback. In the following sections, we will describe the interview process and present the feedback results. Study 1: Comparison of Simple Retrieval First, subjects were asked to search related books written by “Michael Pollan” at Powells.com (figure 6). As such, all subjects used the search box based on their instincts. Then they were asked to find a new hardcover copy of a book named Cooked: A Natural History of Transformation. We paid attention to the ways that subjects located the target. Only five of them used keyboard shortcuts to find the target. However, thirteen subjects stated their concerns regarding the absence of refinement options. Furthermore, we noticed that six subjects swept (moused over) the refinement area and then decided to continue eye screening. In the meantime, we recorded the time they spent looking for the item. After they found the target, all subjects gave us a score from one to ten that represented their satisfaction with the current retrieval system. OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 77 https://doi.org/10.6017/ital.v37i3.10060 Figure 6. Screenshot of retrieval results in the current system. In the comparison experiment, we placed our mock-up in front of subjects and conducted the same exam above. In the mock-up, we used the basic frame of the retrieval system but reframed the refinement area. In the new refinement area (figure 7), we added an optional box with refinement keywords in the left column to narrow the search scope. The logic of the refined keywords comes from the indexing category, as we mentioned in the section on the Indexing books. “Michael Pollan” was indexed in six categories: “Biographies,” “Children’s Books,” “Cooking and Food,” “Engineering Manufactures,” “Hobby and Leisure,” and “Gardening.” Thus, when subjects clicked the “Cooking and Food” category, they can refine the results to only twelve books rather than the seventy books in the current system. Users can obtain accurate retrieval results faster. After the subjects completed their tasks, they gave us a score from one to ten representing their satisfaction with the modified retrieval system. Figure 7. Refinement results in the modified category-system mock-up. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 78 Study 2: Knowledge Recommendation In this experiment, we conducted two tests for two functions on knowledge visualization. One tested the preferences for the net view, and the other tested the preferences for the individual recommendation. For the net view, we guided subjects to search for “William Davis” in the mock-up and reminded them to click the net view button after the system recalled a list view. Then, the subjects could see the net view results in figure 5. We recorded the scores that they gave for the net view. As for the recommendation on individual books, we adopted multiple layers of associated retrieval results for every book. Users could click on one book and another related book would show in a new tab window. We asked subjects to conduct a new search for “William Davis.” Then they could browse the website and freely click on any book. Once they clicked on Davis’s book Wheat Belly: Lose the Wheat, Lose the Weight, and Find Your Path Back to Health, the first recommendation results popped up (figure 8). The recommendation results about wheat in the field of “Grain and Bread” showed up, including Good to the Grain: Baking with Whole Grain Flours and Bread Bakers Apprentice: Mastering the Art of Extraordinary Bread. Others about health and losing weight showed up also, such as Paleo Lunches and Breakfasts on the Go. All related books appeared because the first book is about both wheat and a healthy diet. A new window showing relevant authors and titles would pop up if the mouse glided over any picture. We asked the subjects about their thoughts on the new recommendation format and recorded the scores. Figure 8. An example of knowledge recommendation. Users’ Feedback As a result, knowledge organization and retrieval received a positive response (tables 2 and 3). First, subjects complained about the inefficiency of the current retrieval system in that it took so long to find one book without using shortcut keys (Ctrl-F). Three quarters of them were not satisfied with the original search style due to the search time length. However, 67 percent of the subjects gave a score of more than eight points for the refined search results of our new system. OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 79 https://doi.org/10.6017/ital.v37i3.10060 Only two of them thought that it was useless since they were the two users who only took ten seconds to target the exact result. Second, 67 percent and 74 percent of the subjects, respectively, thought that the knowledge recommendation and net view were useful and gave them six points. However, five subjects gave scores of one point because they maintained that it was not necessary to build a new viewer system. Table 2. The time to find the exact result in the current system. Answers # of users Fewer than 10 seconds 2 10 to 30 seconds 4 30 seconds to 1 minute 12 More than 1 minute 21 Table 3. Statistics of quantitative questions in the questionnaire. Score Questions 10 9 8 7 6 5 4 3 2 1 Total Satisfied with original results 0 0 0 0 1 9 14 9 4 2 39 Preference of refined results 2 10 14 6 5 0 0 0 0 0 37 Preference of results in net view 1 8 10 6 4 1 2 3 1 3 39 Preference of knowledge recommendation 3 6 4 8 5 6 0 3 1 2 38 During the interview, subjects who gave scores of more than eight points spoke positively about the vivid visualization of the retrieval results, using words such as “innovative” and “creative.” For instance, User 11 said, “Bravo changes for Powell, that’d be the most innovative experience for the locals.” Among the subjects who gave scores of more than six points, the comments were mostly “interesting idea.” For instance, User 17 commented, “This is an interesting idea to explore my knowledge. I had no idea Powell could do such an improvement.” Some users offered suggestions to improve the system. For example, User 12 suggested that the system was not comprehensive enough to confidently assess whether the modified category system was better than the previous system. User 25 (a possible professional) was very concerned about the recall efficiency since the system might use many matching algorithms. DISCUSSION AND CONCLUSION In this paper, a digital literature resource organization model based on user cognition is proposed. This model aims to make users exert subjective initiative. We noticed a significant difference between the previous category system and the new system based on user cognition. Our aim, which was “of the people, for the people,” was fulfilled. Taking Powell’s City of Books as an example, it is purposeful to describe how to construct a knowledge network based on user cognition. The user experience study showed that this network implements an optimized exhibition of a digital-resource knowledge recommendation and knowledge retrieval. Although user cognition includes many other processes of user behavior, we only used the literal expression. It turned out to be a positive and possible way to reveal users’ cognition. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 80 We find that there is much more space for the construction object of digital resource knowledge recommendation based on user cognition. For one, in this paper we only take the familiar Book City as a study object and books as experiment objects and determined favorable positive effects, which indicates that the digital resource knowledge link can be applied to physical libraries and bookstores or other types of literature. Even though libraries have well-developed taxonomy systems, they can be compared with or combined with new ideas. For another, users adore visual effects and user functions. The results show promise in actualizing improvements to Book City’s website or even to other digital platforms. The concerns will be how to optimize the retrieval algorithm and reduce the time costs in the next study. ACKNOWLEDGEMENTS We thank Carolyn McKay and Powell’s City of Books for such great help for the questionnaire networking and all participates for feedback. This work was supported by the National Social Science Foundation of China [grant number 17CTQ025]. REFERENCES AND NOTES 1 Peter Carruthers, Stephen Stich, and Michael Siegal, The Cognitive Basis of Science (Cambridge: Cambridge University Press, 2002). 2 Sophie Monchaux et al., “Query Strategies during Information Searching: Effects of Prior Domain Knowledge and Complexity of the Information Problems to Be Solved,” Information Processing and Management 51, no. 5 (2015): 557–69, https://doi.org/10.1016/j.ipm.2015.05.004. 3 Hoill Jung and Kyungyong Chung, “Knowledge-Based Dietary Nutrition Recommendation for Obese Management,” Information Technology and Management 17, no. 1 (2016): 29–42, https://doi.org/10.1007/s10799-015-0218-4. 4 Dandan Ma, Liren Gan, and Yonghua Cen, “Research on Influence of Individual Cognitive Preferences upon Their Acceptance for Knowledge Classification Recommendation Service,” Journal of the China Society for Scientific and Technical Information 33, no. 7 (2014): 712–29. 5 Haiqun Ma and Zhihe Yang, “Study on the Cognitive Model of Information Searchers from the Perspective of Neuro-Language Programming,” Journal of Library Science in China 37, no. 3 (2011): 38–47. 6 Paul Gooding, “Exploring the Information Behaviour of Users of Welsh Newspapers Online through Web Log Analysis,” Journal of Documentation 72, no. 2 (2016): 232–46. https://doi.org/10.1108/JD-10-2014-0149. 7 Munmun De Choudhury and Scott Counts, “Identifying Relevant Social Media Content : Leveraging Information Diversity and User Cognition,” in ’HT11 Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (New York: ACM, 2011), 161–70, https://doi.org/10.1145/1995966.1995990; Carol Tenopir et al., “Academic Users’ Interactions with ScienceDirect in Search Tasks: Affective and Cognitive Behaviors ,” Information Processing and Management 44, no. 1 (2008): 105–21, https://doi.org/10.1016/j.ipm.2006.10.007. https://doi.org/10.1016/j.ipm.2015.05.004 https://doi.org/10.1007/s10799-015-0218-4 https://doi.org/10.1145/1995966.1995990 https://doi.org/10.1016/j.ipm.2006.10.007 OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 81 https://doi.org/10.6017/ital.v37i3.10060 8 Young Han Bae, Jong Woo Jun, and Michelle Hough, “Uses and Gratifications of Digital Signage and Relationships with User Interface,” Journal of International Consumer Marketing 28, no. 5 (2016): 323–31, https://doi.org/10.1080/08961530.2016.1189372. 9 Claude Sicotte et al., “Analysing User Satisfaction with the System in Use Prior to the Implementation of a New Electronic Inpatient Record,” in Proceedings of the 12th World Congress on Health (Medical) Informatics; Building Sustainable Health Systems (Amsterdam: IOS Press, 2007), 1779-1784; Zhenzheng Qian et al., “SatiIndicator: Leveraging User Reviews to Evaluate User Satisfaction of SourceForge Projects,” in Proceedings—International Computer Software and Applications Conference 1 (2016):93–102, https://doi.org/10.1109/COMPSAC.2016.183. 10 Christina Merten and Cristina Conati, “Eye-Tracking to Model and Adapt to User Meta-Cognition in Intelligent Learning Environments,” in Proceedings of the 11th International Conference on Intelligent User Interfaces—IUI ’06 (New York: ACM, 2006), 39–46, https://doi.org/10.1145/1111449.1111465; Weidong Zhao, Ran Wu, and Haitao Liu, “Paper Recommendation Based on the Knowledge Gap between a Researcher’s Background Knowledge and Research Target,” Information Processing & Management 52, no. 5 (2016): 976–88, https://doi.org/10.1016/j.ipm.2016.04.004. 11 Haoran Xie et al., “Incorporating Sentiment into Tag-Based User Profiles and Resource Profiles for Personalized Search in Folksonomy,” Information Processing and Management 52, no. 1 (2016): 61–72, https://doi.org/10.1016/j.ipm.2015.03.001; Francisco Villarroel Ordenes et al., “Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based Approach,” Journal of Service Research 17, no. 3 (2014): 278–95, https://doi.org/10.1177/1094670514524625; Yujong Hwang and Jaeseok Jeong, “Electronic Commerce and Online Consumer Behavior Research: A Literature Review,” Information Development 32, no. 3 (2016): 377–88, https://doi.org/10.1177/0266666914551071. 12 Stephan Ludwig et al., “More Than Words: The Influence of Affective Content and Linguistic Style Matches in Online Reviews on Conversion Rates,” Journal of Marketing 77, no. 1 (2012): 1–52, https://doi.org/10.1509/jm.11.0560. 13 Jun Yang and Yinglong Wang, “A New Framework Based on Cognitive Psychology for Knowledge Discovery,” Journal of Software 8, no. 1 (2013): 47–54. 14 Alan Baddeley, “On Applying Cognitive Psychology,” British Journal of Psychology 104, no. 4 (2013): 443–56, https://doi.org/10.1111/bjop.12049. 15 Aidan Moran, “Cognitive Psychology in Sport: Progress and Prospects,” Psychology of Sport and Exercise 10, no. 4 (2009): 420–26, https://doi.org/10.1016/j.psychsport.2009.02.010. 16 John Van De Pas, “A Framework for Public Information Services in the Twenty-First Century,” New Library World 114, no. 1/2 (2013): 67–79, https://doi.org/10.1108/03074801311291974. 17 Enrique Frias-Martinez, Sherry Y. Chen, and Xiaohui Liu, “Evaluation of a Personalized Digital Library Based on Cognitive Styles: Adaptivity vs. Adaptability,” International Journal of https://doi.org/10.1080/08961530.2016.1189372 https://doi.org/10.1109/COMPSAC.2016.183 https://doi.org/10.1145/1111449.1111465 https://doi.org/10.1016/j.ipm.2016.04.004 https://doi.org/10.1016/j.ipm.2015.03.001 https://doi.org/10.1177/1094670514524625 https://doi.org/10.1177/0266666914551071 https://doi.org/10.1509/jm.11.0560 https://doi.org/10.1111/bjop.12049 https://doi.org/10.1016/j.psychsport.2009.02.010 https://doi.org/10.1108/03074801311291974 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 82 Information Management 29, no. 1 (2009): 48–56, https://doi.org/10.1016/j.ijinfomgt.2008.01.012. 18 Shing Lee Chung et al., “An Integrated Framework for Managing Knowledge-Intensive Service Innovation,” International Journal of Services Technology and Management 13, no. 1/2 (2010): 20, https://doi.org/10.1504/IJSTM.2010.029669. 19 Koteshwar Chirumalla, “Managing Knowledge for Product-Service System Innovation: The Role of Web 2.0 Technologies,” Research-Technology Management 56, no. 2 (2013): 45–53, https://doi.org/10.5437/08956308X5602045; Koteshwar Chirumalla et al., “Knowledge- Sharing Network for Product-Service System Development: Is It a Typical?,” in International Conference on Industrial Product-Service Systems (2013): 109–14; Fumiya Akasaka et al., “Development of a Knowledge-Based Design Support System for Product-Service Systems,” Computers in Industry 63, no. 4 (2012): 309–18, https://doi.org/10.1016/j.compind.2012.02.009. 20 C. F. Cheung et al., “A Multi-Perspective Knowledge-Based System for Customer Service Management,” Expert Systems with Applications 24, no. 4 (2003): 457–70, https://doi.org/10.1016/S0957-4174(02)00193-8. 21 Padmal Vitharana, Hemant Jain, and Fatemeh Zahedi, “A Knowledge Based Component/Service Repository to Enhance Analysts’ Domain Knowledge for Requirements Analysis,” Information and Management 49, no. 1 (2012): 24–35, https://doi.org/10.1016/j.im.2011.12.004. 22 Baihai Zhou, “The Construction of Library Interdisciplinary Knowledge Sharing Service System,” in 2014 11th International Conference on Service Systems and Service Management (ICSSSM), June 25–27, 2014, https://doi.org/10.1109/ICSSSM.2014.6874033. 23 Rusli Abdullah, Zeti Darleena Eri, and Amir Mohamed Talib, “A Model of Knowledge Management System for Facilitating Knowledge as a Service (KaaS) in Cloud Computing Environment,” 2011 International Conference on Research and Innovation in Information Systems, November 23–24, 2011, 1–4, https://doi.org/10.1109/ICRIIS.2011.6125691. 24 Alan Smeaton and Jamie Callan, “Personalisation and Recommender Systems in Digital Libraries,” International Journal on Digital Libraries 5, no. 4 (2005): 299–308, https://doi.org/10.1007/s00799-004-0100-1. 25 Yanwen Wu et al., “Research on Personalized Knowledge Service System in Community E- Learning,” Lecture Notes in Computer Science (Berlin: Springer, 2006), https://doi.org/10.1007/11736639_17; Shu-Chen Kao and ChienHsing Wu, “PIKIPDL. A Personalized Information and Knowledge Integration Platform for DL Service,” Library Hi Tech 30, no. 3 (2012): 490–512, https://doi.org/10.1108/07378831211266627. 26 National Geographic, Destinations of a Lifetime: 225 of the World’s Most Amazing Places (Washington D.C.: National Geographic Society, 2016). 27 Wen Lou and Junping Qiu, “Semantic Information Retrieval Research Based on Co-Occurrence Analysis,” Online Information Review 38, no. 1 (January 8, 2014): 4–23, https://doi.org/10.1016/j.ijinfomgt.2008.01.012 https://doi.org/10.1504/IJSTM.2010.029669 https://doi.org/10.5437/08956308X5602045 https://doi.org/10.1016/j.compind.2012.02.009 https://doi.org/10.1016/S0957-4174(02)00193-8 https://doi.org/10.1016/j.im.2011.12.004 https://doi.org/10.1109/ICSSSM.2014.6874033 https://doi.org/10.1109/ICRIIS.2011.6125691 https://doi.org/10.1007/s00799-004-0100-1 https://doi.org/10.1007/11736639_17 https://doi.org/10.1108/07378831211266627 OF THE PEOPLE, FOR THE PEOPLE | LOU, WANG, AND HE 83 https://doi.org/10.6017/ital.v37i3.10060 https://doi.org/10.1108/OIR-11-2012-0203; Junping Qiu and Wen Lou, “Constructing an Information Science Resource Ontology Based on the Chinese Social Science Citation Index,” Aslib Journal of Information Management 66, no. 2 (March 10, 2014): 202–18, https://doi.org/10.1108/AJIM-10-2013-0114; Fan Yu, Junping Qiu, and Wen Lou, “Library Resources Semantization Based on Resource Ontology,” Electronic Library 32, no. 3 (2014): 322–40, https://doi.org/10.1108/EL-05-2012-0056. 28 Lei Zhang et al., “Extracting and Ranking Product Features in Opinion Documents,” in International Conference on Computational Linguistics (2010): 1462–70. 29 Lou and Qiu, “Semantic Information Retrieval Research,” 4; Qiu and Lou, “Constructing an Information Science Resource Ontology,” 202; Yu, Qiu, and Lou, “Library Resources Semantization,” 322. 30 Qiu and Lou, “Constructing an Information Science Resource Ontology,” 202. https://doi.org/10.1108/OIR-11-2012-0203 https://doi.org/10.1108/AJIM-10-2013-0114 https://doi.org/10.1108/EL-05-2012-0056 ABSTRACT Introduction Related Work User Cognition and Measurement User-Oriented Knowledge Service Model Knowledge Service System Construction Resource-Organization Model Based on User Cognition Resource-Organization Experiment Object Selection Data Collection and Processing Category Organization Indexing Books 4.5 Semantic-Resource Construction 5. User-Experience Study on Knowledge Display and Recommendation User-Experience Study Design Study 1: Comparison of Simple Retrieval Study 2: Knowledge Recommendation Users’ Feedback Discussion and Conclusion Acknowledgements References and notes 10067 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Editorial: How Do You Know Whence They Will Come? Marmion, Dan Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 3 Editorial: How Do You Know Whence They Will Come? A s I write this, I am putting my affairs in order at Western Michigan University, in preparation for a move to a new position at the University of Notre Dame Libraries beginning in April. At each university my responsibilities include overseeing both the online cata- log and the libraries' Web presence. I mention this only because I find it interesting, and indicative of an issue with which the library profession in general is grappling, that librarians in both institutions are engaged in discus- sions regarding the relationship between the two. In talking to librarians at those places and others, from some I hear sentiment for making one or the other the "primary" access point. Thus I've heard arguments that "the online catalog represents our collection, so we should use it as our main access mechanism." Other librarians state that "the online catalog is fine for search- ing for books in our collection, but there is so much more to find and so many more options for finding it, that we should use our Web pages to link everything together." My hunch is that probably we can all agree that there are things that an online catalog can do better than a Web site, and things that a Web site can do better than the online catalog. As far as that goes, have we ever had a pri- mary access point (thanks to Karen Coyle for this thought)? But that's not what I want to talk about today. The debate over a primary access point contains an invalid implicit assumption and asks the wrong question. The implicit assumption is that we can and should con- trol how our patrons come into our systems. The question we should be asking ourselves is not "What is our pri- mary access method?" but rather "How can we ensure that our users, local and remote, will find an avenue that enables them to meet their informational needs?" Since at this time I'm more familiar with WMU than Notre Dame, I'll draw some examples from the former. We have "Subject Guides to Resources" on our Web site. These consist of pages put together by subject specialists that point to recommended sources, both print and electronic, Dan Marmion local and remote, on given subjects. Students can use them to begin researching topics in a large number of subject areas. The catch is that the students have to be browsing around the Web site. If they happen to start out in the online catalog they will never encounter these gateways, because the only reference to them is on the Web site. On the other hand, a student who stays strictly with the Web site is quite possibly going to miss a valuable resource in our library if he/she doesn't consult the online catalog, because we obviously can't list everything we own on the Web site. (Also, obviously, the Web site doesn't provide the patron with status information.) This is why we have to ask ourselves the correct question mentioned above. What is the solution? Unfortunately I'm not any smarter than everyone else, so I don't have the answer (although I do know some folks who can help us with it: check out www.lita.org/ committe / toptech/ main page. htm). My guess is that we'll have to work it out as a pro- fession, possibly in collaboration with our online system vendors, and that the solution will be neither quick nor simple nor easy. There are some ad hoc moves we can make, of course, such as put links to the gateways into the catalog, and on our Web pages stress that the patron really needs to do a catalog search. The bottom line is that we have a dilemma: We can't control how people come into our electronic systems, so we can't have a "primary access point." If we try, we do harm to those who, for whatever reason, reach us via some other avenue. We need to make sure that we provide equal opportunity for all. Dan Marmion (dmarmion@nd.edu) is Associate Director of Information Systems and Access at Notre Dame University, Notre Dame, Indiana. PRODUCTION: ALA Production Services (Troy D. Linker, Christine S. Taylor; Angela Hanshaw, Kevin Heubusch, and Tracy Malecki), American Library Association, 50 E. Huron St., Chicago, IL 60611. Publication of material in Information TechnologtJ and Libraries does not constitute official endorsement by LITA or the ALA. Abstracted in Computer & Information Systems, Computing Reviews, Information Science Abstracts, Library & Information Science Abstracts, Referativnyi Zhurnal, Nauclmaya i Tekhnicheskaya Informatsiya, Otdyelnyi Vypusk, and Science Abstracts Publications. Indexed in CompuMath Citation Index, Computer Contents, Computer Literature Index, Current Contents/Health Services Administration, Current Contents/Social Behavioral Sciences, Current Index to Journals in Education, Education, Library Literature, Magazine Index, NewSearch, and Social Sciences Citation Index. Microfilm copies available to sub- scribers from University Microfilms, Ann Arbor, Michigan. mum requirements of American National Standard for Information Sciences-Permanence of Paper for Printed Library Materials, ANSI Z39.48-1992.oo Copyright ©2000 American Library Association. All material in this journal subject to copyright by ALA may be photocopied for the noncommercial purpose of scientific or educational advancement granted by Sections 107 and 108 of the Copyright Revision Act of 1976. For other reprinting, photo- copying, or translating, address requests to the ALA Office of Rights and Permissions. The paper used in this publication meets the mini- EDITORIAL I 3 10068 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Is This a Geolibrary? A Case of the Idaho Geospatial Data Center Jankowska, Maria Anna;Jankowski, Piotr Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 4 Is This a Geolibrary? A Case of the Idaho Geospatial Data Center Maria Anna Jankowska and Piotr Jankowski The article presents the Idaho Geospatial Data Center (IGDC), a digital library of public-domain geographic data for the state of Idaho. The design and implementa- tion of IGDC are introduced as part of the larger context of a geolibrary model. The article presents methodology and tools used to build IGDC with the focus on a geoli- brary map browser. The use of IGDC is evaluated from the perspective of access and demand for geographic data. Finally, the article offers recommendations for future development of geospatial data centers. I n the era of integrated transnational economies, demand for fast and easy access to information has become one of the great challenges faced by the tradi- tional repositories of information-libraries. Global- ization and the growth of market-based economies have brought about, faster than ever before, acquisition and dissemination of data, and the increasing demand for open access to information, unrestricted by time and location. These demands are mobilizing libraries to adopt digital information technologies and create new methods of cataloging, storing, and disseminating information in digital formats. Libraries encounter new challenges constantly. Participation in the global information infrastructure requires them to support public demand for new infor- mation services, to help the society in the process of self- education, and to promote the Internet as a tool for sharing information. These tasks are becoming easier to accomplish thanks to the growing number of digital libraries. Since 1994, when the Digital Library Initiative originated as part of the National Information Infrastructure Program, the Internet has accommodated many digital libraries with spatial data content. For example, the Electronic Environmental Library Project at the University of California, Berkeley (http:/ /elib.cs. berkeley.edu/) provides botanical and geographic data; the University of Michigan Digital Library Teaching and Learning Project (www.si.umich.edu/UMDL/) focuses on earth and space sciences; the Carnegie Mellon's Informedia Digital Video Library (www.informedia. cs.cmu.edu) distributes digital video, audio, and images Maria Anna Jankowska (majanko@uidaho.edu) is Associate Network Resources Librarian, University of Idaho Library, and Piotr Jankowski (piotrj@uidaho.edu) is Associate Professor, Department of Geography, University of Idaho, Moscow, Idaho. 4 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 with text; and the Alexandria Digital Library at Santa Barbara (http:/ /alexandria.sdc.ucsb.edu/) provides geo- graphically referenced information. The Alexandria Digital Library is of special interest in this article because it implements a model of a geolibrary. A geolibrary stores georeferenced information searchable by geographic location in addition to traditional searching methods such as by author, title, and subject. The purpose of this article is to present the Idaho Geospatial Data Center (IGDC) in the larger context of a geolibrary model. IGDC is a digital library of public- domain geographic and statistical data for the state of Idaho. The article discusses methodology and tools used to build IGDC and contrast its capabilities with a geoli- brary model. The usage of IGDC is evaluated from the perspective of access and demand for geographic data. Finally, the article offers recommendations for future development of geospatial data centers. I Geographic Information Systems for Public Services Terms such as digital, electronic, virtual, or image libraries have existed long enough to inspire diverse interpretations. The broad definition by Covi and King concentrates on the main objective of digital libraries, which is the collection of electronic resources and servic- es for the delivery of materials in different formats.1 The common motivation for initiatives leading to the develop- ment of digital libraries is to allow conventional libraries to move beyond their traditional roles of gathering, select- ing, organizing, accessing, and preserving information. Digital libraries provide new tools allowing their users not only to access the existing data but also to create new information. The creation of new information using the existing data sources is essential to the very idea of the digital library. Since the information in a digital library exists in virtual form, it can be manipulated instanta- neously by computer-based information processing tools. This is not possible using traditional information media (e.g., paper, microfilm) where the information must first be transferred from non-digital into digital format. Since late 1994, when the U.S. National Science Foundation founded the Alexandria Digital Library Project, the number of Internet sites devoted to spatially referenced information has grown dramatically. Today, it would require a serious expenditure of time and effort to visit all geographic data sites created by state agencies, universities, and commercial organizations. In 1997 Karl Musser wrote, "there are now more than 140 sites featur- ing interactive maps, most of which have been created in the last two years." 2 This incredible boom in publishing Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. spatial data is possible thanks to geographic information system (GIS) technology and data development efforts brought about by the rapidly increasing use of GIS. This new technology provides its users with capabilities to automate, search, query, manage, and analyze geographic data using the methods of spatial analysis supported by data visualization. Traditionally, geographic data were presented on maps considered as public assets. According to a Norwegian survey, the aggregate benefit accrued from using maps was three times the total cost of their produc- tion, even though maps provided only static information.3 Today, the conventional distribution of geographic data on printed maps has become less efficient than distribut- ing them in the digital format through wide area data net- works. This happened largely due to GIS's ability to separate data storage from data presentation. As a result, data can be presented in a dynamic way, according to users' needs. Often GIS is termed "data mixing system" because it can process data from different sources and for- mats such as vector-format maps with full topological and attribute information, digital images of scanned maps and photos, satellite data, video data, text data, tabular data, and databases. 4 All of these data types provide a rich informational infrastructure about locations and proper- ties of entities and phenomena distributed in terrestrial and subterrestrial space. The definition of GIS changes according to the disci- pline using it. GIS can be used as a map-making machine, a 3-D visualization tool, and as an analytical, planning, collaboration, and business information management tool. Today, it is hard to find a planning agency, city engi- neering department, or utility company (not to mention individual Internet users) that has not used digital maps. This is why the number of users seeking spatial data in digital format has increased so dramatically. Data discov- ery can be for GIS users the most time-consuming part of using the technology. 5 As a result, libraries are faced with the growing demand for services that help discover, retrieve, and manipulate spatial data. The Web greatly improved the availability and accessibility of spatial data but, at the same time, stimulated public interest in using geographic information. The continuing migration to popular operating sys- tems (i.e., Microsoft Windows family) and the adoption of their common functionality has brought GIS software to many desktops. Tools such as ArcView GIS from Environmental Systems Research Institute, Inc. (ESRI, www.esri.com) or Maplnfo from Maplnfo Corporation (Maplnfo, www.mapinfo.com) have become popular GIS desktop systems. New software tools such as ArcExplorer, released by ESRI, are focused on making GIS more accessible, simpler, and available for use by the public. By taking advantage of the popularity of the Web, attempts are being made to gain a wider acceptance of GIS. In the wake of the simplification of GIS tools and improved access to spatial data, a new exciting area of GIS use has recently emerged-public participation GIS.6 Public participation GIS by definition is a pluralistic, inclusive, and nondiscriminatory tool that focuses on the possibility of reducing the marginalization of societies by means of introducing geographic information operable on a local level.7 It promotes an understanding of spatial problems by those who are most likely to be affected by the implementation of problem solutions, and encour- ages transfer of control and knowledge to these parties. This approach leads to a broader use of GIS tools and spa- tial data and creates new challenges for libraries storing and serving geographic data in digital formats. Broadening the use of data and GIS tools requires atten- tion to data access. Traditional libraries have often ful- filled the crucial role of being an impartial information provider for all parties involved in public decision-mak- ing processes. Will they be capable of serving the society in this capacity in the digital age? I Geolibrary as a Repository of Georeferenced Information According to Brandon Plewe, the user of spatial data can choose among seven types of distributed geographic information services available on the Intemet. 8 They range from raw data download, through static map dis- play, metadata search, dynamic map browsing, data pro- cessing, Web-based GIS query and analysis, to net-savvy GIS software. Yet, another important new category of geographic data service that can be added to this list is geolibrary. Goodchild defines a geolibrary as a library filled with georeferenced information where the primary basis of representation and retrieval are spatial footprints that determine the location by geographic coordinates. "The footprints can be precise, when they refer to areas with precise boundaries, or they can be fuzzy when the limits of the area are unclear." 9 According to Buttenfield, "the value of a geolibrary is that catalogs and other indexing tools can be used to attach explicit locational information to implicit or fuzzy requests, and once accomplished, can provide links to specific books, maps, photographs, and other materials." 10 A geolibrary is distinguished from a traditional library in being fully electronic, with digital tools to access digital catalogs and indexes. It is anticipated that most of the information is archived in digital form. The value of a geolibrary is that it can be more than a traditional, physical library in electronic form.11 IS THIS A GEOLIBRARY? I JANKOWSKA AND JANKOWSKI 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Since its introduction, the concept of a geolibrary has been synonymous with the Alexandria Digital Library (AOL) project. Once AOL was defined as the Internet- based archive providing comprehensive browsing and retrieval services for maps, images, and spatial informa- tion.12 A more recent definition characterizes AOL as a geolibrary where a primary attribute of collection objects is their location on Earth, represented by geo- graphic footprints. A footprint is the latitude and lon- gitude values that represent a point, a bounding box, a linear feature, or a complete polygonal boundary.13 According to Goodchild (1998) a geolibrary' s compo- nents include: • The browser-a specialized software application running on the user's computer and providing access to geolibrary via a computer network. • The basemap-a geographic frame of reference for the browser's searches. A basemap provides the image of an area corresponding to the geo- graphical extent of geolibrary collection. For the worldwide collection this would be the image of the Earth. For the statewide collection this could be the image of a state. The basemap may be poten- tially large, in which case it is more advantageous to include it in the browser then to download it from a geolibrary server each time a geolibrary is accessed. • The gazetteer-the index that links place names to a map. The gazetteer allows geographic searches by place name instead of by area. • Server catalogs-collection catalogs maintained on distributed computer servers. The servers can be accessed over a network with the browser, uti- lizing basic server-client architecture. The value of a geolibrary lies in providing open access to a multitude of information with geographic footprints regardless of the storage media. Because all information in a digital library is stored using the same digital medium, traditional problems of physical storage, accessibility, portability, and concurrent use (e.g., many patrons want- ing to view the one and only copy of a map) do not exist. I Idaho Geospatial Data Center In 1996, inspired by the AOL project, a team of geogra- phers, geologists, and librarians started to work on a dig- ital library of public-domain geographic data for the state of Idaho. The main goal of the project was the development of a geographic digital data repository accessible through a flexible browsing tool. The project 6 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 was funded by a grant from the Idaho Board of Education's Technology Incentive Program. The project resulted in the creation of the Idaho Geospatial Data Center (IGDC, http://geolibrary.uidaho.edu). The first in the state of Idaho, this digital library is comprised of a database containing geospatial datasets, and GeoLibrary software that facilitates access, browsing, and retrieval of data in popular GIS data formats including Digital Line Graph (DLG), Digital Raster Graphics (DRG), USGS Digital Elevation Model (DEM), and U.S. Bureau of Census TIGER boundary files for the state of Idaho. The site also provides an interactive visual analysis of select- ed demographic/economic data for Idaho counties. Additionally, the site provides interactive links to other Idaho and national spatial data repositories. The key component of the library is the GeoLibrary software. The name "GeoLibrary" is not synonymous with the model of geolibrary defined by Goodchild (1998). It was rather adopted as a reference to a geolibrary browser-one of the components of the geolibrary. The GeoLibrary browser (GL) supports online retrieval of spatial information related to the state of Idaho. It was implemented using Microsoft Visual Basic 5.0/6.0 and ESRI MapObjects technology. The software allows users to query an area of interest using a search based on map selection, as well as selection by area name (based on uses 7.5-minute quad naming convention). Queries return GIS data including DEMs, DLGs, DRGs, and TIGER files. Queries are intended both for profes- sionals seeking GIS-format data and nonprofessionals seeking topographic reference maps in the DRG format. The interface of GL consists of three panels resem- bling the Microsoft Outlook user interface. Our intent in designing the interface was to have panels that would be used in the following order. First, the map panel is used to explore the geographic coverage of the geolibrary and to select the area of interest. Next, the query panel is used to execute a query, and finally the result panel allows the user to analyze results and to download spatial data. Users can use a shortcut to go directly to the query panel and type their query. Both approaches result in the out- put being displayed as the list of files available for down- load from participating servers. The map panel (figure 1) includes a navigable map of Idaho, a vertical command toolbar, and a map finder tool. The command toolbar allows the user to zoom in, zoom out, pan the map, identify by name the entities visible on the map canvas, and select a geographic area of interest. Geographic entity name identification was implemented as a dynamic feature whereby the name of entity changes as the user moves the mouse over the map. Spatial selec- tion provides a tool to select a rectangular area of interest directly on the map canvas. The map finder provides additional means to simplify the exploration of the map. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The results panel shows the outcome of the query and includes important information about the data files: their size, type , projection, scale , the name of the server providing the data, as well as the access path (figure 4). Based on this information , the user has the option of manually connecting to the server, using FTP protocol, and retrieving th e selected files. A much more con- venient approach, however, is to rely on GL software to automati- cally retrieve the files through the software int erface. As an option , the result of the query can also be exported to a plain HTML docu- ment that contains links to all listed files . This feature can be very useful in the case of multi- file files selected by the user and slow or limited-time Internet access. This way the user can open the saved list of files in a Web browser and download indi- vidual files as needed, without having to download all the files at once and tie up the Internet connection for a long period of time. Figure 1. Map panel. The vertical toolbar provides zooming, panning , as well as labeling and simple feature querying capabilities. The map finder allows finding and selecting an area by county or USGS quad name . The screen copy here presents the selection of Latah County in Idaho. The result panel provides a flexible way to review and organ- ize the outcomes of queries before commencing the download. One can sort files by name, size, scale, The user can select a county or a quad name and zoom in on the selected geographic unit. The query panel (figure 2) allows the user to perform a query, based either on the selection made on the map or a new selection using one of the available query tools (fig- ure 3). In the latter case, the user can enter geographic coordinates (in decimal degrees) defining the area of interest. This approach is equivalent to selecting a rectan- gular area directly on the map, and will return all data files that spatially intersect with the selected area. Optionally, the user can handpick quads of interest from the list. Finally, a name can be entered to execute a more flexible query . For instance, the search containing the word "Moscow" returns spatial data related to three quads containing "Moscow" within their names. The query is executed when the user presses the Query but- ton . After the results are received, the application auto- matically switches to the results panel. projection, and server name . This feature may be useful if the user decides to retrieve data of only one type (e.g., DEMs), of one scale, or when the user prefers to connect only to a specific sever. In addition, individual records as well as entire file types can be selected to prevent files from being downloaded. The user can also remove select- ed files to scale down the set of data in the list. One of the most important assets of the GL browser is that all of the user activities described up to this point, with the exception of file download, take place entirely on the client-side without any network traffic. In fact, area/file selection as well as queries do not require an active Internet connection. Map exploration is based on vector-format maps contained in GL software and queries are run against the local database. Such an approach limits bandwidth consumption and unneces- sary network traffic. Internet connection is only necessary to perform retrieval of selected files. IS THIS A GEOLIBRARY? I JANKOWSKA AND JANKOWSKI 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 2. Query panel. The interface was set to query spatial selection from the map panel. Figure 3. Query panel. The query is based on the selection of USGS quads . Optionally, the user can enter geographic coordinates of the area or a text to search. 8 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 The vulnerability of the client-side approach to data query is to be left with a potentially outdated local database. In order to prevent this problem from happening, the GL is equipped with a database synchronization mechanism that allows users to keep up with the server database updates. The client-side database, contained in GL software, which mirrors the schema of the server database, can be synchronized automatically or by the user's request. In either case, the GL client contacts the server-based database synchronizer on the server side and handles all necessary process- es. Since the synchronization is limited to data- base record updates, the network traffic is kept low, making GL suitable for limited Internet connections. IGDC is an open solution. New local datasets can be added or removed making the collection easily adaptable to different geographical areas. In addition, datasets can physically reside on multiple servers, taking full advantage of the Internet's distributed nature. I Evaluation of IGDC Use Geospatial information is among the most common public information needs; almost 80 percent of all information is geographic in nature. Published research reflecting those needs and the role of libraries in resolving them is not extensive. The efforts of federal, state, and local agencies collecting digital geospatial data and the growth of GIS creat- ed an interest in the role of libraries as repos- itories of geospatial data. 14 The main obstacle to providing access to digital spatial information is its complexity. This is why the user-friendly interface is crit- ical for presenting spatially referenced infor- mation.15 The IGDC has been a first attempt at creating a user-friendly interface in the form of a map-based data browser allowing the users to access and retrieve geographic datasets about Idaho. In order to track and evaluate the use of geospatial data, WebTrends software was installed on the IGDC server. The WebTrends software pro- duces customized Web log statistics and allows tracking information on traffic and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. AHSAHKA -- SOUTHWICK ·· LENORE --JULIAETTA GREEN KNOB -· ALDEAMAND RIDGE PARK - TEXAS RIDGE · MCGARY BUTTE ·· BOVILL - DEARY VIOLA PALOUSE DLG_Aoads i.tJ - DLG_Rai l!l ·- DLG_Transp01t DLG_Hydro OLG_BCU'ldaries Tiger_Streets Tiger_Bnds - -- - ---'-- ----'--"-'--'--'----'=---:__.:_::.._-_- ·- - -- Since the opening of IGDC for public us e (April 1998), the GeoLibrary map browser was downloaded 1,352 times. The software proved to be relati vely easy to use by the public. Out of fort y-four bug report s/ user questions submitted to IGDC, most were concerned with filling out the software regis- tration form and not with software failure. The IGDC project spurred an interest in geographic information among students , faculty, and librarians at the University of Idaho. In a direct response to this interest, the University of Idaho library installed a new dedicated computer at the reference desk with GeoLibrar y software to access, view , and retrieve IGDC data . I Conclusion Idaho Geospatial Data Center is the first geospatial digital library for the state of Idaho. It does not fulfill all requirements of a Figure 4. The results panel. Results of a query can be sorted; individual items can be removed from the list or can be deselected to prevent them from being downloaded . geolibrary model proposed by Goodchild and others. The IGDC has only two compo- nents of the geolibrary model; they are the datasets dissemination. During a one-year timeframe the number of successful hits was more than twenty-five thousand . Almost 40 percent of users came from .com domain, 35 percent were .net domain users, 15 percent w ere .org, and 10 percent were .edu users (figure 5). Tracking the geographic origin of users by state, the biggest number of users came from Virginia, followed by Washington, California, Ohio, and Idaho . The high number of users from Virginia can be explained by the linking of the IGDC site to one of the most popular geospatial data sites in the country-the United States Geological Survey (USGS) site. Eighty-four percent of user sessions were from the United States; the rest originated from Sweden, Canada , and Germany. The average number of hits per day on weekdays was around one hundred customers. The most popular retrievable information were Digital Raster Graphics (DRG) data that present scanned images of USGS standard series topographic maps at 1:24,000 scale. Digital Elevation Models (DEM) and Digital Line Graphs (DLG) were less popular. The Tiger boundary files for the state of Idaho were in small demand . The popularity of DRG-for- mat maps and the fact that most of the users accessed IGDC via the USGS Web site makes plausible a speculation that most of the users were non-GIS specialists interested in general reference geographic information about Idaho including topography and basic land use information. GeoLibrary map browser and the basemap . The main difference between the GeoLibrary map browser and a Web-based browser solu- tion adopted by other spatial repositories is a client-side solution to geospatial data query and selection. Spatial data query is done locally on the user's machine, using the library data base schema contained in the GeoLibrary map browser. This saves time by eliminating client-serv- er communication delays during data searches, gives the user an experience of almost instantaneous response to queries , and reduces the network communication to the data download time . In comparison with th e geolibrary model, IGDC is missing the gazetteer . This component can help improve the ease of user navigation through a geospatial data col- lection. The other useful component includes online map- ping and spatial data visualization services. The idea of such services is to provide the user with a simple-to- operate mapping tool for visualizing and exploring the results of user-run queries . One such service, currently under implementation at IGDC, includes thematic map- ping of economic and demographic variables for Idaho using Descartes software .16 Descartes is a knowledge- based system supporting users in the design and utiliza- tion of thematic maps. The knowledge base incorporates domain-independent visualization rules determining which map presentation technique to employ in re- sponse to the user selection of variables. An intelligent IS THIS A GEOLIBRARY? I JANKOWSKA AND JANKOWSKI 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I ,I Distribution of IGDC Users (in %) by Domain 40 30 20 10 0 . com .net org .edu Web Domain Categories Figure 5. Distribution of IGDC Users in Percent by Origin Domain map generator such as Descartes can enhance the utility of a geolibrary by providing tools to transform georefer- enced data into information. References and Notes 1. L. Covi and R. King, "Organizational Dimensions of Effective Digital Library Use: Closed Rational and Open Natural Systems Models," Journal of the American Society for Information Science 47, no. 9 (1996): 697. 2. K. Musser, "Interactive Mapping on the World Wide Web." (1997) Accessed March 6, 2000, www .min.net/-boggan/ mapping/thesis.htm. 3. T. Bernhardsen, Geographic Information Systems (Arendal, Norway: Viak IT and Norwegian Mapping Authority, 1992), 2. 4. Ibid., 4. 5. J. Stone, "Stocking Your GIS Data Library," Issues in Science and Technology Librarianship. (Winter 1999). Accessed March 6, 2000, www.library.ucsb .edu/istl/99-winter/articlel. html. 6. P. Schroeder, "GIS in Public Participation Settings." (1997.) Accessed June 2, 1999, www.spatial.maine.edu/ucgis/ testproc/ schroeder / ucgisdft.htm . 7. W. J. Craig and others, "Empowerment, Margin- alization, and Public Participation GIS," Report of a Specialist Meeting Held under the Auspices of the Varenius Project. Santa Barbara, California, Oct. 15-17, 1998, NCGIA, UC Santa Barbara. 8. B. Plewe, GIS Online: Information Retrieval, Mapping, and the Internet (Santa Fe, N.M.: On Word Pr., 1997), 71-91 . 9. M. F. Goodchild, "The Geolibrary," in Innovations in GIS 5: Selected Papers from the Fifth National Conference on GIS Research UK (GISRUK), ed. S. Carver. (London: Taylor and Francis, 1998), 59. Accessed March 6, 2000, www.geog.ucsb.edu/ -good/Geolibrary.html . 10. B. P. Buttenfield, "Making the Case for Distributed GeoLibraries." (1998) Accessed March 6, 2000, www.nap.edu/ html/ geolibraries/ app_b .html . 11. Ibid . 12. M. Rock, "Monitoring User Navigation through the Alexandria Digital Library," (master's thesis abstract, 1998). Accessed March 6, 2000, http :/ /greenwich.colorado.edu/proj- ects/ rockm.htm. 13. L. L. Hill and others, "Geographic Names the Implementation of a Gazetteer in a Georeferenced Digital Library. D-Lib Magazine 5, no. 1 (1999). Accessed March 6, 2000, www.dlib. org/ dlib/ january99 /hill/0lhill.html. 14. M. Gluck and others, "Public Librarians' Views of the Public's Geospatial Information Needs," Library Quarterly 66, no . 4 (1996): 409. 15. B. P. Buttenfield, "User Evaluation for the Alexandria Digital Library project." (1995) Accessed March 6, 2000, http://edfu.lis.uiuc.edu/allerton/95 /s 2/buttenfield .html. 16. G. Andrienko and others, "Thematic Mapping in the Internet: Exploring Census Data with Descartes," in Proceedings of TeleGeo '99, First International Workshop on Telegeoprocessing, Lyon, May 6-7, R. Laurini, ed. (Seiten, France: Claude Bernard Univ. of Lyon, 1999), 138--45. 10 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 10069 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Internet as a Source of Academic Research Information: Findings of Two Pilot Studies Kibirige, Harry M;DePalo, Lisa Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 11 The Internet as a Source of Academic Research Information: Harry M. Kibirige and Lisa DePalo Findings of Two Pilot Studies As a source of serious subject-oriented information, the Internet has been a powerful feature in the information arena since its inception in the last quarter of the twenti- eth century. It was, however, initially restricted to gov- ernment contractors or major research universities operating under the aegis of the Advanced Research Projects Network (ARPANET).1 In the 1990s, the con- tent and use of the Internet was expanded to include mundane subjects covered in business, industry, educa- tion, government, entertainment, and a host of other areas. It has become a magnanimous network of networks the measurement of whose size, impact, and content often elude serious scholarly effort.2 Opening the Internet to common usage literally opened the flood gates of what has come to be known as the information superhighway. Currently, there is virtually no subject that cannot be found on the Internet in one form or another. T here is both hype and reality as to what the Internet can generate in terms of substantive information. In their daily pursuits of information, information professionals as well as end-users of information are challenged with regard to what their expectations are and what actually is delivered in terms of tangible informa- tion products and services on the Net. Academic users are a special breed in that both faculty and students have specific topics covered in their courses of study or facul- ty research agendas for which they need information. The use of electronic resources found on and off the Internet is becoming increasingly vital for education and training in academic environments.3 Five basic elements often are required in the electronic resources that academic infor- mation seekers desire: accessibility, timeliness, readabili- ty, relevance, and authority. The Internet excels in the first three, but depending on how and from where the infor- mation is gathered, it may not be so reliable with regard to the last two elements. The two pilot studies discussed in this article involved four academic institutions and were conducted by the researchers with approximately twelve months apart. One covering two institutions was done in the fall of 1997. It was replicated covering another two institu- tions in the spring of 1999. The main goal of the studies was to investigate how academic users perceive search engines and subject-oriented databases as sources of topi- cal information . The basic underlying question was, "When faced with a topical subject, what is the users' pre- dominant recourse, online databases (which may include CD-ROM, or DVD databases) or search engines?" Our results indicated that there is predominant preference for search engines for the group taken as a whole. Further analysis using nonparametric correlation coefficients- Kendall's tau_b and Spearman's rho-however, indicated that those who use the Internet monthly or weekly had high correlations with online databases as their preferred predominant information sources . On the other hand, daily users tended to have high correlations with search engines as preferred predominant information sources . I Information Seeking Behavior of Academic Users Over the years, several studies have been conducted on how users seek and find information relevant to their needs . For the purposes of our analysis three categories will be used: the undergraduate, the graduate, and the post-doctoral research faculty user. While the levels of how the needed information may be articulated and packaged may be different , the five basic required ele- ments in the electronic information resources needed by academics, already identified, remain the same. The Internet has, however, added another dimension to the information-seeking behavior of all academics in that much of the needed information, if and when found, has a higher chance of appearing as full text (sometimes defined as viewdata) on the Internet. 4 With viewdata the end user has the ultimate in information seeking and acquisition in that he or she will get text, images, and sound in one, two, or more resources on the Net. The process also may be accomplished in one sitting or search session on the computer terminal. The Internet thus may be more likely to generate viewdata in con- trast to conventional databases , which have for a long time been associated with the less desirable citations. In many instances and with a little persistence, it can pro- vide the analogy of "one stop shopping" whereby a user can get viewdata needed for a topic. This may explain the tendency to try the Internet first as a potential infor- mation source even for experienced searchers. To be effective, such searching needs experience and a lot of patience while sifting through pages of useless ver- biage, as the information sources often are garnered from several sites. Categories of academic users have varying levels of expertise in information seeking and Harry M. Klblrlge is Associate Professor at Queens College, City University of New York, and Lisa DePalo is Assistant Professor at College of Staten Island, City University of New York. THE INTERNET AS A SOURCE OF ACADEMIC RESOURCE INFORMATION I KIBIRIGIE AND DEPALO 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. have different characteristics in their information-seek- ing behavior. Undergraduate Users Undergraduates are at the lowest point on the totem pole with regard to expertise in information seeking at any academic institution. There is more to the information needs of undergraduate students than can be revealed during the reference interview process. There are the per- vading needs that the information age has created, which can be met only by those who possess critical thinking skills. Critical thinking skills are imperative to much more than completing college-level assignments-they are also imperative to surviving in the job market once students graduate. This premise has been set forth in the 1992 United States government report from the Department of Labor and The Secretary's Commission on Achieving Necessary Skills (SCANS) entitled Skills and Tasks for Jobs: A SCANS Report for America 2000. This report defined two types of skills needed to excel in the workplace and labeled them as competencies and foun- dations. Effective reference and instruction services can help students develop the critical thinking skills needed to meet the information competency, in particular, since it pertains to one who "acquires and evaluates information, organizes and maintains information, interprets and communicates information, and uses computers to process information." 5 Acquiring and evaluating infor- mation can be particularly difficult for undergraduates in the information age since one is bombarded with data in print and electronic formats. One can easily determine the reliability of print sources by looking at the name of the author, editor, or publisher. However, the Internet has become a popular choice for students who need to do research. It has gained the reputation for providing all that one needs right at one's fingertips. The problem is that one cannot readily discern what is reliable and what is not without some instruction. It may be argued that the undergraduates' informa- tion seeking is somewhat eased by the general guides they get from the faculty in the classroom. There is the general professorial lecture which outlines the topics to be covered during the course, as well as associated relevant readings used to broaden the subjects covered. In addition, there is the text book which elaborates on material covered in class. Finally, there are journal articles and other informa- tion sources which ordinarily are placed on reserve. As far as subject content covered in class lectures and discussion is concerned, information is usually well organized and accessible. At that level information seeking is minimal and often guided by the dictates of the professor. But then enters the term paper and the whole student peace of mind with regard to information gathering habits is disturbed. The term paper brings many unknowns to the undergraduate. The magnitude of the subject to be covered is initially fuzzy. The resources needed to get background as well as specific information are also fuzzy. Furthermore, even when the resources are a little clear, sifting through them and making rational selection of rel- evant material may be problematic. The whole academic exercise entails learning and using new information tools, many of which were not covered in high school. Computers and other electronic equipment have accentu- ated the undergraduates' mesmerization process in their information-seeking effort. A trait that most undergradu- ates exhibit in their information-seeking behavior is approaching the reference librarian for suggestions of leads to information sources needed for the term paper topic. They also may request the librarian to evaluate the sources as to their relevance, and sometimes even ask him or her to fetch the actual material needed. 6 With the advent of the Internet and other electronic resources online or otherwise, (e.g., Dialog, Lexis-Nexis, CD-ROMs, DVD, and tapes), the undergraduate may go directly to the Internet terminal and thus skip the librarians' counsel and hand-holding which used to be vital for accessing the printed material. Unless the undergraduate student is well-groomed in searching the Internet, this relatively new tendency to act independently of the information professional may result in hours of useless roaming on the Net with little relevant information retrieved. The Graduate User In their study of business students, Atkinson and Figueroa found that graduates reported fewer hours spent in the library than undergraduates.7 The researchers did not attempt to explain why that was so. Perhaps because of their search skills, graduates do more focused information seeking and do not waste much of their time browsing and floundering in the unknown information abyss within the library. The researchers reported an equal interest in searching Internet resources and online databases (e.g., Lexis-Nexis, Dow Jones, and ABI/Inform), among graduates and undergraduates. However, their research was done at the end of 1995 and beginning of 1996, before the proliferation of search engines on the Internet. As an information searcher the graduate is more sophisticated compared with the undergraduate. Subject coverage is usually more clearly defined in many of the assignments encountered. He or she has gone through most of the pitfalls of the undergraduate experience and can select a subject and research it relatively well. Most likely due to the nature of their assignments, undergrad- uates' information needs may be satisfied by simple information systems that allow users to browse. Their searches also tend to be less exhaustive than graduates. On the other hand, graduates are faced with relatively 12 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. narrower subjects and prefer to conduct more compre- hensive searches. 8 The Post-Doctoral Researcher-Faculty Faculty have mastered the art of getting relevant infor- mation. Many belong to the informal invisible college and attend professional conferences, both of which are used to get information for teaching and research. Hart's study found that formal sources, which may be found in the personal and college or university library, are more important in the faculty's information-seeking effort than informal ones. 9 According to Hart, this information- seeking characteristic would be applicable to printed and electronic resources found on the Internet. Although our research did not specifically test it, online databases tend to direct the end user to formalized definitive and tested resources than the Internet search engines. This would minimize user search time and maximize rele- vance of the information needed by the research aca- demic faculty. In other words, while the listserv might be one of the Internet substitutes for the invisible college, information found on it would be more acceptable to a research faculty if it directs him or her to reliable and verifiable databases, i.e., information from CENDATA (U.S. Census Bureau information database), EDGAR (U.S. Security and Exchange database), or Dow Jones . Developments in the electronic resources arena have made many hard copies less popular. Subject-oriented databases can be searched either in the library or in fac- ulty offices. Curtis et al. researched the information-seek- ing behavior of health sciences faculty and found a relatively new and growing information-seeking charac- teristic. According to Curtis et al., faculty tend to prefer to search electronic resources from their offices rather than go to the library. 10 That is not surprising, for if a faculty member can access library catalogs and electronic data- bases, some of which can provide viewdata (full text), it is not necessary for him or her to go to the library for some of the information needed. In addition, if CD-ROM databases are on a local area network accessible via the college online catalog, faculty may seldom go to a library whose resources are on the network via a library Web site, Telnet, or the traditional dial up. I The Pilot Studies With the general information-seeking behavior of aca- demic users in mind, the researchers decided to investi- gate the use of search engines for information sources in the academe in the New York metropolitan area. Search engines were contrasted to databases which may be URL- (Universal Resource Locator) accessible online via an Internet browser, stand alone on CD-ROM, or on CD- ROM towers linked by a library local area network. In her article on Web search engines, Schwartz discussed recent studies done on their performance . She pointed out the fact that the end user is not often a participant in such studies .11 Although our research was not on evaluation, we deliberately focused on the end user to gather statistics on perception of Web search engine utility in Internet surf- ing and information seeking . Kassel evaluated search engines indicating their variety and complexity when used to search the Internet. 12 Other relevant literature indicated the difficulty of navigating the Internet for both the information professional and the end user. It also indi- cated how direct access to databases was a shortcut to retrieving some of the topical information . Our periodic observations of Internet users revealed heavy use of search engines. We suspected that end users use them to get topical information which might otherwise be easily gotten from online databases. Consequently, we thought it necessary to conduct a study on end-user perception . Objectives Our objectives in embarking on the pilot studies were to: 1. Find the frequency of Internet use by end users. This would allow us to check whether there is a correla- tion between frequency of Internet use and percep- tion of search engine utility. 2. Find the most popular search engine. Examining the most popular search engine with respect to indexing policy might indicate whether it would generate more topical subject type of information. 3. Gauge the use of online and CD-ROM databases in the library. In order to help the end-users' memory as to what databases are involved in the research, common databases were listed on the questionnaire as examples. 4. Gauge the use of search engines in libraries and information centers. Common search engines like- wise were listed to help the end user identify what they were. 5. Relate the results to pragmatic library and informa - tion-center functioning in providing information . Methodology Four metropolitan New York academic institutions were selected : Borough of Manhattan Community College; Iona College; Queens College of the City University of New York; and Wagner College. The main criteria for selection was ease of access for the researchers. A com- posite sample of users was selected from these institu- tions to participate in the studies. The sample used was dynamic and self-selected in that whoever used the THE INTERNET AS A SOURCE OF ACADEMIC RESOURCE INFORMATION I KIBIRIGIE AND DEPALO 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. "Internet Terminal" was a potential research subject. Only end users as opposed to informa- tion professionals/librarians were used in the study . While subjects sat at the terminal, they were requested to complete the questionnaire and return it to the reference/information desk. Simplicity dictated the design of the research and data collection instrument (ques- tionnaire). It was one page, multicolored, and was entitled "Internet Use Questionnaire." We estimated that it would take the subjects four to seven minutes to complete. Our assumption Daily 46% IVlonthly 9% Weekly 45% in designing it to be simple and least time-con- suming was that since the subjects were sitting at the terminals, they were time conscious. Figure 1. Frequency of Internet Use While subjects were asked to complete the questionnaire, they had the option not to. Forty copies of the question- naire were given to each academic institutional librar y, making a total of 160. Useable returns were 155, or 97 per- cent. In addition to the questionnaire, we conducted exit interviews with some of the subjects who were using the Internet terminals after they handed in the completed questionnaires. The purpose of the interviews was to have some idea as to how the users perceived the utility of the Internet in getting electronic-based information . Four questions were used: 1. How do you find the Internet as an information source? 2. Did you get what you needed from the Internet ? 3. Do you have a favorite search engine? 4. Is there any point when you would seek the assistance of the reference librarian/information specialist? Analysis of the data was done using the SPSS (Statistical Package for Social Science) package. We used descriptive statistics for general group tendencies-fre- quency of Internet use and preferred sources for topical subject search. For inferential statistics we preferred the non-parametric pairwise two-tailed correlation coeffi- cients, Kendall's tau_b and Spearman's rho statistics . Microsoft's Excel program package was used to draw some of the illustrations. Results The study revealed that an overwhelming majority of subjects (91 percent) use the Internet at least once a week (this includes those who use it daily) . An almost equal number (45 percent) use it weekly-(at least once a week); 46 percent use it at least once a day (see figure 1). As figure 2 shows, search engines are the predominant preferred tools for searching topical subjects on the Search Engine 84% Figure 2. Preferred Sources for Subject Search Online DB 16% Internet as contrasted to online and CD-ROM databases. We used the two-tailed pairwise correlation coefficients to see whether there are correlations between frequency of Internet use and tool preferences. As table 1 and table 2 indicate, subjects who used the Internet monthly or weekly had high correlations with online databases . Daily users, however, tended to have high correlations with search engines as tools to get to topical subject infor- mation sources. I Interpretations and Conclusions Search engines certainly provide the most common access points utilized by library /information center users to get to electronic resources on the Internet . Unfortunately, the average user seems to have the impression that the Internet is a be-all and almost a panacea to all information problems. Kassel suggests 14 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The pilot studies do not give Correlatlons conclusive answers as to why the DAILY I SENG I M:>NTHL Y I WEEKLY ONDB weekly and monthly Internet users correlated with those who use online and CD-ROM databas- es. It might be that they search the Internet via search engines as sup- plements to conventional online sources. Alternatively they may search using search engines on an exploratory basis when they begin a relatively new subject. Daily users who correlated with search engines might have mistaken the highway function of search Spearman's rho DAILY Correlation Coeffic ient 1.000 -.544 Sig. (2-tailed) .456 N 4 4 SENG Correlation Coefficient -.544 1.000 Sig. (2-tailed) .456 N 4 4 M:>NTHL Y Correlation Coefficient -258 0.316 Sig. (2-tailed) .742 .684 N 4 4 WEEKLY Correlation Coefficient -.544 .500 Sig . (2-tailed) .456 .500 N 4 4 ONDB Correlation Coefficient .258 .316 Sig . (2-tailed) .742 .684 N 4 4 Table 1. Nonparametric Correlations-Spearman's Rho that, at best, search engines seem to reach just about half of the Web pages available on the Internet.13 Sullivan has given several reasons why search engine coverage is incomplete and search results sometimes may be mis- leading.14 Among the most cogent reasons are: docu- ments may be changed after they have been picked up for inclusion; deleted materials may be displayed as avail- able; and Web sites or files which are password accessible are not covered. Much of the information needed in acad- eme is proprietary and available via database vendors. Using search engines as the main recourse to topical information shortchanges the user and may lead to frus- tration unless the high user expectations are tempered by constant education by the information specialist. Correlations - .258 .742 4 .316 .684 4 1 4 .949 .051 4 .800 .200 4 -.544 .45€ 4 .50< .50< ' 0.94! .051 ' 1.00( .63 .36! ' .258 .742 4 .316 .684 4 0.8 .200 4 .632 .368 4 1.000 4 engines from the actual sources for example: EDGAR or MED- LINE or ERIC. It might have been the problem of confusing "the end" with the "means to the end ." I Implications for Information Professionals Our studies indicated that a majority of the users in the sample preferred the search engines as access points to the Internet for topical information. The interest in search engines correlated with the State University of New York at Albany study which also indicated their predominant use in searching the Internet. 15 While the Albany study was general, ours related the search engines to getting topical information and the use of online databases as an alternative. Our findings point to the need to re-educate the Internet user in several aspects of the superhigh- DAILY I SENG I M:>NTHL YI WEEKLY ONDB way. First, content-the fact that only a fraction of the possible sites (approximately one half) are indexed by the search engines. Second, authority-because it is so easy to self-publish on the Internet, a lot of information of low integrity (for instance) or fac- tual inaccuracy may be mistaken for reliable sources. Third, tran- siency of information found on the Internet must be pointed out. The maxim "here today, gone tomor- row" is appropriate for several Kendall 's tau_ b DAILY Correlation Coefficient 1.000 -.516 Sig . (2-tailed) .346 N 4 4 SENG Correlation Coefficient - .516 1.000 Sig. (2-tailed) .346 N 4 4 M:>NTHL Y Correlation Coefficient -236 .000 .183 Sig . (2-tailed) .655 .718 N 4 4 WEEKLY Correlation Coefficient -516 .000 .400 Sig . (2-tailed) .346 .444 N 4 4 ONDB Correlation Coeffic ient .236 .183 Sig. (2-tailed) .655 .718 N 4 4 Table 2. Nonparametric Correlations-Kendall's Tau-B - .236 - .516 .655 .346 4 ' .183 .40< .718 .44< 4 ' 1.000 .91, .071 4 ' .913 1.00C .071 4 ' .667 .54l .174 .27! 4 ' .236 .655 4 .183 .718 4 .667 .174 4 .548 .279 4 1.000 4 Web sites on the Internet. Finally, information professionals must THE INTERNET AS A SOURCE OF ACADEMIC RESOURCE INFORMATION I KIBIRIGIE AND DEPALO 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. emphasize in their training the proven online databases to which users should go directly, if and when those data- bases are provided by the library or information center. Information professionals have a direct link to pro- viding users with guidance to proven online databases, specifically during course-integrated instruction. Education for the end user is paramount to the optimum utilization of electronic information sources. A well- developed information resources instruction program is needed in conjunction with the one-on-one instruction that takes place every day at the reference/information desk. Such instruction programs must be cumulative, if they are to be effective in an age of burgeoning choices for end users who can more and more often choose to be remote users of information resources. In an academic environment, early intervention at the freshman level is paramount, but also must be pursued in a structured manner at the upper levels. Many college and university information resources instruction programs are based on a one-shot, approximately fifty minute session, which often is executed as an orientation to the library /infor- mation center. Such a method of instruction has no guar- antee that there will be further guidance sought, either at the behest of a teaching faculty member in the form of course-integrated instruction, or on an individual level at the reference desk. Developing effective ways to integrate information resources instruction into the lives of end users is one of the challenges information professionals face in the new millennium with an increase in the use of electronic resources found on the Internet. References and Notes 1. Jon Guice, "Looking Backward and Forward at the Internet," The Information Society 14, no. 3 (July /Sept. 1998): 201-11. 2. G. McMurdo, "The Net by Numbers," Journal of Information Science 22, no. 5 (1996): 1397-411. 3. N. L. Pelzer and others, "Library Use and Information Seeking Behavior of Veterinary Medical Students Revisited in the Electronic Environment," Bulletin of the Medical Library Association 86, no. 3 (July 1998): 346-55. 4. Harry M. Kibirige, "Viewdata," in Encyclopedia of Electrical and Electronics Engineering, vol. 23, ed. by G. Webster (New York: John Wiley, 1999), 223-31. 5. Department of Labor, The Secretary's Commission on Achieving Necessary Skills, Skills and Tasks for Jobs (Washington , D.C.: Department of Labor, 1992). 6. Gloria L. Leckie , "Desperately Seeking Citations: Uncovering Faculty Assumptions about the Undergraduate Search Process," Journal of Academic Librarianship 22, no. 3 (1996): 202-208. 7. Joseph D. Atkinson and Miguel Figueroa, "Information Seeking Behavior of Business Students : A Research Study," The Reference Librarian 58, (1997): 59-73. 8. Deborah Shaw, "Bibliographic Database Searching by Graduate Students in Language and Literature: Search Strategies, Systems Interfaces, and Relevance Judgements," Library & Information Science Research 17, no. 4 (Fall 1995): 327-45 . 9. Richard L. Hart, "Information Gathering among the Faculty of a Comprehensive College : Formality and Globality," Journal of Academic Librarianship 23, no . 1 (Jan. 1997): 21-27. 10. K. L. Curtis and others, "Information-Seeking Behavior of Health Science Faculty: The Impact of New Information Technologies," Bulletin of the Medical Library Association 85, no . 4 (Oct. 1997): 402-10. 11. Candy Schwartz, "Web Search Engines," Journal of the American Society for Information Science 49, no. 11 (Sept. 1998) 973-82. 12. Amelia Kassel, "Internet Power Searching : Finding Pearls in a Zillion Grains of Sand," Information Outlook (Apr . 1999): 28-32. 13. Ibid. 14. Danny Sullivan , "Search Engine Coverage Study Published," Search Engine Watch. Accessed March 11, 2000, www .searchenginewatch.com. / sereport/99 /OS-size.html. 15. Wei Peter He, "What Are They Doing on the Internet?: Study of Information Seeking Behaviors," Internet Reference Services Quarterly 1, no. 1 (1996): 31-51 . 16 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 10070 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Harvesting Information from a Library Data Warehouse Su, Siew-Phek T;Needamangala, Ashwin Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 17 Harvesting Information from a Library Data Warehouse Data warehousing technology has been defined by John Ladley as "a set of methods, techniques, and tools that are leveraged together and used to produce a vehicle that deliv- ers data to end users on an integrated platform. "1 This concept has been applied increasingly by industries world- wide to develop data warehouses for decision support and knowledge discovery. In the academic sector, several uni- versities have developed data warehouses containing the universities'ftnancial, payroll, personnel, budget, and stu- dent data.2 These data warehouses across all industries and academia have met with varying degrees of success. Data warehousing technology and its related issues have been widely discussed and published. 3 Little has been done, however, on the application of this cutting edge technology in the library environment using library data. I Motivation of Project Daniel Boorstin, the former Librarian of Congress, men- tions that "for most of Western history, interpretation has far outrun data." 4 However, he points out "that modem tendency is quite the contrary, as we see data outrun meaning." His insights tie directly to many large organi- zations that long have been rich in data but poor in information and knowledge. Library managers are increasingly finding the importance of obtaining a com- prehensive and integrated view of the library operations and the services it provides. This view is helpful for the purpose of making decisions on the current operations and for their improvement. Due to financial and human constraints for library support, library managers increas- ingly encounter the need to justify everything they do- for example, the library's operation budget. The most frustrating problem they face is knowing that the infor- mation needed is available somewhere in the ocean of data but there is no easy way to obtain it. For example, it is not easy to ascertain whether the materials of a certain subject area, which consumed a lot of financial resources for their acquisition and processing, are either frequently used (i.e., high rate of circulation), seldom used, or not used at all. Or, whether they satisfy users' needs. Another example, an analysis of the methods of acquisition (firm order vs. approval plan) together with the circulation rate could be used as a factor in deciding the best method of acquiring certain types of material. Such information can play a pivotal role in performing collection development and library management more efficiently and effectively. Unfortunately, the data needed to make these types of decisions are often scattered in different files maintained Siew-Phek T. Su and Ashwin Needamangala by a large centralized system, such as NOTIS, that does not provide a general querying facility or by different file/ data management or application systems. This situa- tion makes it very difficult and time-consuming to extract useful information. This is precisely where data ware- housing technology comes in. The goal of this research and development work is to apply data warehousing and data mining technolo- gies in the development of a Library Decision Support System (LOSS) to aid the library management's decision making. The first phase of this work is to establish a data warehouse by importing selected data from separately maintained files presently used in the George A. Smathers Libraries of the University of Florida into a relational database system (Microsoft Access). Data stored in the existing files were extracted, cleansed, aggregated, and transformed into the relational repre- sentation suitable for processing by the relational data- base management system. A graphical user interface (GUI) is developed to allow decision makers to query for the data warehouse's contents using either some prede- fined queries or ad hoc queries. The second phase is to apply data mining techniques on the library data ware- house for knowledge discovery. This paper covers the first phase of this research and development work. Our goal is to develop a general methodology and inexpen- sive software tools, which can be used by different func- tional units of a library to import data from different data sources and to tailor different warehouses to meet their local decision needs. For meeting this objective, we do not have to use a very large centralized database management system to establish a single very large data warehouse to support different uses. I Local Environment The University of Florida Libraries has a collection of more than two million titles, comprising over three mil- lion volumes. It shares a NOTIS-based integrated system with nine other State University System (SUS) libraries for acquiring, processing, circulating, and accessing its collection. All ten SUS libraries are under the consortium umbrella of the Florida Center for Library Automation (FCLA). Siew-PhekT. Su (pheksu@mail.uflib.ufl.edu) is Associate Chair of the Central Bibliographic Services Section, Resource Services Department, University of Florida Libraries, and Ashwin Needamangala (nsashwin@grove.ufl.edu) is a graduate student at the Electrical and Computer Engineering Department, University of Florida. HARVESTING INFORMATION FROM A LIBRARY DATA WAREHOUSE I SU AND NEEDAMANGALA 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I Library Data Sources The University of Florida Libraries' online database, LUIS, stores a wealth of data, such as bibliographic data (author, title, subject, publisher information), acquisitions data (price, order information, fund assignment), circula- tion data (charge out and browse information, with- drawn and inventory information), and owning location data (where item is shelved). These voluminous data are stored in separate files. The NOTIS system as used by the University of Florida does not provide a general querying facility for accessing data across different files. Extracting any information needed by a decision maker has to be done by writing an application program to access and manipulate these files. This is a tedious task since many application programs would have to be written to meet the different information needs. The challenge of this project is to develop a general methodology and tools for extracting useful data and metadata from these disjoint- ed files, and to bring them into a warehouse that is main- tained by a database management system such as Microsoft Access. The selection of Access and PC hard- ware for this project is motivated by cost consideration. We envision that multiple special purpose warehouses be established on multiple PC systems to provide decision support to different library units. The Library Decision Support System (LOSS) is developed with the capability of handling and analyzing an established data warehouse. For testing our method- ology and software system, we established a warehouse based on twenty thousand monograph titles acquired from our major monograph vendor. These titles were published by domestic U.S. publishers and have a high percentage of DLC/DLC records (titles cataloged by the Library of Congress). They were acquired by firm order and approval plan, The publication coverage is the calen- dar year 1996-1997. Analysis is only on the first item record (future project will include all copy holdings). Although the size of the test data used is small, it is suffi- cient to test our general methodology and the functional- ity of our software system. FCLA D82 Tables and Key List Most of the data from the twenty-thousand-title domain that go into the LOSS warehouse are obtained from the DB2 tables maintained by FCLA. FCLA developed and maintains the database of a system called Ad Hoc Report Request Over the Web (ARROW) to facilitate querying and generating reports on acquisitions activities . The data are stored in 0B2 tables. 5 For our research and development purpose, we needed DB2 tables for only the twenty-thousand titles that we identified as our initial project domain. These titles all have an identifiable 035 field in the bibliograph- ic records (zybp1996, zybcip1996, zybp1997 or zybp- cip1997). We used the BatchBAM program developed by Gary Strawn of Northwestern University Library to extract and list the unique bibliographic record numbers in separate files for FCLA to pick up. 6 Using the unique bibliographic record numbers, FCLA extracted the 0B2 tables from the ARROW database and exported the data to text files. These text files then were transferred to our system using the file transfer protocol (FrP) and inserted as tables into the LOSS warehouse. Bibliographic and Item Records Extraction FCLA collects and stores complete acquisitions data from the order records as DB2 tables. However, only brief bibliographic data and no item record data are available . Bibliographic and item record data are essen- tial for inclusion in the LOSS warehouse in order to cre- ate a viable integrated system capable of performing cross-file analysis and querying for the relationships among different types of data. Because these required data do not exist in any computer readable form, we designed a method to obtain them. Using the identical NOTIS key lists to extract the targeted twenty-thousand bibliographic and item records, we applied a screen scraping technique to scrape the data from the screen and saved them in a flat file. We then wrote a program in Microsoft Visual Basic to clean the scraped data and saved them as text-delimited files that are suitable for importing into the LOSS warehouse. Screen Scraping Concept Screen scraping is a process used to capture data from a host application. It is conventionally a three-part process: • Displaying the host screen or data to be scraped. • Finding the data to be captured. • Capturing the data to a PC or host file, or using it in another Windows application. In other words, we can capture particular data on the screen by providing the corresponding screen coordinates to the screen scraping program. Numerous commercial applications for screen scraping are available on the mar- ket. However, we used an approach slightly different from the conventional one. Although we had to capture only certain fields from the NOTIS screen, there were other fac- tors that we had to take into consideration. They are: • The location of the various fields with respect to the screen coordinates changes from record to record . This makes it impossible for us to lock a particular field with a corresponding screen coordinate. 18 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • The data present on the screen are dynamic because we are working on a "live" database where data are frequently modified. For accurate query results, all the data, especially the item record data where the circulation transactions are housed, need to be cap- tured within a specified time interval so that the data are uniform. This makes the time taken for capturing the data extremely important. • Most of the fields present on the screen needed to be captured. Taking the above factors into consideration, it was decided to capture the entire screen instead of scraping only certain parts of the screen. This made the process both simpler and faster . The unnecessary fields were fil- tered out during the cleanup process . I System Architecture The architecture of the LOSS system is shown in figure 1 and is followed by a discussion on its components' functions. NOTIS NOTIS (Northwestern Online Totally Integrated System) was developed at the Northwestern University Library and introduced in 1970. Since its inception, NOTIS has undergone many versions. University of Florida Libraries is one of the earliest users of NOTIS. FCLA has made many local modifications of the NOTIS system since UF Libraries started using it. As a result, the UF NOTIS is dif- ferent from the rest of the NOTIS world in many respects . NOTIS can be broken down into four subsystems: • acquisitions • cataloging • circulation • online public access catalog (OPAC) At the University of Florida Libraries, the NOTIS sys- tem runs on an IBM 370 main frame computer that runs the OS/390 operating system . Host Explorer Host Explorer is a software program that provides a TCP /IP link to the main frame computer . It is a terminal emulation program supporting the IBM main frame, AS/400, and VAX hosts . Host Explorer delivers an enhanced user environment for all Windows NT plat- forms, Windows 95 and Windows 3.x desktops. Exact TN3270E, TN5250, VT420/320/220/101/100/52, WYSE 50/60 and ANSI-BBS display is extended to leverage the wealth of the Windows desktop. It also supports all DB2Tables LOSS Host Explorer Data Cleansing and Extraction Warehouse Graphical User Interface Figure 1. LOSS Architecture and Its Components TCP /IP based TN3270 and TN3270E gateways. The Host Explorer program is used as the terminal emulation program in LOSS. It also provides VBA com- patible BASIC scripting tools for complete desktop macro development. Users can run these macros directly or attach them to keyboard keys, toolbar buttons, and screen hotspots for additional productivity. The function of Host Explorer in the LOSS is v ery simple. It has to "visit" all screens in the NOTIS system corresponding to each NOTIS number present in the BatchBam file, and capture all the data on the screens. In order to do this, we wrote a macro that read the NOTIS number one at a time from the BatchBam file and input the number into the command string of Host Explorer . The macro essentially performed the follow- ing functions: • Read the NOTIS numbers from the BatchBam file. • Inserted the NOTIS number into the command string of Host Explorer . • Toggled the Screen Capture option in Host Explorer so that data are scraped from the screen only at nec- essary times. • Saved all the scraped data into a flat file. After the macro has been executed, all the data scraped from the NOTIS screen reside in a flat file. The data present HARVESTING INFORMATION FROM A LIBRARY DATA WAREHOUSE I SU AND NEEDAMANGALA 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in this file have to be cleansed in order to make them suit- able for insertion into the Library Warehouse. A Visual Basic program is written to perform this function. The details of this program will be given in the next section. I Data Cleansing and Extraction This component of the LOSS is written in the Visual Basic programming language. Its main function is to cleanse the data that have been scraped from the NOTIS screen. The Visual Basic code saves the cleansed data in a text-delimit- ed format that is recognized by Microsoft Access. This file is then imported into the Library Warehouse maintained by Microsoft Access. The detailed working of the code that performs the cleansing operation is discussed below. The NOTIS screen that comes up for each NOTIS number has several parts that are critical to the working of the code. They are: • NOTIS number present in the top-right of the screen (in this case, AKR9234) • Field numbers that have to be extracted. Example: 010::, 035:: • Delimiters. The " I " symbol is used as the delimiter throughout this code. For example, in the 260 field of a bibliographic record, "I a" delimits the place of publication, " I b" the name of the publisher and, "I c" the date of publication. We shall now go step by step through the cleansing process. Initially we have the flat file containing all the data that have been scraped from the NOTIS screens. • The entire list of NOTIS numbers from the BatchBam file is read into an array called Bam_Number$. • The file containing the data that have been scraped is read into a single string called BibRecord$. • This string is then parsed using the NOTIS numbers from the Bam_Number$ array. • We now have a string that contains a single NOTIS record. This string is called Single_Record$. • The program runs in a loop till all the records have been read. • Each string is now broken down into several smaller strings based on the field numbers. Each of these smaller strings contains data pertaining to the corre- sponding field number. • A considerable amount of the data present on the NOTIS screen is unnecessary from the point of view of our project. We need only certain fields from the NOTIS screen. But even from these fields we need the data only from certain delimiters. Therefore, we now scan each of these smaller strings for a certain set of delimiters, which was predefined for each indi- vidual field. The data present in the other delimiters are discarded. • The data collected from the various fields and their corresponding delimiters are assigned to correspon- ding variables. Some variables contain data from more than one delimiter concatenated together. The reason for this can be explained as follows. There are certain fields, which are present in the database only for informational purposes and will not be used as a criteria field in any query. Since these fields will never be queried upon, they do not need to be cleansed as rigorously as the other fields and therefore, we can afford to leave the data of these fields as concatenated strings. Example: The Catalog_source field which has data from " I a" and " I c" is of the form " I a DLC I c DLC" while the Lang code field which has data from "I a" and" I h" is of the form" I a eng I h rus." But we split this into two fields: Lang_code_l containing "eng" and Lang_code_2 containing "rus." • The data collected from the various fields are saved in a flat file in the text-delimited format. Microsoft Access recognizes this format. A screen dump of the text-delimited file, which is the end result of the cleansing operation, is shown in figure 2. The flat file, which we now have, can be imported into the Library Warehouse. I Graphical User Interface In order to ease the tasks of the user (i.e., the decision maker) to create the library warehouse and to query and analyze its contents, a graphical user interface tool has been developed. Through the GUI, the user can enact the following processes or operations through a main menu: • connection to NOTIS • screen scraping • data cleansing and extracting • importing data • viewing collected data • querying • report generating The first option opens HostExplorer and provides a connection to NOTIS. lt provides a shortcut to closing or minimizing LDSS and opening HostExplorer. The Screen Scraping option activates the data scraping process. The Data Cleansing and Extracting option filters out the unnecessary data fields and saves the cleansed data in a text-delimited format. The Importing Data option imports the data in the text-delimited format into the warehouse. The Viewing Collected Data option allows the user to view the contents of a selected relational table stored in 20 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. "RECORD HUMBER","System Control Humber","Catalogin Source","Language Codes 1","Language Code~ "AKR9234", "YBP1996 0507--CLARR done", "a DLC I c DLC ", "1 : I a eng "," I h rus", "e-ur-ru", "306/. 0~ "RKS6472", "YBP1996 0507--CLRRR done"," a DLC I c DLC ", "1 : I a eng "," I h rus", "Hull", "891. 73/ 44 "AKS6493", "YBP1996 0507--CLARR done"," a DLC I c DLC ","Hull", "Hull", "Hull"," 001. 4/225/ 028563 I ~f "AJX7554", "YBP1996 05 08--CLARR done"," a Uk I c Uk ","Hull", "Hull", "e-uk---", "362. 1 / 068 12 2 O",' "AKB3478", "YBP1996 05 08--CLARR done"," a DLC c DLC ","Hull", "Hull", "e-fr---", "843/. 7 12 2 O", "t " "AKC6442","YBP19960508--CLARR done","a DLC c DLC ","1 : la eng ","lh ger","e-fr---","194 12 "AKE9837", "YBP1996 0508--CLARR done"," a DLC c DLC ","Hull", "Hull", "e-gr---", "883/. 01 12 20",' "AKK9486", "YBP1996 0508--CLARR done", "a DLC c DLC ","Hull", "Hull", "e-uk---", "822/. 052309 12 ~% l'AKL2258", "YBP1996 05 08--CLARR done"," a DLC c DLC ","Hull", "Hull", "e-xr---", "929. 4/2/ 08992401 1• "AKM2455", "YBP1996 05 08--CLARR done"," a DLC c DLC ","Hull", "Hull", "e-gx---", "943. 086 12 2 O",' "AKM4649", "YBP1996 0508--CLARR done"," a DLC c DLC ","Hull", "Hull", "Hull", "863/ .64 I 2 20", "Hu] ' "AKH0246","YBP19960508--CLARR done","a DLC c DLC ","Hull","Hull","n-us--- la e-uk-en","700/. "AKH181 O", "YBP1996 05 08--CLARR done"," a DLC c DLC ","Hull" ,"Hull", "e-uk---", "305. 6/2 042/ 0903.: "AKH3749","YBP19960508--CLARR done","a DLC c DLC ","Hull","Hull","f-ke--- la f-so - --","327.{ "AKQ727 4", "YBP1996 05 08--CLARR done"," a DLC c DLC ","Hull", "Hull", "Hull", "355. 4/2 12 2 O", "Hu] "AKQ9180", "Y.BP1996 0508--CLARR done", "a DLC c DLC ","Hull", "Hull", "n-us---", "23 0/. 93/ 09 12 2,f "AKR 0424", "YBP1996 05 08--CLARR done"," a DLC c DLC ","Hull", "Hull", "n-us-mi", "331 . 88/1292/ 097' "RKR1411", "YBP1996 05 08--CLARR done"," a CL I c CL ","Hull", "Hull", "n-us---", "3 05. 896/ 073 12 2 O' "AKR1846", "YBP1996 05 08--CLARR done"," a DLC I c DLC ","Hull", "Hull", "e-uk-ni", "Hull", "Hull", "x, "AKR2169", "YBP1996Jt5 08--CLARR done"," a DLC I c DLC ","Hull", "Hull", "n-us-sc", "323. 1/196073/ 091 "AKR2245" ,"YBP19960508--C .LARR d.one" ," a DLC I c DLC ","Hull", "Hull", "Hull", "306 .4/6 I 2 20", "Hu1 "AKR2255", "YBP1996 05 08--CLARR done"," a DLC I c DLC ","Hull", "Hull", "Hull", "3 03. 48/2 12 2 O", "2r "AKR226 O", "YBP1996 0508--CLARR done"," a DLC I c DLC ","Hull", "Hull", "n-us- - -", "3 03. 48/2 12 2 O", "AKR2281", "YBP1996 05 08--CLARR done"," a DLC I c DLC ","Hull", "Hull", "t------ I a r------", "333. , · "AKR2287", "YBP1996 05 08--CLARR done"," a DLC I c DLC ","Hull", "Hull", "Hull", "57 4. 5/262 12 2 O", "t "RKR2357", "YBP1996 05 08--CLARR done"," a DLC I c DLC ","Hull", "Hull", "e------", "361 . 6/1 / 094 12 l "AKR2358", "YBP1996 0508--CLARR done"," a DLC I c DLC ","Hull", "Hull" ,"Hull", "333. 7/2/01 12 20" ,' ¥' "AKR2371", "YBP1996 05 08--CLARR done"," a DLC I c DLC ","Hull", "Hull", "e------", "3 07. 72/ 094 12 211 "AKR2386", "YBP1996 05 08--CLARR done", "DLC I c DLCI", "Hull" ,/'Hull", "e-uk---", "Hull", "Hull", "xu, "RKR25 03", "YBP1996 05 08--CLARR done"," a DLC I c DLC ","Hull", "Hull", "Hull", "575. 1 / 09 12 2 O", "HL 'i-r---· ----- -----·----- Figure 2. A Text-Delimited File the warehouse. The Querying option activates LDSS's querying facility that provides wizards to guide the for- mulations of different types of queries, as discussed later in this article . The last option, Report Generating, is for the user to specify the report to be generated. I Data Mining Tool A very important component of LOSS is the data mining tool for discovering association rules that specify the interrelationships of data stored in the warehouse. Many data mining tools are now available in the commercial world. For our project, we are investigating the use of a neural-network-based data mining tool developed by Limin Fu of the University of Florida.? The tool allows the discovery of association rules based on a set of training data provided to the tool. This part of our research and development work is still in progress . The existing GUI and report generation facilities will be expanded to include the use of this mining tool. I Library Warehouse FCLA exports the data existing in the 0B2 tables into text files. As a first step towards creating the database, these text files are transferred using FTP and form separate relational tables in the Library Warehouse. The data that HARVESTING INFORMATION FROM A LIBRARY DATA WAREHOUSE I SU AND NEEDAMANGALA 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are scraped from the bibliographic and item record screens result in the formation of two more tables. Characteristics Data in the warehouse are snapshots of the original data files. Only a subset of the data contents in these files are extracted for querying and analysis since not all the data are useful for a particular decision-making situation. Data are filtered as they pass from the operational envi- ronment to the data warehouse environment. This filter- ing process is necessary particularly when a PC system, which has limited secondary storage and main memory space, is used. Once extracted and stored in the ware- house, data are not updateable. They form a read-only database. However, different snapshots of the original files can be imported into the warehouse for querying and analysis. The results of the analyses of different snap- shots can then be compared. Structure Data warehouses have a distinct structure. There are summarization and detail structures that demarcate a data warehouse. The structure of the Library Data Warehouse is shown in figure 3. The different components of the Library Data Warehouse as shown in figure 3 are: • NOTIS and 0B2 Tables. Bibliographic and circula- tion data are obtained from NOTIS through the screen scraping process and imported into the ware- house. FCLA maintains acquisitions data in the form of DB2 tables. These are also imported into the ware- house after conversion to a suitable format. • Warehouse. The warehouse consists of several rela- tional tables that are connected by means of relation- ships. The universal relation approach could have been used to implement the warehouse by using a single table. The argument for using the universal relation approach would be that all the collected data fall under the same domain. But let us examine why this approach would not have been suitable. The dif- ferent data collected for import into the warehouse were bibliographic data, circulation data, order data, and pay data. Now, if all these data were incorporat- ed into one single table with many attributes, it would not be of any exceptional use since each set of attributes have their own unique meaning when grouped together as bibliographic table, circulation table, and so on. For example, if we group the circu- lation data and the pay data together in a single table, it would not make sense. However, the pay data and the circulation data are related through the Bib_key. Hence, our use of the conventional approach of hav- User .....--------~----.----------......----=--___ Bibliographic Data View Circulation Data View Ufbib, Ufpay, Ufinv, Ufcirc, Uford WAREHOUSE Pay Data View Import Screen Scraping NOTIS FCLA DB2 Tables Figure 3. Structure of the Library Data Warehouse ing several tables connected by means of relation- ships is more appropriate. • Views. A view in SQL terminology is a single table that is derived from other tables. These other tables could be base tables or previously defined views. A view does not necessarily exist in physical form; it is considered a virtual table, in contrast to base tables whose tables are actually stored in the database. In the context of the LDSS, views can be implemented by means of the AdHoc Query Wizard. The user can define a query /view using the Wizard and save it for future use. The user can then define a query on this query I view. • Summarization. The process of implementing views falls under the process of summarization. Summarization provides the user with views, which make it easier for users to query on the data of their interests. As explained above, the specific warehouse we established consists of five tables. Table names including "_WH" indicates that it contains current detailed data of the warehouse. Current detailed data represents the most recent snapshot of data that has been taken from the NOTIS system. The summarized views are derived from the current detailed data of the warehouse. Since current detailed data of the warehouse are the basic data of the 22 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. application, only the current detailed data tables are shown in appendix A. I Decision Support by Querying the Warehouse The warehouse contains a set of integrated relational tables whose contents are linked by the common primary key, the Bib_key (Biblio_key). The data stored across these tables can be traver sed by matching the key values associated with their tuples or records . Decision makers can issue all sorts of SQL-type queries to retrieve useful information from the warehouse. Two general types of queries can be distinguished : predefined queries and ad hoc queries . The former type refers to queries that are fre- quently used by decision makers for accessing informa- tion from different snapshots of data imported into the warehouse . The latter type refers to queries that are exploratory in nature. A decision maker suspects that there is some relationship between different types of data and issues a query to verify the existence of such a rela- tionship. Alternatively, data mining tools can be applied to analyze the data contents of the warehouse and dis- cover rules of their relationships (or associations). Predefined Queries Below are some sample queries posted in English. Their corresponding SQL queries can be processed using LOSS. l. Number and percentage of approval titles circulated and noncirculated. 2. Number and percentage of firm order titles circulat- ed and noncirculated . 3. Amount of financial resources spent on acquiring noncirculated titles. 4. Number and percentage of DLC/DLC cataloging records in circulated and noncirculated titles . 5. Number and percentage of "shared" cataloging records in circulated and noncirculated titles. 6. Numbers of original and "shared" cataloging records of noncirculated titles. 7. Identify the broad subject areas of circulated and noncirculated titles . 8. Identify titles that have been circulated "n" number of times and by subjects . 9. Number of circulated titles without the 505 field. Each of the above English queries can be realized by a number of SQL queries. We shall use the first two English queries and their corresponding SQL queries to explain how the data warehouse contents and the query- ing facility of Microsoft Access can be used to support decision making. The results of SQL queries also are given . The first English query can be divided into two parts (see figure 4), each realized by a number of SQL queries as shown below . Sample Query Outputs Query 1: Number and percentage of approval titles circu- lated and noncirculated Result : Total approval titles Circulated Noncirculated 1172 980 192 83.76 % 16.24 % Similar to the above SQL queries, we can translate the second English query into a number of SQL queries and the result is given below: Query 2: Number and percentage of firm order titles cir- culated and noncirculated Result : Total firm order titles Circulated Noncirculated Report Generation 1829 1302 527 71.18 % 28.82 % The results of the two predefined English queries can be presented to users in the form of a report. Total titles 3001 Approval 1172 39% Circulated 980 83.76 % Noncirculated 192 16.24 % Firm Order 1829 61% Circulated 1302 71.18 % Noncirculated 527 28 .82 % From the above report, we can ascertain that, though 39 percent of the titles were purchased through the approval plan and 61 percent through firm orders, the approval titles have a higher rate of circulation, 83.76 per- cent, as compared to firm order titles of 71.18 percent. It is important to note that the result of the above queries is taken from only one snapshot of the circulation data. Analysis from several snapshots is needed in order to compare the results and arrive with reliable information. We now present a report on the financial resources spent on acquiring and processing noncirculated titles. In order to generate this report, we need the output of queries four and five listed earlier in this article. The cor- responding outputs are shown below. Query 4: Number and percentage of DLC/DLC cataloging records in circulated and noncirculated titles. HARVESTING INFORMATION FROM A LIBRARY DATA WAREHOUSE I SU AND NEEDAMANGALA 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Result: Total DLC/DLC records Circulated Noncirculated 2852 2179 673 76.40% 23.60% Query 5: Number and percentage of "shared" cataloging records in circulated and noncirculated titles. Result: Total "shared" records Circulated Noncirculated 149 100 49 67.11% 32.89% In order to come up with the financial resources, we need to consider several factors, which contribute to the amount of financial resources spent. For the sake of sim- plicity, we consider only the following factors: 1. the cost of cataloging each item with DLC/DLC record Approval Titles Circulated 2. the cost of cataloging each item with shared record 3. the average price of noncirculated books 4. the average pages of noncirculated books 5. the value of shelf space per centimeter Because the value of the above factors differs from institution to institution and might change according to more efficient workflow and better equipment used, users are required to fill in the value for factors 1, 2, and 5. LOSS can compute factors 3 and 4. The financial report , taking into consideration the value of the above factors, could be as shown below. Processing cost of each DLC Title = $10.00 673 X $10.00 = $ 6,730.00 Processing cost of each Shared Title = $20.00 SQL query t.o retrieve the distinct bibliographic keys of all the approval titles: SELECT DISTINCT BibScreen.Bib_key FROM BibScreen RIGHT JOIN pa yl ON BibScreen.Bib_key = pa y l.BIB_NUM WHERE (((payl.FUND_KEY) Like "*07*")); SQL query to count the number of approval titles that have been circulated: SELECT Count (Appr_Title.Bib_key) AS CountOfBib_key FROM (BibScreen INNER JOIN Appr_Title ON BibScreen.Bib_key = Appr _Title.Bib_key) INNER JOIN ItemScreen ON BibScreen.Bib_key = ItemScreen .Biblio_key WHERE (((ItemScreen.CHARGES)>0)) ORDER BY Count(Appr _Title.Bib_key); SQL query to calculate the percentage: SELECT Cnt_Appr_Ti tle_Circ.CountOfBib_ke y, Int(([Cnt_Appr_Titl e_Circ]![CountOfBib _key])*lO0/ Count([BibScreen)![Bib_key])) AS Percent_apprcirc FROM BibScreen, Cnt_Appr_Title _Circ GROUP BY Cnt _Appr _Title_Circ.CountOfBib _key; Approval Titles Noncirculated SQL query for counting the number of approval titles that have not been circulated: SELECT DISTINCT Count(Appr_Title.Bib_key) AS CountOfBib_ke y FROM (Appr _Title INNER JOIN BibScreen ON Appr_Title.Bib_key BibScreen.Bib_key) INNER JOIN ItemScreen ON BibScreen .Bib_key = ItemScreen.Biblio_ke y WHERE ( ( (ItemScreen.CHARGES)=0) ); SQL query to calculate the percentage: SELECT Cnt_Appr_Title_Noncirc.CountOfBib_ke y, Int(([Cnt_Appr_Title_Noncirc)![CountOfBib_ke y])*lO0/ Count([BibScreen]! [Bib _key]))) AS Percent_appr _noncirc FROM BibScreen, Cnt_Appr _Title_Noncirc GROUP BY Cnt_Appr_Title_Noncirc .CountOfBib_ke y; Figure 4. Example of an English Query Divided into Two Parts 24 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 49 X $20.00 = $ 980.00 Average price paid per noncirculated item = $48.00 722 X $48.00 = $34,656.00 Average size of book = 288 pages = 3 cm Average cost of 1 cm of shelf space= $0.10 722 X $0.30 = $216.60 Grand Total = $42,582.60 Again it is important to point out that several snap- shots of the circulation data have to be taken to track and compare the different analyses before deriving the reli- able information. Ad Hoc Queries Alternately, if the user wishes to issue a query that has not been predefined, the Ad Hoc Query Wizard can be used. The following example illustrates the use of the Ad Hoc Query Wizard. Assume the sample query is: How many circulated titles in the English subject area cost more than $35? We now take you on a walk-through of the AdHoc Query Wizard starting from the first step till the output is obtained. Figure 4 depicts Step 1 of the Ad Hoc Query Wizard. The sample query mentioned above requires the follow- ing fields: • Biblio_key for a count of all the titles which satisfy the given condition. • Charges to specify the criteria of "circulated title". • Fund_Key to specify all titles under the "English" subject area. • Paid_Amt to specify all titles which cost more than $35. Step 2 of the Ad Hoc Query Wizard (figure 5) allows the user to specify criteria and thereby narrow the search domain. Step 3 (figure 6) allows the user to specify any mathematical operations or aggregation functions to be performed. Step 4 (figure 7) displays the user-defined query in SQL form and allows the user to save the query for future reuse. The output of the query is shown below in figure 8. The figure shows the number of circulated titles in the English subject area that cost more than $35. Alternatively, the user might wish to obtain a listing of these 33 titles. Figure 9 shows the listing. I Conclusion In this article, we presented the design and development of a library decision support system based on data ware- housing and data mining concepts and techniques. We described the functions of the components of LOSS. The screen scraping and data cleansing and extraction Figure 4. Step 1: Ad Hoc Query Wizard ~ E.9,~Lang__;c~,tfe ... 1 Lik~ "'ft,f" J.esi: !han Eg,. Crfi;irget t 4 Gr~er th'Jn·Eii, Q:,arges,> 0 Equal tci'E_g_- Cfiarge~= !1 Not . . Figure 5. Step 2: Ad Hoc Query Wizard HARVESTING INFORMATION FROM A LIBRARY DATA WAREHOUSE I SU AND NEEDAMANGALA 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6. Step Three : Ad Hoc Query Wizard Figure 7. Step Four: Ad Hoc Query Wizard Figure 8. Query Output Figure 9. Listing of Query Output 26 INFORMATION TECHNOLOGY AND LIBRARIES i MARCH 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. processes were described in detail. The process of import- ing data stored in LUIS as separate data files into the library data warehouse was also described. The data con- tents of the warehouse can provide a very rich informa- tion source to aid the library management in decision making. Using the implemented system, a decision maker can use the GUI to establish the warehouse, and to activate the querying facility provided by Microsoft Access to explore the warehouse contents . Many types of queries can be formulated and issued against the data- base. Experimental results indicate that the system is effective and can provide pertinent information for aid- ing the library management in making decisions. We have fully tested the implemented system using a small sample database . Our on going work includes the expan- sion of the database size and the inclusion of a data min- ing component for association rule discovery. Extensions of the existing GUI and report generation facilities to accommodate data mining needs are expected. I Acknowledgments We would like to thank Professor Stanley Su for his sup- port and advice on the technical aspect of this project. We would also like to thank Donna Alsbury for providing us with the 0B2 data, Daniel Cromwell for loading the 0B2 files and along with Nancy Williams and Tim Hartigan for their helpful comments and valuable discussions on this project. References and Notes 1. John Ladley , "Operational Data Stores: Building an Effective Strategy, " Data Warehouse: Practical Advice from the Experts (Englewood Cliffs, N.J.: Prentice Hall , 1997). 2. Information on Har vard University's ADAPT proj ect. Accessed March 8, 2000, www.adapt.harvard .edu/; Information on the Arizona State University Data Administration and Institutional Analysis warehou se. Accessed March 8, 2000, www .asu .edu / Data_Admin / WH-1.html; Information on the University of Minnesota CLARITY project. Accessed March 8, 2000,www.clarity.umn .edu/; Information on the UC San Diego DARWIN project. Accessed March 8, 2000, www.act .ucsd .edu/ dw I darwin.html; Information on University of Wisconsin- Madison InfoAccess . Accessed March 8, 2000, http :/ / wiscinfo. doit.wisc .edu/infoac cess /; Information on the Univer sity of Nebraska Data Warehouse-nulook. Accessed March 8, 2000, www .nulook.uneb.edu /. 3. Ramon Barquin and Herbert Edelstein, eds ., Building, Using, and Managing the Data Warehouse (Englewood Cliffs, N .J.: Prentice Hall , 1997); Ramon Barquin and Herbert Edelstein, eds ., Planning and Designing the Data Warehouse (Upper Saddle River, N.J .: Prentice Hall, 1996); Joyce Bischoff and Ted Alexander, Data Warehouse: Practical Advice from the Experts (Englewood Cliffs, N.J.: Prentice Hall , 1997); Jeff Byard and Donovan Schneider, "The Ins and Outs (and Everything in Between) of Data War ehousing ," ACM SIGMOD 1996 Tutorial Notes, May 1996. Accessed March 8, 2000, www .redbrick.com / product s/ white / pdf/sigmod96.pdf ; Surajit Chaudhuri and Umesh Dayal, "An Overview of Data Warehousing and OLAP Technolog ," ACM SIGMOD Record 26(1), March 1997. Accessed March 8, 2000, www.acm.org/sigmod / record/issue s/ 9703/ chaudhuri .ps ; B. Devlin , Data Warehouse: From Architecture to Implementation (Reading, Mass.: Addison-Wesle y, 1997); U. Fayyad and others, eds ., Advances in Knowledge Discovery and Data Mining (Cambridge, Mass.: The MIT Pr., 1996); Joachim Hammer, "Data War ehousing Overview, Terminology, and Research Issues." Accessed March 8, 2000, www.cise.ufl .edu/ -jhammer / classes / wh-seminar / Overview / index .htm ; W. H. Inmon, Building the Data Warehouse (New York, N.Y.: John Wiley, 1996); Ralph Kimball , "Dangerous Preconceptions." Accessed March 8, 2000, www .dbmsmag.com/9608d05.html ; Ralph Kimball , The Data Warehouse Toolkit (New York, N.Y.: John Wiley, 1996); Ralph Kimball, "Mastering Data Extraction," in DBMS Magazine, June 1996. (Provides an overview of the process of extracting , cleaning, and loading data .) Accessed March 8, 2000, www .dbmsmag.com / 9606d05 .html ; Alberto Mendelzon , "Bibliography on Data Warehousing and OLAP." Accessed March 8, 2000, www.cs.toronto.edu/-mendel/dwbib.html. 4. Daniel J. Boorstin, "The Age of Negative Discovery," Cleopatra's Nose: Essays on the Unexpected (New York: Random Hous e, 1994). 5. Information on the ARROW system . Accessed March 8, 2000,www . fcla.edu /s ystem/intro_arrow.html. 6. Gary Strawn, "BatchBAMing." Accessed March 8, 2000, http:/ /web .uflib.ufl .edu/rs/rsd/batchbam .html. 7. Li-Min Fu, "OOMRUL: Leaming the Domain Rules ." Accessed March 8, 2000, www .cise.ufl .edu / -fu / domrul.html. HARVESTING INFORMATION FROM A LIBRARY DATA WAREHOUSE I SU AND NEEDAMANGALA 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A Warehouse Data Tables UFCIRC_WH UFORD _WH UFPAY_WH Attribute Domain Attribute Domain Attribute Domain Bib_key Text(S0) Id AutoNumber Inv_key Text(20) Status Text(20) Ord_num Text(20) Ord_num Text(20) Enum / Chron Text(20) Ord_Div Number Ord_div Number MidSpine Text(20) Process_Uni t Text(20) Process _Unit Text(20) Temp_Locatn Text(20) Bib_num Text(20) Bib_key Text(20) Pieces Number Order_da te Da te / Time Ord_Seq_Num Number Ch arges Number Mod_Date Date / Time Inv_Seq_Num Number Last_Use Date / Tune Vendor_Code Text(20) Status Text(20) Browse s Number VndAdr_Order Text(20 Create_ Date Da te / Tune Value Text(20) VndAdr_Claim Text(20) Lst_update Da te / Time Invnt_Date Date / Time VndAdr_Retum Text(20) Currency Text(20) Created Date / Time Vend_ Title_N um Text(20) Paid_am t Num ber Ord_Unit Text(20) USD_amt N u mber Rcv_Unit Text(20) Fund_Key Text(20) UFINV_WH Ord_Scope Text(20 Exp_class Text(20) Pur_Ord_prod Text(20) Fiscal_year Text(20) Attribute Domain Action _Int Number Copies Number Inv_Key Text(20) LibSpecl Text(20) Type_pay Text(lO) Create _Dat e Date / Time LibSpec2 Text(20) Text Text(20) Mod_Date Date / Time Vend_Note Text(20) DB2_11meStamp Date / Time Approv _Stat Text(20) Ord_Note Text(20) Vend_Adr _Code Text(20) Source Text(20) Vend_Code Text(20) Ref Text(20) UFBIB_WH Action_Date Text(20) CopyCtl _Num Number Attribute Domain Vend_Inv _Date Date/Tune Mediu m Text(20) Approval_Date Date / Tune Piece_Cnt N umber Bib_key Text(20) Appro ver_Id Text(20) Div_No te Text(20) System_Control _Num Text(S0) Vend_Inv _Num Text(20) Acr_Stat Text(20) Ca talog_Source Text(20) Inv_Tot Number Rel_Stat Text(20) Lan g_Code_l Text(20) Cale_ Tot_rym ts Num ber Lst_Date Date / Time Lang_Code_2 Text(20) Calc_Net _Tot_Pymts Number Action_Date Text(20) Geo_Code Text(20) Currency Text(20) LibSpec3 Text(20) Dewey_Num Text(20) Discount_Percen t Number LibSpec4 Text(20) Edition Text(20) Vouch_No te Text(20) Encum b_Units Number Pagina tion Text(20) Official_ Vend Text(20) Currency Text(20) Size Text(20) Process _Unit Text(20) Est_Price Number Series_440 Text(20) Intemal_Note Text(20) Encumb_outs Num ber Series_490 Text(20) DB2_ Timestamp Text(20) Fund _key Text(20) Conten t Text(20) Fiscal_ Year Text(20) Subject_l Text(20) Copies N u mber Subject_2 Text(20) Xpay_Method Text(20) Subject_3 Text(20) Vol_Isu_Date Text(20) Authors_l Text(20) Title_Author Text(20) Au thors_2 Text(20) DB2_ Timestamp Date / Time Au th ors_3 Text(20) Series Text(20) 28 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 10071 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. That's My Bailiwick: A Library-Sponsored Faculty Research Web Server Soderdahl, Paul A;Hughes, Carol Ann Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 29 Communications That's My Bailiwick: A Library-Sponsored Faculty Research Web Server Pau/A.Soderdahland Carol Ann Hughes The University of Iowa Libraries pro- vide a unique, new, scholarly pub- lishing outlet for their faculty and graduate students. With the preva- lence of personal faculty home pages and course Web sites in just about every department on campus, it's not very hard for faculty to find a Web server somewhere for storing an HTML file. And, with some work, faculty can often find some "techie" to help convert a document to HTML or to save a list of links. What is rare, however, is a space on the Web where faculty from all disciplines can find a home for their scholarly research interests, coupled with a computing environment and a knowledgeable staff to help them "follow their bliss" in digital form. The Information Arcade's new Bailiwick project does just that. The Need for Something New For a number of years, academic departments in the humanities and social sciences have been able to mount departmental information on the University of Iowa's central Web server maintained by academic com- puting. More recently, two centrally administered course Web servers have been made available to any fac- ulty member or teaching assistant offering a credit course. Based on feedback from faculty and graduate students, however, the university libraries learned that there was no place for a research idea or other aca- demically oriented "pet project" to be published on the Web. Instead, facul- ty and students needed to bury these somewhere on a personal home page, often with a commercial Internet Service Provider at their own expense . Rising to address this need, the university libraries sought to pro- vide a well-respected, institutionally supported Web server for just this sort of electronic publishing endeav- or. What originally started as simply a "projects" directory on the library's general Web server has now grown into the Bailiwick project. Officially launched in March 1998, Bailiwick provides a space on the Web where academic passions can be realized as highly specialized and creative Web sites. It is not sim- ply a place for personal home pages, nor is it intended for course Web sites or academic departmental information. Rather, Bailiwick is designed to provide faculty, staff, and graduate students with Web space where they can focus on a par- ticular area of scholarly interest. Bailiwick is not meant to serve as the new model for scholarly publish- ing in peer-reviewed journals . Most electronic publishing initiatives arise from an attempt to transfer existing models of print publishing to the dig- ital environment. A small number of electronic scholarly journals are cur- rently published on the University of Iowa campus, and the university libraries already provide a number of ways to support this medium, from archiving to cataloging to hosting journal sites, as one element of the university libraries' new Scholarly Digital Resources Center . Bailiwick, instead, provides a Web space that allows authors to harness and exploit this new elec- tronic medium, permitting new models of expression with multime- dia, hypertext, and the ability to incorporate anything in digital form. It is not intended to substitute or even compete with traditional schol- arly publishing or electronic journal publishing. Rather, Bailiwick pro- vides an opportunity to engage in an entirely new medium for scholarly communication. A History of Innovation The heart of the Bailiwick Project within the library environment is the Information Arcade, an award-win- ning facility located in the University of Iowa's Main Library. Opened in 1992, the Information Arcade is a place that provides access to pub- lished electronic information resources coupled with state-of-the- art multimedia development work- stations that allow faculty and students to digitize and manipulate source materials that are not already in electronic form. The facility also houses a fully networked electronic classroom, with twenty-four student workstations, where classes from throughout the university are held- some for the whole term and others for one or two class sessions. In support of its unique service mission "to facilitate the integration of new technologies into teaching, learning, and research," the Information Arcade is well regarded as a place for innovation and risk- taking on the University of Iowa campus . It is a place where ideas can be fleshed out; a place that can respond to the real technology needs presented by faculty and students . When the Information Arcade first opened, it was the only fully wired electronic classroom on campus, with a workstation at every stu- dent's desk. It was the only publicly accessible facility on campus where any faculty member or student Paul A. Soderdahl (paul-soderdahl@ uiowa.edu) is Head of Information Arcade, and Carol Ann Hughes (carol-hughes@ uiowa .edu) is Head of Information , Research, and Instructional Services at the University of Iowa Libraries. COMMUNICATIONS I SODERDAHL AND HUGHES 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. could create digital video on a drop- in basis. It was the only computer facility on campus where anyone could access the Internet for free. All of these innovations are now main- stays on campus. In 1998 the Information Arcade expanded its offerings with three new innovative Web-based services. The MOO Project This text-based virtual reality campus for the University of Iowa communi- ty is made possible through the magic of MOO, a piece of software that creates a networked environ- ment on the Internet that is part e-mail, part chat-room, and part programming interface. Known col- lectively as "The Mediatrix," this educational MOO currently houses two distinct academic projects. The Scholar's Web Project, devoted to the possibilities of digital communication in graduate education, makes its MOO home in "The Cave." The MOOniversity Project, which strives to provide a virtual undergraduate learning environment that encour- ages collaboration across campuses and disciplines, is located in "The MOOniversity." Coadministered by D. Diane Davis, assistant professor of rhetoric in the rhetoric department, and Michael Calvin McGee, a profes- sor of rhetoric in the communications studies department, the Mediatrix is available to any faculty member wishing to make use of either of them for teaching and research. The Streaming Video Project With text-based virtual reality at one end of the spectrum, the Information Arcade simultaneously launched a new streaming video server to meet high-end multimedia needs for delivering real-time motion video and audio over the Internet. With a fifty-user license to Real Networks' Real Server, the Information Arcade now provides students and faculty with the ability to serve digital movie files to several locations simultane- ously. Because of the streaming qual- ity of the video files, users do not need to wait for an entire file to download before playing it. Already used by Bob Boynton, professor of political science, for his Multimedia Politics class, the streaming video server provides a delivery mecha- nism for the digital videos created by students and faculty at the Information Arcade's multimedia development stations. The Bailiwick Project By linking new modes of communi- cation and providing an outlet for any number of innovative scholarly projects, the Bailiwick server has become a home for research projects, complementing the university libraries' course Web server. Space is available on this research Web server to any University of Iowa faculty, staff, or graduate student developing a scholarly academic Web site or Web-based tool that might be experi- mental in nature. Open by simple proposal, Bailiwick runs on a dedicated Web server within the library and is sup- ported by the university libraries' Web server infrastructure. Content providers retain editorial control and freedom, and have the ability to define their topic of interest, identify the target audience, and design a customized Web site. Each bailiwick is initially limited to 5MB of space, with the ability to petition for more based on specific needs for a given project. In addition to disk space, authors can turn to library staff at the Information Arcade for consultation on site design, graphics and layout, technical support, and training. An individual bailiwick might: • serve as a home page for artis- tic expression and collabora- 30 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 tion among artists working in Iowa and other states; • be a showcase for digitally produced art that incorporates interactivity meant to be viewed on a computer screen; • provide a natural home for hypertext experiments that explore new forms of multilin- ear argument or open-system documents that welcome, even depend on, links to other Web sites to expand or count- er those arguments; • host a site not full of bells and whistles, but simply a collec- tion of narrowly focused pages of links to resources on a given topic; and • offer an electronic publishing medium for delivery of spe- cialized bibliographies or dig- ital reproductions of rare documents. There are currently eleven baili- wicks in production, with another eight in development. The authors of bailiwicks represent thirteen differ- ent academic departments, including communication studies, political sci- ence, athletics administration, and theatre arts. They range in rank from teaching and research assistants to full professors. Sample Bailiwicks Currently, developed bailiwicks fall into one of four categories: (1) a col- lection of Internet links on a special- ized topic of study, ranging from a small set of links on a particular page to an annotated Internet bibliogra- phy of thousands of links; (2) a hypertextual or multimedia essay or thesis that necessitates publishing in this medium; (3) a scholarly research project that is dynamic or updated with such frequency that print pub- lishing would be ineffective, includ- ing, for example, ongoing findings from a research study; or (4) a collab- orative project that makes use of a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DIASPORA LAFONr£RA 6£ND£R L£SBI6A.Y C.YBOR6S BORD£R INCID£Nrs or-H£R BORD£RS Figure 1. Karla Tonella 's Award-Winning "Border Crossings " Bailiwick shared electronically accessible work space. The Internet Bibliography Karla Tonella, a graduate student in mass communication, has authored three different bailiwick sites that loosely fall into the category of Internet bibliography. As a former graduate assistant and Information Arcade staff member, Tonella first identified the need for this sort of publishing medium on campus and articulated the concept of the Bailiwick project. She was instru- mental in bringing the server to fruition and quickly adopted it as a home for two comprehensive and award-winning sites of Internet resources in her areas of expertise: "Women ' s Studies Online" and "Journalism and Mass Media ." Both of these sites have been given wide- spread praise in those subject areas and have helped bring attention to the Bailiwick project, both on cam- pus and around the country . Her "Border Crossings" site (see figure 1) also relies on Internet links as its core content, but it is experi- mental in design and published in a way that is intended to "encourage the browsing readers to consider the areas of their postmodern world where traditional boundaries are being renegotiated and blurred." The site explores the notion of "border crossings" from a number of differ- ent perspectives. "Border Crossings" has received numerous citations in the mainstream press, including a sidebar in the Chronicle of Higher Education, inclusion in Britannica Online's catalog of recommended sites, and a feature article in Search, a monthly newsletter for advanced graduate students published by Northeastern University in Boston. The Multimedia Essay The most popular use for Bailiwick thus far has been for publishing mul- timedia essays. The Information Arcade itself has been a proponent of the multimedia essay since it first opened in 1992, and most semester- long courses now held in the Information Arcade's electronic class- room incorporate some sort of multi- media term paper as part of the course requirements . The Information Arcade is one of the leaders on cam- pus in the adoption of electronic the- ses and dissertations, working closely with the graduate college and aca- demic computing on a pilot project this semester. It is not surprising, then, that faculty members and grad- uate students are turning to Bailiwick as a medium for publishing these sorts of materials . Michael Calvin McGee , professor of communication studies, has pub- lished his essay, "Suffix it to Say that Reality is at Issue," as a bailiwick (see figure 2). Jennifer Lawrence-Gentry, a Ph.D. candidate also in communi- cation studies, created a comprehen- sive site on the work of Mikhail Bakhtin, which is now seen as one of the most complete online resources on Bakhtin. Patrick Muller, a teach- ing assistant in preventive and com- munity dentistry, developed a bailiwick essay titled, "Complexity Studies: The Fluid Multifaceted Nature of Knowledge." The sites are all very different in design, target audience, and perhaps even scholar- ly value. Nonetheless, Bailiwick pro- vides an ideal way for the University of Iowa to support this sort of exper- imental multimedia publishing out- side the rubric of a class assignment for a multimedia term paper or a COMMUNICATIONS I SODERDAHL AND HUGHES 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Suffix it to Say that Reality is at Issue Mkhael Calvin McGee The Univft'Sity of Iowa · Th• famous S1nchi Stup.a ... coveis .a ~sht co nbi nlng Buddh.a's bones . They w. 1• broug ht here by lndi.l'sfi rsttn.HI impu i1list, Ashok.a, ln ~7 BC.".Mil.tra Pr1dt1h· GOITO/NY Fo r• det.alled description.sum J.!l.!11iliJ!! ~ - Most everyone is familiar with Westernized yersions of stupa rituals: You know -- what you Figure 2. Professor Michael Calvin McGee's Essay Published as a Bailiwick more traditional electronic scholarly publishing environment. The Scholarly Research Project Aside from the hypertextual and mul- timedia aspects of publishing on the Web, the most unique advantage to the Web for publishing scholarly research is the ability to maintain cur- rency on a published project. The most well developed example of this is a bailiwick on gender equity in sports (see figure 3), sponsored by the women's intercollegiate athletics department. The site monitors the cur- rent state of affairs of gender equity in intercollegiate and interscholastic sport, and tracks Title IX compliance and pending Title IX litigation at col- leges and universities. This resource has received significant national atten- tion and acts as a research tool in and of itself that is published out of the University of Iowa Libraries and now available for students and scholars across the country. Another example is the Dogon bailiwick, published by Chris Culy, associate professor of linguistics . Marcel Kervran, a member of the congregation of Catholic missionar- ies know as Peres Blancs, who lived in the town of Bandiagara, Mali for about thirty years, compiled this dic- tionary of the Dogon language. The dictionary has more than seven thou- sand head words. A second expand- ed edition was published in 1993. Partially representing the varieties of Dogon spoken in and around Bandiagara, the dictionary is being expanded from its earlier HyperCard format, and it may soon be ported 32 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 into an SGML environment. It is an excellent example of an academic toolthat would be difficult to create and deliver in paper form. The Collaborative Work Space The Bailiwick server provides a way for researchers at the University of Iowa to work collaboratively and in a public forum with colleagues at other institutions. This collaborative space can be used as a way to gather research data, or to allow others to comm ent on or contribute to the development of a site . Barbara Bianchi, a graduate student in coun- selor education and an art therapist, has established a bailiwick for Global Connections, a set of online art and notes from travel journals. One com- ponent of the Global Connections site, called "Russia Revisited," includes materials from a number of contributing artists and students in Russia, who are jointly working together to create a collaborative artistic travel journal. International collaboration is being tested in another project as well. With grant funding, two schol- ars-one at the University of Iowa and one in Germany-are working with University of Iowa Libraries staff to create a new academic resource consisting of a Web-search- able critical edition of the work of Ingeborg Bachmann. This bailiwick will eventually contain bibliogra- phies , a hypertext archive of materi- als not yet published in any form relating to Bachmann's life, work, and cultural context, and a searchable corpus of commentaries and transla- tions. An advisory group for the proj - ect consisting of additional international scholars has already been named to oversee the develop- ment of content. As it grows, this bailiwick will result in an unprece- dented resource for scholars from many disciplines. It presents a new Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Last Update: June 29 , 1999. Welcome to the University of Iowa Gender Equity in Sports proj ect. For information about the maintenan ce of this proj ect, visit A bout This Research Proiect . See Ind ex to Updates & Law suits hv CaJegory for most recent changes. Choose one of the following for more information. Title IX: The Law • Overview of Title IX • The Federal Law 0 Title IX of the Education Amendments of 1972 (Supt. of Public Instruction WA) HEW Figure 3. Bailiwick on Gender Equity in Sports Sponsored by the Women 's Intercollegiate Athletics Department model for the development of aca- demic Web sites that not only reflect serious study but actually nurture the creation of new, international scholar- ships. Other Bailiwick proposals are also candidates for outside funding and can follow this exciting lead. Policies Regarding Bailiwick Sites Bailiwick sites run the gamut in sub- ject area, nature, and scope. No attempt is made to centrally control the content of someone's site . After all, it is their bailiwick and they have complete editorial freedom . On the other hand, there are certain guide- lines in place for establishing a baili- wick to maintain the focus of the project as an innovative research Web server . First, the site is not intended to be a space for student class assignments. Short-term projects intended to meet course requirements can be accom- modated currently on one of the uni- versity's centrally administered course Web servers. In addition, the site is not meant to be a place to mount a personal home page or even a student's career portfolio . This type of activity can be better accommodat- ed on a student's personal account through academic computing or through a commercial Internet Service Provider. Sites that are com- mercial in nature are refused, as are sites that are completely divorced from the University's mission . Content providers need to abide by the University's Acceptable Use Policy, which identifies inappropri- ate uses of information technology resources on campus, such as hack- ing, forgery, inserting viruses, vio- la ting intellectual property rights and software licenses, interfering with others' access to information technology resources, or personal campaigning, lobbying, or commer- cial activities. These modest restrictions not- withstanding, most proposals for bailiwicks ha ve been approved. Inappropriate use of bailiwick Web space has not yet been an issue. Library Resources to Support the Project The hallmark of the Information Arcade is its dual strength in provid- ing a facility with state-of-the-art, high-end computing equipment for electronic publishing and multime- dia development as well as provid- ing a diverse public services staff who can work closely with faculty and students, often one-on-one, to help them harness the technology and integrate it effectively into their teaching, learning, and research. The facility is staffed with six half-time graduate assistants selected from a variety of academic programs in an attempt to achieve a balance of tech- nologists, information specialists, graphics artists, and instructional designers. The primary benefit of this unique staffing arrangement is that the Information Arcade is much more than just another computer or library lab. It is a place where faculty and students can find qualified con- sultants trained in a subject specialty with expertise in almost any area related to technology. With this high-tech and high- touch model, the Information Arcade is uniquely suited to host a project like Bailiwick. Within the walls of this facility, the library provides support for every step of development from inception to creation to delivery. With expert consultation, access to equip- ment, technical support , and Web server space, the Arcade becomes a COMMUNICATIONS I SODERDAHL AND HUGHES 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. one-stop place for presenting scholar- ly research. Staff support includes consulta- tion in any aspect of the Bailiwick project, including design issues, inter- face development, and training in software. Staff members do not pro- vide programming nor do they do any work in researching or assembling sites. Each faculty member is assigned an Information Arcade consultant at the point of submitting a bailiwick application. The consultant serves as a primary contact person for technical support, troubleshooting, basic inter- face design guidance, and referrals to other staff both in the libraries and on campus. At present, the current level of staffing has been sufficient to accommodate this sort of assistance, which is not unlike the assistance pro- vided to any patron who walks in the door of the Information Arcade. As a computing facility, the Information Arcade provides public access to a host of multimedia devel- opment workstations for scanning images, slides, and text, and for digi- tizing video and audio. At these mul- timedia stations, a large suite of multimedia integration software and Web publishing software is made available for public use. Staff at the public services desk have a strong background in multimedia develop- ment and Web design and can pro- vide some one-on-one training on a walk-in basis beyond technical sup- port and troubleshooting. All of these hardware and software resources are available to Bailiwick content providers, who can choose to do their development work in the Information Arcade or at their home or office. Finally, since there is a close rela- tionship between the Information Arcade and the university libraries Web site, system administration and Web server support is all handled in- house as well. There are few artificial barriers imposed by the technology, thereby permitting content providers to focus on their creative expression and scholarly work. With only minimal reallocation of existing resources, the University of Iowa Libraries has been able to launch the Bailiwick project and con- tinue to develop it at a modest pace. One of the components most essen- tial for its continued success, howev- er, is the ability to scale up to meet the expected demand over the next sev- eral years. Technical infrastructure challenges are not overwhelming as yet. An analysis still needs to be made to determine how quickly creators are developing their sites, what the implications are for network delivery of these resources, what reasonable projections there are for disk space, and who is using the resources. Perhaps more importantly, though, adequate staffing will always remain a concern. Some faculty wish to work more closely with library staff consultants than time allows, and the consultants would certainly find it enriching to be more intimate- ly involved with the development of each bailiwick site. Marketing of the Bailiwick project has been discrete (to say the least) because of the limited staffing available. However, embed- ded in the collaboration inherent in bailiwicks is the potential for stronger involvement with faculty in obtaining grant funding to support the develop- ment of specific bailiwick sites. A Model for Research Libraries Bailiwick is a project that allows the University of Iowa Libraries, and specifically the Information Arcade, to focus on the integration of technol- ogy, multimedia, and hypertext in the context of scholarship and research. To date, most of the bailiwick sites represent disciplines in the arts, humanities, and social sciences. This matches the overall clientele of the Information Arcade (given its loca- tion in the University of Iowa's Main Library), but it also reflects the fact that these disciplines have been tradi- 34 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 tionally undersupported with respect to technology. Nevertheless, individ- ual faculty in these disciplines have integrated some of the most creative applications of the technology in their everyday teaching and research, in part because of the existence of the Information Arcade and the ground- work laid by the libraries for the past several years. With the Information Arcade's vis- ibility on campus, and with similar resources and support in the Information Commons-a sister facili- ty in the Hardin Library for the Health Sciences-the University of Iowa Libraries are well regarded on campus as a leader in information technology, electronic publishing, and new media. Thus, faculty and students alike are accustomed to turning to the libraries for innovation in technology and the Bailiwick project is a natural fit. Bailiwick is now fully integrated as part of a palette of new technology services and scholarly resources included within the libraries' support of teaching, learning, and research at the University of Iowa. Engelond: A Model for Faculty-Librarian Collaboration in the Information Age Scott Walter The question of how best to incorporate information literacy instruction into the academic curriculum has long been a lead- ing concern of academic librarians. In Scott Walter (walter.123@osu.edu), formerly Humanities and Educaton Reference Librarian, University of Missouri-Kansas City, now is Information Services Librarian, Ohio State University. 10072 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Engelond: A Model for Faculty-Librarian Collaboration in the Information Age Scott, Walter Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 34 one-stop place for presenting scholar- ly research. Staff support includes consulta- tion in any aspect of the Bailiwick project, including design issues, inter- face development, and training in software. Staff members do not pro- vide programming nor do they do any work in researching or assembling sites. Each faculty member is assigned an Information Arcade consultant at the point of submitting a bailiwick application. The consultant serves as a primary contact person for technical support, troubleshooting, basic inter- face design guidance, and referrals to other staff both in the libraries and on campus. At present, the current level of staffing has been sufficient to accommodate this sort of assistance, which is not unlike the assistance pro- vided to any patron who walks in the door of the Information Arcade. As a computing facility, the Information Arcade provides public access to a host of multimedia devel- opment workstations for scanning images, slides, and text, and for digi- tizing video and audio. At these mul- timedia stations, a large suite of multimedia integration software and Web publishing software is made available for public use. Staff at the public services desk have a strong background in multimedia develop- ment and Web design and can pro- vide some one-on-one training on a walk-in basis beyond technical sup- port and troubleshooting. All of these hardware and software resources are available to Bailiwick content providers, who can choose to do their development work in the Information Arcade or at their home or office. Finally, since there is a close rela- tionship between the Information Arcade and the university libraries Web site, system administration and Web server support is all handled in- house as well. There are few artificial barriers imposed by the technology, thereby permitting content providers to focus on their creative expression and scholarly work. With only minimal reallocation of existing resources, the University of Iowa Libraries has been able to launch the Bailiwick project and con- tinue to develop it at a modest pace. One of the components most essen- tial for its continued success, howev- er, is the ability to scale up to meet the expected demand over the next sev- eral years. Technical infrastructure challenges are not overwhelming as yet. An analysis still needs to be made to determine how quickly creators are developing their sites, what the implications are for network delivery of these resources, what reasonable projections there are for disk space, and who is using the resources. Perhaps more importantly, though, adequate staffing will always remain a concern. Some faculty wish to work more closely with library staff consultants than time allows, and the consultants would certainly find it enriching to be more intimate- ly involved with the development of each bailiwick site. Marketing of the Bailiwick project has been discrete (to say the least) because of the limited staffing available. However, embed- ded in the collaboration inherent in bailiwicks is the potential for stronger involvement with faculty in obtaining grant funding to support the develop- ment of specific bailiwick sites. A Model for Research Libraries Bailiwick is a project that allows the University of Iowa Libraries, and specifically the Information Arcade, to focus on the integration of technol- ogy, multimedia, and hypertext in the context of scholarship and research. To date, most of the bailiwick sites represent disciplines in the arts, humanities, and social sciences. This matches the overall clientele of the Information Arcade (given its loca- tion in the University of Iowa's Main Library), but it also reflects the fact that these disciplines have been tradi- 34 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 tionally undersupported with respect to technology. Nevertheless, individ- ual faculty in these disciplines have integrated some of the most creative applications of the technology in their everyday teaching and research, in part because of the existence of the Information Arcade and the ground- work laid by the libraries for the past several years. With the Information Arcade's vis- ibility on campus, and with similar resources and support in the Information Commons-a sister facili- ty in the Hardin Library for the Health Sciences-the University of Iowa Libraries are well regarded on campus as a leader in information technology, electronic publishing, and new media. Thus, faculty and students alike are accustomed to turning to the libraries for innovation in technology and the Bailiwick project is a natural fit. Bailiwick is now fully integrated as part of a palette of new technology services and scholarly resources included within the libraries' support of teaching, learning, and research at the University of Iowa. Engelond: A Model for Faculty-Librarian Collaboration in the Information Age Scott Walter The question of how best to incorporate information literacy instruction into the academic curriculum has long been a lead- ing concern of academic librarians. In Scott Walter (walter.123@osu.edu), formerly Humanities and Educaton Reference Librarian, University of Missouri-Kansas City, now is Information Services Librarian, Ohio State University. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. recent years, this issue has grown beyond the boundaries of professional librarianship and has become a general concern regular- ly addresssed by classroom faculty, educa- tional administrators, and even regional accrediting organizations and state legisla- tures. This essay reports on the success of a pilot program in course-integration infor- mation literacy instruction in the field of medieval studies. The author's experience with the "Enge/and" project provides a model for the ways in which information literacy instruction can be effectively inte- grated into the academic curriculum, and for the ways in which a successful pilot program can both lead the way for further development of the general instructional program in an academic library, and serve as a springboard for future collaborative projects between classroom faculty and aca- demic librarians. In 1989 the Chronicle of Higher Education reported on the proceed- ings of a conference on teaching and technology held near the Richmond, Indiana campus of Earlham College.1 Conference speakers identified a number of concerns for those involved in teaching and learning at the end of the twentieth century. Chief among these were recent advances in information technology that threatened "to leave students adrift in a sea of information." Earlham College librarian Evan I. Farber and his fellow speakers called upon conference attendees to devel- op new teaching strategies that would help students learn how to evaluate and make use of the "mass- es of information" now accessible to them through emergent information technologies, and to embrace a col- laborative teaching model that would allow academic librarians and classroom faculty members to work together in developing instructional objectives appropriate to the infor- mation age. The concerns expressed by these faculty and administrators for the information literacy skills of their students may have still seemed unusual to the general educational community in the late 1980s, but, as Behrens and Breivik have demon- strated, such concerns have been a leading issue for academic librarians for more than twenty years. According to its most popular defini- tion, information literacy may be understood as "[the ability] to recog- nize when information is needed and ... the ability to locate, evaluate, and use effectively the needed informa- tion."2 It has become increasingly clear over the past decade that edu- cators at every level consider infor- mation literacy a critical educational issue in contemporary society. Perhaps the most frequently cited example of concern among educa- tional policy-makers for the informa- tion literacy skills of the student body can be found in Ernest Boyer's report to the Carnegie Foundation, College: The Undergraduate Experience in America (1987), in which the author concludes that "all undergraduates should be introduced to the full range of resources for learning on campus," and that students should spend "at least as much time in the library ... as they spend in classes."3 But while Boyer's report may be the most famil- iar example of such concern, it is hardly unique. As Breivik and Gee have described, a small group of edu- cational leaders have regularly expressed similar concerns over the past several decades. Moreover, as Bodi et al. among others, have demonstrated, the rise in professional interest in information literacy issues among librarians in the past decade is closely related to more general con- cerns among the educational commu- nity, especially the desire to foster critical thinking skills among the stu- dent body. By the mid-1990s, profes- sional organizations such as the National Education Association, accrediting bodies such as the Middle States Association of Colleges and Schools, and even state legislators began to incorporate information lit- eracy competencies into proposals for educational reform at both the sec- ondary and the post-secondary lev- els. The confluence over the past decade of new priorities in educa- tional reform with rapid develop- ments in information technology provided a perfect opportunity for academic librarians to develop and implement formal information litera- cy programs on their campuses, and to assume a higher profile in terms of classroom instruction. For the past two years, a pilot project has been underway at the Miller Nichols Library of the University of Missouri-Kansas City that not only fosters collaborative relations between classroom faculty members and librarians, but pro- motes the development of higher- order information literacy skills among participating members of the student body. Engelond: Resources for 14th-Century English Studies (www.umkc.edu/lib I engelond/) incorporates traditional library instruction in information access as well as instruction in how to apply critical thinking skills to the contem- porary information environment into the academic curriculum of partici- pating courses in the field of medieval studies. Our experience with the Engelond project provides a model for the ways in which informa- tion literacy instruction can be effec- tively integrated into the academic curriculum, and for the ways in which a successful pilot program can both lead the way for further devel- opment of the general instructional program in an academic library, and serve as a springboard for future col- laborative projects between class- room faculty members and librarians. The Impetus for Collaboration "Most medieval Web sites are dreck," or so wrote Linda E. Voigts, curators' professor of English at the University COMMUNICATIONS I WALTER 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of Missouri-Kansas City, in a recent review of her participation in the Engelond project for the Medieval Academy News. Describing the impe- tus for the development of the proj- ect in terms of a complaint increasingly common among mem- bers of the classroom faculty, Voigts provides a number of examples from recent years in which students made extensive, but inappropriate, use of Web-based information resources in their academic research. In one example, Voigts describes a student who made the mistake of relying heavily on what appeared to be an authoritative essay for her report on medieval medical practices. The report was actually authored by a radiologist "with little apparent knowledge of either the Middle Ages or of premodern medicine." "How can those of us who teach the Middle Ages," Voigts asked, "help our stu- dents find in the morass of rubbish on the Internet the relatively few pearls? How can we foster skills for distinguishing between true pearls and those glittery paste jewels that dissolve upon close examination?"4 By the time Voigts approached the Miller Nichols Library during the fall 1997 semester for suggestions about the best ways to teach her stu- dents how to "sift the Web" in their search for resources suitable for aca- demic research in medieval studies, the issue of faculty-librarian collabo- ration in Internet instruction was a familiar one. In a representative review of the literature, Jayne and Vander Meer identified three "com- mon approaches" that libraries have taken to the problem of teaching stu- dents how to apply critical thinking skills to the use of Web-based infor- mation resources: (1) the develop- ment of generic evaluative criteria that may be applied to Web-based information resources; (2) the inclu- sion of Web-based information resources as simply one more materi- al type to be evaluated during the course of one's research (i.e., adding Lo.st updated : 27.April 1999 ! Enge/and supports the research of students in Dr . Linda Ehrsam Voigts' Chaucer (English 4121512 ) and Medieval Literature II (English 555A) courses at the University of Miss ouri-K ansas City. The site was created by the University Libraries' Public Services Staff with the collaboration of Dr . Voigts . We hope it will serve as a prototype for future collaborative efforts integrating library resources, course content. and multi-media technologies These pages contain syllabi for both courses, links to Internet resources (including web sites. news groups and online discussion groups relevant to medieval studies) . a guide to evaluating both online and print research tools. a list of materials held on reserve at Miller Nich ols Library for the use of these classes. and links to the MERLIN Library Catalog and a wide range of databases available through the University Libraries . AudioNisual resources include Rea!Audio streams of Dr. Voigts reading from Chaucer's Canterbury Tales and Troi/us and Criseyde . Also included is Joshua Merrill's 'From Gatehouse to Cathedral A Phot ograp hic Pilgrimage to Chaucerian Landmarks .' , ,~ • • I ~ I I Ju l . • I t• II I I I ' • "I h d1,t I lt.;l 1.,,ld~.;, ..,;,.,"' r'\UU,V I io 1.;:,U.Ji .;l ,t. '•·" ;,t.,:, ),J l..,l<;i~.;tl.., .... J -'-' ' • J ~ Figure 1. Engelond Home Page the Web to the litany of resources, popular and scholarly, print and elec- tronic, typically addressed in a gen- eral instructional session); and (3) working with faculty to integrate critical thinking skills into an aca- demic assignment that asks students to use or evaluate Web-based infor- mation resources relevant to their coursework. 5 While the Engelond project focused primarily on the last of these options, our work on the project also fostered the use of the first two approaches in our broader instructional program. Engelond's Landscape The Engelond Web site provides access to a number of resources for participat- ing students. These resources may be categorized as course-specific (e.g., course syllabi), information literacy- related (e.g., a set of evaluative criteria for use with Web-based information resources), or multimedia (e.g., sound recordings of Voigts reading excerpts from Chaucer's works in Middle English). All of these resources are accessible from the Engelond home page www.umkc.edu/lib/engelond/) (see figure 1). Several links are also provided throughout the site to resources housed on the library's Web site, including access to elec- tronic databases and subject-specif- ic guides to relevant resources in the print collection. Although stu- dents make use of all of these resources during the course of the semester, the emphasis in this essay will be on describing the nature and use of the information literacy-relat- ed resources. As Behrens and Euster have noted, recent interest in information literacy instruction has been guided to a degree by concern over student ability to make effective use of new forms of information technology. This concern is addressed in the Engelond project by its "Internet Resources" page, through which stu- dents are acquainted with the archi- tecture of the Internet and are provided with annotated references (and links) to a number of electronic resources (including Web portals) that will allow them to begin their research in medieval studies. Students making use of the page are 36 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. introduced, for example, to a variety of the different types of information resourc e s available through the Internet, including Web sites, Telnet sites, news groups, and discussion lists. Users are also directed to relat- ed resources on the library Web site, including a guide to print resources for the study of Chaucer and an annotat ed guide to Web-based infor- mation resources generally useful for the study of literatur e . 6 Also provided on the Engelond site is a discussion of evaluative cri- teria that students might apply to their selection of Web-based informa- tion resources for academic research. Designed to address Voigts' initial concern about the issue of teaching students how to apply critical think- ing skills to their use of the Web, the "Criteria" page provides a general discussion of the nature of Web- based information resources, the ways in which such resources differ from traditional resources, and the kinds of questions that students must ask of any Web-based resource before making use of it in their aca- demic work . Reflecting the idea that information literacy skills are best taught in connection with a specific subject matter, the "Criteria" page includes references to a number of illustrative examples of Web-based resources in medieval studies. This page also reflects the evolutionary nature of the Engelond project, since new illustrations are added as each successive group of student users discovers different examples (both positive and negative). Also included on this page is a link to the library 's "Quick Reference Guide to Evaluating Resources on the World Wide Web," a generic version of the criteria developed for use with the broader instruction program at the Miller Nichols Library . 7 While the resources described above introduce students to the information landscape in the field of medieval studies and provide them with evaluative tools tailored to sub- ject-specific concerns in making use of Web-based information resources in their academic work, the final information literacy-related resource made available through the Engelond site is perhaps of the great- est interest. The "Class Picks" page presents the results of participating students' Web site evaluati on assign- ments. On this page , user s will find student evaluations of Web-based resources in medieval studies that draw not only on the information lit- eracy skills provided through tradi- tional library instruction, but also on the subject-specific knowledge that students gain as part of their aca- demic coursework. Jayne and Vander Meer wrote that faculty-librarian collaboration in Internet instruction is most effective when students are asked to draw both on generic informational litera- cy skills and on information and evaluative criteria specific to the sub- ject matter being addressed.8 As they concluded, " [to] benefit fully from the Web's potential, stud ents need training and guidance from librari- ans and faculty." Incorporating dis- cussions of site design, organization of information, and veracity of con- tent, the Web site evaluations found on the "Class Picks" page demon- strate that participating students have learned both from the librarian and the scholar, and hav e begun to consider the best ways to incorporate Web-based information resources into their day-to-day academic work. In a review of "The Harvard Chaucer Page " (http:/ / icg.fas. harvard .edu / -chaucer /) , for exam- ple, students note the general appeal of the site, but criticize it both for technical problems in its design and for editorial choices that limit its util- ity for academic research: The Harvard Chaucer is an insightful , colorful look at the author and his times, but is dappled conspicuously with misspellings, repeated phrases , sentence fragments, broken links, and unfinished pages . Translations of medieval texts provided on the site are often anonymou s, making it hard to tell if the translation is credible and an acceptable resourc e for serious research in Chauce r studies. If one is interested in pursuing a topic found on the Harvard Chaucer , s / he is well advised to explore the site for ideas and background infor- mation, but to go elsewhere for authoritative sources .. . 9 In another review , this one of "The Medieval Feminist Index" (www.haverford.edu / library/ reference/mschaus/mfi/mfi.html), students provide a discussion of the scholarly authority of the site as well as a description of the results retrieved in sample searches of the index for materials relevant to the study of Chaucer. 10 The review con- cludes with further examples of issues relevant to Chaucer studies that might be effectively investigated with information identified through this resource. In both reviews, stu- dents demonstrate the ability to criti- cally evaluate a Web site both for its design and for its content , and the ability to express the strengths and weaknesses of a site from the point of view of a student concerned with how to make use of a Web-based information resource in his or her academic work. As a result, the reviews found on the "Class Picks" page not only demonstrate the suc- cessful approach to course-integrated information literacy instruction pro- moted through the Engelond project, but also provide a useful student resource in their own right. The Collaborative Approach In her review of faculty-librarian partnerships in information literacy instruction, Smalley wrote that, in the best-case scenario, "the student gains COMMUNICATIONS I WALTER 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mastery in using some portion of Internet resources, as well as expo- sure to resources intrinsically impor- tant to disciplinary pursuits. In doing the Web-based exercises, students see information seeking and evaluation as essential parts of problem solving within the field of study." 11 The three information literacy-related resources found on the Engelond site- "Internet Resources," "Criteria, " and "Class Picks" -demonstrate one approach to providing course-inte- grated information literacy instruc- tion in such a way that the classroom faculty member and the academic librarian can work collaboratively and productively to meet their mutu- al instructional goals . Both the classroom faculty mem- ber and the cooperating librarian are able to meet their instructional goals using the Engelond model because of the collaborative nature of the infor- mation literacy instruction provided to the participating students. Students enrolled in Voigts' Chaucer class during the winter 1999 semester received information literacy instruc- tion focused both on information access and critical thinking while completing successive iterations of the Web site evaluation assignment required for the course. A brief overview of the collaborative teach- ing process should suggest ways in which the participating faculty mem- ber and librarian were able to draw successfully both on generic infor- mation literac y skills and on subject- specific knowledge while conducting course-integrated library instruction using the Engelond site. Participating students during the winter 1999 semester began with a general introduction to the electronic resources available through the Miller Nichols Library at the University of Missouri-Kansas City (e.g., using the online catalog and databases such as the MLA Bibliog- raphy) . Students were then presented with an introduction to the problem of applying critical thinking skills to the use of Web-based information resources, as described on Engelond's "Criteria" page. Following this intro- ductory session conducted by the cooperating librarian, the cooperat- ing faculty member provided stu- dents with a number of illustrative examples of the inappropriate use of electronic resources for academic research in medieval studies. From the beginning, the librarian and the faculty member modeled an integrat- ed approach to the evaluation of information resources for their stu- dents; one that drew both on generic critical thinking skills and on specific examples of how such skills might be applied to resources in their field. Following this initial session (which took place during the first week of the semester) , students were asked to complete an evaluation of a Web site containing information they might consider using as part of their academic work. Individual sites were chosen from among those accessible through the subject-specific Web por- tals provided on the "Internet Resources" page. Students were pro- vided both with the library's "Quick Reference Guide to Evaluating Resources on the World Wide Web" and with the more extensive descrip- tion of Web site evaluation available on the "Criteria" page . Students completed these initial reviews over the following week and submitted copies to both the faculty member and the librarian . In preparation for the second instructional session (which took place during the third week of the semester), the faculty member and the librarian evaluated each review twice (individually, and then togeth- er). Reviews were evaluated for the clarity of their criticism of a site, both from the point of view of information organization and design and from the point of view of the significance of the information for student research in the field . Sites that seemed to merit further review by the entire class were selected from 38 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 this pool of evaluations and were discussed in greater detail by the instructors. The second instructional session took the form of an extended review of the sites selected in the meeting described above . In each case, stu- dents were asked to describe their reaction to the site in question. In cases where more than one student had evaluated the same site, each student was asked to present one or two distinct points from his or her review. The instructors then present- ed their reactions to the site. Again, the librarian and the faculty member modeled for the students an approach to the critical evaluation of information resources that drew not only on the professional expertise of the librarian, but also on the scholar- ly expertise of the faculty member . By the end of this session, students had been exposed to three separate critiques of the selected Web sites: the student's opinion of how the information presented on the site might be used in academic research; the librarian 's opinion of how effec- tively the information was organized and presented, and how its authority, currency, etc ., might differ from that of comparable print resources; and, finally, the faculty member's opinion of the place and value of the infor- mation provided on the site in the broader scheme of the discipline. Following this session, the stu- dents were assigned to groups in order to develop more detailed eval- uations of the Web sites discussed in class. As before, these assignments were submitted both to the faculty member and to the librarian. After further review by both instructors, the assignments were returned to the students for a third (and final) itera- tion, and then mounted to the "Class Picks" page. By the conclusion of this assignment, participating students had learned not only how to apply critical thinking skills to Web-based information resources, but had begun to think about the nature of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. electronic information and the many forms that such information can take. The Web site evaluations included on the "Class Picks" page demonstrate the students' ability to successfully evaluate a Web-based information resource both for its design and for its content, and to suggest the aca- demic situations in which its use might be warranted for a student of medieval literature. Evaluating Engelond During the winter 1999 semester, we attempted to evaluate the success of the information literacy instruction provided through the Engelond proj- ect. While the Web site evaluations produced by the students provided one obvious measure of our instruc- tional success, we attempted to learn more about the ways in which stu- dents used the materials provided through the Engelond site by polling users and by examining use patterns on the site. Both of these latter meas- ures confirmed what the instructors already suspected: students enrolled in participating courses were making heavy use of the information litera- cy-related resources housed on the Engelond site and saw the skills fos- tered by those resources as a valuable complement to the disciplinary knowledge being gained in the tradi- tional classroom. As part of a general evaluation of the instructional services provided by the library during the course of the semester, students participating in the Engelond project were asked open-ended questions such as: "What features of the Engelond Web site did you find most useful as a student in this course?"; "How did the existence of the Engelond site and the collaboration between your classroom instructor and the library enhance your learning experience in this course?"; and "What aspects of the library instruction that you received as part of this course do you believe will be useful to you in other courses or in regards to life- long learning?" Among the specific items cited most often by students as being useful to them in their aca- demic work were two of the infor- mation literacy-related resources: "Internet Resources" and "Class Picks." Likewise , information litera- cy skills such as familiarity with the structure of the Internet and the abil- ity to critically evaluate Web-based information resources were listed by almost every student as skills that would be useful both in other aca- demic courses and in their daily lives . Moreover, two graduate stu- dents who were participants report- ed that their experience with Engelond had led them to incorpo- rate information literacy instruction into the undergraduate courses that they taught themselves. Any conclusions about the appeal of the information literacy- related resources housed on the Engelond site based on these narra- tive responses were reinforced by a study of the use statistics for the same period. Through the first three months of the winter 1999 semester Ganuary-March), the Engelond site recorded approximately one thou- sand "hits" on its main page.12 In each month, the most frequently accessed pages were the three infor- mation literacy-related resources described above, with the "Criteria" page regularly recording the greatest number of hits . Among the other most-frequently visited pages on the site were the multimedia resource page (" Audio -Visual"), the "Syllabi" page, and the "Quick Reference Guide to Chaucer" (housed on the library Web site, but accessible through the "Internet Resources" page). Taken in conjunction with the narrative responses provided on the evaluation form , these use statistics suggest that the information literacy resources provided through the Engelond site have become a fully- integrated, and greatly appreciated, feature of the academic curriculum in medieval studies in the Depart- ment of English at the University of Missouri-Kansas City. A Model for Future Collaboration The Engelond project has not only been a success with students who have enrolled in participating cours- es, but has had a significant influence on the broader instructional program at the Miller Nichols Library. It has served as a template for future col- laborative efforts between the class- room faculty and the library in terms of integrating information technolo- gy and information literacy into the academic curriculum. In terms of the instructional pro- gram at the Miller Nichols Library, our experience with Engelond helped lay the groundwork for the development of new instructional materials and for new instructional programs . It was through Engelond, for example, that we first provided electronic access to our point-of-serv- ice guides to library materials in var- ious subjects (e.g., the "Library Guide to Chaucer"). As of the end of the winter 1999 semester, we have made almost all of our pathfinders available on the library Web site and are now considering ways in which these might be effectively incorporat- ed into the work being done by our faculty in developing Web-based coursework. Also, it was through Engelond that our subject specialists started col- lecting and annotating Web-based information resources of potential use to our students and faculty. Now, sub- ject specialists are developing "subject guides" to Web-based resources in a number of fields and promoting their use among faculty members who , like Voigts, are concerned about the quali- ty of the Web-based information being used by their students in their COMMUNICATIONS I WALTER 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. !' Miller Nichols Library About the TL TC Services Schedule Workshops Staff Technology for Learning and Teaching Center A UMKC FACULTY SERVICE Figure 2. TLTC Home Page academic work. Both our pathfinders and our subject guides to Web-based resources are available online (www. umkc.edu/lib /instruction/guides/ index.html). Finally, the instructional session on the critical evaluation of Web- based resources that has been the centerpiece of library instruction for the Engelond project has now been adapted for inclusion in our normal round of instructional workshops. While support for such innovations in our instructional program clearly existed within the library prior to the initiation of the Engelond project, the project's success has provided an important spur to the development of instructional services in the library. The commitment to collaborative instructional programming demon- strated by the Engelond project has also helped pave the way for the development of the University of Missouri-Kansas City's new Technology for Learning and Teaching (TLT) Center. Housed in the Miller Nichols Library, the TLT Center offers faculty workshops in the use of information technology and a place in which classroom facul- ty, subject specialists, and educational technologists may collaborate on the development of projects such as Engelond. Further information on the TLT Center is available online (www.umkc.edu/tltc/) (see figure 2). Initiating a culture of collabora- tion between members of the class- room faculty and academic librarians can be a difficult task (as so much of the literature has shown). In reviewing our experi- ence with Engelond, we have bene- fited from the suggestions that Hardesty made some years ago about the means of supporting the adoption across campus of an inno- vative instructional model: (1) the librarian must present information literacy instruction in such a way that it does not threaten the role of the classroom faculty member as an authority in the subject matter of the course; (2) the new approach to instructional collaboration must be adopted on a limited basis at first, rather than requiring that all instructional programs immediately adopt the new approach; and (3) the results of a successful pilot projects 40 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 must be "readily visible to others" on campus. 13 Designed as a pilot project, Engelond has successfully demon- strated that classroom faculty and academic librarians can collaborate to meet their mutual instructional objectives, both in terms of informa- tion literacy instruction and in terms of academic course content. As infor- mation technology continues to gain a central place in the educational mission of the college and university, it is likely that the sphere of mutual instructional objectives between classroom faculty and academic librarians will only increase. Our careful approach to raising the instructional profile of librarians on campus has been rewarded, too, both by an increasing number of faculty members seeking course-related instruction in our electronic class- room as part of the regular instruc- tional program of the library, and by the institutional commitment of resources to the TLT Center, which will become the nexus of instruction- al collaboration between faculty and librarians on our campus. During the 1999-2000 academic year, no fewer than three academic courses in medieval studies will make use of the Engelond site. As more faculty become aware of the services provided by the TLT Center, such collaborative approaches to information literacy instruction will likely become more evident across a variety of disciplines. The lessons learned over the past two years of project development will be invalu- able as we move to provide course- integrated information literacy instruction to an increasing number of students in an increasingly broad variety of courses. Acknowledgments The Engelond project has benefited from the work of a number of indi- viduals over the past two years, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. especially Ted P. Sheldon, director of libraries at the University of Missouri-Kansas City, and Marilyn Carbonell, assistant director for col- lection development, both of whom were instrumental in developing the plan for a pilot project in course- integrated information literacy instruction with Professor Voigts. The design for the Engelond site was developed by John LaRoe, former multimedia design technologist at the Miller Nichols Library. The orig- inal text for the site was written by Voigts, LaRoe, and T. Michael Kelly, former humanities reference librari- an at the Miller Nichols Library. Additional text and resources for the site have been developed over the past year by Voigts and myself. In addition, a number of librarians and staff members in the public services division of the Miller Nichols Library devoted time to critiquing the site and to assisting with the cre- ation of the embedded audio files. These contributions may not always be evident to the students who bene- fit from the project, but they were instrumental in our ability to suc- cessfully meet our instructional objectives during the 1998-99 aca- demic year. References and Notes 1. Thomas J. DeLoughry, "Pro- fessors Are Urged to Devise Strategies to Help Students Deal with 'Information Explosion' Spurred by Technology," Chronicle of Higher Education 35 (March 8, 1989), A13, Al5. 2. Shirley J. Behrens, "A Concep- tual Analysis and Historical Overview of Information Literacy," College & Research Libraries 55 Guly 1994): 309-22; Patricia Senn Breivik, Student Learning in the Information Age (Phoenix, Ariz.: Oryx Pr., 1998); "Final Report of the American Library Association Presidential Com- mittee on Information Literacy" (1989), as reproduced in Breivik, Student Learning in the Information Age, 121-37 (quotation is from pp. 121-22). For another recent overview of the development of the theo- ry and practice of information literacy at every level of American education over the past two decades, see Kathleen L. Spitzer and others, Information Literacy: Essential Skills for the Information Age (Syracuse, N.Y.: ERIC Clearinghouse on Information and Technology, 1998). 3. Ernest L. Boyer, College: The Undergraduate Experience in America (New York: Harper & Row, 1987), 165; Patricia Senn Breivik and E. Gordon Gee, Information Literacy: Revolution in the Library (New York: MacMillan, 1989); Sonia Bodi, "Critical Thinking and Bibliographic Instruction: The Relation- ship," Journal of Academic Librarianship 14 Guly 1988): 150-53; Barbara B. Moran, "Library /Classroom Partnerships for the 1990s," C&RL News 51 (June 1990): 511-14; Sonia Bodi, "Collaborating with Faculty in Teaching Critical Thinking: The Role of Librarians," Research Strategies 10 (Spring 1992): 69-76; Hannelore B. Rader, "Information Literacy and the Undergraduate Curriculum," Library Trends 44 (Fall 1995): 270-78; Spitzer and others, Information Literacy; and Breivik, Student Learning in the Information Age, 7-8. On the relation- ship between trends in educational reform favoring the development of criti- cal thinking skills and their relationship to the place of information literacy instruction in higher education, see also Joanne R. Euster, "The Academic Library: Its Place and Role in the Institution," in Academic Libraries: Their Rationale and Role in American Higher Education, Gerard B. McCabe and Ruth J. Person eds. (Westport: Greenwood Pr., 1995), 7; Craig Gibson, "Critical Thinking: Implications for Instruction," RQ 35 (Fall 1995): 27-35. 4. Linda Ersham Voigts, "Teach- ing Students to Sift the Web," Medieval Academy News (Nov. 1998): 5. 5. Elaine Jayne and Patricia Vander Meer, "The Library's Role in Academic Instructional Use of the World Wide Web," Research Strategies 15 (1997): 125. See also Topsy N. Smalley, "Partnering with Faculty to Interweave Internet Instruction into College Coursework," Reference Services Review 26 (Summer 1998): 19-27. 6. Behrens, "A Conceptual Analysis and Historical Overview of Information Literacy," 312; Euster, "The Academic Library," 6; Scott Walter, "UMKC University Libraries: Quick Reference Guide to Chaucer." Accessed Sept. 24, 1999, ww.umkc.edu/lib/ instruction/ guides/ chaucer .html; Scott Walter, "UMKC University Libraries: Subject Guide to Literature." Accessed Sept. 24, 1999, www.umkc.edu/lib/ instruction/ guides/literature.html. All references to specific pages on the Engelond site will be made to the page title, e.g., "Internet Resources." Because Engelond has been designed in a frame- set, it will be easier for interested readers to access the main page at the URL pro- vided in the text and then make use of the navigational buttons provided there. 7. Scott Walter, "UMKC Univ- ersity Libraries: Quick Reference Guide to Evaluating Resources on the World Wide Web." Accessed Sept. 24, 1999, www.umkc.edu/ lib/ instruction/ guides/ webeval.html. 8. Jayne and Vander Meer, "The Library's Role in Academic Instructional Use of the World Wide Web," 125. 9. Laura Arruda and others, review of "The Harvard Chaucer Page." Accessed Accessed Sept. 24, 1999, www.umkc.edu/lib/engelond. 10. Sherrida D. Harris and Jennifer Kearney, review of "The Medieval Feminist Index: Scholarship on Women, Sexuality, and Gender." Accessed Sept. 24, 1999, www.umkc.edu/lib/engelond. 11. Smalley, "Partnering with Faculty to Interweave Internet Instruction into College Coursework," 20. 12. In January 1999 Engelond received 368 hits, with the three most fre- quently accessed items being "Criteria" (157), "Internet Resources" (130), and "Class Picks" (128). In February the total number of hits dropped to 216, with the most frequently accessed items being "Criteria" (130), "Audio-Visual" (59), and "Internet Resources" and "Class Picks" (both with 46). In March the total number of hits was 323, with the favorite resources again being "Criteria" (113), "Internet Resources" (74), and "Class Picks" (65). Statistics are based on a study of the daily use logs. Accessed Sept. 24, 1999,www.umkc.edu/ _reports/. 13. Larry Hardesty, "The Role of the Classroom Faculty in Bibliographic Instruction," in Teaching Librarians to Teach: On-the-Job Training for Bibliographic Instruction Librarians, Alice F. Clark and Kay F. Jones eds. (Metuchen: Scarecrow Pr., 1986), 171-72. COMMUNICATIONS I WALTER 41 10073 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Site License Initiatives in the United Kingdom: The PSLI and NESLI Experience Borin, Jacqueline Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 42 L Site License Initiatives in the United Kingdom: The PSLI and NESLI Experience Jacqueline Borin This article examines the development of site licensing within the United Kingdom higher education community. In particular, it looks at haw the pressure to make better use of dwindling fiscal resources led ta the conclusion that information technology and its exploitation was necessary in order to create an effective library service. These conclusions, reached in the Follett Report of 1993, led to the establishment of a Pilot Site License Initiative and then a National Electronic Site License Initiative. The focus of this article is these initiatives and the issues they faced, which included off-site access, definition of a site and perhaps most importantly, the unbundling of print and electronic journals. Increased competition for institution funding around the world has result- ed in an erosion of library funding. In the United States state universities are receiving a decreasing portion of their funds from the state while pri- vate universities are forced to limit tuition increases due to outside mar- ket forces. In the United Kingdom the entitlement to free higher educa- tion is currently under attack and losing ground. Today's economic pressures are requiring individual libraries to make better use of their fiscal resources while the emphasis moves from being a repository for information to providing access to information. Jacqueline Sorin (jborin@csusm.edu) is Coordinator of Reference and Electronic Resources, Library and Information Services, California State University, San Marcos. As in the United States, the use of consortia for cost sharing in the United Kingdom is becoming imperative as producers produce more electronic materials and make them available in full-text formats. Consortia, while orig- inally formed to cooperate on interli- brary loans and union catalogs, have recently taken on a new role, driven by financial expediency, in negotiating electronic licenses for their members, and the percentage of vendor contracts with consortia are rising. Academic libraries cannot afford the prevalent pricing model that asks for the current print price plus an electronic surcharge plus projected inflation surcharges, therefore group purchasing power allows higher education institutions to leverage the money they have and to provide resources that would other- wise be unavailable. Advantages for the vendor include one negotiator and one technical person for the consortia as a whole. In addition, the use of con- sortia provide greater leverage in pushing for the need for stable archiv- ing and for retaining the principles of fair use within the electronic environ- ment as well as reminding publishers of the need for flexible and multiple economic models to deal with the diverse needs and funding structures of consortia. I During the spring of 1998, while visiting academic libraries in the United Kingdom, I looked at an exist- ing initiative within the UK higher education community-the Pilot Site License Initiative (PSLI), which had begun as a response to the Follett Report and to rising journal prices. At the time the three-year initiative was nearing its end and its successor, the National Electronic Site License Initiative (NESLI), was already the topic of much discussion. I History The concept of site licensing in the United Kingdom higher education 42 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 community had already been estab- lished, since 1988, by the Combined Higher Education Software Team (CHEST), based at the University of Bath. CHEST has negotiated site licenses with software suppliers and some large database producers through two different methods. Either the supplier sells a national license to CHEST, which passes it on to the individual institution or CHEST sells licenses to the institu- tion on the suppliers behalf and pass- es the fees on to them (see figure 1). CHEST works closely with National Information Services and Systems (NISS). NISS provides a focal point for the UK education and research communities to access infor- mation resources. NISS's Web serv- ice, the NISS Information gateway, provides a host for CHEST informa- tion such as Ebsco Masterfile and OCLC NetFirst. Most CHEST agree- ments are institution-wide site licenses that allow for all noncom- mercial use of the product, normally for five years to allow for incorpora- tion into the curriculum. Once an institution signs up it is committed for the full term of the agreement. CHEST is not in the business of either evaluating products or differ- entiating among competing suppli- ers. Evaluations and purchase decisions are left up to the individual institutions.2 CHEST does set up and support e-mail discussion lists for each agree- ment so that users can discuss fea- tures and problems of the product among themselves. They also send out electronic news bulletins to pro- vide advance warning of forthcom- ing agreements and to assess level of interest in future agreements. CHEST operates in a similar manner to many library consortia in the United States. The major differences are that it sells to higher education institutions as a whole so the products they sell include not only databases but also for example, software programs. This is also beginning to change in Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the United States. A recent article in the Chronicle of Higher Education men- tions that institutions will not stop with library databases, "in the future we'll be negotiating site licenses for software and all sorts of things . . . not just databases."3 Although CHEST is substantial- ly self-funding it is strongly sup- ported (as is NISS) by the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils of England (HEFCE). The majority of public funding for higher education fund- ing in the United Kingdom is fun- neled through the HEFCs (one each for England, Scotland, Wales, and Northern Ireland). One of the JISC committees, the Information Services Subcommittee (ISSC), which in 1997 became part of the Committee for Electronic Information (CEI) defined principles for the delivery of con- tent. 4 They were: • free at the point of use; • subscriptions not transaction based; • lowest common denominator; • universality; • commonality of interfaces and • mass instruction. I Follett Report In 1993 an investigation into how to deal with the pressures on library resources caused by the rapid expan- sion of student numbers and the worldwide explosion in academic knowledge and information was undertaken by the Joint Funding Council 's Libraries Review Group, chaired by Sir Brian Follett. This investigation resulted in the Follett Report. One of the key conclusions of the report was "The exploitation of IT is essential to create the effective Higher Education and Public Research Establishments Software, Data , Training Needs ! CHEST © CHEST (University of Bath) 1996 Figure 1. CHEST Diagram CHEST Deals , CHEST Offers Negotiations Software , Data, Training Materials t IT Product Suppliers library service of the future ." The review group recommended that as a starting point "a pilot initiative between a small number of institu- tions and a similar number of pub- lishing houses should be sponsored by the funding councils to demon- strate in practical terms how material can be handled and distributed elec- tronically." 5 As a consequence £15 million was allocated to an Electronic Libraries Program, managed by JISC on behalf of HEFCE. The Electronic Libraries Program was to "engage the higher education community in developing and shaping th e imple- mentation of the electronic library." 6 This project provided a body of elec- tronic resources and services for UK higher education and influenced a cultural shift towards the acceptance and use of electronic resources instead of more traditional informa- tion storage and access methods. PSLI In May 1995 a pilot site license initia- tive subsidized by the funding coun- cils was set up to : • Test if the site license concept could provide wider access to journals for those in the academ- ic community; • See if it would allow more flexi- bility in the use of scholarly material ; • Test the methods for dissemina- tion of scholarly material to the higher education sector in a vari- ety of formats ; • Test legal models for a national site license program; and • Explore the possibility for increased value for money from scholarly journals.7 Sixty-five publishers were invit- ed by HEFCE to participate for three years commencing January 1, 1996. HEFCE was also responsible through JISC for the funding of the elib pro- gram, but no formal links were estab- lished between the elib project and COMMUNICATIONS I BORIN 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the PSLI. 8 The final selection of four companies included Academic Press Ltd., Blackwell Publishers Ltd., Blackwell Science Ltd., and IOP Publishing Ltd. The publishers agreed to offer print journals to high- er education institutions for dis- counts of between 30 and 40 percent over the three year period as well as electronic access as available. Originally the electronic journals were supposed to be the subsidiary component of the agreement but by the end of the agreement they had become the major focus. The PSLI achieved almost 100 percent take up among the higher education institu- tions due to the anticipated savings through the program.9 HEFCE did not specify how the publishers were to deliver their con- tent. IOPP hosted the journals on their own server, for example, while Academic Press linked their IDEAL server to the Journals Online service at the University of Bath. One of the key provisions of the site license was the unlimited rights of authorized users to make photocopies (includ- ing their use within course packs) of the journals. Academic Press and IOPP provided full-text access to all their journals while Blackwell and Blackwell Science only allowed reading of full text where a print subscription existed. An integral part of the PSLI was that the funding from HEFCE to the higher education institutions was top sliced to sup- port the discounted price offered to the institutions. Several assessments of the initia- tive were made and a final evalua- tion of the pilot was concluded at the end of 1997. Initial surveys indicated subscription savings through the program (average annual savings were approximately £11,800 per annum) and the first report of the evaluation team showed a wide level of support for the project despite major problems with lack of commu- nication in a timely manner.10 The team recommended an extension of the PSLI to include more publishers and more emphasis on electronic delivery. One concern that was raised was ease of access, students had to know which system a journal they required was on. This was not easily discernible or user friendly. Evaluations by focus groups showed users wanted one single access point to all electronic journals.11 Also unre- solved was the need for one consis- tent interface to the electronic journals and a solution to the archiv- ing issue. At the end of the PSLI, HEFCE handed the next phase over to JISC. In the fall of 1997 JISC announced that a NESLI would be set up and a new steering group was established. NESLI was to be an electronic-only scheme and the invitation to tender went out at the end of 1997 with a decision to be made mid-1998. National Electronic Site License Initiative NESLI, a three-year JISC funded pro- gram, began on January 1, 1999 although the "official" launch was held at the British Library on June 15, 1999. It is an initiative to deliver a national electronic journal service to the United Kingdom higher education and research community (approxi- mately 180 institutions) and is a suc- cessor program to the Pilot Site License Initiative {PSLI). In May 1998 JISC appointed a consortium of Swets and Zeitlinger and Manchester Computing {University of Manchester) to act as a managing agent (Swets and Blackwell Ltd. announced in June 1999 their intention to combine Swets Subscription Service and Blackwell's Information Services, the two sub- scription agency services). The manag- ing agent represents the higher education institutions in negotiations with publishers, manages delivery of the electronic material through a single Web interface and oversees day-to-day operation of the program including the handling of subscriptions.12 44 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 The managing agent also encour- ages the widespread acceptance by publishers of a standard model site license, one of the objectives of this being to reduce the number and diversity of site definitions used by publishers. Other important provi- sions of the model site license addressed the issues of walk-in use by clients and the need for publishers to provide access to material previ- ously subscribed to when a subscrip- tion is cancelled. The subscription model is currently the prevalent option although they are also work- ing towards a pay-per-view option.13 Priority has been given to pub- lishers who had been involved in the PSLI and to those publishers partici- pating in SwetsNet, the delivery mechanism for the NESLI. SwetsNet is an electronic journal aggregation service that offers access to and man- agement of Internet journals. Its search engine allows searching and browsing through titles from all pub- lishers with links to the full-text arti- cles. NESLI is not a mandatory initiative, the higher education insti- tutions can choose whether to partic- ipate in proposals and can pursue their own arrangements individually or through their own consortiums if they wish. While PSLI was basically a print- based initiative limited to a small number of publishers and funded via top slicing, NESLI is an electronic ini- tiative aimed at involving many more publishers. It is designed to be self-funding, although it did receive some start-up funding. Although it is an electronic initiative, proposals that include print will be considered, as it is still not easy to separate print and electronic materials.14 The initiative addresses the most effective use, access, and purchase of electronic journals in the academic library community. Its aims include: • access control-for on-site and remote users; • cost; Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • definition of a site; • archiving; and • unbundling print from electronic. Access to SwetsNet, the delivery mechanism for journals included in NESLI, has now been supplemented by the option of Athens authentica- tion. Athens, an authentication sys- tem developed by NISS, provides individuals affiliated with higher education institutions a single user- name and password for all electronic services they have permission to access. Athens is linked to SwetsNet to ensure access for off-site, remote, and distance learners who do not have a fixed IP address. This supple- ments SwetsNet's IP address authen- tication, which does not allow for individual access to TOC and SDI alerting. A help desk is available for all NESLI users through the Univer- sity of Manchester. The definition of a site is being addressed by the NESLI model site license, which tries to standardize site definitions (including access from places that authorized users work or study, including homes and residence halls); interlibrary loan (supplying an authorized user of another library a single paper copy of an electronic original of a individ- ual document); walk-in-users; access to subscribed material in perpetuity (it provides for an archive to be made of the licensed material with access to the archive permissible after ter- mination of the license); and inclu- sion of material in course packs. JISC' s NESLI steering group ap- proved the model NESLI site license on May 11, 1999 for use by the NESLI managing agent.15 The managing agent asks pub- lishers to accept the model license with as few alterations as possible. During the term of the initiative the managing agent will be working on additional value added services. These include links from key index- ing and abstracting services, provi- sion of access via z39.50, linking from library OPACs, creation of catalog records and assessing a model for e- journal delivery via subject clusters. In particular, they have begun to look at the technical issues concerned with providing MARC records for all elec- tronic journals included in NESLI offers. Additionally they will be look- ing at solutions for longer term archiving of electronic journals to provide a comfort level for librarians purchasing electronic only copies.16 Two offers that have been made under the NESLI umbrella so far are Blackwell Sciences for 130 electronic journals and Johns Hopkins Uni- versity Press for 46 electronic titles. Most recently two additional ven- dors have been added to the list. Elsevier has made a proposal to deliver full text content via the pub- lishers ScienceDirect platform that includes the full text of more than 1,000 Elsevier science journals along with those of other publishers. A total of more than 3,800 journals would be included in the service.17 MCB University Press, an independ- ent niche publisher, is offering access to 114 full text journals and second- ary information in the area of man- agement through it's Emerald Intelligence + Fulltext service. Similarly, here in the United States, California State University (CSU) put out for competitive tender a contract for the building of a cus- tomized database of 1200+ electronic journals based on the print titles sub- scribed to by 15 or more of the 22 campuses-Journal Access Core Collection OACC). The journals will be made available via Pharos, a new Unified Information Access System for the CSU. Like Ohiolink, a consor- tium of 74 Ohio libraries, it will pro- vide a common interface to electronic journals for students and faculty and will facilitate the development of dis- tance learning programs.18 By unbundling the journals, libraries will no longer be required to pay for jour- nals they do not want or need leading to moderate price savings. Additional savings can be realized through the lowering of overhead costs achieved by system wide purchasing of core resources. Other issues being addressed within the JACC RFP included archiving and perpetual access to journal articles the universi- ty system has paid for, availability of e-journals in multiple formats, inter- library loan of electronic documents, currency of content and cost value at the journal-title level. 19 Currently 500 core journals are being provided under the JACC by Ebsco Inform- ation Services and the CSU plans on expanding those offerings. I Conclusion As we move into the next millennium library consortia will continue to work together with vendors to further cus- tomize journal offerings. However it is still far too early to say whether NESLI will be successful or whether it will succeed in getting the publishing industry to accept the model site license. If it is to work within the high- er education community, it will depend greatly on the flexibility and willingness of the publishers of schol- arly journals. It has made a start by developing a license that sets a wider definition of a site and that deals real- istically with the question of off-site access. By encouraging the unbundling of electronic and print subscriptions NESLI allows services to be tailored to specific needs of the information community, but it remains to be seen how many publish- ers are prepared to accept unbundled deals at this stage. Also as technology stabilizes and libraries acquire increas- ingly larger electronic collections, we will not be able to rely on license nego- tiations as the only way to influence pricing, access, and distribution. An additional problem that remains unaddressed by either PSLI or NESLI is the pressure on academics to pub- lish in traditional journals and the cor- COMMUNICATIONS I BORIN 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. responding rise in scholarly journal prices. NESLI neither encourages nor hinders changes in scholarly commu- nication and therefore the question of restructuring the scholarly communi- cation process remains.20 References and Notes 1. Barbara McFadden and Arnold Hirshon, "Hanging Together to Avoid Hanging Separately: Opportunities for Academic Libraries and Consortia," Information Technology and Libraries 17, no. 1 (March 1998): 36. See also International Coalition of Library Consortia, "Statement of Current Perspective and Preferred Prac- tices for the Selection and Purchase of Electronic Information," Information Technology and Libraries 17, no. 1 (March 1998): 45. 2. Martin S. White, "From PSLI to NESLI: Site Licensing for Electronic Journals," New Review of Academic Librarianship 3, (1997): 139-50. See also CHEST. CHEST: Software, Data, and Information for Education (1996). 3. Thomas J. DeLoughry, "Library Consortia Save Members Money on Electronic Materials," The Chronicle of Higher Education (Feb. 9, 1996): A21. 4. Information Services Subcom- mittee, "Principles for the Delivery of Content." Accessed Nov. 17, 1999, www.jisc.ac.uk/ pub97 /nl_97.html#issc. 5. Joint Funding Council's Libraries Review Group. The Follett Report. (Dec. 1993): Accessed Nov. 20, 1999, www.niss.ac. uk/ education/ hefc / follett/report/. 6. John Kirriemuir, "Background of the eLib programme." Accessed Nov. 21, 1999, www.ukoln.ac.uk/services.elib/ background/history.html. 7. PSLI Evaluation Team, "UK Pilot Site License Initiative: A Progress Report," Serials 10, no. 1 (1997): 17-20. 8. White, "From PSLI to NESLI," 149. 9. Tony Kidd, "Electronic Journals: Their Introduction and Exploitation in Academic Libraries in the UK," Serials Review 24, no. 1 (1998): 7-14. 10. Jill Taylor Roe, "United We Save, Divided We Spend: Current Purchasing Trends in Serials Acquisitions in the UK Academic Sector," Serials Review 24, no. 1 (1998): ~- 11. PSLI Evaluation Team, "UK Pilot Site License Initiative," 17-20. 12. Beverly Friedgood, "The UK National Site Licensing Initiative," Serials 11, no. 1 (1998): 37-39. 13. University of Manchester and Swets & Zeitlinger, NESLI: National Electronic Site License Initiative (1999). Accessed Nov. 21, 1999, www.nesli.ac.uk/. 14. NESLI Brochure, "Further Information for Librarians." Accessed Nov. 21, 1999, www.nesli.ac.uk/ nesli-librarians-leaflet.html. 15. A copy of the model site license is available on the NESLI Web site. Accessed Nov. 22, 1999, www.nesli.ac.uk/ Mode1License8.html. 16. Albert Prior, "NESLI Progress through Collaboration," Learned Publishing 12, no. 1 (1999). 17. Science Direct. Accessed Nov. 24, 1999, www.sciencedirect.com. 18. Declan Butler, "The Writing is on the Web for Science Journals in Print," Nature 397, Oan. 211998). 19. The Journal Access Core Collection Request for Proposal. Accessed Nov. 22, 1999, www.calstate.edu/tier3/ cs+p/rfp_ifb/980160/980160.pdf. 20. Frederick J. Friend, "UK Pilot Site License Initiative: Is it Guiding Libraries Away from Disaster on the Rocks of Price Rises?" Serials 9, no. 2 (1996): 129-33. A Low-Cost Library Database Solution Mark England, Lura Joseph, and Nern W. Schlecht Two locally created databases are made available to the world via the Web using an inexpensive but highly func- tional search engine created in-house. The technology consists of a microcom- puter running UNIX to serve relation- al databases. CGI forms created using the programming language Perl offer flexible interface designs for database users and database maintainers. Many libraries maintain indexes to local collections or resources and cre- ate databases or bibliographies con- 46 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 cerning subjects of local or regional interest. These local resource indexes are of great value to researchers. The Web provides an inexpensive means for broadly disseminating these indexes. For example, Kilcullen has described a nonsearchable, Web- based newspaper index that uses Microsoft Access 97.1 Jacso has writ- ten about the use of Java applets to publish small directories and bibli- ographies.2 Sturr has discussed the use of WAIS software to provide searchable online indexes.3 Many of the Web-based local databases and search interfaces currently used by libraries may: • have problems with functionality; • lack provisions for efficient searching; • be based on unreliable software; • be based on software and hard- ware that is expensive to pur- chase or implement; • be difficult for patrons to use; and • be difficult for staff to maintain. After trying several alternatives, staff members at the North Dakota State University Libraries have implemented an inexpensive but highly functional and reliable solu- tion. We are now providing search- able indexes on the Web using a microcomputer running UNIX to serve relational databases. CGI forms created at the North Dakota State University Libraries using the pro- gramming language Perl offer flexi- ble interface designs for database users and database maintainers. This article describes how we have imple- Mark England (england@badlands. nodak.edu) is Assistant Director, Lura Joseph (ljoseph@badlands.nodak.edu) is Physical Sciences Librarian, and Nem W. Schlecht (schlecht@plains.nodak.edu) is a Systems Administrator at the North Dakota State University Libraries, Fargo, North Dakota. 10074 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A Low-Cost Library Database Solution England, Mark;Lura, Joseph;Schlecht, Nem W Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 46 responding rise in scholarly journal prices. NESLI neither encourages nor hinders changes in scholarly commu- nication and therefore the question of restructuring the scholarly communi- cation process remains.20 References and Notes 1. Barbara McFadden and Arnold Hirshon, "Hanging Together to Avoid Hanging Separately: Opportunities for Academic Libraries and Consortia," Information Technology and Libraries 17, no . 1 (March 1998): 36. See also International Coalition of Library Consortia, "Statement of Current Perspective and Preferred Prac- tices for the Selection and Purchase of Electronic Information," Information Technology and Libraries 17, no. 1 (March 1998): 45. 2. Martin S. White, "From PSLI to NESLI: Site Licensing for Electronic Journals," New Review of Academic Librarianship 3, (1997): 139-50. See also CHEST. CHEST: Software, Data, and Information for Education (1996). 3. Thomas J. DeLoughry, "Library Consortia Save Members Money on Electronic Materials," The Chronicle of Higher Education (Feb. 9, 1996): A21. 4. Information Services Subcom- mittee , "Principles for the Delivery of Content." Accessed Nov . 17, 1999, www.jisc.ac.uk/ pub97 / nl_97.html#issc. 5. Joint Funding Council's Libraries Review Group . The Follett Report. (Dec. 1993): Accessed Nov . 20, 1999, www.niss . ac . uk/ ed ucation/hefc/ follett/report/ . 6. John Kirriemuir, "Background of the eLib programme ." Accessed Nov . 21, 1999, www .ukoln.ac.uk/services .elib/ background/history.html . 7. PSLI Evaluation Team, "UK Pilot Site License Initiative : A Progress Report," Serials IO, no. 1 (1997): 17-20. 8. White, "From PSLI to NESLI," 149. 9. Tony Kidd, "Electronic Journals: Their Introduction and Exploitation in Academic Libraries in the UK," Serials Review 24, no . 1 (1998): 7-14. 10. Jill Taylor Roe, "United We Save, Divided We Spend: Current Purchasing Trends in Serials Acquisitions in the UK Academic Sector," Serials Review 24, no. 1 (1998): ~- 11. PSLI Evaluation Team, "UK Pilot Site License Initiative," 17-20. 12. Beverly Friedgood, "The UK National Site Licensing Initiative," Serials 11, no. 1 (1998): 37-39 . 13. University of Manchester and Swets & Zeitlinger, NESLI: National Electronic Site License Initiative (1999). Accessed Nov. 21, 1999, www.nesli.ac.uk/. 14. NESLI Brochure, "Further Information for Librarians." Accessed Nov . 21, 1999, www .nesli .ac.uk/ nesli-librarians-leaflet.html. 15. A copy of the model site license is available on the NESLI Web site . Accessed Nov . 22, 1999, www .nesli .ac .uk/ Mode1License8.html . 16. Albert Prior, "NESLI Progress through Collaboration," Learned Publishing 12, no . 1 (1999). 17. Science Direct. Accessed Nov. 24, 1999, www .sciencedirect.com. 18. Declan Butler, "The Writing is on the Web for Science Journals in Print," Nature 397, Oan. 211998) . 19. The Journal Access Core Collection Request for Proposal. Accessed Nov . 22, 1999, www .calstate.edu/tier3/ cs+p/rfp_ifb/980160/980160.pdf . 20. Frederick J. Friend, "UK Pilot Site License Initiative: Is it Guiding Libraries Away from Disaster on the Rocks of Price Rises?" Serials 9, no. 2 (1996): 129-33. A Low-Cost Library Database Solution Mark England, Lura Joseph, and Nem W. Schlecht Two locally created databases are made available to the world via the Web using an inexpensive but highly func- tional search engine created in-house. The technology consists of a microcom- puter running UNIX to serve relation- al databases. CGI forms created using the programming language Perl offer flexible interface designs for database users and database maintainers. Many libraries maintain indexes to local collections or resources and cre- ate databases or bibliographies con- 46 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 cerning subjects of local or regional interest. These local resource indexes are of great value to researchers. The Web provides an inexpensive means for broadly disseminating these indexes. For example, Kilcullen has described a nonsearchable, Web- based newspaper index that uses Microsoft Access 97.1 Jacso has writ- ten about the use of Java applets to publish small directories and bibli- ographies.2 Sturr has discussed the use of WAIS software to provide searchable online indexes.3 Many of the Web-based local databases and search interfaces currently used by libraries may: • have problems with functionality; • lack provisions for efficient searching; • be based on unreliable software; • be based on software and hard- ware that is expensive to pur- chase or implement; • be difficult for patrons to use; and • be difficult for staff to maintain. After trying several alternatives, staff members at the North Dakota State University Libraries have implemented an inexpensive but highly functional and reliable solu- tion. We are now providing search- able indexes on the Web using a microcomputer running UNIX to serve relational databases. CGI forms created at the North Dakota State University Libraries using the pro- gramming language Perl offer flexi- ble interface designs for database users and database maintainers. This article describes how we have imple- Mark England (england@badlands . nodak.edu) is Assistant Director, Lura Joseph (ljoseph@badlands.nodak.edu) is Physical Sciences Librarian, and Nem W. Schlecht (schlecht@plains.nodak.edu) is a Systems Administrator at the North Dakota State University Libraries, Fargo, North Dakota. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mented this technology to distribute two local databases to the world via the Web. It is hoped that recounting our experiences will facilitate other such projects . I Creating the Databases The two databases that we selected to use as demonstrations of this technol- ogy are a community newspaper index and a bibliography of publica- tions related to North Dakota geology. The Forum Index The Farg o Forum is a daily newspaper published in Fargo, North Dakota. It began publication in 1879 and is the paper of record for North Dakota . For many years, the North Dakota State University Libraries have main- tained an index to the Forum. Beginning with the selective index- ing of notable events and editions, we started offering full-text indexing of the entire paper in 1996. Until early in the 1980s, all indexing was done manually and preserved on cards or paper. Then for several years , indexing was done on one of the university's mainframe comput- ers . Starting in 1987, microcomputers were used to compile the index, first using DBASE and then using Pro- Cite as the database management software . Printed copies of the data- base were sold annually to subscrib- ing libraries and businesses . Starting in the summer of 1996, th e library made arrangements with the pub- lisher of the paper to acquire digital copy of the text of each newspaper. In early 1997, the NDSU Libraries began a project to place all of our Forum indexes on the Web. DBASE, Pro-Cite, WordPerfect, or Microsoft Access computer files existed for the newspaper index from 1879 to 1975, 1988, and from 1990 to 1996. All other data was unavailable or unreadable. Printed indexes from 1976 to 1987 and 1989 were scanned using a Hewlett Packard 4C scanner fitted with a page feeder . Optical character recog- nition was accomplished using the software OmniPage Pro. Once expe- rience was gained with scanner and software settings, the scanning went very quickly with very few errors appearing in the data. Various mem- bers of the library staff volunteered to check and edit the data, and the digitizing of approximately 1,500 pages was completed in about three weeks. All data were checked and nor- malized using Microsoft's Excel spreadsheet software and then saved as tab-delimited text. Programmer's File Editor was used to do the final text editing. Because of variations in the completeness of the indexing, three separate relational database tables were created: one each for the years 1879-1975, 1976-1996, and 1996-the present. The Collective Bibliography of North Dakota Geology In 1996 a project was initiated to combine three bibliographies of North Dakota geology and to make the final product searchable and browsable on the Web. All three of the original print bibliographies were published by the North Dakota Geological Survey. Scott published the first bibliography as a thesis . It is a bibliography of all then-known North Dakota geological literature published between 1805 and 1960, and most entries are annotated. 4 The second print bibliography, also by Scott, focuses on North Dakota geo- logical literature published in the years 1960 through 1979, and also includes some material omitted in the first bibliography .5 Most entries in the second bibliography include annotations in the form of keywords or keyword phrases. The third bibli- ography covers the years 1980 through 1993, and is not annotated.6 All three bibliographies are indexed . The third bibliography was available in digital format, whereas the first two were in print format only. Library staff members began rekeying the two print bibliographies using Microsoft Word. The remain- ing pages were digitally scanned using a new Hewlett Packard 4C scanner and the optical character recognition software OmniPage Pro . There were many errors in the result- ing text. Different font sizes in the original documents may have con- tributed to optical recognition errors . Editing of the scanned pages was nearly as time consuming and tedious as rekeying the documents . The Microsoft Word documents were saved as text files and combined as a single text file. Programmer's File Editor was used as a final editor to remove any line breaks or other undesirable formatting. Each record was edited to occupy one line, and each field was delimit- ed by two asterisks . Asterisks were used because there were many occur- rences of commas, semicolons, and other symbols that would have made it difficult to parse any other way. Because italics were removed by con- verting to a text file, some errors were made in parsing. In retrospect, parsing should have been done before the document was saved as a text file. Punctuation between fields was removed because the database would be converted to a large table. It would have been better to leave the punctuation intact, since it can- not easily be put back in for the out- put to be presented in bibliographic form. The alphabetical additions to publication dates (e.g. Baker, 1966a) were left intact to aid in hand-cutting and pasting index terms into the records at a later date. Initially, the resulting document was converted to a Microsoft Access file so that it would be in a table for- mat. However, many of the fields COMMUNICATIONS I ENGLAND, JOSEPH, AND SCHLECHT 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Secure Database: Shaw diese fields in results : Aalhor: ::::=====~---~---Date : le~al to i P Author P Date P Tille P' Source Tid,:L . _J r Annot:l!iom R: Index Sour1:e: ~=====~ Amiotalions: l ... -· · ... ······-···~~ ..... ·.-... --.... J r Prilll Resource P Record Number bulu: ;:::::::::::=::::::::::::::::::;:::;,~~ - Priat Re1oun:11: ! Show all ii Record Naml,er: I equal to iJ l=:J Sort results by: jAulhor j r Descending AI B IC IDI EIF IG IHII IJ IKIL IM IN IO IPI.Q.IRI S IT IUIVIWIX IY IZ Figure 1: Secure Database Editing Interface were well over the 256 character limit of individual fields . To solve this problem, the data were imported into a relational database called MySQL, which allows large data fields called "blobs." Running under UNIX, MySQL is very flexible and powerful . I Database and Search Engine Design We examined the features and capa- bilities of various online bibliogra- phies and indexes when deciding on our search interfaces and search engine designs . We wanted our data- bases to be both searchable and browsable and, in the case of the Collective Bibliography of North Dakota Geology, we wanted to pro- vide the option of receiving search results accurately in a specific biblio- graphic format. We wanted both sim- ple and advanced search capabilities, including the ability to do highly sophisticated Boolean searching. Finally, we wanted to provide those maintaining the databases with the ability to easily add, delete, and change records from within simple forms on the Web and immediately see the results of this editing . MySQL uses a Perl interface, DBI (Database Independent Interface), which makes accessing the database simple from a Perl script. Essentially, a SQL statement is generated, based on data from an HTML form. This SQL statement is then run against the MySQL database, returning match- ing rows that the same script can handle and display as needed. All of the dynamically generated pages in this database are created this way. Using both MySQL and Perl provid- ed a nice, elegant way to integrate database functionality with the Web. The databases were installed on a server and made available via the Web. It soon became apparent that there were problems with large num- bers of returns . Depending upon the client machine's hardware configura- tion, browsers could lock up the 48 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 machine. While an efficient search should not result in such a large num- ber of hits, we decided to limit returns to reduce this problem. Following suggestions from users, various search tips were added, and some search interface terminology was changed. From a secure gateway , it is possi- ble to call up different forms that allow individual records to be dis- played, edited, and saved (see figure 1). New records are added by using a simple HTML form . It is also possible to bulk-load large numbers of records by using a special Perl program to load the data directly from a text file. I Advantages of the UNIX/MySQL Solution After first using Glimpse, a popular Web search engine, under Linux, a free UNIX platform, and then Microsoft's Internet Information Server (IIS) software on a Windows NT platform to search the Forum newspaper index, we settled on using MySQL on a microcomputer running Linux and the Apache Web server. We found we could write Perl scripts that allowed users to make very sophisti- cated searches of the data from with- in very simple Web forms. MySQL is stable, reliable, free, and offers a high degree of functionality, flexibility, and efficiency. Apache is reliable, extendible, very fast, free, and offers tight control of data access. Initially, each story received from the newspaper was maintained as a separate file on a microcomputer. By having the stories as separate files, it was easy to set up Glimpse as a searching tool for the articles. Although it did provide a nice pre- view of a workable system, Glimpse did not provide enough flexibility in how records were displayed, organ- ized, or searched. It was not meant for managing data of this sort. Windows NT, although a popular and successful IT solution, was Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. found to be somewhat cumbersome to implement and did not provide enough flexibility. The installation of these tools was easy, but it was diffi- cult to obtain a high level of database and Web integration . Reliability and cost were also concerns . We found that UNIX was more stable and practically eliminated any unavailability of the data . Perl, MySQL, and Apache were ultimately used to manage, store, and deliver the data. Although these products are available for Windows NT, their native platform is UNIX. By running these products on UNIX, we were able to take advantage of all the fea- tures offered by each of the products. We found that MySQL offered the flexibility and power to manage both sets of data efficiently. Also, to load the data into a relational database such as MySQL required the data to be normalized. Normalized data are data that are separated into logically separate components. To normalize data often takes some extra effort, as fields must be defined to contain cer- tain types of data, but in the end the data is easier to manage and well organized. By having articles and bibliographies in a relational data- base, we are able to easily make updates, additions, and generate out- put or reports on the data in many different ways. There are several Web servers available on the market today . However, Apache is often singled out as being the most popular server . Apache, like Perl and MySQL, is available free for all uses (educational and commercial). Using Apache and .htaccess control files, we are able to restrict access to administrative pages where data are added or modified. Many extensions for Apache are available to increase Web perform- ance in different situations. For exam- ple, a module for Apache allows the Web server to execute Perl code with- in the server without the need to run the regular Perl interpreter. I Conclusion and Future Plans Work is under way to refine and update The Collective Bibliography of North Dakota Geology. Because bibli- ography number three was not anno- tated, index terms are being added to facilitate searching and retrieval of citations. We have recently updated The Collective Bibliography of North Dakota Geology to include citations to publications through 1998, and we plan to update the database annually. Additionally, we receive monthly updates of Forum articles, which are added using a simple Perl script as soon as they are received. We have successfully implemented a number of other databases using these meth- ods. We realize that this UNIX/ MySQL solution is likely to be most helpful to other academic libraries: there are generally students and staff available on many campuses who are capable of programming in Perl and maintaining SQL databases on UNIX servers. Our Perl scripts are available at the URL ww.lib.ndsu .nodak.edu/ kids. References and Notes 1. M . Kilcullen, "Publishing a Newspaper Index on the World Wide Web Using Microsoft Access 97," The Indexer 20, no . 4 (1997): 195-96 . 2. P . Jacso, "Publishing Textual Databases on the Web," Information Today 15, no . 11 (1998): 33, 36 3. N .O . Sturr, "WAIS: An Internet Tool for Full-Text Indexing," Computers in Libraries 15 (June 1995): 52-54. 4. M .W . Scott, Annotated Bibliography of the Geology of North Dakota 1806-1959 North Dakota Geological Survey Miscellaneous Series, no. 49 . (Grand Forks , N .D .: North Dakota Geological Survey , 1972). 5. M . W . Scott , Annotated Bibliography of the Geology of North Dakota 1960-1979 North Dakota Geological Survey Miscellaneous Series, no. 60. (Grand Forks, N.D.: North Dakota Geological Survey, 1981). 6. L. Greenwood and others, Bibliography of the Geology of North Dakota 1980-1993 North Dakota Geological Survey Miscellaneous Series, no. 83. (Bismarck, N .D .: North Dakota Geological Survey, 1996). Related URLs Linux Homepage: www.linux.org/ MySQL Homepage: www.mysql.com/ Perl Homepage: www.perl.com/ Apache Homepage: www.apache.org/ NDSU Forum Index: www.lib.ndsu. nodak.edu/Forum/ Collective Bibliography of North Dakota Geology: www.lib.ndsu.nodak.edu/ ndgs/ COMMUNICATIONS I ENGLAND, JOSEPH, AND SCHLECHT 49 10075 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Internet, the World Wide Web, Library Web Browsers, and Library Web Servers Jian-Zhong, Zhou Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 50 Tutorial The Internet, the World Wide Web, Library Web Browsers, and Library Web Servers Jian-Zhong (Joe) Zhou This article first examines the difference between two very familiar and sometimes synonymous terms, the Internet and the Web. The article then explains the relation- ship between the Web's protocol HTTP and other high-level Internet protocols, such as Telnet and FTP, as well as provides a brief history of Web development. Next, the article analyzes the mechanism in which a Web browser (client) "talks" to a Web server on the Internet. Finally, the article studies the market growth for Web browsers and Web servers between 1993 and 1999. Two statis- tical sources were used in the Web market analysis: a survey conducted by the University of Delaware Libraries for the 122 members of the Association of Research Libraries, and the data for the entire Web industry from different Web survey agencies. Many librarians are now dealing with the Internet and the Web on a daily basis. While the Web is some- times synonymous with the Internet in many people's minds, the two terms are quite distinct, and they refer to different but related concepts in the modem computerized telecommunication system. The Internet is nothing more than many small computer networks that have been wired together and allow electronic information to be sent from one network to the next around the world . A piece of data from Joe Zhou (joezhou@udel.edu) is Associate Librarian at the University of Delaware Library, Newark. Beijing, China may traverse more than a dozen networks while making its way to Washington, D.C. We can compare the Internet to the Great Wall of China, which was built in the Qin dynasty around the third centu- ry B.C. by connecting many existing short defense walls built by previous feudal states . The Great Wall not only served as a national defense system for ancient China, but also as a fast military communication system. A border alarm was raised by means of smoke signals by day, and beacon fires at night, ignited by burning a mixture of wolf dung , sulfur, and saltpeter. The alarm signal could be relayed over many beacon-fire tow- ers from the western end of the Great Wall to the eastern end (4,500 miles away) within a day . This was consid- ered light speed two thousand years ago. However, while the Great Wall transferred the message in a linear mode, the Internet is a multidimen- sional network. The Web is a late-comer to the Internet, one of the many types of high-level data exchange protocols on the Internet. Before the Web, there was Telnet, the traditional command- driven style of interaction. There was FTP, a file transfer protocol useful for retrieving information from large file archives. There was Usenet , a com- munal bulletin board and news sys- tem. There was also e-mail for individual information exchange, and e-mail lists, for one-to-many broadcasts. In addition, there was Gopher, a campus-wide information system shared among universities and research institutions, and WAIS, a powerful search and retrieval sys- tem developed by Thinking Machines, Inc. In 1990 Tim Bemers- Lee and Robert Cailliau at CERN (www. cern.ch), the European Laboratory for Particle Physics, cre- ated a new information system called "World Wide Web" (WWW). Designed to help the CERN scientists with the increasingly confusing task of exchanging information on the 50 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Internet, the Web system was to act as a unifying force, a system that would seamlessly bind all file-proto- cols into a single point of access. Instead of having to invoke different programs to retrieve information via various protocols, users would be able to use a single program, called a "browser," and allow it to handle all the details of retrieving and display- ing information. In December 1993 WWW received the IMA award, and in 1995 Bemers-Lee and Cailliau received the Association for Computing (ACM) Software System Award for its development. The Web is best known for its ability to combine text with graphics and other multimedia on the Internet. In addition, the Web has some other key features that make it stand out from earlier Internet infor- mation exchange protocols. Since the Web is a late-comer to the Internet, it has to be compatible backwards with other communications protocols in addition to its native language, HyperText Transfer Protocol (HTTP). Among the foreign languages spo- ken by Web browsers are Telnet, FTP, and other high-level communication protocols mentioned earlier. This support for foreign protocols lets people use a single piece of software, the Web browser, to access informa- tion without worrying about shifting from protocol to protocol and soft- ware incompatibility . Despite different high-level pro- tocols including HTTP for the Web, there is one thing in common for all parts of the Internet-TCP/ IP, the lower level of the Internet protocol. TCP /IP is respon sible for establish- ing the connection between two com- puters on the Internet and guarantees that the data can be sent and received intact. The format and content of the data are left for high-level communi- cation protocols to manage, among which the Web is the best known one. At the TCP /IP level all computers "are created equal." Two computers establish a connection and start to Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. communicate. In reality, however, most conversations are asymmetric. The end user's machine (the client) usually sends a short request for information, and the remote machine (the server) answers with a long- winded response. The media is the Internet. The common language on the Internet can be the Web or any other high-level protocols . On the Web, the client is the Web browser; it handles the user's request for a document. The first Web brows- er, NCSA Mosaic, developed by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana- Champaign, was released in mid- November 1993 for Unix, Windows, and Macintosh platforms. Version 3.0 of NCSA Mosaic is available at www. ncsa. uiuc.ed u/ SDG /Software/ Mosaic. Both source code and bina- ries are free for academic use. Mosaic lost market share to Netscape after its key developer left NCSA and joined Netscape. Even after Mosaic introduced an innovative 32-bit ver- sion in early 1997, which can perform feats that other major browsers had not even thought of back then, Mosaic remained out of the major browsers' market. The two most widely-used browsers today are Microsoft's Internet Explorer (IE) and Netscape's Navigator (part of the Netscape Communicator suite). Recent Web browser surveys conducted by dif- ferent Internet survey companies such as www.zonaresearch.com/ browserstudy, www.psrinc.com/ Trends.htm, and www .statmarket. com all indicate that IE is the market leader with more than 60 percent market share, leaving Navigator with between 35 percent and 40 per- cent. In 1995 IE had only 1 percent share versus Navigator's more than 90 percent, an unimaginable rise critics have attributed to Microsoft's strategy of bundling the browser with its near-monopoly Windows operating system. However, a survey conducted in December 1998 by the University of Delaware Library of 122 members of the Association of Research Libraries (ARL) showed that Netscape still remained the mar- ket leader among big academic libraries. More than 90 percent of ARL libraries supported Netscape, and about 50 percent also supported IE. Most ARL libraries supported both browsers, and unlike the brows- er industry survey mentioned earlier, in which only one product can be picked as the primary browser , the sum of the percentages for the ARL survey was greater than 100 percent. The main function of the Web brows- er is to request a document available from a specific server through the Internet using the information in the document's URL. The server on a remote machine returns the docu- ment usually physically stored on one of the server's disks. With the use of Common Gateway Interface (CGI), the documents do not have to be static. Rather, they can be synthe- sized at the point of being requested by CGI scripts running on the serv- er's side of the connection . In some database-driven Web servers that make the core of today's e-com- merce, the documents provided may never exist as physical files but are generated as needed from database records . The Web server can be run on almost any computer, and server software is available for almost all operating systems, such as Unix, Windows 95/98/NT, Macintosh, and OS / 2. According to the University of Delaware Library's 1998 survey of Internet Web servers among ARL member libraries, more than 32 per- cent of ARL libraries chose Apache as their Web server software, fol- lowed by the Netscape series at 29.32 percent, NCSA HTTPd at 11.28 per- cent, and Microsoft Internet Inform- ation Server (IIS) at 7.52 percent. In July 1999 the author checked the Netcraft survey at www .netcraft. com/Survey . The top three Web serv- er software programs for more than 6.5 million Web sites are Apache (56.35 percent) , Microsoft-HS (22.33 percent), and Netscape (5.65 per- cent). The Netcraft survey also pro- vides the historical market share information of major Web servers since August 1995. NCSA HTTPd was the first Web server software released, about the same time as the release of Mosaic in 1993. However, it slipped from the number-one position with more than 90 percent market share in 1993, and almost 60 percent in 1995, to less than 1 percent in July 1999. It is no longer supported by NCSA, howev- er, HTTPd remains a popular choice for Web servers due to its small size, fast performance, and solid collec- tion of features . The "inertia effect" of the existing sites (if it runs well, why bother to change?) will likely keep NCSA on the major Web server software list for some time. NCSA is free, but available only for the Unix platform. It is available from http:/ /hoohoo .ncsa.uiuc.edu. How- ever, when the author visited the site in July 1999, the following message appeared on the main page : "THE NCSA HTTPd IS NO LONGER UNDER DEVELOPMENT. It is an unsupported product. We recom- mend that you check out the Apache server, instead of installing our server." Most people who use only Web browsers may have heard of Apache only as an Indian nation or a military helicopter, not the most popular Web server software with more than 50 percent market share . It was first introduced as a set of fixes or "patch- es" to the NCSA HTTPd. Apache 1.0 was released in December 1995 as open-source server software by a group of webmasters who named themselves the Apache Group. Open-source means the source code is available and freely distributed, and it is the key to Apache's attrac- tiveness and popularity. The Apache Group members were NSCA users TUTORIAL I ZHOU 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. who decided to coordinate develop- ment work on the server software after NSCA stopped. In July 1999 the Apache Group announced that it was establishing a more formal organiza- tion called the Apache Software Foundation (ASP). In the future, the ASP (www .apache.org) will monitor development of the free software, but it will remain a "not-for-profit" foundation. Apache is high-end, enterprise-level server software and can be run on OS/2, Unix (including Linux), and Windows platforms, but a Mac version is still not available. The Netscape series includes Netscape-Enterprise, Netscape-Past- Track, Netscape-Commerce, and Netscape-Communication . Enterprise is a high-end, enterprise-level server while PastTrack serves as an entry- level server for small workgroups. Netscape supports both the Unix and the Windows NT platforms. The other major commercial Web server, Microsoft Internet Information Server (IIS), as of 1999, is only available for the Windows platform. However, one advantage of IIS over Netscape is that it can be downloaded for free as part of the Windows Option Pack. In addi- tion, IIS can handle MS Office docu- ments very well. While both the Microsoft and Netscape brand names are well recognized by millions of end users. a name alone does not neces- sarily equate to large market share, nor does a deep pocket. Apache remains the top Web server despite intense competition. One of the keys to Apache's success, in addition to its outstanding performance, lies in its open-source code movement and active user support on a wide basis. The Web server of choice for the Macintosh platforms is WebStar. However, due to the limitations of the operating system networking software, the performance of Macintosh-based servers has not been great. WebStar can be down- loaded as a free evaluation release from www.stamine.com/webstar. The Web server market is dynam- 52 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 ic and competition intense. There are more than sixty Web server products on the top list ( of Web servers with more than one thousand Web sites) as of July 1999, and newcomers are being added frequently. Acknowledgments The author thanks Peter Liu, Head of the Systems Department at the University of Delaware Library, for providing the Web survey data of ARL libraries . After this article was submitted, the survey data was pub- lished by ARL in 1999 as SPEC Kit 246: Web Page Development and Management. The author also wants to thank his dear wife Min Yang for her tech- nical assistance. Min is Webmaster and System Administrator for the Web site at A. I. duPont Nemours Foundation and Hospital for Child- ren, http:/ /kidshealth.org. 10076 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Pearls Marmion, Dan Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 53 Pearls Ed. Note: "Pearls" is a new section that will appear in these pages from time to time. It will be ITAL 's own version of the "Top Technology Trends" topic begun by Pat Ensor. These Pearls might be gleaned from a variety of places, but most often will come from discussion lists on the Net. Our first pearl, from Thomas Dowling appeared on Web4Lib on August 19, 1999 under the subject "Pixel sizes for web From : Thomas Dowling To : Multiple recipients of list Sent : Thu, 19 Aug 1999 06:07 :08 -0700 (PDT) Subject: [WEB4LIB] Pixel s izes for web pages Dan Marmion pages." He is responding to a query that asked if Web site developers should assume the standard monitor resolution is 640x480 pixels, or something else. You may want to consult the Web4Lib archive for comments from the last few merry go-rounds on this topic. Monitor size in inches is different from monitor size in pixels , which is different from window size in pixels, which is d ifferent from the rendered size of a browser's default font. Not only are these four measurements different, they operate almost wholly independently of each other . So a statement like "I have trouble reading text at 600x800" puts the blame in the wrong place . HTML inherently has no sense of screen or window dimensions. Many Web designers will argue that the only aspects to a page with fixed pixel dimensions should be inline images; such designers typically restrain their use of Images so that no single image or horizontal chain of images is wider than, say, 550px (with obvious exceptions for sites like image archives where the main purpose of a page is to display a larger Image) . Outside of images, find ways to express measurements relative to window size (percentages) or relative to text size (ems). Users detest horizontal scrolling. In my experience, users with higher screen resolutions and/or larger monitors are less likely to run any application full screen; average window size on a 1280x1024 19" or 21 " monitor is very likely to be less than B00px wide. (The browser window I currently have open is 587px wide and 737px high .) I applaud your decision to support Web access for the visually Impaired . Since that entails much , much more than monitor resolution, I trust the people actually writing your pages are familiar with the Web Content Accessibility Guidelines. It is actually possible to design web sites that are equally usable , even equally beautiful, under a wide range of view- Ing conditions. Failing to accomplish that completely is understandable; failing to identify It as a goal is not. My recommendations to your committee would be A) find a starting point that isn't tied up In presentational nitpick- ing; B) find a design that looks attractive anywhere from 550 to 1550 pixels wide; C) crank up both your workstations ' resolution and font size; and D) continue to run your browsers in windows that are approximately 600 to 640 pixels wide . Thomas Dowling OhioLINK - Ohio Library and Information Network tdowllng @ohiolink.edu PEARLS I 53 10077 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Information Ecologies: Using Technology With Heart/The Media ... Zillner, Tom Information Technology and Libraries; Mar 2000; 19, 1; ProQuest pg. 54 Book Reviews Information Ecologies: Using Technology With Heart by Bonnie A. Nardi and Vicki L. O'Day. Cambridge: MIT Pr., 1999. 232p. $27.50 (ISBN 0-262-14066-7). The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places by Byron Reeves and Clifford Nass. Cambridge: Cambridge Univ . Pr., 1996 and 1999. 305p. $28.95 (ISBN 1-575-86052- X); paper, $15.95 (ISBN 1-575-86053-8). The books I am reviewing this month are interrelated because they both focus on information technology and our changing world, with the two volumes looking at different levels of the picture. The broader, and to me more intriguing, view is presented by Nardi and O'Day in their wonder- ful book Information Ecologies. Although it is not clear from the cap- sule biographies of the dust jacket, Nardi and O'Day are anthropologists who study the world of technology in a number of locales, and they here report the findings from their field work. Among the case studies they discuss are an examination of the activities of reference librarians at two corporations and a look at a vir- tual world created for and by ele- mentary school students. But they do much more than simply present case studies, although these alone make the book a worthwhile read. In addi- tion, they argue that the most useful way to look at information technolo- gy is through the metaphor of "infor- mation ecologies," "system[s] of people, practices, values, and tech- nologies in ... particular local envi- ronment[s]." They adopt this biological metaphor after carefully considering the most commonly employed information technology metaphors: technology as tool, text, or system . In turn, they find each of these metaphors wanting. It is particularly important to choose carefully the metaphorical lenses through which technological developments are viewed. Each par- ticular metaphor has consequences for how sanguinely we view a tech- nology, and it is often worthwhile to use multiple metaphors to enhance our world view. The information ecology metaphor is particularly appropriate for an anthropological view of local "habitats" and their inhabitants and artifacts . In turn, an anthropological view is particularly apt for capturing the human side of technology (thus the subtitle: Using Technology With Heart). This is a side of things that can be overlooked in other metaphorical views, particular- ly since it requires that the sticky issue of values be considered. Unfortunately for all of us, there is a reluctance to talk of human values when considering technology. As Nardi and O'Day note, there is a ten- dency to either enthusiastically applaud new technology without regard to its effects, or to condemn all new technology as inherently debas- ing to humanity, or to simply resign oneself pessimistically to the inevitable development of technolo- gy and our lack of control over it. Nardi and O'Day tend to be cau- tious optimists, claiming that we can control technology, and the way to exercise that control is through our own local encounters with informa- tion ecologies. Thus, rather than bemoaning the dehumanizing effects of the Internet, Information Ecologies explores the successful use of Internet technologies to set up a virtual world for students and the elderly in Phoenix, Arizona. Instead of thinking or acting globally, exploit the technol- ogy locally, but do so in a way that makes sense in terms of human val- ues. On the taxonomic scale of tech- nology views, ranging from gloom and doom (e.g., the views of Clifford 54 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 Tom Zillner, Editor Stoll) to perpetual optimism (e.g., Nicholas Negroponte), I place Nardi and O'Day somewhere in the middle, but as I suggested, leaning toward cautious optimism. In fact, they spend several chapters discussing the views of others and offering prescient criticism of the deficiencies of those views . Of particular interest to me was their analysis of the French soci- ologist Jacques Ellul, who apparently sounded the alarm concerning the stress to mind and soul of constant technological change in 1954, well before the current crop of doomsay- ers. Nardi and O'Day find Ellul's views, as articulated in The Technological Society to be compelling. Yet, they claim, the rise of the Internet can counteract the trend that Ellul saw toward monotonous sameness and lack of diversity in the face of technological efficiency. Perhaps so. One thing that I was looking for in Information Ecologies were some practical tools for engaging in the kind of exploration of information habitats that Nardi, O'Day, and other anthropologists engage in. There is a spate of interest lately in the role of anthropologists in the design and deployment of new technologies, and I would like to determine its applica- bility to my modest software devel- opment projects. Unfortunately, I was mainly disappointed on this score. In fairness to the authors, they did not set out to spell out the anthropologi- cal methodology of exploring infor- mation ecologies in any detail. The purpose of the book is rather to argue that viewing the world of technology as a set of interconnected information ecologies is useful and accurate, and in many cases superior to other metaphorical views. They succeed in this goal. Now I want them to go on to write a book on using anthropo- logical methods in these ecologies without necessarily becoming a pro- fessional anthropologist. Nardi and O'Day do touch extremely briefly on a few conven- tions of interviewing subjects, with - - - - - --------- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. their most important technical dis- cussion centering on what they call "strategic questioning," which they present in the context of evolving information ecologies . They provide useful categories of questions to be asked, and specific examples. Although it may seem obvious to ask penetrating questions of members of an information habitat, this is one area in which software developers in particular fail miserably . Another seemingly obvious pointer is to pay attention . Again, its obviousness is deceptive , since most of us are poor observers who make many assump- tions about the characteristics of a work activity without observational evidence . As evidence that people intro- ducing new technologies to an ecolo- gy do not follow these simplest pieces of advice you can tum to the chapter "A Dysfunctional Ecology," to see how badly technology can fail for nontechnological reasons . This case study deals with a major teach - ing hospital that introduced a moni- toring system into its neurosurgical operating suites that captured instru- ment readings as well as complete audio and video. The system was installed to aid neurophysiologists, experts who are called in to advise neurosurgeons at key points during complex surgeries to ensure that patient neurological function is not compromised . The neurosurgeons and neurophysiologists at this hospi- tal decided that it would be more efficient for the neurophysi ologists to be able to remotely monitor multiple surgeries simultaneously. Both groups failed to consult with the other constituencies among the oper- ating team, the nurses and anesthesi- ology staff. These groups believed that their privacy was being compro- mised, particularly since it was pos- sible to tape any procedures at multiple workstations throughout the hospital. I can easily envision similar sorts of problems due to lack of communication in introducing new or modified technology into other milieus, e.g., libraries. Although the consequences might not lead to the potentially life-threat- ening situations that could arise in an operating suite, there are certainly possible outcomes where service to users could be undermined. Despite the book being not exact- ly what I (rather selfishly) want, lnformation Ecologies is a first-rate read and an important starting point for those concerned with better con- trolling technological change in the world of information. Turning from an anthropological point of view to a psychological one, The Media Equation offers another important basis for technological design and implementation, particu- larly of computer software and mul- timedia. The release last year of a paperback edition of this volume, first published in 1996, provides a convenient pretext for reviewing this work. Reeves and Nass have super- vised years of study and experimen- tation that have consistently demonstrated the truth of what they call the "media equation": that our relations with media, including com- puters and multimedia, are identical in key ways to our relationships with other human beings. This is true of all of us, even those of us sophisticat- ed enough to understand that we are dealing with devices and human artifacts rather than people . Reeves and Nass quite entertain- ingly present the technique they've used over the years to perform their research, on a step-by-step basis: 1. Pick a research finding on how people respond to each other or their environment. 2. Find the summary of the social or natural rule that the study has yielded. 3. Replace the words "person" or "environment" in the summary with media of some sort (televi- sion, movi es, computers, etc.) 4. Find the research procedure . 5. Substitute media for one of the people or the environment in the procedure. 6. Run the experiment. 7. Draw conclusions. Although this may sound face- tious, it is in fact the recipe that pro- duced the startling conclusions that we all tend to behave toward media much as we do toward other people. What's perhaps more important is that Reeves and Nass point toward techniques that practitioners can use to produce more effective media, including computer software . As a simple example, consider politeness. Reeves and Nass discovered that people treated computers with the same sort of politeness that they would other human beings, and in turn Reeves and Nass suggest that people respond better to "polite media." They then provide some fairly straightforward advice on pro- ducing polite computer programs, starting with Grice's Maxims, a set of politeness rules assembled by H. Paul Grice, a philosopher and psy- chologist. These center around truth telling, appropriate quantity of infor- mation (neither too much nor too lit- tle), relevance, and clarity. All of this is fairly unsurprising, but the authors spell out just how the max- ims can be applied to the construc- tion of computer programs . Further, they go on to suggest some rules of thumb of their own. For example, some computer programs produce verbal output but expect the user to key in his or her responses. This may be perceived by the user , possibly subconsciously, as forcing an impo- lite response, since mixing communi- cations modalities is a faux pas. Thus, they suggest that if text input is required , perhaps only text output should be supplied . This should provide you with some of the flavor of The Media Equation, and in turn you may be able to see a set of potential ethical dilemmas that can arise from utilizing BOOK REVIEWS I 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. techniques that result from the research of Reeves and Nass. This set of problems can be seen most clearly in the chapter "Subliminal Images," where they discuss how subliminal messages could be inserted into new media to advertise products or to attempt to bolster employee morale. In fact, they say, " ... it might be eas- ier to accomplish subliminal intru- sions with a computer than with a television, because software can respond to the particular input of individual users and timing is more precise." They immediately temper this insight with the caution that" ... ethical and legal issues abound." Indeed. Although some of the techniques that can be applied to new media do lead to ethical problems, I think that most of what Reeves and Nass talk about are just elements of good design. Subliminal suggestion seems to most of us to be out of bounds because it unfairly manipulates user response in a powerful way. The unfairness is that someone can be manipulated without his or her knowledge to do something outside of the person's normal behavior. Although the other techniques tend to subtly alter behavior, they don't gen- erally result in an anomalous action by the user. If you think this is a kind of philosophical hairsplitting, you're right. The onus is upon the program- mer or multimedia designer to use these techniques with great care. In a past professional life I wrote computerized patient interviews for the psychiatry department of the University of Wisconsin. Researchers there and elsewhere found that peo- ple were generally more candid with the computer than they were with human clinicians. So the findings of Reeves and Nass were not quite as surprising to me as they might be to others. What did surprise me, howev- er, is that the media equation is not a phenomenon solely of the nai"ve or inexperienced media and computer users. On the contrary, all of us, no matter how conversant we are with underlying technology, are suscepti- ble to the effects described in The Media Equation. This vastly increases the power of computer programs and other media for both good and ill. I want to emphasize that not all of the possible effects of human- media interaction are pernicious. Most are simply innocuous, and if techniques that benefit users can result from these effects there should be no harm in applying them in soft- ware or multimedia. In general, it's desirable to make user experiences of software and media pleasanter and more productive, and Reeves and Nass do an excellent job of providing pointers throughout the book. There are suggestions with regard to per- sonality, emotion (including arousal), social roles, and form (e.g., image size, fidelity of sound, and video). None of them comes close to being as controversial as subliminal sugges- tion, although it continues to make me uncomfortable that people react to media as if they were dealing directly with other human beings. This is a disquieting finding, but it should not dissuade us from our jobs of designing good systems for users. All in all, Information Ecologies and The Media Equation are both first- rate books that belong in our libraries and on our professional bookshelves. Both provide method- ologies and techniques for making user interactions with automated systems a better experience, both in terms of accomplishing tasks effi- ciently and in terms of user satisfac- tion.-Tom Zillner Index to Advertisers Info USA Library Technologies, Inc. LITA cover 4 cover 3 cover 2, 2 56 INFORMATION TECHNOLOGY AND LIBRARIES I MARCH 2000 10078 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Editorial: I inhaled Helmer, John F Information Technology and Libraries; Jun 2000; 19, 2; ProQuest pg. 59 Editorial: I Inhaled T his editorial introduces the third special issue of Information Technology and Libraries dedicated to library consortia, and the second primarily aimed at surveying consortial activities outside the United States. 1 The concept of a special consortial issue began in 1997 as an outgrowth of a sporadic and wide-ranging dis- cussion with Jim Kopp, editor of ITAL 1996-98. At the time, Jim and I were involved in the creation and matu- ration of the Orbis consortium in Oregon and Washington. Jim was a member and later chair of the governing council and I was chief volunteer staff person and finding myself increasingly absorbed by consortial work. Our discussions lasted more than a year and were sustained by many e-mail messages and several enjoy- able conversations over bottles of Nut Brown Ale. In the mid-1990s it seemed obvious that we were wit- nessing the beginning of a renaissance in library consor- tia. Consortia had been around for many years but now established groups were showing renewed vigor and new groups seemed to be forming every day. Why was this happening? What were all these consortia doing? Jim and I discussed these questions and speculated on future roles for library consortia and their impact on member libraries. Library consortia seemed an ideal topic for a special issue of ITAL. My initial goal as guest editor of ITAL was to take a snapshot of a variety of consortia and begin to better understand the implications of the explosive growth we were witnessing. While assembling the March 1998 issue I soon realized that consortia were all over the map, both figuratively and literally. A small amount of study revealed a tremendous variety of consortia and a truly worldwide distribution. Although American consortia were starting to receive attention in the professional liter- ature, a great deal of important work was occurring abroad. This realization gave rise to the September 1999 issue and the present issue dedicated to consortia from around the world. In addition to six articles from the United States, these three special issues of ITAL include contributions from South Africa, Canada, Israel, Spain, Australia, Brazil, John F. Helmer China, Italy, Micronesia, and the United Kingdom. Taken together these groups represent a dizzying array of organizing principles, membership models, governance structures, and funding models. Although most are geo- graphically defined, the type of library they serve also defines many. Virtually all license electronic resources for their membership but many offer a wide variety of other services including shared catalogs, union catalogs, patron-initiated borrowing systems, authentication sys- tems, cooperative collection development, digitizing, instruction, preservation, courier systems, and shared human resources. Each consortium is formed by unique political and cultural circumstances, but a few themes are common to all. It is clear that the technology of the Web, the increas- ing importance of electronic resources, and advances in resource-sharing systems have created new opportuni- ties for consortia. Beyond these technological and eco- nomic motivations, I believe that in consortia we see the librarian's instinct for collaboration being brought to bear at a time of great uncertainty and rapid change. Librarians often forget that as a profession we collaborate and cooperate with an ease seldom seen in other endeav- ors. There is safety in numbers and in uncertain times it helps to confer with others, spread risk over a larger group, and speak with a collective voice. Library consor- tia fulfill these functions very well and their future con- tinues to look bright. As I conclude my duties as guest editor I would like to thank Jim Kopp for sparking my interest in this project and for several years of stimulating conversation. Special thanks are due to managing editors Ann Jones and Judith Carter as well as the helpful and professional staff at ALA Production Services. Obstacles of language and time dif- ferences make composing and editing a publication such as this unusually challenging. The quality and cohesive- John F.Helmer(jhelmer@darkwing.uoregon.edu) is Executive Director, Orbis Library Consortium. PRODUCTION: ALA Production Services (Troy D. linker, Kevin Heubusch; Ellie Barta-Moran, Angela Hanshaw, and Karen Sheets), American library Association, 30 E. Huron St., Chicago, IL 60611. Publication of material in Infornrntion Trclz110logy and Libraries does not constitute official endorsement bv LITA or the ALA. . Abstracted in Computer & /11jtJ1·11wtwn Systems, Compllting Rn 1icws, il~{ormation Science Abstracts, Library [-r lnforlllatio11 Science Abstracts, Rtfrrati'unyi Zlwrnal, I\Iauclmaya i Tckfrniclzeskaya l11fon11atsiya, Otdyclnyi Vyp11sk, and Science Abstracts Pu{J/icnticms_ Indexed in Co111pu1\r!nth Citation lndcx, Comptdcr Contents, Co111putcr Litaaturc lndc:r, Current Contc11ts/Healtl1 Scn.·iccs Admi11istratio1l, Current Ccmtcnfs/Social Bclwuioral Scic11ces, C11rrcnt Index to Journals in Education, Education, Library Literature, A1agazinc JndcJ:, NcwScarcl1, and Social Sciences Citation Index. Microfilm copies available to sub- scribers from University Microfilms, Ann Arbor, Michigan. for Information Sciences-Permanence of Paper for Printed library Materials, ANSI 239.48-1992.= Copyright ©2000 American Library Association. All material in this journal subject to copyright by ALA may be photocopied for the noncommercial purpose of scientific or educational advancement granted by Sections 107 and 108 of the Copyright Revision Act of 1976. For other reprinting, photo- copying, or translating, address requests to the ALA Office of Rights and Permissions. The paper used in this publication meets the mini- mum requirements of American National Standard EDITORIAL 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ness of these issues of ITAL are due in large measure to the efforts of these individuals. In Inhaling the Spore, the editorial introduction to the first special consortial issue, I compared a librarian's involvement in consortia to the Cameroonian stink ant's inhalation of a contagious spore. The effect of this spore is featured in Mr. Wilson's Cabinet of Wonder, Lawrence Weschler's remarkable history of the Museum of Jurassic Technology. 2 Weschler explains that, once inhaled, the spore lodges in the brain and "immediately begins to grow, quickly fomenting bizarre behavioral changes in its ant host." Although the concept of a consortial spore is somewhat extreme (or "icky" according to my nine-year- old daughter) the editorial was an accurate reflection of my own sense of being inexorably drawn into a consor- tium-drawn not so much against my will but as a will- ing crazed participant. At the time I was nominally working for the University of Oregon Library System and vainly trying to keep consortial work in perspective. 60 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 By the time of my second editorial, Epidemiology of the Consortia/ Spore, I was exploring consortia around the world but still laboring under the illusion that I could keep my own consortium at arm's length. I must have failed since, as of this writing, I have left my position at the UO and now serve as the executive director of the Orbis Library Consortium. Like the Cameroonian stink ant, I have inhaled the spore and am now happily labor- ing under its influence. References and Notes 1. See ITAL 17, no. 1 (Mar. 1998) and ITAL 18, no. 3 (Sept. 1999). 2. Lawrence Weschler, Mr. Wilson's Cabinet of Wonder (New York: Vintage Books, 1995). The Museum of Jurassic Technology (www.mjt.org) is located in Culver City, Calif. See www.mjt.org/ exhibits/stinkant.html for more on the Cameroonian stink ant. 10079 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Electronic library for scientific journals: Consortium project in Brazil Rosaly Favero Krzyzanowski;Taruhn, Rosane Information Technology and Libraries; Jun 2000; 19, 2; ProQuest pg. 61 Electronic Library for Scientific Journals: Consortium Project in Brazil Making information available for the acquisition and transmission of human knowledge is the focal point of this paper, which describes the creation of a consortium for the 1111iversity and research institute libraries in the state of Sao Paulo, Brazil. Through sharing and coop- eration, the project will facilitate information access and minimize acquisition costs of international scien- tific periodicals, consequently increasing user satisfac- tion. To underscore the advantages of this procedure, the objectives, management, and implementation stages of the project are detailed, as submitted to the Research Support Foundation of the State of Sao Paulo (FAPESP). I Production, Organization, and Acquisition of Knowledge In 1851, predicting the imminent growth in information, which in fact exploded in volume one hundred years later, Joseph Henri of the Smithsonian Institute voiced his opinion that the progress of mankind is based on research, study, and investigation, which generate wis- dom, knowledge or, simply , information. He stated that for practically every item of interest there is some record of knowl edge pertinent to it, "and unless this mass of information be properly arranged, and the means fur- nished by which its content may be ascertained, literature as well as science will be overwhelmed by their own unwieldy bulk. The pile will begin to totter under its own weight, and all the additions we may heap upon it will tend to add to the extension of the base, without increas- ing the elevation and dignity of the edifice." 1 At the threshold of the twenty-first century, these words become more self-evident by the day. There are enormous archives of knowledge from which people extract parts, allowing them to advance and progress in science, technology, and the humanities. Until some decades back, recovery from these archives was essen- tially a manual task consisting of written work and organization. Today's technologies provide auxiliary tools to transmit this knowledge . Although information is a cultural and social asset, it now is purchased at high prices . Making these enormous archives available in a clear and organized manner by using the proper technology is currently the greatest challenge for all those involved in knowledge manage- ment-the production , organization, and transmission of information. Rosaly Favero Krzyzanowski Rosane Taruhn I The Advent and Implications of Electronic Publications Among the major contributions of the industrial era, out- standing are the evolution and growth of information publi shing and printing facilities that use tools to record, store, and distribute information. In the last ten years, the first steps were taken toward the storage and reproduc- tion of sounds and images in new multimedia formats. Technological advances also have brought new pos- sibilities in accessing and disseminating information . Electronic publishing has been particularly effective in accelerating access and contributing to the generation of additional knowledge; consequently, an exponential increase in data has taken place, most notably in the sec- ond half of the twentieth century. Current journals num- bered about 10,000 at the beginning of the century; by the year 2000 the number had reached an estimated 1 million. 2 As a result, specialized literature has been warning about a possible crisis in the traditional system of scien- tific publications on paper . In addition to the difficulty of financing the publication of these works, the prices of subscriptions to scientific periodicals on paper have been rising every year. At times, this makes it impracticable to update collections in all libraries, which interferes sub- stantially in development. On the other hand, access to electronic scientific pub- lications via Internet is proving to be an alternative for maintaining these collections at lower cost. It also pro- vides greater agility in publishing and distributing the periodical, and in the final user's accessing of the infor- mation. Due to this, it is important that institutions that wish to support and promote research developed by their scientific communities facilitate access to these publica- tions on electronic media . To paraphrase Line, we can say that although pub- lishers are still uncertain as to all the aspects of transmit- ting information electronically, because authors and institutions will be increasingly able to distribute their works on the Web without the direct involvements of publishers, there is an escalation in electronic publica- tions being published by scientific publishers.3 Rosaly Favero Krzyzanowski is Technical Director of the Integrated Library System of the University of Sao Paulo- SIBi/USP, Brazil. Rosane Taruhn is Director of the Development and Maintenance of Holdings Service of the Technical Department of the University of Sao Paulo-SIBi/USP, Brazil. ELECTRONIC LIBRARY FOR SCIENTIFIC JOURNALS I KRZYZANOWSKI ANDTARUHN 61 ! Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Physical Figure 1. Infrastructure Resources for Consortium Formation Line also savs that one of the reasons for the growth in the number o'f electronic publications is "that it is tech- nically possible to make them [journals] accessible in this way, and in fact easy and cheap, since nearly all te_xt ~oes through a digital version on the way to pubh~ahon. Secondly, journal publishers believe that electronic ve~- sions provide a second market in addition to that for t~eir printed versions, or at least in an expanded market, since many users will be the same." 4 . . . . . It is important to point out that the sC1enhhc penod1- cal, be it paper or electronic, must ensure market valu_e and academic community receptivity, have a staff quali- fied for scientific publishing, be consistent in publishing release dates, comply with international standards, and use established distribution and sales mechanisms. 5 Line goes further: "Electronic publication as an_ 'extra' to printed publication has few added costs of J~urnal publication other than those of printing, and pubhshe~s are not going to want to make less money fro~ elect~onic journals than they do from printed ones. While p~inted journals once acquired can be used and reused without extra cost, each access to an electronic article has to be paid for. And although the costs of storage and binding may be saved, these are offset by the costs of printing out."6 He then notes that this technology demands an active equipment and telecommunication infrastructure. Another point he addresses is the need for users to master the search strategies required to efficiently recover information, thus reducing the time spent and costs. In turn, Saunders points out that, depending on the contracts made with the publishers or their agents: 62 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 libraries, through their development, formation, and maintenance policies, should be receptive to this tran- sition by accommodating the different means of com- munication to the different user needs and striving for a new balance. These policies should certainly stress the cooperation and sharing of remote access to the information demanded. Budget estimates should, therefore, foresee, in addition to the subscriptions to electronic titles with complete texts, other possible items like licensing rates for multi-user remote access and the right to copy articles on electronic media to paper, depending on the contracts made with the pub- lishers or their agents.7 I Electronic Publication Consortiums Catering to mutual interests by setting up a library con- sortium to select, acquire, maintain, and preserve elec- tronic information is one means of reducing or sharing costs as well as expanding the universe of information available to users and ensuring a successful outcome. Resources-physical, human, financial, and elec- tronic-are combined for the common good; in this case, the consortium, as shown in figure 1, which was extracted and adapted from an OCLC Institute. 8 The consortium presupposes invigoration of coopera- tive activities among member libraries by promoting the central administration of electronic publication databases as part of a shared library system visible to all and replete with access facilities. In addition to putting in place sim- plified, reciprocal lending progra~s and spu_rring _the cooperative development of collections and the~r st~nng, the consortium has the objective of implementing infor- mation distribution by electronic means, provided that copyright and fair use rights are complied _wi~h.9 On t~e other hand, "the research library community is commit- ted to working with publishers and database producers to develop model agreements that deploy lice~ses that d? not contract around fair use or other copynght provi- sions. In this way, one seeks to insure the library practices being disseminated, especially interli?~ary lendi~g."'. 0 Experience shows that acqumng ~ubhcahons through consortia has brought great benefits and has equally favored different size institutions that would not be able to afford single subscriptions, whether on paper or in electronic format. North American and European universities have been opting for this type of alliance to augment inve~tment cost-benefit. Important examples of these consortia cur- rently operative are: • Washington Research Library Consortium, Washington, D.C., www.wric.org; Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • University System of Georgia, Galileo Project, www.galileo.peachnet.edu; • Committee on Institutional Cooperation, Mich- igan, www.cedar.cic.net/ cic; and • Ohio Library and Information Network, Ohio Link, www.ohiolink.edu. I The Electronic Consortium in the State of Sao Paulo Considering that Brazilian institutions also are being affected by the high cost of maintaining periodical collec- tions and that alternative means of distributing this infor- mation are available, the model used abroad has shown itself as appropriate for developing the International Scientific Publications Electronic Library in the state of Sao Paulo. The location has a favorable information infra- structure available, particularly that of the electronic net- work of the Academic Network of Sao Paulo (ANSP), thanks to the support of the Research Support Foundation of the State of Sao Paulo (FAPESP). 11 Growing user demand for direct, convenient access to information in the state of Sao Paulo also was a factor in location choice. The final decision was to compose the consortium of five Sao Paulo state universities- Universidade de Sao Paulo (USP), Universidade Estadual Paulista (UNESP), Universidade de Campinas (UNI- CAMP), Universidade Federal de Sao Carlos (UFSCAR), and Universidade Federal de Sao Paulo (UNIFESP)-as well as the Latin American and Caribbean Center for Health Science Information (BIREME). The consortium's goal was to make available to the member institutions' entire scientific community-10,492 faculty and researchers -rapid access to the complete, updated texts of the Elsevier Science scientific journals. This publishing house, an umbrella for North Holland, Pergamon Press, Butterworth-Einemann, and Excerpta Medica, presently publishes electronic versions of its journals. Selection of the member institutions that would serve as a pilot group for this project was based on prior expe- rience with the cooperative work in preparing the Unibibli Collective Catalog CD-ROM, which, using Bireme/OPAS/OMS technology, consolidates the collec- tions of these three universities. The project was initially funded by the FAPESP; since its fourth edition the CD- ROM has been published through funds provided by the universities themselves, by means of a signed agreement. Moreover, choice of Elsevier Science, which would be justified solely by its premier ranking in the global pub- lishing market, also is due to the fact that consortium member institutions maintain subscriptions to a great number (606) of this publishing house's titles on paper. Already fully available on electronic media, these titles are components of a representative collection initiating the building of the International Scientific Publications Electronic Library in the state of Sao Paulo. Furthermore, the majority of the titles are studied on the Institute of Scientific Information's Web of Science site, which has been at the disposal of researchers and libraries in the state of Sao Paulo since 1998. Consortium Objectives The consortium was formed to contribute to the develop- ment of research through the acquisition of electronic publications for the state of Sao Paulo's scientific com- munity. Using the ANSP Network, in addition to aug- menting and speeding up access to current scientific information in all the member institutions, will: • increase the cost-benefit per subscription; • promote the rational use of funds; • ensure continuous subscription to these periodicals; • increase the universe of publications available to users through collection sharing; • guarantee local storage of the information acquired and thus ensure the collection's mainte- nance and its continual use by present and future researchers; and • develop the technical capabilities of the personnel of the state of Sao Paulo institutions in operating and using electronic publication databases. Initially, the project will not interfere in the current process of acquiring periodicals on paper and in distrib- uting collections in member institutions. However, as electronic collection utilization becomes predominant, duplicate subscriptions to paper may be eliminated so as to allow new subscriptions to be available to the consor- tium at no additional cost. Implementation of the Electronic Library for International Scientific Publications Implementation of this project includes the following stages already achieved: • constitution of the consortium by the six member institutions; and • set up of an administrative board. The following stages are in progress: • purchase of hardware (central server) and a soft- ware manager; and • estimate for the installation of the operational system. ELECTRONIC LIBRARY FOR SCIENTIFIC JOURNALS I KRZYZANOWSKI AND TARUHN 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. BIREME Server FAPESP Server Full-Text Database r----------.,- 1 Full-Text 1 t International i r Database 1 ~----------~ Web of Science .... •--•.. : Scientific : : Current : : Contents : SCIELO : Periodical : 1 Electronic 1 I L'b I 1 1 rary 1 .. __________ .. : Connect : I (CCC) I I I ., __________ ., \/ Universe • Web of Science: 8,000 titles • CCC: 9,000 titles Users in consortia institutions • SCIELO (Scientific Electronic Library Online): 100 titles • International Scientific Periodical Electronic Library: 606 titles Figure 2. Reference database and full-text interconnectivity to optimize information access And the following stages are planned: • training for qualified personnel and maintenance of the infrastructure built up; • acquisition and implementation of the electronic library on the central server; and • permanent utilization assessment. The pilot project proposes that the central server, for storage and availability of electronic scientific periodical collections on the ANSP network, be located at FAPESP in order to facilitate development of an electronic bank. In the future, the bank should, in addition to the collec- tion in mind for the project, include international collec- tions of other publishing houses: the Scielo collection of Brazilian scientific magazines (Project FAPESP /Bireme) as well as the Web of Science and Current Contents Connect reference databases (see figure 2). Consortium Management The electronic library will be administrated by the con- sortium's administrative board, made up of a general coordinator, an operations coordinator, and directors and coordinators of the library systems and central libraries of member institutions as well as consultants recom- mended by FAPESP. The administrative board shall be in charge of the implementation, operation, dissemination, and assess- ment of electronic library utilization. It also is charged 64 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 with supervising qualified personnel training in order to guarantee the success of the project. An agreement speci- fying the consortium objective, its constitution, the man- ner by which it shall be executed and consortium member obligations established was signed. Shortly, a contract to use Elsevier Science electronic publications shall be signed by FAPESP and by the provider. The agreement's documents and use license were drawn up in compliance with the principles for licensing electronic resources recommended by the American Library Association, published in final version at the 1997 American Library Association Annual Conference.12 I Recovery System and Information Use Evaluation Research on electronic media suggests that use of a single software program that offers different strategies and forms of interacting for searching the collections requires an evaluation of the efficiency of individual research strategies. This evaluation is critical for preparation of guidelines that orient the choice of systems and proper training programs.13 For the electronic library, the challenge of measuring not only the amount of file use but also the efficacy and efficiency of its information access systems and training for its users is an imperative task. In the project Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. described, evaluation shall be made by indicators that demonstrate use of the electronic library and of the col- lections on paper, per journal title, subject researched, user institution, number of accesses per day, and user sat- isfaction regarding service provided (interface, response time, text copies), among other factors to be studied. I Final Remarks The way in which electronic media are read by the users is a code far beyond the written, because sound and image are being added increasingly. In this first genera- tion of electronic publications, FAPESP supported avail- ability of Web of Science and of Scielo by FAPESP and the creation of the International Scientific Publications Electronic Library in the state of Sao Paulo. The possible introduction of Current Contents Connect will trigger an extraordinary leap in research development, facilitating the access of scientific information and the acquisition and transmission of human knowledge as well as enhancing the cooperative and sharing enterprise of member libraries. References and Notes l. Annual Report of the Board of Regents of tile Smit/zsonum Institution ... During the Year 1851 (Washington, D.C. 1852), 22. 2. Leo Wieers, "A Vision of the Library of the Future," in Developing the Library of the Fut11re: The Tilb11rg Experience, H. Geleijnse and C. Grootaers, eds. (Tilburg, The Netherlands: Tilburg Univ., 1994), 1-11. 3. M. B. Line, "The Case for Retaining Printed LIS Journals," !FLA Journal 24, no. 1 (Oct./Nov. 1998): 15-19. 4. Ibid. 5. R. F. Krzyzanowski, "Administra<;ao de Revistas Cientificas," in Re11niiio Anual da Sociedade de Pesquisa Odonto/6gica, Aguas de Sao Pedro, 14, 1997. (Lecture) 6. Line, "The Case for Retaining Printed LIS Journals." 7. L. M. Saunders, "Transforming Acquisitions to Support Virtual Libraries," Information Teclmology and Libraries 14, no. 1 (Mar. 1995): 41-46. 8. OCLC Institute, OCLC Instit11te Seminar: Information Tec/znology Trends for thl' Global Library Cormmmity, 1997, Ohio (Dublin, Ohio: OCLC Institute/The Andrew W. Mellon Foundation/Funda<;ao Gettilio Vargas/Bibliodata Library Network, 1997). 9. A definition of fair use is the "legal use of information: permission to reproduce texts for the purposes of teaching, study, commentary or other specific social purposes." Found in J. S. D. O'Connor, "Intellectual Property: An Association of Research Libraries Statement of Principles." Accessed July 28, 1999, http://arl.cni.org/ scomm/ copyright/ principles. html. 10. Statement of Current Perspective and Preferred Practices for the Selection and Purchase of Electronic Information. ICOLC Statement on Electronic Information. Accessed July 2, 1998, www.library.yale.edu/ consortia/statement.html. 11. R. F. Krzyzanowski and others, Biblioteca Eletr6nica de Publicac;oes Cientfficas Internacionais para as Universidades e Institutos de Pesquisa do Estado de Sao Paulo. Sao Paulo, 1998 (project presented to FAPESP-Fundac;ao de Amparo a Pesquisa do Estado de Sao Paulo). 12. B. E. C. Schottlaender, "The Development of National Principles to Guide Librarians in Licensing Electronic Resources," Library Acquisitions-Practice and Theory 22, no. 1 (Spring 1998): 49-54. 13. W. S. Lang and M. Grigsby, "Statistics for Measuring the Efficiency of Electronic Information Retrieval," Journal of the American Society for Information Science 47, no. 2 (Feb. 1996): 159-66. ELECTRONIC LIBRARY FOR SCIENTIFIC JOURNALS I KRZYZANOWSKI AND TARUHN 65 10080 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. China Academic Library and Information System: An academic library consortium in China Dai, Longji;Chen, Ling;Zhang, Hongyang Information Technology and Libraries; Jun 2000; 19, 2; ProQuest pg. 66 China Academic Library and Information System: An Academic Library Consortium in China Longji Dai, Ling Chen, and Hongyang Zhang Since its inception in 1998, China Academic Library and Information System (CALIS) has become the most important academic library consortium in China. CALIS is centrally funded and organized in a tiered structure. It currently consists of thirteen management or informa- tion centers and seventy member libraries' 700,000 stu- dents. After more than a year of development in information infrastructure, a CALIS resource-sharing network is gradually taking shape. L ike their counterparts in other countries, academic libraries in China are facing such thorny problems as shrinking budgets, growing patron demands, and rising costs for purchasing books and subscribing to periodicals. It has thus become increasingly difficult for a single library to serve its patrons to their satisfaction. Under these circumstances, the idea of resource sharing among academic libraries was born. Library consortia provide an organizational form for libraries to share their resources. The Georgia Library Learning Online (GALILEO), the Virtual Library of Virginia (VIVA), and OhioLINK are among the well- known library consortia in the United States.I Traditionally, the primary purpose of establishing a library consortium is to share physical resources such as books and periodicals among members. More recently, however, advances in computer, information, and telecommunica- tion technologies have dramatically revolutionized the way in which information is acquired, stored, accessed, and transferred. Sharing electronic resources has rapidly become another important goal for library consortia. I What Is CALIS? In May 1998, as one of the two public service systems in "Project 211," the China Academic Library and Information System (CA LIS) project was approved by the State Development and Planning Commission of China after a two-year feasibility study by experts from aca- demic libraries across the country. CALIS is a nationwide academic library consortium. Funded primarily by the Chinese government, it is Longji Dai is Director, Peking University Library, and Deputy Director, CALIS Administrative Center; Ling Chen is Deputy Director, CALIS Administrative Center; and Hongyang Zhang is Deputy Director, Reference Department, Peking University Library. 66 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 intended to serve multiple resource-sharing functions among the participating libraries-including online searching, interlibrary loan, document delivery, and coor- dinated purchasing and cataloguing-by digitizing resources and developing an information service network. I Structure and Management of CALIS A library consortium is an alliance formed by member libraries on a voluntary basis to facilitate resource shar- ing in pursuit of common interests. Whether a consor- tium can operate successfully depends in large part on how it is managed. CALIS differs from library consortia in the United States in that it is a national network. It resembles multi- state consortia in the United States with respect to geo- graphic distribution of member libraries, but it is like tightly knit or even centrally funded statewide ones in terms of management.2 The CALIS members are distributed in twenty-seven provinces, cities, and autonomous regions in China, mak- ing an entirely centralized management difficult. After surveying some of the major library consortia in the United States, Europe, and Russia, CALIS adopted an organizational mode characterized by a combination of both centralized and localized management-that is, a three-tiered structure (figure 1). In order to improve the management efficiency and maximize the sharing of various resources including funds, CALIS has established a coordination and man- agement network comprising one national administra- tive center (which also serves as the North Regional Center), five national information centers (see table 1) and seven regional information centers (see table 2). The thirteen centers are maintained by full-time staff mem- bers provided by the libraries in which these centers are located. The National Administrative Center (located in Peking University)-overseen by officials from the con- cerned office at the Ministry of Education and the presi- dents of Peking and Tsinghua Universities and advised by an advisory committee consisting of experts from major member libraries-is responsible for the construc- tion and management of CALIS, makes policies and reg- ulations, and prepares resource-sharing agreements. The center has an office handling routine management needs and several specialized work groups overseeing CALIS' national projects, such as those for the development of databases for union catalogues, current Chinese periodi- cals, and CALIS' service software. Under the guidance of the National Administrative Center, five national information centers are each respon- sible for building and maintaining an information system Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in one of five general areas-humanities, social science, and science; engineering and technology; agriculture and forestry; medicine; and national defense-in coordina- tion with regional centers and member libraries. The host libraries where these centers are located possess rela- tively abundant collections in their respective areas. These centers, which are intended to be information bases that cover all major disciplines of science, are responsible for importing databases for sharing and con- structing resource-sharing networks among member libraries and for providing searching and document delivery services to member libraries. 5 national information centers 8 regional information centers 70 member libraries Depending on their location, academic libraries in China are divided into eight groups, with each forming a regional library consortium. Each regional consortium is overseen by a regional management center, except for the consortium in the north, which is directly managed by the national management center. The regional centers not only participate in nationwide projects in coordination with the national centers and other Figure 1. The Three-Tiered Structure of CALIS regional centers, but they also are responsible for promoting coopera- tion among libraries in their particu- lar regions. All the centers are located in member universities and staffed by the host universities. The concerned vice president or library director of a host university is in charge of the associated center. The regional centers also are assisted by regional coordina- tion committees and advisory com- mittees of provincial and municipal officials in charge of education; uni- versity presidents; library directors; and senior librarians in the concerned Table 2. Table 1. Five National Information Centers Areas of specialization Humanities , Social Science and Science Engineering and Technology Agriculture and Forestry Medicine National Defense Location Peking University, Beijing Tsinghua University , Beijing China Agricultural University, Beijing Beijing Medical University, Beijing Haerbin Industrial University, Haerbin, Heilongjiang Regional Information Centers and Areas of The ir Jurisdiction Name National Administrative Center Southeast (South) Regional Center Southeast (North) Regional Center Central Regional Center South Regional Center Southwest Regional Center Northwest Regional Center Northeast Regional Center Location Beijing Shanghai Nanjing Wuhan Guanzhou Chengdu Xi'an Jilin Areas of juristiction Beijing, Tianjin , Hebei, Shanxi, and Inner Mongolia Shanghai, Zhejiang, Fujian, and Jiangxi Jiangsu, Anhui, and Shandong Hubei, Hunan, and Henan Guangdong, Hainan, and Guangxi Sichuan, Chongqing, Yunnan, and Guizhou Shanxi, Gansu, Ningxia, and Xinjiang Jilin, Liaoning, and Heilongjiang CHINA ACADEMIC LIBRARY AND INFORMATION SYSTEM I DAI, CHEN, AND ZHANG 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. regions. These committees serve a coordinating role in the regions. I Funding The development and operation of CALIS has been funded in large part by the Chinese government. The sources of funding for CALIS at the present time are as follows: • Government grants. Much of the funds for the CALIS project during the first phase of construction came from the government. Because of the demonstrated benefits of the ongoing project, it is expected that the government will provide funds for the second phase of CALIS construction. These government funds have been used in the purchase of software and hardware for the CALIS centers and commer- cial databases, development of service software and databases, training of staff members, etc. • Local matching funds. According to prior agree- ments, a province or city that desires to have a regional center is required to provide funds in sup- plementation to the government funds for the con- struction of its local center. • Member library funds. These funds, primarily derived from the university budgets, have been used to purchase electronic resources and cover the expenses incurred from the use of the CALIS service software platforms. Although CALIS is currently funded by the govern- ment, the future expansion and operation of the system is expected to rely in large part on other sources of fun_ds. The funding needs for CALIS may be met by operating the system in a commercial mode. I Principles for Cooperation among Members The successful operation of a library consortium clearly depends on good working relationships among members and between members and the consortium. At CALIS, all members are required to adhere to a set of principles (see below) in dealing with these relationships. It is based on these principles, known as the CALIS Princ~ples for Cooperation among Members, that CALIS pohc1es and rules are made. • The common interests of CALIS are above those of individual member libraries. 68 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 • • • • Member libraries should not cooperate at the expense of the interests of others. CALIS provides services to member libraries for no profit. Member libraries are all equal and enjoy the same privileges. Larger member libraries are obliged to make more contributions. I What Has Been Achieved? When it was first established, CALIS had sixty-one mem- ber libraries from major universities participating in "Project 211." Later, as many other major universities were interested in joining the alliance, the number of CALIS members has climbed to seventy. At the present, CALIS serves about 700,000 students. Construction of CALIS is a long-term, strategic undertaking. The system provides service functions as they become available and is constantly being improved in the process. In the first phase (1998 to 2000) of the proj- ect, CALIS successfully started the following informa- tion-sharing functions in its member libraries: • primary and secondary data searching; • interlibrary borrowing and lending; • document delivery; • coordinated purchasing; and • online cataloguing. The following tasks have been completed: • purchase of computer hardware (e.g., SUN E~S00); • construction of a CERN et- or Internet-based infor- mation-sharing network connecting academic libraries across the country; and • group purchase of databases, such as UMI, EBSCO, EI Village, INSPEC, Elsevier, and Web of Science, that are shared among member libraries either directly online or indirectly through requested service/ document delivery. CALIS also has completed development of a number of databases, including: • • Union Catalogues. These databases currently con- tain 500,000 bibliographic records of the Chinese and Western language books and periodicals in all member libraries. Dissertation Abstracts and Conference Proceedings . These databases now contain abstracts of doctoral dissertations (12,000 bibliographic records) and pro- ceedings of national and international conferences (8,000 records) collected from more than thirty Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. memb er librarie s. The databa ses are expected to ha ve 40,000 record s in total by the end of 2000. • Current Chinese Periodicals. Th ese databases (5,000 titl es, 1.5 milli on bibliographic records) cont ain cont ents and indexes of current Chinese pe rio di- cals from about thirt y member libraries. • Key Disciplines Databases. CALIS has sponsor ed the de ve lopment of twe nt y- fiv e di scip line-sp eci fic d a tabases by m em ber librarie s. Each of thes e dat a- bc.ses contains about 50,000 to 100,000 record s. The first three class es of databases are prepared in the USMARC, UN IMARC, or CCFC format for the ease of u se b y patron s and ca ta loguing s taff and in data exchang e. Clients from member librari es may perform a Web-ba sed sear ch of th e above databa ses. Most of th em contain secondary docum en ts and ab str acts, and access CALIS onl ine resources using brows ers. Deve lopm ent of sofhvare platform s includes the fol- lowing: • Cooperative online cataloguing systems. The syst ems includ e protocol 239.50-based searc h and upl oad - in g serve rs and terminal softw are platforms for cataloguing staff. Acquisition and ca taloguin g staff in each memb er library m ay participate in cooper- ativ e online cataloguing using the terminal sof t- ware platform s on their local sys tem . Th e sys tems have been u sed for the devel opment and operation of the union catalogue databa ses. • Systems for database development. These syst ems can be used in the de velopment of shared databa ses con- taining secondary data informati on in USMARC, UNIMARC, CCFC, or Dublin Core format. The sys- tems for dat abase developm ent in the USMARC, UNIMARC, or CCFC format s are equipp ed with a search server based on the 239.50 protocol to permit use by catalo guing staff and for data exchange . • A n interlibrary loan system. The sys tem, d eve loped base d on the ISO10160/10161 protocol, consists of ILL protocol machines and clien t terminal s. These sys tems, locat ed in memb er libr aries, are interco n- nected to form a CALIS interl ibrar y loan n etw ork. Primar y document deliv ery sof tware bas ed on the FTP protocol also has been developed for the de livery of scann ed docum ent s between libr aries. • An OPAC system. The system has both Web /239. 50 a nd Web / ILL ga teways . Patron s may visit the sys- tem using co mmon brow sers , sea rch all CALIS NEW! LITA Publications Getting the Most Out of Web-based Surveys by David Ward • 2000 $20 ($18 LITA members) ISBN 0838981089 Surv eys help evalu ate user service s, rat e diff e r e nt librar y programs, facili- tat e n ee ds assess m ents , a id fa cul ty research , a nd mor e. Posti ng surv eys to the W eb provide s an easy and con- veni en t way to reach in ten ded aud i- Getting the Most Out of Web-Based Survey s enc es, cen tralizes data collection a n d gives librari a ns gre ater contro l over analyz ing and repor ting results . Thi s guide shows ho w to create r ob u st W eb-ba se d sur - veys, a nd t h e n gather a nd ass imil ate t h e ir da ta for u se in common database a nd spre adsh eet programs. Th e auth or h as applied th e techniques described in hi s own work and has desi gned both comm ercial and acad emic Web sites . Digital Imaging of Photographs: A Practical Approach to Workflow Design and Project Management by Lisa Macklin and Sarah Lockmiller• 1999 $20 ($18 LITA members) • ISBN 0838980058 A com pre hens ive app roach to man agement of digit al im ag ing in libr aries a nd archi ves , from archival nega tives to meta- data ca taloging a nd Web -base d access. Getting Mileage Out of Meta data: Applications for the Library by Jean Hudgins, Grace Agnew, and Elizabeth Brown 1999 • $22 ($19.20 LITA members)• ISBN 0838980066 An overview of the state-of-the-art metadata cataloging and curr ently ava ilabl e metadata standa rds, incl uding compr e- hen sive descr iption s an d links to current a pplications. Other LITA publications and a printable order form can be found at www.lita.org/litapubs/index.html. Fax orders to (312) 836-9958 or call 800-545-2433, press 7 (M-F, 8-5 CST). CHINA ACADEMIC LIBRARY AND INFORMATION SYSTEM I DAI, CHEN, AND ZHANG 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. databases, and send search results directly to the CALIS interlibrary loan service. Patrons also may access an ILL server through Web/ILL, tracking the status of submitted interlibrary loan requests, inquiring about fees, and so on. The databases that are centrally located and those that are distributed at various locations as well as service plat- forms in member libraries form a CALIS information service network. I Future Considerations In a period of just over a year, considerable progress has been made in forming a nationwide resource-sharing library consortium in China. However, because member libraries vary in size, available funds, staff quality, and automation level, CALIS has yet to realize its potential. There are a number of problems that remain to be solved. For example, the CALIS union catalogue databases do not work well on some of the old automation systems in member libraries and the CALIS service platforms are incompatible with a dozen automation systems currently in use; as a result, the union catalogues cannot tell the real-time circulation status in all member libraries, affect- ing interlibrary loan service. In addition, primary 70 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 resources are not sufficiently abundant. Therefore, the extent to which resources are shared among member libraries remains quite limited. In the next phase of development, CALIS will improve service systems (including hardware and soft- ware platforms) and the distribution of shared databases. At the same time, CALIS will develop more electronic resource databases and be actively involved in the research and development of digital libraries, expanding the scale and extent of resource sharing. References 1. Barbara A. Winters, "Access and Ownership in the 21st Century: Development of Virtual Collection in Consortia! Settings," in Electronic Resources and Consortia (Taiwan: Science and Technology Information Center, 1999), 163-80; Katherine A. Perry, "VIVA (The Virtual Library of Virginia): Virtual Management of Information, in Electronic Resources and Consortia (Taiwan: Science and Technology Information Center, 1999), 93-114; Delmus E. Williams, "Living in a Cooperative World: Meeting Local Needs Through OhioLINK," in Electronic Resources and Consortia, Ching-Chin Chen, ed. (Taiwan: Science and Technology Information Center, 1999), 137-61. 2. Jordan M. Scepanski, "Collaborating on New Missions: Library Consortia and the Future of Academic Libraries," in Proceedings of the International Conference on New Missions of Academic Libraries in the 21st Century, Duan Xiaoqing and He Zhaohui, eds. (Peking: Peking Univ. Pr., 1998), 271-75. 10081 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prospector: A multivendor, multitype, and multistate Western Union catalog Bush, Carmel;Garrison, William A;Machovec, George;Reed, Helen I Information Technology and Libraries; Jun 2000; 19, 2; ProQuest pg. 71 Prospector: A Multivendor, Multitype, and Multistate Western Union Catalog The Prospector project represents a unique union catalog. The origin, goals, and design of the union catalog that uses the INN-Reach system are presented. Challenges of the union catalog include the integration of records from libraries that do not use the Innovative Interfaces system and the development of best practices for participating libraries. T he Prospector project is a union catalog of sixteen libraries in Colorado and Wyoming built around the INN-Reach software from Innovative Interfaces, Inc. (III).1 In 1997, the Colorado Alliance of Research Libraries (the Colorado Alliance) and the University of Northern Colorado submitted a joint grant proposal to create a regional union catalog for many of the major academic and public libraries in the region. The project would allow users to view library holdings and circulation information with a single query of the central database. The union catalog also would allow patrons to request items from any of the participating libraries and have them delivered to a nearby local library. However, unlike many of the other union catalogs in the country, Prospector has several unique elements: • It is multistate (Colorado and Wyoming). • It is multisystem (incorporating systems from Innovative Interfaces and CARL Corporation; plans call for Voyager from Endeavor). • It is multi-library-type (academic, public, and spe- cial libraries). Regional union catalogs representing the cataloged collections of libraries that are related by geography, sub- ject, or library type have been extant for many years. Early leaders in the field spearheaded locally developed systems such as the University of California's MELVYL system and the Illinois Library Computer Systems Organization's (ILCSO) ILLINET Online system, which became operational in 1980.2 The commercial integrated library system market began to emerge in the late 1980s and the 1990s with such vendors as Innovative Interfaces and its work with OhioLink through its INN-Reach union catalog product, and the CARL System.3 Many major vendors now have union catalog solutions for a single physical union catalog, although most have the requirement that participating libraries all use the same integrated library system. An alternative approach that is also becoming popular, because of the heterogeneous nature of the ILS marketplace and the widespread imple- mentation of Z39.50, is for libraries to create virtual union catalogs through broadcast searching. This solution is available from many ILS vendors as well as through organizations such as OCLC and its WebZ software. Carmel Bush, William A. Garrison, George Machovec, and Helen I. Reed There is not a single "right" answer for whether regional catalog searching and document delivery is best accom- plished through a physical or virtual union catalog. Each solution has benefits and drawbacks that must be bal- anced against the mix of vendors, economics, politics, and technical issues within a state. Prospector is some- what unusual in that it does create a single physical union catalog but allows for the incorporation of other library systems, made possible through a published spec- ification from Innovative Interfaces. I Prospector History, Funding, and Project Goals Colorado has a long history of resource sharing through a variety of programs, including use of the Colorado Library Card statewide borrower's card and access to individual libraries' online catalogs through the Access Colorado Library Information Network (ACLIN) and other regional catalogs. The Colorado Alliance has taken a leadership role within the state in promoting coopera- tion among major academic and public libraries in the areas of automation, joint acquisitions, and other coop- erative endeavors. Existing online catalog software enabled patrons to easily search individual online cata- logs, but searching several catalogs was a tedious task requiring many steps. It has long been a goal of the alliance to have a true union catalog of holdings for all member libraries. To forward this goal, in 1997 the Colorado Alliance of Research Libraries and the University of Northern Colorado jointly applied for and received a grant from the Colorado Technology Grant and Revolving Loan Program to establish the Colorado Unified Catalog, a uni- fied catalog of holdings for sixteen of the major academic, public, and special libraries in Colorado.4 The University of Wyoming was included in the project through separate funding. The grant of $640,000 was used to develop a union catalog that would support searching and patron borrowing from a single database. The Colorado Alliance Carmel Bush (cbush@manta.library.colostate.edu) is Assistant Dean for Technical Services at the Colorado State University Libraries, Fort Collins; William A. Garrison (garrisow@ spot.colorado.edu) is Head of Cataloging at the University of Colorado at Boulder (Colo.) Libraries; George Machovec (gmachove@coalliance.org) is the Associate Director of the Colorado Alliance of Research Libraries, Denver; and Helen I. Reed (hreed@unco.edu) is Associate Dean, University of Northern Colorado Libraries, Greeley. PROSPECTOR I BUSH, GARRISON, MACHOVEC, AND REED 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and the University of Northern Colorado contributed an additional $189,500 of in-kind services to the unified cat- alog project. Additionally, the Colorado Alliance con- tributed $119,000 of in-kind funds to support purchase of distributed system software. The Colorado Unified Catalog project, later named Prospector, was based upon the INN-Reach software developed by Innovative Interfaces, Inc. It included all Innovative Interfaces sites in Colorado as of December 1996 as well as the CARL sys- tem sites that were members of the nonprofit Colorado Alliance of Research Libraries.s The Colorado Unified Catalog project had two major goals: • the development of a global catalog database con- taining the library holdings of the largest public and academic libraries in the region; and • the development of an automated borrowing sys- tem so that users at any of the participating libraries could easily request materials electroni- cally from any other participating libraries.6 The union catalog would allow users to view library holdings and circulation information on titles with a sin- gle query of the global database. Once titles were located, patrons could request available items and have them delivered to their home library. The grant proposal identified four major goals and outcomes of the project: access, equity, connections, and content and training. By creating a global catalog, the Colorado Unified Catalog project would provide stu- dents, faculty, staff, and patrons free and open access to the union catalog via the Internet. Patrons from all par- ticipating libraries would have equal access to the com- bined holdings of all sixteen participating libraries, thus greatly enhancing resources available to patrons without the necessity of travel across the state. Connectivity was greatly enhanced by the installation of high-speed Internet access in the Colorado Alliance office where the union catalog server was housed. The unified catalog project amassed, in one place, the complete cataloged col- lections of the major libraries in the region creating a sin- gle, easy-to-use public interface. Training for the catalog would be conducted in each library so that it could be integrated into the standard training and reference serv- ices of each participating library.? Addressing statewide goals for libraries, the Colorado Unified Catalog was designed to dovetail with an exist- ing project in Colorado called the Access Colorado Library and Information Network (ACLIN) in several ways. The goal of ACLIN was to provide statewide searching of several hundred library catalogs in Colorado through broadcast 239.50 searching. However, because of the large number of online library catalogs (too many Z39.50 targets cause broadcast searching to be slow) and 72 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 poor network infrastructure in some parts of the state, the creation of physical union catalogs, such as Prospector, greatly enhanced the ability for a project such as ACLIN to be successful. As stated in the grant proposal it will: • make ACLIN more efficient since sixteen libraries will be grouped together and can be accessed via a single search, thus saving ALCIN users steps in searching; • enhance ACLIN's document delivery plans since patrons can make requests themselves; • offer both Web and character interfaces for various levels of users; • provide access via ACLIN's dial-in ports as well as via the Internet; and • support ALCIN's future developments based on a 239.50 environment.s Work on the development of the Colorado Unified Catalog began in mid-1997. Even while contract negotia- tions were underway in mid- to late 1997, groups were busy undertaking discussions on the design and struc- ture of the unified catalog. Work on development of pro- filing and system specifications continued through July 1998. This data was entered onto the server at the Colorado Alliance office and a test database was created in August 1998. Testing was completed in November 1998 and the first records were loaded in December 1998. The creation of the database for the first twelve libraries took seven months. During the database load the catalog was available for searching, although most participating libraries did not highlight the system in their local OPACs. Innovative Interfaces, Inc. conducted training on the actual patron requesting and circulation functions at three sites over the period from May through July 1999. As of January 2000 the catalog included more than 3.6 million unique bibliographic records of the twelve largest libraries in Colorado (more than 6.6 million MARC records have been contributed, which has resulted in 3.6 million unique records after de-duplication). With the database in place and OPAC and circulation training complete, Prospector went "live" for patron-initiated requests in the first eight libraries on July 23, 1999. As of December 31, 1999, all twelve Innovative sites were "live" in Prospector. The final programming for loading the records from CARL-system sites will be completed in spring 2000. It is anticipated that CARL-system library records will be loaded in late spring 2000 and will bring the database to more than five million unique MARC records, with more than ten million item records. Since the receipt of the grant, two participating libraries have selected Endeavor as their online integrated system . Contract negotiations are underway between Innovative Interfaces and the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Colorado Alliance to come to an agreement on loading records for the Endeavor libraries into Prospector. I Politics and Marketing of Prospector Planning and policy making are inherently political processes in which participants choose among goals and options in order to make decisions and to direct actions. For Prospector the diverse makeup of multitype libraries and multisystems augured for different perspectives on implementation from the onset. Nearly every department in member libraries would have an impact from the proj- ect. To be successful in carrying out their charges, the work of the task forces appointed to implement Prospector had to address how these staff could influence the process and how local practices would be affected. The challenge was to engage staff in the process since the task force structure precluded representation from every member library. Meeting this challenge would be vital to ensuring input and fostering buy-in and advocacy for Prospector in member institutions. Consequently, in addition to reviewing standards or best practices and focusing on the goals stipulated in the grant, obtaining factual knowledge about member practices and resources and encouraging communications served as key ingredi- ents in planning and policy development. General Process Profiling Prospector, a main charge for the Cataloging/ Reference Task Force, illustrates the general process employed in planning and how key ingredients were applied to gain input and produce results. The first step involved the task force's review of the grant's aims for the unified catalog. With that framework as a basis, a plan- ning process was outlined and shared with participants. The Prospector Web site detailed the specification devel- opment process, including the schedule and opportuni- ties for input. Next the task force surveyed participants for informa- tion on their systems: bibliographic indexing rules, types of indexes, characters indexed in phrase indexes, indexes on which authority control performed, and suppliers of authority records. Using this data, the task force identi- fied the commonalties and differences to determine what to create in the unified catalog. Members also consulted Innovative Interfaces and reviewed what previous INN- Reach customers had established. Draft recommendations for indexing, indexes, record overlay, and record display specifications were then posted on the Prospector Web site and participants requested to review and provide input. A notice in Data/ink: The Alliance Newsletter (www.coalliance.org/ datalink) also referenced the site. At the same time, testing was performed using draft specifications in order to assess them and to check for other concerns that testing might reveal. Because of the importance of the recommendations, an open forum was held to receive additional comments. Following the forum, the task force members made final adjustments to the specifications. After the period for public comment ended, the spec- ifications were submitted as recommendations to the Prospector Steering Committee for approval. Once approved, the specifications became official and were ref- erenced in all site visits. Issues Because of the design of INN-Reach, participants must make decisions about contribution of records, priorities for what record would serve as the master record, order of loading, indexing, indexes, and displays for the unified catalog. Circulation functions require decisions about services for patron types, circulation statuses, loan peri- ods, numbers of loans, renewals, recalls, checkouts, holds, overdues, fines, notices, pick-up locations, and billing. In the case of Prospector, expectations regarding what would be controversial met with a few surprises. For example, the master record, the bibliographic record from one participating library to which holdings of other libraries are displayed, is based upon encoding level and the library priority list. The latter determines if the incoming record should replace an existing level; a record with a higher level will replace a lower one. Based upon the data collected from libraries, a proposal cate- gorized libraries into the following order: large, special, and "all others." The order was further factored by a member library's application of authority control and participation in Program for Cooperative Cataloging programs. The proposal drew minimal comment from libraries. Pride of ownership was not an obstacle. Everyone was committed to the fullest authorized form of the record. How many loans an individual could request was the subject of early debate. There were concerns about dis- crepancies between local limits for borrowing and the possible setting of a higher number of loans on Prospector. A corollary concern was that a high number might result in depleting a member library's collection. Previous experience with borrowing by a subset of mem- bers shed light on the issue; there were no problems with loan limits. In fact, INN-Reach supports "load leveling" across participating libraries randomly as well as by PROSPECTOR I BUSH, GARRISON, MACHOVEC, AND REED 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. precedence tables thus avoiding systematic checkout from one library only. Members decided that they could always pass a request on to another owning library if nec- essary and monitor loans to determine if any abuses would develop. With these options, it then became possi- ble to establish a forty checkout limit for individual patrons in Prospector. Differences in cataloging practices engendered more discussion because of the potential for a policy that might affect local practice. In the course of comparing practices of institutions, the Cataloging/Reference Task Force identified multiple records for the same serial titles that reflected differences in forms of entry and multiple ver- sions treated either in separate records or on the same record. There was wide variety in statements of holdings. These differences warranted gathering further informa- tion on holdings; multiple versions, especially those involving electronic versions; and successive/latest entry for cataloging. The task force decided to hold a focus group on serials and invited staff in member libraries from serials, cataloging, and reference to attend. In the meantime, visits to participating libraries were instituted, the first of the roadshows, to discuss serials practices, their implications for overlays and displays, and options for handling them. The focus group attracted a large attendance and proved useful in gathering information about practices and the concerns of participating libraries regarding serials. Most libraries reported individual practices for recording holdings. Although participants expressed a desire for consistency, attendees also shared that resources are not available to retroactively change them. Instead attendees encouraged development of a best practice recommendation that would follow the NISO standards for those libraries wishing to change practices. With the exception of electronic versions of serials, focus group participants had no problem with multiple formats in the same bibliographic record as long as it was clear to users. Electronic versions prompted a lot of ques- tions about what to do with 856 links to restricted access resources and about changes in software. It was clear that this issue would need further investigation by the task force. The hottest area, successive or latest entry cataloging of serials, registered strong preferences by proponents. Attendees did not welcome changing practice in either direction. Instead there were questions asked about pos- sible system changes and about the conduct of use stud- ies to determine what problems might arise from latest entry records in the system. With the information gained from the focus group meeting, the task force assigned priority to the areas and pursued latest/ successive entry as the top priority. 74 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 Already the task force had consulted Innovative Interfaces, Inc. and received a negative reply to possible changes to matching algorithms, loading programs, and record values that could deal with practices of partici- pants because of the software structure. It was technically impossible for a latest entry and successive entry record to load separately given their match on the OCLC number. The predominant use of successive entry and its sta- tus as the current national standard persuaded the task force initially to recommend coding latest entry in a spe- cial way so that the record for such an entry would not be the master record in the system unless it was unique. This interim measure led to the policy recommendation that successive entry serve as the standard for Prospector. As a part of the recommendation, members are asked to not undertake retroactive conversion/ recataloging projects to change existing latest entry records. Up to the meeting of the Prospector Board of Directors, the serials policy was argued. The approval by the board illustrates that controversial issues may require that leadership commit their libraries to policies. Marketing Marketing incorporates an overall strategy of identifying patrons' needs, designing products to meet those needs, implementing the products, and promoting and evaluat- ing them. The twin goals of Prospector are: (1) one-stop shopping and expanded access regardless of location, and (2) an automated borrowing system to facilitate fast delivery of materials that addressed problems experi- enced by patrons in searching and obtaining materials. The grant proposal outlined a plan for member libraries to meet these goals through INN-Reach software and the cooperative efforts of participating members. With the implementation of the unified catalog and patron-initi- ated borrowing, the next pieces of the strategy, promotion and evaluation, come into play. Member Libraries Commitment to a cooperative venture takes time and energy. The support for Prospector at the library director and dean level had to be translated to staff in member libraries whose efforts would be necessary to support the unified catalog and patron-initiated loans. Staff members had to become acquainted with how Prospector would benefit patrons and their work. Hence internal promotion was a necessary component throughout planning and policy development and with implementation to users. Because of the numbers of staff in member libraries, no one method would assure awareness of developments for Prospector. The approach involved the Alliance's newsletter (DataLink), a Prospector Web site, electronic Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. discussion lists, e-mail, correspondence, phone calls, doc- umentation, training sessions, and many site visits. The site visits facilitated interaction across institutional lines and were important for discussing critical issues at the local level. In arranging for site visits, it was important to clarify what the staff members wanted to discuss. A gen- eral update on Prospector might be followed by other technical sessions such as preparing the library's data- base for load into the Prospector system. Participants' questions emphasized the importance of sharing the plan for developing Prospector and the basic concepts guiding the implementation planning and pol- icy process as listed below. These concepts bore repeating because a staff member could have been hearing about Prospector for the first time. • Decisions and directions are guided by data and input gathered from participants, standards/best practices, system capabilities, and the aims for Prospector described in the grant. • Relatively few local practices are affected by par- ticipating in Prospector. • Inclusiveness in record contributions would build Prospector into a rich resource for users; however, participating libraries can exert control over con- tributions. • Global policies are developed for Prospector only; local sites define their own local policies. • Assistance is available to participating libraries in coming up with solutions for special circum- stances. • Prospector is not reinventing the wheel. Although the multitype library and multisystem involve- ment would produce a new model of INN-Reach, other INN-Reach sites could serve as models. • Think globally but act locally. More than a catch- phrase, this statement acknowledges the reality of individual library circumstances and the balancing of Prospector goals to maximize access and use of resources by patrons. Patrons The design of the PAC, a promotional brochure, and indi- vidual library public relations efforts all served to pro- mote Prospector's availability to users. Prospector provides access via Telnet and the Web. The impetus, however, was to examine member WebPACs and create a Prospector WebP AC that exemplified the best in menu design including caption descriptions, navigational aids, and consistency in display of elements among search screens. Special attention was paid to providing example searches that would have appeal for the diversity of patrons served by the membership. After mulling over several name possibilities, the Alliance staff suggested the name Prospector for the uni- fied catalog, connoting the rich mining history of the Rocky Mountain area. This identity found its depiction in a classic picture of a gold miner supplied by the Colorado Historical Society. Representing the user, the miner is the center panning for gold, an apt image for users exploring the richness of resources from the unified catalog. The incorporation of the image as the logo on the Web site and the catalog was followed by its adoption for the entire cooperative venture. Name recognition spread quickly. To facilitate promotion at member libraries, the Alliance staff designed a brochure. The design features a brief description of the unified catalog, a list of members and information for patrons on how to connect, what's available on Prospector, how to use the self-service bor- rowing, and how to view their circulation record. Many libraries have Web-mounted guides or paper handouts in their instructional service, using the Alliance-designed brochure as a model. Finally, staff in member libraries exercised individual approaches to promote Prospector to users. Denison Library describes and provides a link to Prospector on its Web list of databases and help guides. Colorado State University Libraries devoted the front page of its library newsletter to "hunting for hidden gold," the introduction of Prospector. A special newsletter for Auraria's history faculty highlighted Prospector in its database news sec- tion. The University Libraries of the University of Colorado at Boulder describes the unified catalog in its Web site on its State Services page. More introductions came from instructional classes held by every member library. Profile of Participating Libraries Prospector is unique since it is multistate, multi type, and multisystem. Of the sixteen members (see appendix A), almost all are located along the front range of the Rocky Mountains extending from Laramie, Wyoming, south- ward to Colorado Springs, Colorado. Only Fort Lewis College is located on the western slope of the mountains. Despite the distances, a network of courier service con- nects all members. Within the membership are eleven public and private academic libraries, three special libraries representing law and medicine, and two public libraries that serve almost one million registered patrons. Twelve of the libraries operate Innopac and are loaded into Prospector. Two libraries on the CARL System are slated for loading in mid-2000. Two other libraries are migrating to the Voyager System by Endeavor Information Systems in the summer of 2000. Hopes are to incorporate them into the system in 2001. PROSPECTOR I BUSH, GARRISON, MACHOVEC, AND REED 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Description of How INN-Reach Works The INN-Reach software is designed to provide a union catalog with one record per title with all of the libraries holding a title represented. After databases are loaded ini- tially, the software automatically queues transactions that occur to bibliographic, item, order, or summary serial holdings records and sends those transactions up to the central catalog. Staff in the local library has no extra work or steps to take to send transactions to the union catalog. The union catalog uses a "master" record to maintain only one bibliographic record per title. The "owner" of the master record is determined by several factors. A bib- liographic record with only one holding library automat- ically has that library as the owner of the master record. If more than one library holds a title, the system uses an algorithm to determine which record coming into the sys- tem has the highest encoding level. The library that has the record with the highest encoding level becomes the owner of the record, and its version of the record is dis- played and indexed in the catalog. In addition, a table is created which has a list of the libraries in priority order for determining the master record if two or more match- ing records enter the system with the same encoding level. For the Prospector catalog, a survey was conducted of the participating institutions to determine which libraries might have the best or fullest records. Questions in the survey included size of database, source of biblio- graphic records, participation in national projects (e.g., Program for Cooperative Cataloging, OCLC Enhance), amount of authority work done and level of authority control in the local database, level of cataloging given to records, and type of institution. The task force charged with designing the catalog examined these surveys and determined a priority order of the participating institu- tions for selecting bibliographic records. The system also uses a set of match points each time a bibliographic record is added to the union catalog. Whenever a match occurs, the system examines the encoding level of the incoming record and the library from which the record is coming to determine if a change in the master record is required. The existing record is overlaid by the incoming record if the master record holder is changed. The first check is done on the OCLC record number. If there is a match on that, the system adds the holdings to the existing record. If there is no match on the OCLC number, the system attempts to match on the ISBN or ISSN in combination with the title in the 245 field. Again, if a match occurs, the system adds the holdings to the existing record. If no match occurs, a new bibliographic record is added to the catalog. In addition, each library that has a local Innovative Interfaces system has the ability to exclude bibliographic, item, order, or check-in records from being sent to the 76 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 union catalog. Suppression may occur in each of these record types. The library may also choose to send a record to the union catalog but exclude it from public dis- play in the union catalog or to suppress a record from dis- playing in the public catalog both locally and centrally. The INN-Reach system has no central database main- tenance module, though it does provide a staff mode in which to view records, to create lists, and to monitor transaction queues. The staff module that is available via a telnet connection allows authorized users to view those records that have been contributed to the union catalog but are not displayed to the public in the union catalog. For example, a library may contribute its order records to the union catalog but choose to suppress those records from public display; however, authorized staff may view these records in the INN-Reach staff mode or create lists for collection development purposes that include those order records. Circulation status of individual items and volumes also appears to the user. The Prospector member libraries with local Innovative Interfaces systems also maintain a set of circulation or item status codes that display various messages to users of their individual public catalogs. The INN-Reach system also has a set of circulation or item status codes. Agreement was reached on what the status codes were to be in the central catalog, and each member library then had to map its local codes to the codes used in the central catalog to ensure proper message display in the union catalog. In some cases, the member libraries had to adjust local status codes. Indexes for the Prospector catalog were determined during the profiling process. In general, there are more indexes in the union catalog than are available in the member libraries' local catalogs. Indexes in Prospector include author, author/ title, Library of Congress Subject Headings, Medical Subject Headings, Library of Congress Children's Subject Headings, journal title, key- word, Library of Congress classification numbers, National Library of Medicine classification numbers, Dewey Decimal classification numbers, government documents numbers, OCLC numbers, and special num- bers (e.g., ISBN, ISSN, music publisher numbers, etc.). The classification number indexes are derived using the classification numbers that appear in the defined MARC tags for the various classification schemes in the biblio- graphic record and do not represent local call numbers. Local call numbers are always stored at the item record level in the union catalog. It was decided that many local MARC fields that are defined for local notes or local access would not transfer from the local catalog to the union catalog (e.g., 59x, 69x, 79x, 9xx) to avoid ambigui- ties and excessive heading conflicts. Therefore, there may be access points or index entries in the local catalog that may not be available in the union catalog; the local Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. catalog may still contain "richer" or "fuller" searching than the union catalog. The local catalog may have mate- rials accessible in it as well that do not appear in the union catalog. Patrons using a local catalog may transfer their searches up to Prospector simply by clicking on a button in their local public catalogs and have the search auto- matically occur in the union catalog. Patrons may access Prospector directly either via the World Wide Web or via telnet. Navigation between local catalogs and Prospector as well as navigation within Prospector has been designed to be clear and simple. Patrons may also go from Prospector either back to their local catalog or to the local catalogs of other member libraries. When a patron locates an item that he or she wishes to borrow from Prospector, he or she may initiate the request for the item online. The borrowing and lending process is described below. Prospector member libraries have been asked to be as inclusive as possible in contributing bibliographic records to the union catalog. Member libraries have been asked to contribute the following: • items that users may borrow, including all mono- graphic materials that circulate, and other material types as specified by individual institutions that are listed as available for circulation. • items that users may not borrow but may use on- site, including reference materials, archival materi- als, rare books, and others as determined by individual institutions. Virtual items, such as elec- tronic journals, which have IP limiting and authen- tication are included in this category. • Items that are owned virtually which have URLs or IP addresses that are open and unrestricted include government publications and selected home pages as determined by the local institution. Bibliographic records that are contributed should have as full cataloging as possible for identification and retrieval. Materials that are on reserve and other locally defined special materials (e.g., materials that have use restrictions placed upon them) may be excluded from Prospector. The Prospector union catalog will also include biblio- graphic and circulation information from libraries that do not use Innovative Interfaces as their local system vendor. I The Integration of Non-Innovative Libraries into INN-Reach One of the major efforts in the Prospector project was to be able to incorporate bibliographic, item, summary serial holdings, and acquisitions records from other ven- dors with the INN-Reach union catalog software. In 1997, when the grant was written, it was envisioned that the system would incorporate libraries using two ILS ven- dors-Innovative Interfaces, Inc. and CARL Corpora- tion-two of the major vendors in Colorado at the time. Twelve libraries used Innovative Interfaces and four used the CARL system (Denver Public Library, Regis University, Colorado School of Mines, and the University of Wyoming). However, in late 1999, the Colorado School of Mines and the University of Wyoming decided to migrate to the Voyager system by Endeavor Information Systems (this is occurring in 2000). Both of these institu- tions have still expressed an interest in being part of Prospector, so they will need to be integrated in 2001 after they are stable on their new system. The remaining CARL sites will be fully integrated in 2000. The integration of records that allows document requests from different vendors is being accomplished as follows: • Innovative Interfaces, Inc. has published a set of specifications for how bibliographic, item, sum- mary serial holdings, and acquisitions order records should be formatted to be loaded into the union catalog. • Published specifications were also created for patron verification and for how document requests are to be transferred. • The Alliance office is developing the software to package USMARC bibliographic records, item records, summary serial holding records, and order records to transfer to Prospector. Work is also being done so that document requests may be relayed between the different systems using an intermediate Unix server running an SQL database with a Web interface for circulation to ILL staff. Because the CARL and Endeavor systems are built differently, the record updating may be done on a "batch" basis several times a day. Patron verification, to deter- mine if a CARL or Endeavor patron is in good standing before allowing a document request, will be done in real- time. I Administrative and Committee Structures Under provisions of the grant, the Dean of Libraries at the University of Northern Colorado provides administrative management for the project while the Colorado Alliance of Research Libraries houses the server, maintains the union catalog software, provides network connectivity, PROSPECTOR I BUSH, GARRISON, MACHOVEC, AND REED 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. develops the software to integrate the non-Innovative sites into the union catalog, and provides ongoing system administration support for the project. A Prospector Steering Committee comprised of deans and directors of three participating libraries provided general overview for the project during the initial stages. To carry out the initial work of the project, two task forces were appointed with responsibility for detailed design and implementa- tion of the system: the Catalog/Reference Task Force and the Circulation/Document Delivery Task Force. The Catalog/Reference Task Force was charged with making all bibliographic and display decisions relating to the catalog. This included establishing the criteria for determining which institution's bibliographic record dis- plays in the catalog, developing display and overlay hier- archies for bibliographic records coming into the system, and identifying MARC fields that would be indexed and displayed in the catalog. Membership on this task force included both public services and technical services per- sonnel, but did not include representation from every participating library.9 The Circulation/Document Delivery Task Force was charged with developing common circulation policies to be applied in the union catalog including loan periods, fines, renewals, holds, recalls, checkout limits, and patron blocks. The task force was also responsible for develop- ing the precedence table for routing patron requests. The members of this task force represented each participating library, and several libraries had representation from both their circulation and interlibrary loan department.lo These two task forces conducted meetings from July 1997 through December of 1999. The stage was set for the task forces' work at a training session held by Innovative Interfaces, Inc. on system operation and functionality. Each group received direction on what policy issues needed to be determined to lay the groundwork for establishing the codes that drive system functionality. After the initial train- ing, each task force met several times a month, often con- sulting with Innovative Interfaces, Inc. and/ or their local libraries as their planning and deliberations continued. Communication was an important component during the development of the system. Soon after the grant was awarded, staff from the Alliance office visited each par- ticipating library and met with library personnel to explain the overall goals of the project and how work would be conducted. As detailed development pro- gressed, open forums were held in central locations to keep representatives of all libraries apprised of progress and to get feedback regarding specific policy issues. Completed work from the task forces was mounted on the Prospector Web site. In addition, regular articles appeared in Data/ink, the Alliance monthly newsletter. Specific training sessions were conducted both by the Task Forces and by Innovative Interfaces. 78 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 As the actual database loading process began, the Catalog/Reference Task Force conducted sessions at each Prospector library. These sessions were twofold in pur- pose: to provide an opportunity for a general overview of how the database structure and indexing worked for all library personnel, and to train technical services person- nel in how local coding of records impacted the display of their local records in the global catalog. In preparation for going live with patron requesting, Innovative Interfaces, Inc. conducted PAC searching and circulation training sessions at several central locations for frontline staff from all institutions. In addition, the Circulation/ Document Delivery Task Force held a central session for representatives from all libraries to discuss issues relating to the flow of materials among libraries. During system implementation, it became apparent that some ongoing structure would be required for ongo- ing maintenance and development of the global catalog. In completion of their charges, each task force prepared a final report, which was submitted to the Steering Committee and to the Prospector Directors Group. Each task force recommended its own termination but out- lined a structure to address ongoing issues. As approved by the Prospector Directors Group, the ongoing governance structure is multilayered with front- line operations groups, broader planning and policy-set- ting committees, an Advisory Committee, a Directors Group, and electronic discussion lists for communication. Monitoring of the day-to-day work of the cataloging and circulation/ document delivery operations is handled by frontline staff via e-mail, electronic discussion lists, and/ or telephone. Broader planning and policy issues are addressed through smaller, representative standing committees. The Advisory Committee and Directors Group operate at a policy level. The new structure includes: • a Catalog Site Liaison group comprised of one rep- resentative from each participating library and charged with serving as the point of contact for inquiries regarding catalog maintenance, access and record merging; • a Catalog/Reference Committee comprised of members selected from the participating libraries and charged with responsibility for all biblio- graphic and display issues relating to Prospector. This includes monitoring details of the current implementation as well as addressing ongoing policy issues, recommending system enhance- ments, testing new system functionality, and train- ing staff at new sites coming into the system; • a Document Delivery Site Liaison group com- prised of one or more representatives from each participating institution with responsibility to Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. serve as a point of contact for other Prospector libraries that have inquiries concerning issues, lost books, courier delivery, or related topics; • a Circulation/Document Delivery Committee com- prised of representatives selected from the partici- pating libraries and responsible for issues relating to the courier delivery service, circulation load-balanc- ing, monitoring member compliance with circula- tion policies, recommending system enhancements, testing new system functionality , and the year-end reconciliation of lost book charges; and • a Prospector Advisory Committee comprised of tewnty-four deans and directors from participat- ing libraries to address issues requiring quick response relating to project specifications and operating rules. The Prospector Directors Group is comprised of the deans/ directors of all participating libraries and is charged with making recommendations on high-level policy and admission of new participants . Since Prospector is a proj- ect of the nonprofit Colorado Alliance of Research Libraries consortium, all final high-level decisions and financial commitments are subject to the approval of the Board of Directors of the consortium . At the present, five of the sixteen Prospector libraries are not part of the formal consortium but participate in this one project. The newly formed committees will continue to address broad policy and operational issues such as the load-balancing tables for routing patron requests to own- ing libraries, will document best practices for local libraries to follow in implementing certain functionality within their local system to achieve maximal results in the central catalog, will identify enhancements to the sys- tem , and will test new release functionality. I Borrowing and Lending Policies and Specifications As a prelude to its work, the Circulation / Document Delivery Task Force examined borrowing and lending practice s from other Innovative Interfaces . INN-Reach sites and reviewed the borrowing policies for consortia! borrowers that were developed and agreed to by a subset of Alliance libraries (University of Northern Colorado, Auraria Library, and Denver Public Library) several years ago. The first major duty of the task force was to establish circulation and document delivery policies that would govern those functions within the Prospector system. These common circulation and document delivery poli- cies were based on a series of assumptions: • the task force policies apply to the unified catalog only; local sites define local policies; • local workflow remains local purview; • policies should be kept simple; • circulating materials are commonly circulated materials, primarily books, at each site; • the task force will work within the confines of the INN-Reach system; • if a patron is blocked locally, he or she will be blocked at the global level; • for routing purposes, each institution (rather than branch) is the routing site; and • local sites will determine when their items are declared lost. The task force established a series of recommenda- tions for policies that applied to the Prospector system . The proposed policies were discussed within the local institutions as well as with various administrative groups. The final policies for Prospector lending as adopted and implemented in the system are: • loan period : twenty -one days • renewals: one • number of holds allowed : forty • checkout limit: forty items • recalls: none, except for academic library reserve collections • lost book charge: $100, which is comprised of a $75 refundable lost book charge and a $25 nonrefund - able processing fee • libraries establish their own local rules for overdue fines on Prospector materials . Key features of the INN-Reach software that were emphasized with each local library during training ses- sions are: • Libraries have local control over what is loaned through the global catalog. • Libraries have local control over which of their patrons can borrow materials through the global catalog. • If the local copy is checked out or missing, a copy may be requested through Prospector. • The system is sensitive to multivolume works and allows particular volumes to be selected. The ongoing Document Delivery Committee has developed a series of "best practices" that establish benchmark policies that each library is urged to adopt in the spirit of uniform cooperation among participating libraries. Individual libraries, however, may choose not to adopt these practices. PROSPECTOR I BUSH, GARRISON, MACHOVEC, AND REED 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. System Functionality The actual steps for a patron to request an item within the Prospector system are simple and self-explanatory. Once a patron has identified an item they wish to order, the fol- lowing steps take place: • The user is prompted for institutional affiliation, name, and library card number. • The system checks local system to ensure that the patron is in good standing. • The user selects a pick-up location from those offered by their home institution. • The system forwards the patron request to an own- ing library with an available circulation status doing load balancing among the libraries with available copies. Once the patron request is forwarded to a lending library, the request goes into the queue of requested items from that library. Each library has established its own workflow for handling requests; however, that workflow must include interaction with the system to record the status of the request. Once the item is located by the lend- ing library, it is checked out to the requesting patron's "home" library and is sent, via courier, to that library. The "home" library then receives the item in the system and holds it pending pick-up by the patron. When the patron arrives to borrow that item, it is checked out to that patron's record according to the Prospector loan rules. Having a common set of loan rules for all Prospector loans provides consistency for the patron. The patron may still have multiple due dates on items checked out at the same time depending on the loan rules for local checkouts. The system maintains statistics on several elements of the borrowing and lending processes. It tracks the total number of items borrowed and loaned and calculates the ratio of borrowing to lending per institution. In addition, it tracks the number of items cancelled and the reason why, the number of holds filled and cancelled, and sev- eral other groupings. I Challenges and Issues With the building of Prospector still underway and pub- lic access available only since late July 1999, Prospector is doing a respectable volume of loans in its infancy. Over ten thousand items were delivered during the first six months of operation. This number is expected to dramat- ically rise as the system grows and as local libraries pro- mote the service. This auspicious start provides a sense of 80 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 accomplishment tempered by recognition that there is more to do. Some of the major challenges facing the proj- ect include: • • • • • • • • Development is underway to integrate records for the CARL system libraries into the central catalog and provide borrowing capabilities for their patrons. As member libraries choose other online system providers, ideally, these systems likewise need to be interfaced with the Prospector system. Coming to agreements with all vendors involved will require careful negotiation and wording of contracts. Discussions are underway with Innovative Interfaces and Endeavor Information Systems for merging Endeavor libraries into INN-Reach. Monitoring how the fiscal accounting for first end- of-year reconciliation will work for lost books is planned. Developing best practices and evaluating software enhancements for INN-Reach are necessary. We need to determine how to handle electronic resources and multiple formats, and load records from commercial electronic resources, for example, net Library. We must improve matching within the system and additional enhancements to the Prospector Web site. With growth of the system, full-time operations and management staff may be required. Securing funding for the new ventures and new staffing will require development efforts or a shar- ing of costs by members. There is no state-based funding for ongoing maintenance and new prod- uct acquisition. With the increasing flow of materials between libraries, the courier delivery service must be monitored on an ongoing basis. The statewide courier service has been recently restructured and was contracted based on pre-Prospector activity levels for interlibrary loan materi- als. With the ever-growing popularity of Prospector, there will be a corresponding increase in volume for the courier. Service levels need to be monitored closely to ensure that the speed of delivery is maintained and that the loss and incorrect routing rate is within acceptable limits. The balance of borrowing and lending will have financial impacts on some of the participating libraries. Through a legislative allocation, the State Library of Colorado provides funding on a per transaction basis to libraries that are net lenders, or that loan more materials than they borrow. Most libraries are considering the Prospector transactions as equivalent to interlibrary loan transactions and counting them toward the payment for lending program. It is anticipated that the inclusion of Prospector activity in the interlibrary loan borrowing and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. lending statistics will significantly alter the balance of payment for lending among the Prospector libraries. Already Prospector has shown that it is changing behaviors. The cooperation between libraries has been impressive. In member libraries, staff are factoring Prospector into their plans and realizing that keeping Prospector operations staff informed of problems is a good habit. User searching and document delivery patterns are changing. Margaret Landrum, Director at the Fort Lewis College Library, predicts that Prospector will have a dra- matic effect on researchers in the geographic area. Its start has given all members a share in that expectation. I The Future and Interesting Spin-Offs Union catalog projects often take on a "life of their own" far beyond what was originally envisioned. Some of the future spin-offs may include: • The addition of other research libraries in nearby states. • Collection overlap studies and improved coordina- tion on acquisition and weeding projects between libraries. • With the full implementation of the union catalog, there are opportunities for resource sharing at a broader level. The central catalog has the function- ality to support bibliographic records for and access to "consortia!" resources, thus enabling libraries to jointly purchase resources and provide centralized access to them. • As database and online information providers develop new methodologies for access to their resources, there will be opportunities to easily link from either the local or central catalog to these online resources, a process which is cumbersome and/or impossible in the nonglobal environment. For instance, where databases are centrally mounted at the Alliance office with shared ownership, the link to serial holdings feature is pointed to Prospector, thus providing patron access to consortiawide holdings. • Use of the system as a central repository for cata- loged metadata for electronic resources on the Web. • Encouraging Innovative Interfaces, Inc. to allow document requests that "fail" in the system to be forwarded to national ILL subsystems or commer- cial document suppliers using national standards. I Conclusion Prospector dramatically alters the bibliographic land- scape in Colorado, offering patrons easy access to the bib- liographic wealth of the state. Patrons will be easily able to move from a local catalog to this regional system and request materials. Librarians will find the system useful for collection overlap studies, improved coordination on acquisitions and weeding projects, Z39.50 links with other indexing/ abstracting services for serials holdings information (e.g., Ovid or SilverPlatter), and expedited book delivery. The high level of cooperation among the diverse nature of the participating libraries is exemplary. The incorpora- tion of public and private universities, public libraries, and special libraries offers a model for cooperation. References 1. Anthony J. Dedrick, "The Colorado Union Catalog Project," College and Research Libraries News 59, no. 10 (1998): 754-55; George Machovec, "Prospector: A Regional Union Catalog," Colorado Libraries 25, no. 2 (1999): 43-45. 2. Clifford A. Lynch, "The Next Generation of Public Access Information Retrieval Systems for Research Libraries: Lessons from Ten Years of the MELVYL System," l!'.formation Technology and Libraries 11, no. 4 (1992): 405-15; Bernie Sloan, "Testing Common Assumptions about Resource Sharing," Information Technology and Libraries 17, no. 1 (1998): 18-29. 3. Thomas Dowling, "OhioLINK-The Ohio Library and Information Network," Library Hi Tech 15, no. 3 / 4 (1997): 136-39; Lindy Naj, "The CARL System at the University of Hawaii UHM Library," Library Software Review 12, no. 1 (1993): 5-11. 4. Gary Pitkin and George Machovec, Colorado Union Catalog. Senate Bill 96-197. Technology Grant and Revolving Loan Program. Excellence in Learning Through Technology. December 1996. Grant proposal by the University of Northern Colorado and the Colorado Alliance of Research Libraries. 5. Gary Pitkin, Colorado Union Catalog-Prospector. Final Report. July 27, 1999. 6. Machovec, "Prospector: A Regional Union Catalog." 7. Ibid. 8. Ibid. 9. Prospector Staff Web site, www.coalliance.org/prospector. 10. Ibid. PROSPECTOR I BUSH, GARRISON, MACHOVEC, AND REED 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. APPENDIX A General Statistics about Prospector: • sixteen libraries (see below) • twelve Innovative Interfaces sites (went live in fall 1999) • two CARL sites (to go live in 2000) • two Voyager Endeavor sites (to be incorporated in 2001 pending final negotiations with both vendors) • 3.6 million unique MARC records as of January 2000, which are expected to grow to more than 5 million after the incorporation of the CARL and Endeavor sites. • 9 million item records, which are expected to grow to more than 12 million after the incorporation of the CARL and Endeavor sites. • Currently 61 percent of the records in the system are held by only one library. • Greater than 1 million registered patrons are possible users . Denver Public Library has over 500,000 patrons and Jefferson County Public Library has over 300,000 patrons . • Prospector URL for public use : http:/ /prospector.coalliance.org • Prospector staff URL, which includes policies, committee minutes, and profiling tables: www.coalliance.org/ prospector Prospector Libraries Auraria Library Colorado College Colorado School of Mines Colorado State University Denver Public Library Fort Lewis College Jefferson County Public Library Regis University University of Colorado at Boulder University of Colorado/Colorado Springs University of Colorado/Health Sciences University of Colorado/Law Library University of Denver University of Denver/Law Library University of Northern Colorado University of Wyoming Web site http://carbon.cudenver.edu/public/library http://www.coloradocollege.edu/library http://www.mines.edu/academic/library http://manta.library.colostate.edu http://www.denver.lib.co.us http:/ !library. fortlewis.edu http://www.jefferson.lib.co .us http://www.regis.edu/1 ib/wlibhome.htm http://www.libraries.colorado.edu http://web.uccs.edu/library http://www.uchsc.edu/library/index.html http://www.colorado.edu/law/lawlib http://www.penlib.du.edu http://www.law.du.edu/library http://www.unco.edu/library http://www-lib.uwyo.edu 82 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. APPENDIX B Early Borrowing/Lending Data The borrowing and lending patterns in Prospector will be of interest to monitor because of the wide variety of partici- pating libraries in the system. The incorporation of both academic and public libraries has the potential for different use patterns as seen in more homogeneous academic union catalogs. The following data represents some of the very early borrowing and lending patterns in Prospector . All of the libraries in the table went "live" in terms of borrowing and lend- ing in late July or August 1999, with the exception of Jefferson County Public Library, which went live in November 1999. History with other similar projects has shown that use will dramatically grow as libraries and users gain familiarity with the service. The incorporation of Denver Public Library in 2000 should provide significant impact on the service. At the present (and in the accompanying table), Prospector has been configured to do random load balancing without the use of any precedence tables to force document requests to one site or another. Borrowing Site Aur CCC SU CUL CUB DU DUL FTL JCPL UCCS UCHSC UNC Lending (Owning) Site Ratio UB TOTALS 1879 930 2301 225 1520 1132 129 946 1775 882 364 2063 AUR 0.89 1667 108 282 33 232 187 17 113 234 128 70 263 CCC 0.72 673 114 109 11 96 57 66 89 53 10 68 CSU 0.86 1985 267 156 29 272 221 18 130 288 134 55 415 CUL 0.55 123 24 9 20 5 11 12 3 10 7 3 19 CUB 2.05 3120 396 231 590 26 260 21 246 420 233 56 641 DU 2.07 2341 361 153 464 42 315 20 163 279 131 69 344 DUL 1.12 145 27 7 14 27 15 25 3 11 6 4 6 FTL 0.54 511 66 36 130 3 66 36 7 72 31 11 53 JCPL 0.54 962 187 81 201 11 154 65 11 64 33 38 117 uccs 1.02 900 170 65 148 12 130 65 5 3 137 15 90 UCHSC 0.83 301 63 5 49 5 26 31 3 5 32 36 46 UNC 0.69 1422 219 81 291 27 207 153 13 89 222 90 30 Prospector Fulfillments Report, August 1999 through February 14, 2000 PROSPECTOR I BUSH, GARRISON, MACHOVEC, ANO REED 83 10082 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Digital resource sharing and library consortia in Italy Giordano, Tommaso Information Technology and Libraries; Jun 2000; 19, 2; ProQuest pg. 84 Digital Resource Sharing and Library Consortia in Italy Tommaso Giordano Interlibrary cooperation in Italy is a fairly recent and not very widespread practice. Attention to the topic was aroused in the eighties with the Italian library network project. More recently, under the impetus toward tech- nological innovation, there has been renewed (and more pragmatic) interest in cooperation in all library sectors. Sharing electronic resources is the theme of greatest interest today in university libraries, where various ini- tiatives are aimed at setting up consortia to purchase licenses and run digital products. A number of projects in hand are described, and emerging trends analyzed. T he state of progress and the details of implementa- tion in various countries of initiatives to share digi- tal information resources obviously depend-apart from current investment policies to develop the informa- tion society-on many factors of a historical, social, and cultural nature that have determined the evolution and consolidation of cooperation practices specific to each context. Before going to the heart of the specific subject of this article, in order to foster an understanding of the envi- ronment in which the trends and problems that we shall be considering are set, I feel it best to give a quick (and necessarily summary) sketch of the library cooperation position in Italy. The word "cooperation" became established in the language of Italian librarians only toward the mid-'70s, when in the sector of public libraries-which were trans- ferred in those years from central government to local authorities-the "territorial library systems" were taking shape: this was a form of cooperation provided for and encouraged by regional laws that brought together groups of small and medium-sized libraries, often around a system centre supplying shared services. A few years later, in the wake of the new information technologies and in line with ongoing trends in the most advanced countries, in Italy, too, the term "cooperation" became increasingly associated with the concept of com- puterized library networks. The decisive impulse in this direction came from a project of the National Library Service (SBN), the national network of Italian libraries, then in a gestation stage, which also had the merit of Tommaso Giordano (giordano@datacomm.iue.it) is Deputy Director of the Library at the European University Institute, Florence. 84 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 speeding up the opening of the Italian librarianship pro- fession to experiences underway in the most advanced countries_! In the '80s, cooperation, together with automation, was the dominant theme at conferences and in Italian professional literature. However, the heat of the debate had no satisfactory counterpart in terms of practical implementation, because of both resistance attributable to a noninnovative administrative culture and the polarization of the bulk of the investments around a sin- gle major project (the SBN network), the technical and organizational choices of which were shared by only part of the libraries, while others remained completely outside this programme. Many librarians, while recog- nizing the progress over the last fifteen or twenty years (including the possibility of accessing the collective cat- alogue of SBN libraries through the Internet), maintain that results obtained in the area of cooperation are well below expectations, or energy involved. I am touching here on one of the most sensitive, controversial points in the ongoing professional debate, which I do not wish to dwell on except to note the split that came in Italian libraries following the vicissitudes of a project that ought, instead, to have united them and stimulated large-scale cooperation.2 I shall now seek to summarize the cooperation posi- tion in Italy in relation to the subject of this article. Very schematically (and arbitrarily) I have grouped the experi- ences I feel most signficant under three heads: SBN net- work, territorial library systems, and sectoral cooperation. SBN brings together some eight hundred large, medium-sized, and small libraries (national, local- authority, university, and research-institute). The pro- gramme, funded by the central government, supports cooperation in the following main sectors: • hardware sharing, • development and maintenance of library software packages, • network administration, • shared cataloguing, and • interlibrary loans. The SBN is a star network with its central node con- sisting of a database (the so-called "index") containing the collective catalogue of the participating libraries (cur- rently some four million relevant bibliographic titles and 7.5 million locations). To the index are linked the thirty- seven local systems, single libraries or multiple libraries, that apply the computerized procedures developed by the SBN programme. Thus the SBN is a closed network of only those libraries agreeing to adopt the automation sys- tems distributed by the Central Institute for the Union Catalogue, the central office coordinating the pro- gramme, take part. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. From the organizational viewpoint, the SBN can be regarded as a de facto consortium (i.e., not in the legal sense of the term), even if the management bodies, par- ticipation structures, and funding mechanisms differ con- siderably from consortia that have been set up in other countries. In fact, libraries join the SBN through an agree- ment among state, regions, and universities, and the gov- erning bodies represent not the libraries but their parent institutions. Participating libraries receive the services free, and funding for developing the systems and net- work administration comes from the central government, which coordinates the technical level of the project through ICCU.3 Currently, ideas are moving toward evolving the SBN into an open network system and reor- ganizing its management bodies: if this provision becomes a reality, the SBN will have potential for taking on an important role in developing digital cooperation. The territorial library systems, developed especially in the central and northern regions, consist of small groups of public libraries cooperating in one or more sec- tors of activity such as: • sharing computer systems, • cataloguing, • centralized management of purchases, • interlibrary loans, and • professional training and other activities. The library systems are based on conventions and for- mal or informal agreements between local institutions (the municipalities) and receive support from the provin- cial and regional administrations. In more recent years some systems (e.g., Abano Terme, in the Veneto) have formed themselves into formal, legal consortia. The most advanced experience in this sector-for example, the libraries in the Valseriana (an industrial valley in Lombardy), which have been operating on the basis of an informal consortium for some twenty years now-have reached a high level of efficiency comparable with the most developed European situations and may rightly be regarded as reference models for the organization of cooperation. However, given their limited size, they are unlikely to achieve economies of scale in the digital con- text unless they develop broader alliances. It is not unlikely that these consortia, given their capacity to work together, will in the near future develop broader forms of cooperation suited to tackling current technological chal- lenges. Sectoral cooperation (cooperation by area of special- ization) is meeting today with steadily increasing inter- est, though it did not fare very well in the past. Among the rare initiatives embarked upon by university and research libraries in this direction, particular importance in our context attaches to the National Coordination of Architectural Libraries (CNBA), started some twenty years ago, which became an association in 1991. The CNBA has various projects on its programme and can be regarded as an established reference point for coopera- tion among architectural libraries. We should also mention one of the "oldest" coopera- tion projects among research libraries: the Italian period- icals catalogue promoted by the National Research Council (CNR), recently made available online by the University of Bologna.4 To complete this sketch, at least a mention should be made of the participation of Italian libraries in the European Commission's technical programme in favor of libraries. This programme, which since 1991 has mobi- lized the world of libraries in the European Union, not only favors and guides explosion of technologies into libraries in accordance with preset objectives, but also has the aim of encouraging cooperation among libraries in the various countries. The programme-the latest edition of which includes not just libraries but also archives and museums-has secured significant participation from many Italian libraries. Over and above the validity of the projects already carried out or under way (important as that is), this programme has been very valuable to Italian libraries in terms of exchanges of experience and of open- ing up professional horizons, especially as regards coop- eration practice.s Digital Cooperation Recently, following the expansion of electronic publish- ing, university libraries have been displaying renewed interest in cooperation activities with particular reference to acquiring licenses and sharing electronic resources. This movement is at present in full swing and is giving rise to manifold cooperation initiatives. To get an idea of the trends under way, one may leaf through a session on database networking in Italian universities in the pro- ceedings of the AIB Congress at Genoa. 6 On that occasion a group of universities presented a "Draft Proposal of Agreement on Access to Electronic Information." The document is divided into two parts, the first defining the purposes and object of university cooperation in the sphere of electronic information. The second part indi- cates operational objectives for cooperation in acquiring electronic information and proposes a model contract for purchasing licenses, to which member universities are to keep. The content of this second part coincides with the recommendations and understandings signed by associ- ations, consortia, and groups of libraries in other coun- tries, and largely follows the indications and recommendations issued by the European Bureau of Library Information and Documentation Associations (EBLIDA), the organization that brings together the library associations of the various European countries; by DIGITAL RESOURCE-SHARING AND LIBRARY CONSORTIA IN ITALY I GIORDANO 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the International Coalition of Library Consortia (ICOLC); and by other library organizations. There is no point here in listing all initiatives under way in Italian libraries, in part because most of them are only just started or in the experimental stage. I shall men- tion a few only to bring out the trends that seem, from my point of view , to be emerging . Development of Digital Collections At the moment initiatives in this sector are much fewer and less substantial than in other industrialized coun- tries. Among them the Biblioteca Telematica Italiana stands out: in it, fourteen Italian and two foreign univer- sities digitize , archive, and put online works in Italian . The project is based on a consortium, the Italian Interuniversity Library Center for the Italian Telematic Library (CIBIT), supported by funds from the National Research Council (CNR) and made up of the fourteen Italian and two foreign universities that have signed the agreement. Technical support is provided by the CNR Institute for Computer Linguistics, located in Pisa.7 In this context we must also note, especially for the consequences it may have for the future growth of digi- tal collections, an agreement between the National Central Library in Florence and the publishers and authors associations aimed at accomplishing the National Legal Depository for Electronic Publishing project, which also provides for production of a section of the Italian National Bibliography to be called BNI- Documenti Elettronici. The publishers who have signed the agreement undertake to supply a copy of their elec- tronic products to the National Central Library in Florence. The latter undertakes to guarantee conserva- tion of the electronic products deposited, and to make them accessible to the public in accordance with the agreements reached. • Description of Electronic Resources In this area the bulk of the initiatives are still in an embry- onic stage. In the sector of periodicals index production (i.e., TOCs), mention should be made of the Economic Social Science Periodicals (ESSPER), a cooperation project on Italian economics periodicals launched by the Libero Istituto Universitario Carlo Cattaneo (Castellanza, Varese) to which some forty libraries are contributing. 9 Recently the project has been extended to Italian legal journals. ESSPER is a cooperative programme based on an informal agreement among the libraries, each of which undertakes to supply in good time the TOCs of the peri- odical titles they have undertaken to monitor. The pro- gramme does not benefit from any outside funds, being supported entirely by the participating libraries, which 86 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 have recently been endeavouring to evolve into a more structured form of cooperation . Administration of Electronic Resources and Licenses In this sphere there have been numerous initiatives recently, particularly by university libraries . One may note, first, a certain activism by university data-processing con- sortia (big computing centres created at the start of the computer era to support applications in scientific and then university and library administration areas). The Interuniversity Consortium for Automation (CILEA) in Milan , which has for some time been operating in the area of library systems and electronic information distribution (especially in the biomedical sector), has extended its activ- ities by offering services to nonmembers of the consortium too. Recently CILEA, in connection with a broader pro- gramme---CDL-CILEA Digital Library-has been negoti- ating with a number of major publishers the distribution of electronic journals and online bibliographic services on the basis of needs expressed by the libraries in the consortium. CASPUR (the university computing consortia in Rome) is working on several projects, among them shared manage- ment of electronic resources on CD-ROM in a network among five universities of the Centre-South . CASPUR, too, has opened its services to libraries not in the consortium and is negotiating with a number of major publishers the licenses for establishing a mirror site for electronic period- icals. The University of Genoa, through CSITA, its com- puting services centre, has concluded an agreement with an Italian distributor of electronic services to enable multi- site license-sharing for biomedical databases by institu- tions operating on the territory of Liguria. Very recently the universities of Florence, Bologna, Modena, Genoa, and Venice and the European University Institute in Florence have initiated a pilot project (CIPE) for shared administra- tion of electronic periodicals, and have begun negotiations with a number of publishers. Let us now seek to draw some conclusions from this initial, brief consideration of current initiatives: • Initiatives in the area of digital cooperation are coming mainly from the world of university and research-institute libraries. • No projects are big enough to achieve economies of scale, with most initiatives in hand having a very limited number of partners and often being experimental in nature . • Projects under way do not provide for the formation of proper consortia, most likely because the legal form of the consortium is hard to set up in Italy because of the burdens involved, especially the com- plexity and length of the decision-making processes needed to constitute such an organization. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Librarians prefer decentralized forms of coopera- tion, partly because , shaken by experiences of the past, they fear losing autonomy and efficiency and finding themselves caught up in the bureaucracy of centralized organizations. "However, there can also be a correlation between the amount of auton- omy that the individual institution retains and the ability of the consortium to achieve goals as a group". This observation by Allen and Hirshon obviously holds for Italy too . JO It is no coincidence , in fact, that university computing consortia, who have centralized staff and funds available, are able to carry out more incisiv e actions in this sector. • Except for the Biblioteca Telematica Italiana, no initiatives seem to have been incentivized by ad hoc government programmes or funds. • A part of the cooperation projects concerns sharing of databases on CD-ROMs. The traditional Italian resistance to online materials would seem to be due partly to the still inadequate network infra- structures in our country; improvements in this sector might bring a quick turnaround here. • Some initiatives in hand have been inspired more by suppliers than by librarians : the risk is to coop- erate in distributing a particular product, not to enhance libraries' bargaining power. Without wishing to deny anything to the suppliers, who today play an essential part in terms of profes- sional information too, I feel that keeping the roles clearly separate may help to develop clear, upright and mutually advantageous cooperation. • Some major project s are being led by universit y computing consortia that have begun to take an interest in the library sector. The university com- puting consortia would indeed have some of the requirements to play a first-rank role in this sphere if they can manage to bring themselves into their most natural position, i.e., to operate as agents of libraries rather than as distributors of services on behalf of the commercial suppliers. Moreover, it ought to be clear that th e computing consortia should act as partners with the library consortia and not as substitutes for them, otherwise the libraries risk limiting their autonomy of decision . • Some attention is turning toward university elec- tronic publishing , though at the present stage it does not seem there are practical projects for coop- eration in this area. • Finally, one has to not e low initiative by libraries (compared with other countries) in developing content and in storing digital collections. Th e analysis I have rapidly summarized here is the basis for an initative which has in recent months been stimulating the debate on digital cooperation in Italy. I am referring to the Italian National Forum on Electronic Information Resources (INFER), a coordination group ini- tially promoted by the Europ ean University Institut e, the University of Florence, and a number of universities in the Centre-North, which is today extending beyond the sphere of university and research libraries. The forum's chief mission is to coop erate to promote efficient use of electronic information reso urce s and facilitate access by the public. To this end it encourages libraries to set up consortia and other typ es of agreement on acquisition and management of electronic resources and access to them . INFER's objectives can be summarized as follows: • to act as a reference and linkage point and develop initiatives to promote activities and programmes in the area of library e lectro nic resource sharing; • to enhance awar eness both at institutional political levels (ministries, universities, local authorities, etc.) and among librarians and end users; • to facilitate dialogue and mutual collaboration between libraries and all others in the knowledge production and distribution chain, to help them all (authors, publi shers, intermediaries, end users) to take advantage of the opportunities offered by the information society; and • to maintain contacts with similar initiatives under way in other countries. INFER has immediately embarked on a rich pro- gramme of activities which is giving appreciable results especia lly in terms of raising awareness of the problem and coordinating initiativ es in the area. We shall her e briefly mention some of the actions in hand that seem to us most important. Dissemination of information. INFER has developed a Web site where as well as information on the Forum's activities, important information and documents can be found relating to the consortia, the negotiations and licenses, and in general the digital resource-sharing pro- grammes in Italy and around the world.1 1 A discussion list for TNFER members has also been activated. Seminars and workshops. Thi s activity is aimed at fur- ther exploration of themes of particular interest (e.g ., legal aspects of license contracts, or programmes under way in other countries) . Data collection. The two main programmes corning und e r this heading are: (a) monitoring of Italian coopera- tion initiatives under way in the digital sector; and (b) collecting data on acquisitions of electronic information resources in university libraries . This information will enable the libraries to have a more exact picture of the sit- uation , so as to assess their bargaining power and achieve the necessary support to adopt the most appropriate strategies. DIGITAL RESOURCE-SHARING AND LIBRARY CONSORTIA IN ITALY I GIORDANO 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Indications and recommendations. As well as translating and distributing documents from the most important associations operating in this area (such as EBLIDA, ICOLC, and IFLA), INFER is developing a model license for the Italian consortia. INFER was set up in May 1999 and currently has some forty members, most of them representatives of university library systems, university computing consor- tia or research libraries, or univer si ty professors. One of INFER's aspirations is to persuade decision-makers to develop a programme of incentives on a national scale for the creation of library consortia . I Critical Factors As to the delay we note in terms of shared management of electronic resources, weight clearly attaches to the fact that cooperation is not very established , nor are the national structures that ought to have supported it. It would be all too easy and perhaps also more fun to attrib- ute this situation to the so-called individualism of Italians and to abandon inquiry into th e structural limitations that may have determined it. First of all, except in very few cases, libraries have no administrative autonomy, or only very little, and with hardly any decision-making powers. This factor favors interference in decision-making processes, complicates th em, slows down procedures, and strips librarians of their responsibility. One of the reasons why the SBN has not managed to generate cooperation is to be sought in the mechanisms for joining and participating in the pro- gramme . In other words, many libraries have joined the SBN following decisions taken from above, at the politi- cal and admistrative levels, and not on the basis of an autonomous, weighted assessment of attitudes, needs, and alternatives. These experiences have augmented libraries' reluc- tance to embark on centrally steered national pro- grammes. On the other hand, the low administrative autonomy they have prevents them from implementing truly effective alternative solutions, i.e., ones able to real- ize economies of scale. Another factor is the administrative fragmentation of libraries . The big universities have fifty or so libraries each (often one per department). Some universities have an office coordinating the librari es , but only in very few cases does this structure have the powers and the neces- sary support to coordinate; more often it acts as a media- tion office with no real administrative powers. In short, the result is that since (perhaps also because of a misun- derstood sense of departmental autonomy) there is no 88 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 decision-making centre for libraries in each university , decisional processes prov e slow and cumbersom e. Clearly, all this brings many probl ems in establishing understandings and cooperative programmes with other libraries and weakens the universities in negotiating licenses. This position, while objectively favoring suppli- ers in the short term, in the long term risks facing them with difficulties given an increasingly impoverished, uncertain market because of the fragmentation and the limited capacity of possible purchasers . Another limit is the insufficient awareness, especially on the academic side, of the challenges of electronic infor- mation. In early 1999 the French daily Le Monde published an extensive feature on scientific publishing, showing how current publishing production mechanisms, whil e assuring a few big publishers of ample profit margins, are suffocating libraries and universities under the continu - ous rises in prices for scientific joumals.12 The argument, immediately taken up by the Spanish El Pais and other European newspapers, met with very little response in Italy. Clearly, in Italy today, the conditions do not exist to embark on initiatives like the incisive open letter to pub- lishers sent by the Kommission des Deutschen Bibliotheksinstituts filr Erwerbung und Bestandsent- wicklung in Germany, supported by similar Swiss, Austrian, and Dutch organizations. 13 The lack of an adequate national policy in the area of electronic information is probably the direct consequence of the problems I have just mentioned. In this context, however praiseworthy the initiatives, they tend in the absence of reference points and practical support to break up or fritter away . Under the Ministry for Universities there are no leadership or action bodies in the area of aca- demic information, like the Joint Information System Committee in Britain that stimulates programmes aimed at developing and utilizing information technologies in university and research libraries . These observations are also valid for the state libraries and public libraries, too, where the central (Ministry for Cultural Affairs) and regional authorities could play a more effective part in promoting digital cooperation . I Conclusions The picture I have presented is not very rosy. However, it does reveal considerable elements of vitality and great er awareness of the problems emerging, starting with a few representatives of academic sectors who might be able to wield influence and bring about a turnaround. At the moment, the consortium movement to share electronic resources chiefly involves university libraries, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. but a few initiatives by public libraries are starting to appear, especially in the multimedia products sector. No specific lines of action are yet emerging at the level of the national authorities-especially the Ministry for Education and Research and the Ministry of Cultural Activities, on which the national libraries and many research libraries depend. It is likely that in the near future the entry of these agencies may be able to modify the current scenario and considerably influence the approach to cooperation. From this viewpoint, the impression is that a few consortium initiatives that have been flourishing in recent months on the part of both libraries and suppliers have the principal aim of propos- ing cooperation models to guide future choices. In conclusion, we are only at the outset, and the game is still waiting to be played. References and Notes 1. Michel Boisse!, "L'organisation Automatisee de la Bibliotheque de l'Institut Universitaire Europeen de Florence," Bulletin des Bibliotheques de France 24, no. 5 (1979): 231-39. For an overall picture of the debate, see: La Cooperazione: II Servizio Bibliotecario Nazionale: Atti de/ 30th Congresso del/'Associazione Italiana Biblioteche, Giardini Naxos, November 21-24, 1982 (Messina: Universita di Messina, 1986). 2. Tommaso Giordano, "Biblioteche tra Conservazione e Innovazione," in Giornate Uncee Su/le Biblioteche Pubbliche Statali, Roma, January 21-22, 1993 (Roma: Accademia Nazionale dei Lincei, 1994): 57-65. For the most recent developments in the debate, see the articles by Antonio Scolari, "A proposito di SBN," Giovanna Mazzola Merola, "Lo Studio sull'Evoluzione de! Servizio Bibliotecario Nazionale," and Claudio Leombroni, "SBN un Bilancio per ii Futuro," Bollettino AIB 37, no. 4 (1977): 437-66. 3. Further information on SBN can be found at www.iccu.sbn.it/sbn.htm, accessed Oct. 27, 1999, where the col- lective catalogue of participating libraries is also accessible. 4. Catalogo Italiano dei Periodici (ACNP),www.cib.unibo.it/ cataloghi/infoACNP.htm, accessed Sept. 19, 1999. 5. There is a considerable literature on the European Commission's "Libraries Programme": for a summary of proj- ects in the programme, see Telematics for Libraries: Synopses of Projects (Luxembourg: Office for Official Publications of European Communities, 1998). Updated information on the lat- est version of the programme can be found at www.echo.lu/ digicult, accessed Oct. 26, 1999. On Italian participation in the programme see: "Ministero per i Beni Culturali e Ambientali, L'Osservatorio dei Programmi Internazionali delle Biblioteche 1995-1998" (Roma: MBAC, 1999). 6. Associazione Italiana Biblioteche (AIB), XLIV Congresso Nazionale AIB. Genova, 1988: www.aib.it/aib/congr/co98univ. htm, accessed Oct. 27, 1999. 7. More information about CIBIT can be found at www.ilc.pi.cnr.it/pesystem/19.htm, accessed May 19, 2000. 8. Progetto EDEN: Deposito Legale Editoria Elettronica N azionale, www.bncf.firenze.sbn.it/ progetti.html, accessed Sept. 29, 1999. 9. More information about ESSPER mav be found at www.liuc.it/biblio/ essper /Default.htm, access~d May 19, 2000. 10. Barbara McFadden Allen and Arnold Hirshon, "Hanging Together to Avoid Hanging Separately: Opportunities for Academic Libraries and Consortia," Information Technology and Libraries 17, no. 1 (1998): 37-44. 11. The INFER Web page can be found on the Universita di Roma I site, www.uniromal.it/infer, accessed May 19, 1999. 12. Le Monde, 22 Jan. 1999: A whole page is devoted to this topic. See especially the article titled "Les Journaux Scientifiques Menaces per la Concurrence d'Internet." Accessed Feb. 4, 1999, www.lemonde.fr/ nvtechno /branche / journo / index.html. The point was taken up again by El Pa(s, 27 Jan. 1999; see the article titled "Las Revistas Cientfficas, Amenazadas por Internet." 13. The letter, signed by Werner Reinhardt, DBI president, is available at www.ub.uni-siegen.de/pub/misc/Offener_Brief-engl. pdf, accessed Feb. 4, 1999. DIGITAL RESOURCE-SHARING AND LIBRARY CONSORTIA IN ITALY I GIORDANO 89 10083 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Consortia building: A handshake and a smile, island style Cutright, Patricia J Information Technology and Libraries; Jun 2000; 19, 2; ProQuest pg. 90 Consortia Building: A Handshake and a Smile, Island Style Patricia J. Cutright In the evaluation of consortia and what constitutes these entities the discussion runs the gamut. From small, loosely knit groups who are interested in cooperation for the sake of improving services to large membership- driven organizations addressing multiple interests, all recognize the benefits of partnerships. The Federated States of Micronesia are located in the western Pacific Ocean and cover 3.2 million square miles. Throughout this scattering of small islands exists an enthusiastic library community of staff and users that have changed the outlook of libraries since 1991. Motivated by the collaborative eff orts of this group, a project has unfolded over the past year that will furth er enhance library services through staff training and edu- cation while utilizing innovative technology. In assess- ing the library needs of the region this group crafted the document "The Federated States of Micronesia Library Services Plan, 1999-2003," which coalesces the con- cepts, goals, and priorities put forward by a broad-based contingency of librarians. The compilation of the plan and its implementation demonstrate an understanding of the issues and exhibit the ingenuity, creativity, and will- ingness to solve problems on a g rand scale addressing the needs of all libraries in this vast Pacific region. T he basic philosophy inher ent in librarianship is the concept of sharing. The di sse mination of informa- tion through material exchang e and interlibrary communication has enriched so cieties for centuries. Th ere ar e few institutions other than libraries that are bet- ter equipped or suited for such cooperation and collabo- rati ve e ndeavors. With servic e as the lifeblood that runs through its inky veins , the librar y has the potential to be the driving force in an y community toward partnerships that a fford mutual benefit for all. The examination of the literatur e exposes a wid e rang e of perceptions as to the d e finition of what is a con- sortium . The term "consortia" conjur es up impressions that span the spectrum from highly or ganized, member- ship-driv en groups to loosely knit cadres focusing on impro ving services to their patrons however they can make it happen. In Kopp 's pap er "Library Consortia and Patricia J. Cutright (cutright@eou .edu} is Library Director of the Pierce Library at Eastern Oregon University. 90 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 Information Technology : Th e Past, the Present, th e Promise" he presents information from a study con- duct ed by Ruth Patrick on academic library consortia. In that study she identified four general types of consortia : • Large consortia concerned primarily with comput- erized large-scale technical processing; • Small consortia conc erned with user services and everyday probl ems ; • Limited-purpose consortia cooperating with respect to limited special subject areas; • Limited-purpose con sorti a concerned primarily with interlibrary loan or reference; and network operations.I With this distinction in mind , this paper will focus on th e second category typifying a small , less structured organization. Whil e on a visiting assis tantship in the Federated States of Micronesia (FSM), I worked with a partnership of libraries that believe in order for cooperation to suc- ceed, results for the patron must be the goal-not equity between libraries or some magical balance between resources lent by one library and resources received from a noth er library.2 Unified effort s to provide service to the p a tron is the key. The libraries on a small, rem ote island situated in the western Pacific Ocean exhibit this grassroots effort that define s the true meaning of consortia-demonstrating col- laboration , cooperation , and partnerships. It is a multi type library cooperative that not only encompasses interaction among libraries but also betwe en agencies as well as gov- ernments. The librarians on the island of Pohnpei, Micron esia, and all the islands throughout the Federated States of Micronesia have embraced this consortia) attitud e whil e achieving much through these collaborative efforts : • The joint work done on crafting the Library Services Plan, 1999-2003 for the libraries throu gh- out the Federated States of Micronesia • Initiating successful grant-writing efforts which target national goals and priorities • Implementing a collaborative library automation project which is d esigned to evolve into a national union catalog • The implementation of a viable resource-sharing and document delivery service for the nation I Background and Socioeconomic Overview Micron esia, a name m eaning " tiny islands ," comprise s som e 2,200 volcanic and coral islands spread throughout Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2 million square miles of Pacific Ocean. Lying west of Hawaii, east of the Philippines, south of Japan and north of Australia, the total land mass of all these tropical islands is fewer than 1,200 square miles with a population base estimated at no more than 111,500.3 A location- unique region, but nonetheless still plagued with all the problems associated with any geographically remote, economically depressed area found anywhere in the United States or elsewhere in the world. The Federated States of Micronesia is a small-island, developing nation that is aligned with the United States through a Compact of Free Association, making it eligible for many U.S. federal programs. The economic base is cen- tered around fisheries and marine-related industries, tourism, agriculture, and small-scale manufacturing. The average per capita income in 1996 was $1,657 for the four states of the FSM: Kosrae, Pohnpei, Yap, and Chuuk. Thirteen major languages exist in the country, with English as the primary second language. The 607 different islands, atolls, and islets dot an immense expanse of ocean; this geographic condition presents challenges in implementing and enhancing library services and technology. 4 Despite the extreme geographic and economic condi- tions, the College of Micronesia-FSM National campus in collaboration with the librarians throughout the states have been successful in implementing nationwide proj- ects. These endeavors have resulted in technical infra- structure and the foundation for information technology instruction supported through awards from the U.S. Department of Education, the Title III program, and the National Science Foundation. I Collaboration: Building Bridges that Cross the Oceans The libraries in Micronesia have shown an ongoing com- mitment to librarianship and cooperation since the estab- lishment of the Pacific Islands Association of Libraries and Archives (PIALA) in 1991. The organization is a Micronesia-based regional association committed to fos- tering awareness and encouraging cooperation and resource sharing among libraries, archives, museums, and related institutions. PIALA was formed to address the needs of Pacific Islands librarians and archivists, with a special focus on Micronesia; it is responsible for the common-thread cohe- siveness shared by the librarians over the past eight years. The organization has grown to become an effective champion of the needs of libraries and librarians in the Pacific region.s When PIALA was established, the most pressing areas of concern within the region were development of resource-sharing tools and networks among the libraries, archives, museums, and related institutions of the Pacific Islands. The development of continuing education pro- grams and the promotion of technology and telecommu- nications applications throughout the region were areas targeted for attention. Those concerns have changed little since the group's inception. Building upon that original premise, in January 1999 a group of interested parties from throughout the Federated States of Micronesia met to draft a document they envisioned would lay the groundwork for library planning over the next five years. This strategic plan encompasses all library activity-services, staffing, and the impact technology will have on libraries in the region. The document, "The Federated States of Micronesia Library Services Plan, 1999-2003," coalesces the concepts, goals, and priorities put forward by a broad-based contin- gent. In this meeting, the group addressed basic issues of library and museum service, barriers and solutions to improve service delivery, and additional funding and train- ing resources for libraries and museums.6 The compilation of the plan crafted at the gathering demonstrated a thor- ough understanding of the issues that face the librarians of the vast region. It exhibits the ingenuity, creativity, and will- ingness to problem-solve on a grand scale in a way that addresses the needs of all libraries in the Pacific region. The goals set forward by the writing session group illustrate the concerns impacting library populations throughout the FSM. The FSM has now established six major goals to carry out its responsibilities and the need for overall improvement in and delivery of library services: 1. Establish or enhance electronic linkages between and among libraries, archives, and museums in the FSM. 2. Enhance basic services delivery and promote improvement of infrastructure and facilities. 3. Develop and deliver training programs for library staff and users of the libraries. 4. Promote public education and awareness of libraries as information systems and sources for lifelong learning. 5. Develop local and nationwide partnerships for the establishment and enhancement of libraries, muse- ums, and archives. 6. Improve quality of information access for all segments of the FSM population and extend access to informa- tion to underserved segments of the population. Priorities The following are general priorities for the FSM Library Services Plan. The priorities represent needs for overall improvement of the libraries, museums, and archives. The priorities are based on the fact that currently libraries, museums, and archives development is in its infancy in CONSORTIA BUILDING I CUTRIGHT 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the FSM. Specific priorities will change from year to year as programs are developed. 1. Establishment of new libraries and enhancement of existing library facilities to increas e accessibility of all FSM citizens to library resources and services. Outer islands and remote areas generally have no access to libraries or information sources. New facilities or mechanisms need to be established to provide access to information resources for the public. Existing pub- lic and school library facilities often lack adequate staffing, climate control, and electrical connections needed to meet the needs of the community. Existing public and school libraries also need to improve their facilities and services delivery to meet the needs of disabled individuals and other special populations. 2. Provide training and professional development for library operation and use of new information tech- nologies. A survey held during the writing session indicated that public and school library staff do not currently possess the skills needed to effectively pro- vide assistance in the use of new information tech- nologies. Well-designed training programs with mechanisms for follow-up technical assistance and support need to be developed and implemented. 3. Promote collaboration and cooperation among libraries, museums, and archives for sharing of hold- ings and technical ability. Limited holdings, financial capacity, and human resources are major barriers to improving library services. Collaboration and coop- eration are needed among libraries, museums, and archives to maximize scarce resources . 4. Develop recommended standards and guidelines for library services in the FSM. The ability to share resources and information could be significantly increased by development and implementation of recommended standards and guidelines for library services. Standardization could assist with sharing of holdings and holdings information, increase avail- ability of technical assistance, and provide guidance as new libraries and library services are set up. 5. Increase access to electronic information sources. Existing public and school libraries have limited or no access to electronic linkages including basic serv- ices such as e-mail and connections to the Internet. The priority need is to establish basic electronic link- ages for all libraries, followed by extending access to electronic information to all users.7 I Shifting into Action With the drafting of this five-year plan, the librarians stated emphatically the need and desire to move ahead 92 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 with haste and determination . As the plan was conceptu- alized and documented, a small cadre of librarians from the College of Micronesia -FSM National campus, the public library, and high school library crafted two suc- cessful grant proposals which addressed: • a cooperative library automation project which is designed to evolve into a national union catalog (goal 1; priorities 3, 5); • the installation of Intern et services that would link the College of Micronesia-FSM campuses, the pub- lic library, and high school library (goals 1, 2, 6; priorities 1, 2, 3, 5); • the development and delivery of training pro- grams for library staff and users of the libraries (goals 3, 4, 6; priority 2); and • the implementation of a viable resource-sharing and document delivery service for the nation (goal 1, 2, 5, 6; priorities 3, 4, 5). Over the past year the awarding of grant funds has shifted the library community into high gear with the design and implementation of project activities that will fulfill the targeted needs. The Automation Project and Internet Connectivity A collaborative request submitted by the Bailey Olter High School (BOHS) library and the Pohnpei Public Library pro- vided the funding necessary to computerize the manual card catalog system at BOHS and upgrade the dated auto- mated library system at Pohnpei Public Library. Since the College of Micronesia-FSM campuses are automated, it was important for the high school library and the public library to install like systems to achieve a networkable automated system, facilitating the develop- ment of a union catalog for all th e libraries' holdings. This migration to an automated system promoted cooperation and resource sharing for the island libraries-opening a wealth of information for all island residents. The project entailed purchasing a turnkey cataloging and circulation system that will facilitate the cataloging and processing of new acquisitions for each library as well as the conversion of approximately five thousand volumes of material already owned by the public and high school libraries. Through Internet connectivity, which was integral to the project, the system would also serve as public access to the many holdings of the libraries for students, faculty, and town patrons through a union catalog to be established in the future. The development and deliv ery of training programs for library staff and users is linked to the implementation of a viable resource-sharing and document delivery serv- ice for the nation. Stated earlier, the librarians of the Federated States of Micronesia accepted the challenge facing them in ramp- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ing up for the twenty-first century. Their prior experience laid the groundwork necessary to implement the training programs necessary to bring the library community the knowledge and skills needed. A survey administered during the writing session indicated that few public and school librarians have sig- nificant training in or use of electronic linkages or infor- mation technologies, nor are they actively using such technologies at present. Of the fourteen public and school librarians in the four states of Micronesia, none hold a master's degree from an accredited library school or Library Media Specialist certification. An exception is the library staff at the COM-FSM National campus, where two-thirds of the librarians hold professional credentials. Significant effort is needed on a sustained basis for effective training in the understanding and use of infor- mation systems throughout the nation. Where training has occurred, it has often been of an infrequent, short variety with little support for ensuring implementation at the work site. Additionally, often there are no formal systems for getting answers to questions when problems do arise. In addressing the information needs for this popula- tion it is apparent that education is the key component for continued improvement of library services. This con- cern is evident in a paper by Daniel Barron, where it is stated that only 54 percent of librarians and 19 percent of staff in libraries serving communities considered to be rural (i.e., 25,000 people or fewer) have an ALA-MLS. 8 And Dowlin proposes even more perplexing questions, "How can a staff with such an educational deficit be expected to accomplish all that will be demanded to enable their libraries to go beyond being a warehouse of popular reading materials? How can we expect them to change from pointers and retrievers to organizers and facilitators?" 9 Micronesia is no different than any other state or country in wanting its population to have access to qual- ified staff, current resources, and services. It recognizes the libraries are inadequately staffed and many others have staff who are seriously undereducated to meet the expanded information needs of the people in their com- munities. If these libraries are to seize the opportunities suggested by the developing positive view, develop serv- ices to support this view, and market such a view to a wider range of citizens in their communities they must invest in the intellectual capital of their staffs. In order to carry out this charge, the following activi- ties were designed to address the educational and train- ing needs of the librarians in the FSM. As outlined in a recently funded Institute of Museums and Library Services (IMLS) National Leadership grant, preparation has begun with the following activities, which will address the staffing and technology concerns described in FSM libraries: 1. Recruit and hire an outreach services librarian to survey training needs, coordinate and plan training, and deliver or arrange for needed training. 2. Develop a skills profile for all library, museum, and archival staff positions. 3. Identify training contact or coordinator for each state. 4. Develop and provide periodic updates to opera- tional manuals for school and public libraries, muse- ums, and archives. 5. Recruit local students and assist them in seeking out scholarships for professional training off island. 6. Design and implement programs to provide contin- uous training and on-site support in new technolog- ical developments and information systems (provided on-site and virtually). 7. Establish a Summer Training Institute offering train- ing based on needs as determined by the Outreach Services Librarian in collaboration with state coordi- nators and recruiting on- and off-island expertise as instructors. 8. Design and develop programs for orientation and training of users of information systems (provided on-site and virtually). 9. Develop and implement a "train the trainer" pro- gram, which will have representation from all four states, that will ensure continuity and sustainability of the project for the years to come. 10 The primary requisite to initiating this project is the recruitment and hiring of the outreach services librarian who will then begin the activities as listed. A beginning cadre of librarians gleaned from the summer institute will become the trainers of the future, perpetuating a learning environment enhanced with advanced technology. Breakthroughs in distance education, aided with advances in telecommunications, will significantly impact this project. On-site training will be imperative for the initial cadre of summer institute attendees to provide sound teaching skills and a firm understanding of the material at hand. Follow-up training will be presented on each island by the trainer either on location or virtually with available technology. Products such as Web Course in a Box, WebCT, or Nicenet will be analyzed for appro- priate utilization as teaching tools. These products will take advantage of newly established Internet connections on each island and, more importantly, will provide the interactive element that distinguishes this learning methodology from the "talking head" or traditional cor- respondence course approach. A Web site designed for this project will provide valuable information and con- nectivity for not only the Pacific library community but anyone worldwide who may be interested in innovative methods of serving remote populations. Using computer conferencing and virtual communi- ties technology, a video conferencing system such as 8 x 8 CONSORTIA BUILDING I CUTRIGHT 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Technologies will be used, which will allow face-to-face interaction with trainer and student in an intra-island sit- uation (interisland telephone rates are too expensive for regular use as a teaching tool). To enhance the learning experience and information retrieval component for these librarians and the popula- tion they serve, the project also incorporates implementa- tion of a viable resource-sharing, document delivery system capitalizing on a shared union catalog and using a service such as Research Library Group's Ariel product. With library budgets reflecting the critical economic cli- mate of the nation, it becomes even more crucial for col- laborative collection development and resource sharing to satisfy the needs of the library user. To maintain cost-effective communication and build a sense of community among the librarians, the messaging software ICQ has been installed on all participant hard- ware and utilized for group meetings, question and answer, and general correspondence. Since ICQ operates as part of the Internet, this package allows low-cost com- munication with maximum benefit in connecting the group. This technology will also be used as the primary mechanism for communication with an outside advisor who will provide expertise in the area of outreach serv- ices for rural populations. The realm of outreach services in libraries has always presented unique challenges that can now benefit greatly from current and emerging technologies. The definition of "outreach" is truly a matter of perspective, with the more traditional sense relating to a specific library servic- ing its own user or patron. But current practice regards "outreach" as a mere extension of services to all users whether they be a registered patron or colleague or peer. Micronesia is a country where the proverbial phrase "the haves and the have-nots" is amplified. The recent (and ongoing) installation of Internet services in the region has made possible many basic changes, but there still exists the reality that some of the sites for services proposed have nothing more than a common analog line and rudimentary services. As an example of the realities that exist, only 38 percent of the approximately 180 pub- lic schools in the FSM have access to reliable sources of electricity. Another challenge for these libraries is the cli- mate and environment, which has a significant impact on library facilities, equipment, and holdings. The FSM lies in the tropics, with temperatures ranging daily from 85 to 95 degrees with humidity normally 85 percent or higher.11 The high salt content in the ocean air wreaks havoc upon electrical equipment, and the favorable envi- rons inside a library often entice everything from termites in the wooden bookcases to nesting ants in keyboards. From these examples it is apparent that the problems that trouble these libraries are not going to be solved with the magic bullet of technology. This reality constitutes the 94 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 need for varying strategies and different aproaches to address the training requirements of the library staff. I Summary The FSM library group, in particular the Pohnpeian librari- ans, have accomplished much in the past year. The moti- vating factor for the flurry of activity that enveloped the libraries on Pohnpei was spurred by the collaborative writ- ing session in January 1999. A week-long "meeting of the minds" from libraries throughout Micronesia produced the blueprint that will map the future of libraries and library service for years to come. These librarians stated their pri- mary issues in delivering library services and came to a consensus on activities needed to address the issues. The "Federated States of Micronesia Library Services Plan, 1999-2003" was crafted as a working document, a strategic plan for improving library services in the Pacific region, and a commitment to achievement through collaboration. While in Micronesia I observed the impact that the unification of ideas can have on the citizens of a commu- nity. In my fourteen-year tenure at Eastern Oregon University I have been exposed to the benefits of "con- sortium attitude" that come from cooperation and part- nerships. Time and again the university demonstrates the positive effects of what is referred to as "politics of entan- glement." Shepard describes the overriding philosophy that has been the recipe for success: The politics are really quite simple. We maintain an intricate pattern of relationships, any one of which might seem inconsequential. Yet there is strength in the whole that is largely unaffected if a single relation- ship wanes. Rather than mindlessly guarding turf, we seek to involve larger outside entities and in the ensnaring, to turn potential competitors into helpful partners .12 Just as Eastern Oregon University has discovered, the libraries of the Federated States of Micronesia are learn- ing the merits of entanglement. References and Notes 1. James J. Kopp, "Library Consortia and Information Technology: The Past, the Present, the Promise," Information Technology and Libraries 17 (Mar. 1998): 7-12. 2. Jan Ison, "Rural Public Libraries in Multi-type Library Cooperatives," Library Trends 44 (Summer 1995): 29-52. 3. Pacific Islands Association of Libraries and Archives, www.uog.edu/rfk/piala.html, accessed June 6, 2000. 4. Division of Education, Department of Health, Education and Social Affairs, Federated States of Micronesia, "Federated Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. States of Micronesia, Library Services Plan 1999-2003" (March 3, 1999): 2. 5. Pacific Islands Association of Libraries and Archives, www.uog.edu/rfk/piala.html, accessed June 6, 2000. 6. Division of Education and others, "Library Services Plan," 4. 7. Ibid, 6. 8. Daniel D. Barron, "Staffing Rural Pubic Libraries: The Need to Invest in Intellectual Capital," Library Trends 44 (Summer 1995): 77-88. The MIT From Gutenberg to the Global Information Infrastructure Access to Information in the Networked World Christine L. Borgman Considers digital libraries from a social rather than a technical perspective. Digital Libraries and Electronic Publishing series 340 pp. $42 now in paperback Remediation Understanding New Media Jay David Bolter and Richard Grusin " Clearly written and not overly technical, this book will interest general readers, students, and scholars engaged with current trends in technology." - Choice 307 pp., 102 illus. $17.95 paper 9. K. E. Dowlin, "The Neographic Library: A 30-Year Per- spective on Public Libraries," in Libraries and the Future: Essays Oil the Library ill the Twenty-First Century, F. W. Lancaster, ed. (New York: Haworth Pr., 1993). 10. Patricia J. Cutright and Jean Thoulag, College of Micronesia-FSM National campus, "Institute of Museums and Library Services, National Leadership Grant" (Mar. 19, 1999). 11. Division of Education and others, "Library Services Plan," 2. 12. W. Bruce Shepard, "Spinning Interin;titutional Webs," AAHE Bulletin 49 (Feb. 1997): 3-6. The Intellectual Foundation of Information Organization Elaine Svenonius "Provides sound guidance to future developers of search engines and retrieval systems. The work is original, building on the foundations of information science and librarianship of the past 150 years." - Dr. Barbara 8. Tillett, Director. ILS Program, Library of Congress Digital Libraries and Electronic Publishing series 264 pp. $37 now in paperback Information Ecologies Using Technology with Heart Bonnie A. Nardi and Vicki L. O'Day "A new and refreshing perspective on our technologically dependent society." - Daily Telegraph 246 pp. $15.95 paper To order call 800-356-0343 (US & Canada) or 617-625-8569. Prices subject to change without notice. http:/ /mitpress.mit.edu CONSORTIA BUILDING I CUTRIGHT 95 10084 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. New strategies in library services organization: Consortia University Libraries in Spain Miguel Duarte Barrionuevo Information Technology and Libraries; Jun 2000; 19, 2; ProQuest pg. 96 New Strategies in Library Services Organization: Consortia University Libraries in Spain Miguel Duarte Barrionuevo New political, economic, and technological developments, as well as the growth of information markets, in Spain have created a foundation for the creation of library consortia. The author describes the process by which different regions in Spain have organized university library consortia. S panish libraries are public entities that depend either on central or local governments and are funded through either the national general budget or the regional government (Comunidades Aut6nomas) budget. On one hand, the player at the national level is the Education and Culture Ministry, which contributes to the fifty-two state public libraries and shares jurisdiction with the regional government. On the other hand, universities are self-governed institutions of a public nature regulated by the Ley de Reforma Universitaria, or University Reform Law, which was approved by the Spanish parlia- ment in 1983 to promote scientific study and greater self- government of Spanish universities. Universities have their own budget, and they are mainly funded by the regional government. The university library system is currently made of about fifty public libraries and twelve private libraries. Since the second half of the 1980s, a new philosophy concerning public services has spread in Spain, as in other European countries: a philosophy calling for higher quality and more efficiency in the management and administration of the public capital. There has also arisen a claim to the government's satisfactory use of public funds as a social right, as well as a claim to a return on that capital in social terms. This is where libraries' public services come into play. There is a clear interest in all the aspects related to the introduction of new techniques in management. Quality management, effectiveness and efficiency measuring, costs control, services assessment, and users content or analysis from the stakeholders' point of view are con- cepts that emerge in university libraries. In order to adjust to the circumstances, universities are changing their management procedures, and university libraries have been forced into managing their "business" accord- ing to managerial criteria. The commonality of their activities, and the relaxation of geographical boundaries fostered by information tech- nologies, have encouraged libraries to join consortia in order to remain relevant in the current library services context. Such concepts as the "electronic," "digital," and Miguel Duarte Barrionuevo is head Director of the Central Library of the University of Cadiz (Andalucia), and an active contributor of the University Libraries Consortium of Andaluc1a. 96 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 "virtual" libraries lead, from my point of view, to a differ- ent configuration in the library services context; they have pushed the library managers to consider strategically where they are and what is their most adequate position within this new configuration. Departments dealing with information are to be wider, more heterogeneous, and multidisciplinary. New organization strategies need to be defined in order to offer services in a different way When library managers are forced to obtain the best results out of their limited resources, the organization of consortia represents a qualitative leap forward in cooper- ation, efficiency, and cost-savings. Library consortia aim to share resources and to promote participation on the basis of the mutual benefit of the libraries involved and, although the concepts of cooperation, coordination, and sharing resources are not new in the library world, the organization of library consortia introduces a major level of commitment and involvement among the participants. I New Settings, New Facts Libraries are going through a crisis. A library is still an institution with a strong traditional character, but its tra- ditional duties as depository of knowledge no longer jus- tify its costs, and the crisis is exacerbated by an accelerated technological and informative revolution. 1 Within the changing atmosphere of the Spanish univer- sity in the last few years, goals and objectives are affected by a number of socioeconomic, institutional, and techno- logical factors, as well as others with an internal character that push these institutions to move toward change as an opportunity to maintain continuous improvement. Materials and services are more expensive, and technology is more sophisticated every day, which leads to a need for strong investments. The public financing funds are more and more limited while the costs are growing. The univer- sity, in general, is suffering from a lack of efficiency and organizational flexibility; staff rejects monotonous tasks and holds high expectations; the fast dynamics of the implementation of information technology in the last few years has caused a very serious imbalance in the skill lev- els of people and in job-position demands. All these factors generate a new setting of weaknesses and hopes to which the university libraries have to respond in order to maintain their competitive advantages. I Technology Technology has recently become a strategic element in the development of libraries. Technology is more and more sophisticated and its life is shorter. Its use implies Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the need of strong investments in computer and commu- nication infrastructure. I Economical Pressure on Information Market Agents Materials costs have diversified and arc more and more costly, with annual growths far exceeding even inflation rate levels. An absolute change has been produced in the supply and demand of the information market, which causes the agent's utter disorientation: the publishing sector is adapting very slowly to the electronic context; the distri- bution sector needs a deep technological and organizational transformation (few Spanish suppliers offer added value services such as cataloguing, outsourcing, or material preparation-Puvill Libras, or filial multinationals such as Blackwell or Dawson are exceptions). Electronic Data Inter- change, a European standard like SISAC, is not a standard format among the sector and there is not a national supplier that offers services of the Approval Plans type. Additionally, the agents of the information market are very conditioned by the change of the demand orientation. Specialized users (teachers, researchers, thesis students, etc.) demand from libraries electronic resources, quick information, and access at all times from remote locations. This conflicts with the restrictive tendencies in the maintenance of the public services and drastic budget cuts. Libraries are forced to obtain the highest possible ratio of efficiency in the use of the fewest resources. I Total Quality Management Implementation and Other Management Techniques The result is implementation of Total Quality Management (TQM), which guarantees quality of services. It is important to consider TQM as an instrument that develops organiza- tional strategies. It is a continuous process developed in order to replace obsolete types of organization, to orient the corporate activity as a permanent basis to the processing optimization, and to obtain a coherent relation between the efficacy in the reaching of objectives and the efficiency in the use of resources. Changes in the editorial industry, the budget cuts, the quick expansion of electronic resources, the new price pol- itics, and the problems related to copyright and intellec- tual property form the new setting. In this context, the consortia organization is considered by the university and library managers as a means to face the challenges which the new settings imply, to unify their pressure capacity with regard to the different agents, and to take advantage of the system's strength in order to adjust to the new situ- ation and improve their competitive advantage. I Adequate Information Technologies The Spanish university libraries are connected to the aca- demic information network upheld by Rediris, a scien- tific-technical installation that depends on the Science and Technology Office of the Prime Minister. The main line that maintains the Redlris services is formed by seventeen nodes in each region (Comunidad Aut6noma), connected by ATM circuits on ATM accesses of 34/155 Mbps. Each node is formed by a set of commu- nication equipment that allows the coordination of the main transmission means and of the access lines from the centers of each regions. Redlris participates in the TEN-34 Project, which aims at building up an IP Paneuropean net of 34 Mbps, that interconnects us with the different academic and research nets and that is planned to become a TEN-135 in 1999.2 On the other hand, the region (Comunidad Aut6noma) incorporates added value elements to the Net segments they manage, such as faster access speeds that allow cen- tralized architecture (for instance, Union Catalogue Consortia Libraries of Galicia is managed through a broad band net of 155 Mbps). The region also allows access to data- bases in CD-ROM and electronic formats orientated to the final users in a regional context. For instance, the Scientific Computer Center of Andalucia manages twenty-two data- bases in CD-ROM and other electronic formats that can be searched by all the Andalusian universities and research centers through the Andalusian Scientific Research Net. Homogeneous Automation Level The automation process of the library services, initiated at the end of the decade of the '80s, is practically completed. Dobys-Libis, Libertas, vrLS, Absys, and Sabini are the most widely used library management systems. 3 Since 1997 some libraries have updated their library automation sys- tem to Unicom (Sirsi) and Innopac (Innovative Interfaces). The Spanish university libraries have a homogeneous automation level and can establish projects from the con- sortia perspective, such as regional union catalogs, shar- ing electronic information resources, and shared purchase policies. Favourable Political Situation Traditionally, the cooperative efforts have obtained little offical support. However, in the last years, a positive atti- tude can be perceived from the academic authorities in NEW STRAGETIGES IN LIBRARY SERVICES ORGANIZATION I BARRIONUEVO 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. relation to cooperation activities and the cooperative projects development, both as an answer to the need to reduce costs by sharing resources and as a means to face the growing and unstoppable demand from the users. The initiatives for the consortia organization are sup- ported by highest academic level institutional agreements among the universities: principals and vice-principals of research (such is the case of the Consortia of Andalucia and Madrid) or they are the result of initiatives taken by the autonomous government (Galicia Consortium) or a con- fluence of interests between the autonomous government and the universities (Catalufla Consortium). Remote Access to End Users' Information Resources Following the automation projects and the network tech- nologies and data transmission development, most uni- versity libraries have made projects for all information resources integration and maintain a wide group of serv- ices: campuswide networks, catalogs, databases in CD- ROM (e .g., Indice Espanol de Ciencias Sociales y Humanidades, Indice Espanol de Ciencia y tecnologia, Aranzadi Legislaci6n y Jurisprudencia, Medline, ABI Inform, Academic Search) , e-mail , and remote access via Internet. Access to dll resources is available through the libraries management system Opac Web. There is access to any of these resources from any point connected to the network, whether from terminal servers, workstations, PCs, Unix stations, or MACs. I Cooperation in Spain Up to the middle of the '80s, university libraries were sep- arate realities with scattered funds and disorganized serv- ices; they were not structured as a system and they were lacking any tradition or mentality of cooperation. In a 1994 poll, only 40 percent of university library directors declared that cooperation among libraries was important. 4 We could say that the cooperation initiatives depend on the will of the people who obtain little support from the government. Therefore, two different stages could be set: one in which cooperation is the result of personal actions, taken with no institutional support, in which local projects are undertaken ; or one in which individual initia- tives are taken by the people in charge of libraries and a certain concern from the central government converge. Will to Share Resources Spain did not join the movement toward library automa- tion until the '80s . At this time, the cooperative tenden- 98 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 cies now associated with information and communi- cation technologies were only slightly realized in the libraries. Eventually, however, a consolidation of efforts took place, helping to bring about, at the end of the '80s and beginning of the '90s, some important cooperative initiatives out of which some specialized union catalogs could be brought. Some of the first cooperative initiatives arose from the Association of Specialized Libraries. 5 Among these we can point out the Coordinating Committee of Biomedical Documentation, whose mission was to promote the coop- eration and rationalization of document resources in the field of biomedicine. This committee holds conferences and maintains a union catalog of the daily publications on health services accessible through Internet. 6 Documat, created in 1988, groups together the libraries specializing in mathematics and maintains a union catalog of journals on which basis are organized plans of shared acquisition . MECANO groups together the libraries of the schools of engineering and maintains a union catalog accessible through Internet? Early cooperative initiatives were also promoted by the Library Automation Systems Users Groups. Red Universitaria Espanola de Dobis / Libis began in 1990 when twelve universities using the system decide to cre- ate an online union catalog maintained by the University of Oviedo. The Libertas Spanish Users Group maintains its union catalog associated with SLS Database, accessible online from Bristol. RUECA is the union catalog of Absys users .8 Need to Cooperate In the early '80s a forum started in universities that attempted to influence the writing of the University Statutes (as a result of the Ley de Reforma Universitaria) and establish a general criterion for regulations. As a result of this debate, two documents have been published and have proved to be essential for subsequent cooperative development. 9 Some reports from confer- enc es on university libraries h eld in 1989 in the University Complutense of Madrid had a wide influence at the national level, and the same year, FUNDESCO pro- duced a report about the state-of-the-art in automation in the Spanish university libraries .10 The situation that is repeated in these reports about th e libraries is extremely pessimistic. Their evolution from 1985 to 1995 has been perfectly described by M. Taladriz and L. Anglada as "the lack of recognition of the role of university libraries ... the dispersion of biblio- graphical funds ... the general disorganization of the library services .... " 11 In 1988, Red de Bibliotecas Universitarias (REBIUN, University Libraries Network) was created. Although ini- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. tially only nine university libraries were involved , the number grew to seventeen during the following years. The cooperative activiti es were centralized, and th ey obtained remarkable results in training, the improvement of library interlending, and in the publishing on CD-ROM of bibliographical records from participant libraries. At the same time, and thanks to the celebration of the IFLA Congress in Barcelona in 1993, the general need to create a wider discussion forum including all the univ er- sity libraries and to obtain bett er cooperation and coordi- nation was established. This idea crystallized with the creation of the Conferencia de Directores de Bibliotecas Universitarias y Cientfficas (COBIDUCE, th e Conference of University and Scientific Libraries Directors). The first working mee ting was held in November 1993.12 This led to th e merging of REBIUN with COBIDUCE in order to concen- trate all the cooperation efforts into a single institution. A single institution, which kept th e name of REBIUN, was created in 1996. In 1998, REBIUN became the local com- mitte e of the Conferencia de Rectores de las Universidades Espanolas (CRUE, Conference of Spanish University Principals). REBIUN has become the organi- zation that oversees all the cooperation and coordination efforts in Spanish academic librari es . REBIUN activities include a union catalog published on CD-ROM, "Regulation s for University and Scientific Librari es," agreements on int erli brary loans, and activi- ties in different working groups .13 I University Libraries' Consortia In the past few years the tran sfer of powers to the autonomous regions on ed u ca tion and culture, a conse- quence of a constitutional order, has brought about another political and administrative context for the achievement of the libraries ' objectives. Th e autonomous regions are now working on the design of regional developm en t plans or regional infor- mation systems that are related, unfailingly, to the coop- erative activity of the libraries of the territor y. Thi s initiati ve can be applied to university librarie s as well as any other type of library , which, through their institutions, request their autonomous governments' assistance or funding in order to achieve cooperative projects. Or it could be done the other way round: a gov- ernm ent can outline an action plan for its libraries and suggest it to the potential participants. Thus, the basis for consortia development was set in the second half of the '90s, and encouraged by events like the celebrated conference in Ca diz , organized by the University of Carlos III de Madrid and the University of Cadiz libraries , and Ebsco Information Services (Spanish branch) in 1998. Catalonia Consortium of University Libraries (Consorcio de Bibliotecas Universitarias de Catalufia) We could sum up the situation in Catalonia according to the following: the existenc e of new automated libraries, few automated records, the us e of their own automation systems, and the existence of only three universitie s. We can es tablish some cooperation background developed at this time : CRUC, CAPS , and the joint selection of an automation system realiz ed by Universidad Aut6noma de Barcelona and Universidad Politecnica de Cataluna. It is not until the '90s that positive factors combined to move the cooperative movement a step forward in Catalonia. These positive factors were a homogeneous s ta te of automation among university libraries, a good communications network, and the use of standards for library data recording . The previous cooperative move- ments and an analysis of the worldwide evolution of libraries helped in the building of a united view in which coop era tion appeared as an additional instrument for the improvement of the library world. The university library directors of Catalonia consid- ered cooperation a way to accelerate the evolution of libraries, to create new services, to facilitate changes, and to save expenses. With this conviction, they wrote a pro- proposal for the creat io n of a library network in Catalonia , which in 1993 resu lted in the interconnection of the university librarie s in Catalonia, followed in 1995 with the first steps toward the cre ation of the United Catalog of the Univer sities of Catalonia. This catalog was fully operative in early 1996. At the end of 1996 th e Univ ersity Library Consortium of Catalonia (CBUC) was created with the task of improv- ing library services through cooperation. 14 Its objectiv es are: • To create new workin g too ls • To improve services • To build a digital librar y • To take better advantages of resources • To face together the changing role in libraries The CBUC comprises the University of Barcelona , Universidad Autonoma de Barcelona, the Politechnical University of Catalonia , Pompeu Fabra University , th e Univ ersity of Girona, the University of Lleida, Rovira i Virg ili University, the University Oberta of Catalonia , and the Library of Catalonia. The direction of CBUC is determined by a board of representatives from each of th e institutions, an executive committee of six members, and a technical committee of librar y dir ectors. A staff of seven NEW STRAGETIGES IN LIBRARY SERVICES ORGANIZATION I BARRIONUEVO 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. runs the CBUC office, and different working groups audit active plans and study possible issues of concern. University Libraries Consortium of the Madrid Region (Comunidad Autonoma de Madrid) The Public University Libraries, based in the Madrid region (Universidad de Alcala, Universidad Carlos III, Universidad Complutense, Universidad Politecnica, Universidad Rey Juan Carlos, and Uiversidad Nacional de Educaci6n a Distancia), are developing many cooper- ation programs with the following objectives: • To facilitate access to information resources • To improve the existing library services • To test and promote the use of information and com- munication technologies • To reduce costs by sharing resources 15 Two programs have already been initiated: Interlibrary loan. An agreement to obtain a faster deliv- ery system for books and journal articles has been estab- lished. Using the services of a private courier company, maximum delivery time from one university to another will be set to forty-eight hours. This service started work- ing on the first of Sepember. Training. Different courses for the joint training of library staff are being organized on a cooperative basis. In the future, other programs will be developed, including a union catalog (with the creation of a collec- tive data basis that will also save cataloging costs by shar- ing bibliographical resources); and an elecronic library, which will allow common access to electronic resources. Galician Libraries Consortium The Galician Libraries Consortium is the result of a regional government intiative. 16 In November 1996 the Xunta de Galicia signed an agreement of scientific and technological collaboration with Fijitsu ICL Spain in which the company agreed to develop the telecommuni- cations infrastructure of the community: the Galician Information Highway (AGI: Autopista Gallega de la Informaci6n). Inaugurated in 1997, AGI serves as the basis for projects with great political and social appeal. Three projects were embarked upon : • tele-teaching, • tele-medicine, and • access to libraries Users have access to a loan service by which a loan may be requested from any library in the consortium. The loan works as it would work in a local climate, with the same limitations, controls, and blocking of any other local loan system . The request to the system is sent online and is fulfilled within twenty-four to forty-eight hours. The consortium originally was to encompass all types of libraries, but as the project advanced, it was decided to restrict the collaboration to university libraries. This allowed the project to move forward with greater speed, because the member libraries had more narrowly defined interests and concerns. The Xunta de Galicia prepared the "Protocol of Intentions," which has been signed by the highest represen- tatives of the three Gallician universities (Universidad de Santiago, Universidad de la Corufta, and Universidad de Vigo). This protocol is characterized by two essential ideas: 1. Allow adequate time for planning individual incor- poration into the consortium, so that each institution may participate at the rate it deems appropriate. 2. Create a permanent working commission formed by representatives of the institutions involved, which will: • answer existing and future questions; • define the model of consortium that each organi- zation desires to establish through specific objec- tives; and • promote adequate measurement in order to obtain the objectives that have been designed . Andalucian University Libraries Consortium In the era of the Internet, electronic documents, and the virtual library, maintaining independent libraries is out of order . In addition, the efforts needed to face the chal- lenges of the information society and the changes that society is demanding of universities are destined to become weaknesses more than strengths in those institu- tions that face them individually. There are many reasons why it is advisable for libraries to approach these challenges collaboratively: • The productivity and competitiveness that society demands of the universities • The huge technological opportunities to share infor- mation • The importance of the changes that are taking place in the products and services that the information market offers • The high cost of the new products (e.g., e-joumals) • The need of very specialized knowledge in order to activate some of these services • The growing demands of library users The Andalucian University Libraries concluded that if they wished to stay current with information technolo- gies, if they wished to continue implementing improved services, and if they wished to do so within their budg- 100 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ets, solid cooperation mechanisms would have to be established. In March 1998 the Andalucian vice-principals of research requested the directors of the Andalucian univer- sity libraries to analyze possible cooperative activities among the university libraries of the community. Two goals were set in this meeting: • The analysis of library automation products cur- rently on the market. • The analysis of the current individual management systems within the Andalucian libraries (which, though automation varied within them, were each considered to be outdated) and the potential for shar- ing resources with the present systems, which is dif- ficult because currently available systems may not be compatible with Z39.50. The object of this analysis is to define essential requirements so the new systems to be implemented facilitate possible cooperative actions. This possible inte- gration will not be simple: the University Pablo de Olavide, recently created, is planning to purchase its own system; the universities of Seville, Granada, and Cordoba are using Dobis-Libis; and the universities of Cadiz and Malaga are using Libertas and are preparing to update to Innopac. The Andalucian university libraries have studied some of the systems that the Spanish market offers: Abys (Baratz, Document Systems), Amicus (Elias), Innopac (SLS), Sabini (Sabini Library Automation) and Unicorn (Sirsi). They are preparing a catalog of electronic infor- mation resources available in the Andalucian university libraries to know which resources are available and pre- ferred by different universities. The Andalucian University Libraries Consortium is in an early stage; while its organizational structure and functions are defined, its tasks are still being elaborated. The Delegate Commission of the Vice-principals of Research of the Andalucian Universities is responsible for this work. The commission is presided over by the vice- principal of the University of Seville and formed by the directors of the Andalucian libraries and the juridical consultant of the University of Cordoba. The commision will produce a working paper that outlines the main facets of the organization, based on the following general principles: • To add value to the computer net of research • To favor the use of technologies that contribute to the improvement of the production times and the designing of efficient processes • To apply scale economies: • in the purchase of products and services • in repetitive tasks and activities • To favor the use of information resources among the members of the Andalusian universities and the society in general In order for the project to succeed, the following con- ditions must exist: • A homogeneous situation among the libraries in terms of regulations and technical instruments used in the description of materials, data format, and information interchange format; • The Andalucian universities are connected with high speed optic fiber lines (32 MB); • The administrative framework is clearly defined; and • The responsible members of the Andalusian univer- sity libraries are convinced that cooperation will improve substantially the quality of the library serv- ices in each university. Additionally, the following advantages must result: • Decline or leveling of production expenses • Economies of scale in the purchase of products such as computer systems, databases, and journal and electronic information subscriptions • Shared technical support • Shared training costs • Shared information resources through interlibrary loan I Conclusions The ultimate goal of cooperation is to join users and the documents and information they need; establishing rela- tions among participant institutions is a means to that end. Consortia represent the possibility to test alterna- tives to the traditional automated library. They represent the potential to offer the best library services to a wider number of users with all the resources they possess. Further than simple cooperation that unites efforts and resources, consortia represent the possibility to test innovative formulas of processes management and serv- ices organization from a regional perspective. References 1. Miguel Duarte, "Evaluaci6n del rendimiento aplicando sistemas de gesti6n de calidad, La Experiencia de la biblioteca de la Universidad de Cadiz" [Performance Assesment Implementing Total Quality Management Systems. The University Library of Cadiz Experience], in XV Jornadas_ de Gerencia Universitaria: Mode/as de financiaci6n, evaluaci6n y me1ora de la calidad de la gesti6n de las servicios [15th University Managers Meeting: Financing Models, Assesment and Quality Assurance NEW STRAGETIGES IN LIBRARY SERVICES ORGANIZATION I BARRIONUEVO 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of Services] (Cadiz, University Pr., 1997), 309-10; Marta Torres, "El impacto de las autopistas de la informaci6n para la comu- nidad academica y los bibliotecarios" [The Information Highway to Academic Community and Librarians], in Autopistas de la informaci6n: el reto de/ Siglo XXI (Madrid: Editorial Complutense, 1996), 37-55. 2. Victor Castelo en la Mesa Redonda: Suen.an los informati- cos con bibliotecas electr6nicas. En Seminario sobre Consorcios de Bibliotecas [Dream the computerman with electronic libraries?] Table Ronde in Libraries Consortia Conference, Cadiz, University Press, 1999, 130; see also www.rediris.es, accessed Apr. 24, 2000. 3, M. Jimenez and Alice Keefer, "Library Automation in Spain," Program 26, no. 3 (1992): 225-37; Assumpcio Stivill, "Automation of University Libraries in Spain," Telephasa Seminar on Innovative Information Services and Information Handling (Tilburg, June 10-12, 1991); Rebiuns Statistical Annual offers data about catalog automation. 4. Luis Anglada and Margarita Taladriz, "Pasado, presente y futuro de las bibliotecas universitarias espaii.olas" [Past, Present and Future of Spanish University Libraries] in IX Jornadas de Bibliotecas de Andalucfa (Granada: Asociaci6n Andaluza de Bibliotecarios, 1996), 108-31. 5. L. Anglada, "Cooperaci6 bibliotecaria a Espanya [Library Cooperation in Spain]," Item 95, no. 16: 51--67. 6. See www.doc6.es/cdb, accessed Apr. 24, 2000. 7. See http:/ /biblioteca.upv.es/bib/mecano, accessed Apr. 24, 2000. 8. See www.uned,es/bibliote/biblio/ruedo.htm and www. baratz.cs/RUECA, accessed Apr. 24, 2000. 9. "The Library in the University: Report on the University Libraries in Spain, Produced by a Working Team Formed by University Librarians and Teachers" (Madrid: Ministry of General Culture of the Book and Libraries, 1985); "University Libraries: Recommendations about its Regulations, Conference's on University Libraries, 'Castillo Magalia,' Las Navas de! Marques," Avila, May 27-28, 1986 (Madrid: Library Coordination Centre, 1987). 10. Situaci6n de las bibliotecas universitarias dependientes del MEC [Academic Libraries from Education Department State of Art] (Madrid: Universidad Complutense, Biblioteca, 1988); Estudio sob re normalizaci6n e informatizaci6n de las bibliotecas cientificas espaii.olas.-Fundesco, 1989 (no publicado). 11. Luis Anglada and Margarita Taladriz, 108. 12. See Consorcios de Bibliotecas [Consortia Libraries Conference], Maribel Gomez Campillejo, ed. (Cadiz: Cadiz Univ. Pr., 1999). 13. See www2.uji.es/rebiun, accessed Apr. 24, 2000. 14. For more information about CBUC, see www.cbuc.es, accessed Apr. 24, 2000. 15. Marta Torres, Los Consorcios, forma de organizaci6n bib- liotecaria en el S.XXI. Una aproximaci6n desde la perspectiva espaii.ola. In Consorcios de bibliotecas (Library Consortia Conference), 17-35. 16. Santiago Raya, "El Consorcio de Bibliotecas de Galicia [Galician Library Consortium]," in Consorcios de Bibliotecas [Library Consortia Conference], cit, 117-25. 102 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 10085 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In the Beginning...Was the Command Line Zillner, Tom Information Technology and Libraries; Jun 2000; 19, 2; ProQuest pg. 103 Book Reviews In the Beginning ... Was the Command Line by Neal Stephenson. New York: Avon Books, Inc., 1999. 151p. $10 (ISBN 0-380- 81593-1) Neal Stephenson is best known for his cyberfiction, including Snow Crash and most recently Cryptonomicon. In the Beginning . . . Was tlze Command Line is a quite different kettle of fish. Command Line is a short book with a succinct message: the command line is a good thing, because the full power of the computer is only avail- able to those who can access the com- mand line and type in the magic commands that make things happen. Stephenson learned this lesson the hard way, after first spending much time as a Macintosh-devoted GUI- head. The revelation came when he lost a document he was editing on his PowerBook, completely and without a trace, forever irretrievable. Actually, I say the book has a succinct message, but it has many messages and many metaphors, all artfully constructed by a master of prose. Stephenson constructs his argu- ments along multiple lines, provid- ing a discursive tour through Windows, Macintosh, and UNIX his- tory, offering personal history as well as his own take on the economics of the software industry. For example, he believes that Microsoft would be better off as an applications company rather than carrying the millstone of a family of operating systems. As for Apple, he suggests that they have been doing their best to destroy themselves for years, so far unsuc- cessfully (but give them time). The real meat of the book is whether, in fact, it is better to offer to people the flash of metaphor with the recognition that power and certain levels of choice are lost, as with graphical user interfaces exemplified by Windows and the Macintosh, or whether it is better to have at least some access to the command line interface, which MS/DOS offered and members of the UNIX family (e.g., Linux) afford. This is, in fact, both a silly and important question at the same time. Silly because many people would wonder why anyone would want command line access to any software. Silly because others might wonder why you couldn't have both. Important, or at least apparently important, because we seem to have become, without much warning, a world wrapped in GUis of one sort or another. Important in the library automation world, because end-user tools are moving increasingly toward GUI-based or Web-based interfaces without text- based alternatives (except, perhaps, Lynx or similar Web browsers, which have their own problems). For much of the book, Stephenson dances around the question, among others, of why not both GUI and text-based interfaces, and finally finds the answer in the Be operating system. My question is, why not as many interfaces as it takes, of whatever sort? To repeat the trite saw, there are two kinds of people in the world, those who divide the world into two kinds of people and those who don't. Stephenson has a lot of fun trying to make the division in this case, then ultimately comes out from behind the posturing and admits that he believes in the availability of both worlds. There are many people who do, indeed, want hard things hidden from them, at least some of the time. When I am dealing with an auto- mated teller machine, I don't want to have to use mechanical levers or pedals as I might have needed were ATMs invented in an earlier age, nor do I want to type in commands, although I am comfortable using a command line environment in my workplace. I just want to be prompted through a minimal num- ber of steps to walk away with some cash from my checking account. The world is a complicated and challeng- ing place to navigate. Some people Tom Zillner, Editor would like to be helped by other people in this navigation, although many have found that they would far rather deal with the dumbed- down interface of an ATM machine than to interact with not-so-friendly, underpaid bank tellers. Similarly, many people want to accomplish a particular task requir- ing the use of a computer and don't mind having the details hidden from them, no matter how much power knowing the details would provide. Or, they want to do that at least some of the time. As an example in the library world, let's consider a nai:ve patron who enters the library desir- ing to perform a known-item search. Such a user might be quite comfort- able with an interface with a single type-in box and a set of clickable but- tons labeled Title, Author and Subject. Or maybe just a single but- ton "click to start search." Although nai:ve users may consult library staff, who are most often more friendly than bank tellers, many people want to find their own materials. At the same time, more sophisticated users want more sophisticated capabilities and interfaces from the same cata- logs. Although vendors have gotten better at providing a couple of levels of complexity and corresponding user interfaces, why not go further? There aren't just two kinds of people. There are lots of kinds of people, with lots of kinds of informa- tion needs, representing lots of expe- rience levels. Why the restrictions at the user interface? In the history of microcomputing, Stephenson points to the evolution of two major play- ers, Microsoft and Apple, with Linux coming on strong and Be represent- ing an interesting offshoot. I think the important insight implicit in what Stephenson discusses is that much of the appearance and behav- ior of Windows and the Macintosh desktop are historically based arti- facts. In order to maintain backward compatibility with existing applica- tions, the Windows and Macintosh BOOK REVIEWS 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. operating systems have picked up a great deal of "cruft," computer code that allows multitasking and other improvements cobbled on to the fragile inner shell of ancient code required for compatibility with older applications. At the same time, Stephenson invokes the familiar refrain that the user interfaces of both platforms are tied to a tired set of metaphors that attempt to mimic the real-world office (e.g., desktop, folder) but do not do so with any kind of useful fidelity. In the library world, I think a similar kind of line- age might be traced from command line interfaces to the current Windows- and Web-based front-ends. Although many libraries and librari- ans have faced painful conversion processes over the years in moving through generations of automated systems, it might be interesting to see if there are still traces of underlying code that owe their existence to back- ward compatibility. Where does Stephenson turn in the face of the inelegance of the Windows and Macintosh worlds? He finds solace in the power and integrity of Linux. It may take a long time to successfully install the oper- ating system and get it to function with all of the hardware components of a particular computer configura- tion, but it has all that power, and all of those cool applications carefully constructed by people who care. Bugs are fixed quickly. It's a commu- nity effort. That's all very appealing, particularly when compared to the appalling response (or lack of it) to Windows or Macintosh bugs. The problem is that so far most of us aren't equipped to deal with the steep curve required to install Linux on personal computers, and the cor- porate or library environment usu- ally isn't politically prepared for Linux to be adopted as an institu- tionwide standard. So, while Linux boxes are frequent choices for servers, they are not widespread personal PC choices. Nor r.hould they be until easy installation tools are available. Again, Stephenson is ambivalent. On the one hand, he recognizes that there are many people who don't want the kind of power offered by being so close to the machine if it means becoming experts in arcane commands and codes. Even though he wants the power and simplicity, and decries the limitations imposed by the GUI, he recognizes that Linux is not for everyone. He's right. Most people use computers to get some work done (or to play). To the extent that the software gets in the way, it isn't operating properly. By that cri- terion, none of the three environ- ments described are particularly useful in a desktop world. In spite of the fact that the old metaphors have been rightly criti- cized for years for their tiredness, there doesn't seem to be much move- ment beyond them, except in limited research operating environments and applications. Similarly, it seems, in the library and information world, at least in most people's routine interactions with OPACs and data- bases. Yes, I am waffling, because I'm sure that someone could point out the "Snarfle n 1 Virtual Reality inter- face to the LC catalog that affords a walkthrough browsing experience," but of course only six computer sci- ence researchers have actually expe- rienced the SnarfleTM interface, and it requires a $25,000 workstation and $10,000 in virtual reality gear to work, plus it is s-1-o-w. Pardon the sarcastic riff, but there is a lot of won- derful user interface work that is cer- tainly not finding its way onto mainstream computer users' desk- tops, or to the library or information center. So what's the answer? Criticism is fun, because critics don't necessar- ily have to provide a positive account to match their nay-saying function. If things are bleak in the world of the user interface, both on the average user desktop and on the library desk- 104 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 top as well, what is to be done? For a taste of what is to come in the library world, take a look at MyLibrary (http:/ /my.lib.ncsu.edu/), which allows profiling of user preferences and customization based on aca- demic discipline. Similarly, there are a number of Web portals and other sites that allow customization for users (e.g., My Yahoo, My Excite, etc.). Suppose that these first steps in customization are carried further, so that each user's unique profile gener- ates a unique user interface experi- ence across all databases he or she deals with in a session. The interface unification could be accomplished across heterogeneous databases in a couple of different ways. A simple initial step that many libraries already employ is to obtain databases from a single aggregator, so that a uniform interface is pre- sented to the user. For example, OCLC' s First Search offers a single interface to a number of commercial databases. This type of solution is not possible for libraries that need access to a diverse array of databases not available through a single aggregator or vendor. Of course, this situation can present patrons and staff with a bewildering array of interfaces and search methods. A more elaborate solution is to employ Z39.50 to access the databases and build a single interface at the front end. There may be aggregators that already use this strategy with the databases they pro- vide, but in the future perhaps there would be an incentive to offer uni- fied interfaces with fine-grain cus- tomization possible by users. Getting back to Stephenson's more generalized view of the user interface, I think there are also opportunities here for more fine- grained customization. Stephenson points to the BeOS, which apparently allows both command-line and GUI- based interactions, as an example of what can be done when an operating system is constructed anew, from the bottom up, with no pre-existing Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. audience to satisfy. At the same time, and in contrast, Stephenson extols the power of open software develop- ment, which he believes is most apparent in operating systems, the production of which he describes as money-losing propositions. Yet, Linux is tremendously successful without, for the most part, commer- cial gain for developers. Can this same model be applied to interface and other development in the library world? In this example, might not some group of librarian coders (or coder librarians) work together to put Mylibrary together with Z39.50 capabilities and customization of interfaces to produce a little slice of paradise for library patrons? Promising moves are being made within the library community to get open source efforts off the ground. This could be one of many especially useful and fruitful projects to come out of open software development for libraries. Although his book is ostensibly about a few issues that elicit yawns from most of the world, Stephenson is really using In the Beginning . . . Was the Command Line to look at a much bigger picture than simply the command line versus the GUI at its microscopic level. Stephenson looks at the cloaking, obfuscation or replacement of underlying text by images and multimedia as contribut- ing to the decline of civilization. That seems like a radical claim, but at heart it is the one that Stephenson makes in his discussion of the Disney-ification of the world-that visual metaphors and explanations oversimplify and obscure the truth. In fact, Stephenson goes further, dis- cussing this trend toward anti-word as our attempt at an antidote for the kind of intellectualism that resulted in a lot of death, pain, and suffering for people in the twentieth century. He, as a person who lives by words and loves the intellectual life, thinks we've gone too far, reaching a state of cultural relativism where there is nei- ther good nor bad remammg. This discussion includes my favorite quote of the book: The problem is that once you have done away with the ability to make judgments as to right and wrong, true and false, etc., there's no real culture left. All that remains is clog dancing and macrame. The ability to make judgments, to believe things, is the entire point of having a culture. I think this is why guys with machine guns sometimes pop up in places like Luxor and begin pumping bul- lets into Westerners .... When their sons come home wearing Chicago Bulls caps with the bills turned sidewavs, the dads go out of their minds. (p. 56) It's a pretty startling move to try to connect up the decline in use of the command line to an anti-intellec- tualism following World War II that resulted in cultural relativism. I think it actually has some merit, although in the case of visual interfaces versus the command line the ethical import is minimal, i.e., I don't believe my decision to accomplish certain tasks using visual metaphors contributes to the decline of civilization, and I think the fact that I like to work on other tasks utilizing a command line won't serve to save our written cul- ture. It's too much of a stretch. I think that something Stephenson misses in his discussion of the replacement of the written word by visual images is that there is still a cre- ative force and judgment involved in the creation of the images. There is still script writing. Isn't this, after all, what a writer does in any case, creat- ing images, metaphorically, through his or her work? Certainly, we are moving through a perilous time, when the world really is changing from a reliance on the written word to more dependence on the visual. There will be many things lost in this transi- tion. Plato had some major, well- founded doubts about the transition from Greece's oral cultural tradition to a written one. The change hap- pened anyway. Civilization has been declining for a long time. My fearless prediction is that it will continue to decline for a long time. I think Stephenson has done a masterful job of writing a brief glimpse of the overall picture that represents the state of culture and intellectual life in the world today, and has also made some important points about the economics and char- acter of the world of software and operating environments. His writing skills make this fairly short book a pleasurable read and a worthwhile one. As I did, I think you might find this long essay a useful starting point for thoughts about issues large and small.-Tom Zillner, WILS The Cathedral & the • Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary by Eric S. Raymond, Sebastopol, Calif.: O'Reilly, 1999. 288p. $19.95 (ISBN 1- 56592-724-9) This short essay examines, in the guise of a book review, the concept of a "gift cul- ture" and how it may or may not be related to librarianship. As a result of this examination, and with a few qualifi- cations, I believe my judgements about open source software and librarianship are true: open source software develop- ment and librarianship have a number of similarities-both are examples of gift cultures. I have recently read a book about open source software development by Eric Raymond. The Cathedral & the Bazaar describes the environment of free software and tries to explain why some programmers are willing to give away the products of their labors. It describes the "hacker milieu" as a "gift culture": BOOK REVIEWS 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Gift cultures are adaptations not to scarcity but to abun- dance. They arise in popula- tions that do not have significant material scarcity problems with survival goods. We can observe gift cultures in action among aboriginal cul- tures living in ecozones with mild climates and abundant food. We can also observe them in certain strata of our own soci- ety, especially in show business and among the very wealthy. 1 Raymond alludes to the defini- tion of "gift cultures," but not enough to satisfy my curiosity. Being the good librarian, I was off to the reference department for more spe- cific answers. More often than not, I found information about "gift exchange" and "gift economies" as opposed to "gift cultures." (Yes, I did look on the Internet but found little.) Probably one of the earliest and more comprehensive studies of gift exchange was written by Marcell Mauss. 2 In his analysis he says gifts, with their three obligations of giving, receiving, and repaying, are in aspects of almost all societies. The process of gift giving strengthens cooperation, competitiveness, and antagonism. It reveals itself in religious, legal, moral, economic, aesthetic, morphological, and mythological aspects of life.3 As Gregory states, for the indus- trial capitalist economies, gifts are nothing but presents or things given, and "that is all that needs to be said on the matter." Ironically for econo- mists, gifts have value and conse- quently have implications for commodity exchange. 4 He goes on to review studies about gift giving from an anthropological view, studies focusing on tribal communities of various American Indians, cultures from New Guinea and Melanesia, and even ancient Roman, Hindu, and Germanic societies: The key to understanding gift giving is apprehension of the fact that things in tribal eco- nomics are produced by non- alienated labor. This creates a special bond between a pro- ducer and his/her product, a bond that is broken in a capi- talistic societv based on alien- ated wage-labor. 5 Ingold, in "Introduction To Social Life," echoes many of the things sum- marized by Gregory when he states that industrialization is concerned exclusively with the dynamics of commodity production. Clearly in non-industrial soci- eties, where these conditions do not obtain, the significance of work will be very different. For one thing, people retain control over their own capacity to work and over other productive means, and their activities are carried on in the context of their relationships with kin and com- munity. Indeed their work may have the strengthening or regeneration of these relation- ships as its principle objective. 6 In short, the exchange of gifts forges relationships between part- ners and emphasizes qualitative as opposed to quantitative terms. The producer of the product (or service) takes a personal interest in produc- tion, and when the product is given away as a gift it is difficult to quantify the value of the item. Therefore, along with the product or service, less tangible elements-such as obliga- tions, promises, respect, and interper- sonal relationships-are exchanged. As I read Raymond and others I continually saw similarities between librarianship and gift cultures, and therefore similarities between librari- anship and open source software development. While the summaries outlined above do not necessarily mention the "abundance" alluded to by Raymond, the existence of abun- dance is more than mere speculation. Potlatch, "a ceremonial feast of the American Indians of the northwest coast marked by the host's lavish dis- tribution of gifts or sometimes destruction of property to demon- strate wealth and generosity with the 106 INFORMATION TECHNOLOGY AND LIBRARIES I JUNE 2000 expectation of eventual reciproca- tion," is an excellent example.? Libraries have an abundance of data and information. (I won't go into whether or not they have an abun- dance of knowledge or wisdom of the ages. That is another essay.) Libraries do not exchange this data and infor- mation for money; you don't have to have your credit card ready as you leave the door. Libraries don't accept checks. Instead the exchange is much less tangible. First of all, based on my experience, most librarians simply take pride in their ability to collect, organize, and disseminate data and information in an effective manner. They are curious. They enjoy learning things for learning's sake. It is a sort of Platonic end in itself. Librarians, generally speaking, just like what they do and they certainly aren't in it for the money. You won't get rich by becoming a librarian. Information is not free. It requires time and energy to create, collect, and share, but when an information exchange does take place, it is usually intangible, not monetary, in nature. Information is intangible. It is difficult to assign it a monetary value, espe- cially in a digital environment where it can be duplicated effortlessly: An exchange process is a process whereby two or more individuals (or groups) ex- change goods or services for items of value. In Library Land, one of these individuals is almost always a librarian. The other individuals include tax payers, students, faculty, or in the case of special libraries, fel- low employees. The items of value are information and information services exchanged for a perception of worth-a rating valuing the services ren- dered. This perception of worth, a highly intangible and difficult thing to measure, is something the user of library services "pays," not to libraries and librarians, but to adminis- trators and decision-makers. Ultimately, these payments Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. manifest themselves as tax dol- lars or other administrative support. As the perception of worth decreases so do tax dol- lars and support. 8 Therefore, when information ex- changes take place in libraries, librari- ans hope their clientele will support the goals of the library to administra- tors when issues of funding arise. Librarians believe that "free" informa- tion ("think free speech, not free beer") will improve society. It will allow peo- ple to grow spiritually and intellectu- ally. It will improve humankind's situation in the world. Libraries are only perceived as beneficial when they give away this data and informa- tion. That is their purpose, and they, generally speaking, do this without regard to fees or tangible exchanges. In many ways I believe open source software development, as articulated by Raymond, is very simi- lar to the principles of librarianship. First and foremost they are similar in the idea of sharing information. Both camps put a premium on open access. Both camps are gift cultures and gain reputation by the amount of "stuff" they give away. What people do with the information, whether it be source code or journal articles, is up to them. Both camps hope the shared informa- tion will be used to improve our place in the world. Just as Jefferson's informed public is necessary for democracy, open source software is necessary for the improvement of computer applications. Second, human interactions are a necessary part of the mixture in both librarianship and open source devel- opment. Open source development requires people skills by source code maintainers. It requires an under- standing of the problem the computer application is intended to solve, since the maintainer must be able to "patch" the software, both to add functionality and to repair bugs. This, in turn, requires interactions both with other developers and with users who request repairs or enhancements. Similarly, librarians understand that information-seeking behavior is a human process. While databases and many "digital libraries" house infor- mation, these collections are really "data stores" and are only manifested as information after the assignment of value is given to the data and interre- lations between data are created. Third, it has been stated that open source development will remove the necessity for programmers. Yet Raymond posits that no such thing will happen. If anything, there will be an increased need for programmers. Similarly, many librarians feared the advent of the Web because they believed their jobs would be in jeop- ardy. Ironically, librarianship is flow- ering under new rubrics such as information architects and knowl- edge managers. It has also been brought to my attention by Kevin Clarke (kevin_clarke@unc.edu) that both institutions use peer-review: Your cultural take (gift culture) on "open source" is interesting. I've been mostly thinking in material terms but you are right, I think, in your assessment. One thing you didn't mention is that, like academic librarians, open source folks participate in a peer-review type process. Index to Advertisers All of this is happening because of an information economy. It sure is an exciting time to be a librarian, especially a librarian who can build relational databases and program on a Unix computer. Acknowledgements Thank you to Art Rhyno (arhyno@ server.uwindsor.ca) who encouraged me to post the original version of this text.-Eric Lease Morgan, North Carolina State University, Raleigh, North Carolina References 1. The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary, 99. 2. M. Mauss, The Gift: Forms and Functions of Exchange in Archaic Societies (New York: Norton, 1967). 3. S. Lukes, "Mauss, Marcel," in International Encyclopedia of the Social Sciences, D. L. Sills, ed. (New York: Macmillian), vol 10, 80. 4. C. A. Gregory, "Gifts," in The New Pa/grave: A Dictionary of Eeconomics, J. Eatwell and others, eds. (New York: Stockton Pr., 1987), vol. 4, 524. 5. Ibid. 6. T. Ingold, "Introduction to Social Life," in Companion Encyclopedia of Anthropology, T. Ingold, ed (New York: Routledge, 1984), 747. 7. The Merriam-Webster Online Dic- tionary, http://search.eb.com/ cgi-bin/ dictionary?va=potlatch 8. E. L. Morgan, "Marketing Future Libraries." Accessed Apr. 27, 2000, www.lib.ncsu.edu/ staff/ morgan/ cil/ marketing. Info USA Library Technologies, Inc. LITA MIT Press cover 2 cover 3 58, 69, cover 4 95 BOOK REVIEWS 107 10086 ---- September_ITAL_yelton_final President’s Message: 50 Years Andromeda Yelton INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 2017 1 Fifty years. LITA was voted into existence (as ISAD, the Information Science and Automation Division) in Detroit at Midwinter 1966. Therefore we have just completed our first fifty years, a fact celebrated (thanks to our 50th Anniversary Task Force) with a slide show and cake at Annual in Chicago. It’s truly humbling to take office upon this milestone. Looking back, some of the true giants of library technology have held this office. In 1971-72, Jesse Shera, who in his wide-ranging career challenged librarians to think deeply about the epistemological and sociological dimensions of librarianship; ALA makes several awards in his name today. In 1973-74 and again in 1974-75, Frederick Kilgour, the founding director of OCLC, who also has an eponymous award. In 1975-76, Henriette Avram, the mother of MARC, herself. Moreover, thanks to the work of countless LITA volunteers, much of this history is available open- access. I strongly recommend reading http://www.ala.org/lita/about/history/ for an overview of the remarkable people and key issues across our history. You can also read papers by Avram and Kilgour, among many others, in the archives of this very publication. In fact, reading the ITAL archives is deeply engaging. It turns out library technology has changed a bit in 50 years! (I trust that isn’t a shock to you.) The first articles (in what was then the Journal of Library Automation) are all about instituting first-time computer systems to automate traditional library functions such as acquisitions, cataloging, and finance. The following passage caught my eye: “A functioning technical processing system in a two-year community college library utilizes a model 2201 Friden Flexowriter with punch card control and tab card reading units, an IBM 026 Key Punch, and an IBM 1440 computer, with two tape and two disc drives, to produce all acquisitions and catalog files based primarily on a single typing at the time of initiating an order” (“An Integrated Computer Based Technical Processing System in a Small College Library”, Jack W. Scott; https://doi.org/10.6017/ital.v1i3.2931.) How many of us are still using punch cards today? And, indeed, how many of us are automating libraries for the first time? The topics discussed among LITA members today are far more wide- ranging: user experience, privacy, accessibility. They’re more likely to be about assessing and improving existing systems than creating new ones, and more likely to center on patron-facing technologies. Andromeda Yelton (andromeda.yelton@gmail.com) is LITA President 2017-18 and owner/consultant of Small Beautiful Useful LLC. PRESIDENT’S MESSAGE | YELTON https://doi.org/10.6017/ital.v36i3.10086 2 And yet, with a few substitutions — say, “Raspberry Pi” for “Friden Flexowriter” — the blockquote above would not be out of place today. Then as now, LITA members were doing something exciting, yet deeply practical, that cleverly repurposes new technology to make library experiences better for both patrons and staff. Our job descriptions have changed enormously in fifty years; in fact, the LITA Board charged a task force to develop LITA member personas, so that we can better understand whom we serve, and work to align our publications, online education, conference programming, and committee work toward your needs. (You can see an overview of the task force’s stellar work on LITAblog: http://litablog.org/2017/03/who-are-lita-members-lita-personas/.) At the same time, the spirit of pragmatic creativity that runs throughout the first issues of the Journal of Library Automation continues to animate LITA members today. I’m looking forward to seeing where we go in our next fifty years. 10087 ---- September_ITAL_varnum_final Editorial Board Thoughts: Content and Functionality: Know When to Buy ‘Em, Know When to Code ‘Em1 Kenneth J. Varnum2 INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 2017 3 We in library technology live in interesting times, though not those of these apocryphal curse. No, these are interesting times in the best possible way. Where once there was a paucity of choice in interfaces and content, we have arrived at a time when a range of competing and valid choices exists for just about any particular technology need. Data and functionality of actual utility to libraries are increasingly available not just through proprietary interfaces, but also through APIs (Application Programming Interfaces) that are ready to be consumed by locally developed applications. This has expanded the opportunity for libraries to respond more thoughtfully and strategically to local needs and circumstances than ever before. Libraries are faced with an actual, rather than hypothetical, choice between building or buying fundamental user interfaces and systems As the internet has evolved, and coding has become more central to the skillset of many libraries, the capability of libraries to seriously consider building their own interfaces has grown. How does a technologically capable library make the decision to buy a complete system or build its own interface to existing data? The process can be decided using a range of criteria that can help define the library’s need for a locally managed solution. We’ll start by discussing technological capabilities needed to take on almost any development project, then define three criteria, and finally discuss the circumstances in which a build solution might be appropriate. The goal is outline a process for deciding when it make more sense to buy both the interface and the content, to build one or the other locally, or to build both. Criterion 0: What are the short- and long-term technological capabilities of the library? Clearly, the first point of consideration is whether the institution has the capacity to manage application development and user research. The short-term answer may be no, but the long-term answer -- one based on the library’s strategic direction -- may be that these skills are needed to meet the library’s goals or strategic vision. One project may not be enough to tip the scales, but if the library is continually deciding if the immediate project under discussion is the one to change the balance, then perhaps the answer is that it’s time to invest in new skillsets and capabilities. There are actually several skillsets needed to undertake development projects. Individuals with coding skills are needed to adapt existing open-source software to the library’s needs — it is a rare 1 With apologies to Kenny Rogers 2Kenneth J. Varnum (varnum@umich.edu), a member of the ITAL Editorial Board, is Senior Program Manager for Discovery, Delivery, and Library Analytics at the University of Michigan Library, Ann Arbor, MI. EDITORIAL BOARD THOUGHTS | VARNUM https://doi.org/10.6017/ital.v36i3.10087 4 open-source project that does exactly what a library needs it to do, with connectors to all the same data sources and library management tools already perfectly configured by somebody else — but that is not sufficient. A library also needs people with user interface and user research skills ensure that the application meets at least the critical needs of its own user community, and does so with language and cues that match user expectations. Even if there is not a permanent capability on the library’s staff, development can take place with contract services. If this is the option selected, a library would do well to make sure that staff are sufficiently trained to make minor updates to interfaces and applications, or that a longer-term arrangement is made for ongoing maintenance and updates. Criterion 1: What is the need to customize interactions to local situations? Most, but not all, applications offer opportunities to match interface features and functionality with local user needs. The more interactive and core to the library’s service model the tool is, the more likely the tool is to benefit from customization. For example, a proxy server -- technology that allows an authenticated user to access licensed content as if she were in the physical library or within a campus on a defined network -- has little or no user interface. There is little need to customize the tool to meet user needs, beyond ensuring the list of online resources and URLs subject to being proxied is up to date. There really aren’t any particularly useful APIs to consumer and reproduce elsewhere, and there are easier ways to build an A-Z list of licensed content than harvesting the proxy server’s configuration lists. In contrast, the link resolver -- technology that takes a citation formatted according to the OpenURL standard and returns a list of appropriate full-text destinations to which the library has licensed access -- may well be worth bringing in house. Some vendors offer their software to be run locally, while others provide API access to the metadata. At my institution, we used the APIs Serials Solutions makes available for its 360 Link API to build our own interface using the open- source Umlaut software. (See https://mgetit.lib.umich.edu/). Why go to the trouble of recreating an interface? For several reasons, some of which (understanding user behaviors and maintaining control over user data to the extent practical) I’ll touch on in the following two sections. The main reason centered on providing a user interface consistent with the rest of our web presence, offering integrations to our document delivery service, and a way to contact our online chat service, and a way to report problem links directly to the library when the full text links provided by the system do no work. While these features are generally available through vendor interfaces, the user experience is hard to make consistent with other services we offer. Criterion 2: What are the needs for integration with other systems from different providers? Integrations can run in two directions: from the system under consideration to existing library or campus/community tools, and from those environmental tools to the library. When thinking about the buy-or-build decision, understanding the scope of these integrations up front is important. If all of the tools or services that need to consume information from or provide information to your INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 2017 5 system rely on well-defined standards that are broadly implemented, this criterion may be a wash; there may not be an inherent advantage to building or buying based on data exchange. If, however, the other systems are themselves tricky to work with, relying on inputs or providing outputs in a non-standard or idiosyncratic way, this situation may swing the pendulum toward building the system yourself so you can manage. For example, many course management systems on academic campuses can consume and provide data using the LTI [Learning Tools Interoperability] standard for data exchange. Many traditional library applications do, as well, so if a library using an LTI-compliant system needs to provide course reserves reading lists to the course management system, this is a ready-made way to make that information available. At the other extreme, bringing registrar’s data into a library catalog -- to know who is in what courses to provide those patrons with an appropriate reference librarian contact for a particular subject, or access to a reading list through a course reserves system -- may only be possible through customized applications to read non-standard data. In this case, to provide the desired level of service to the campus, the library may need to build local applications. Criterion 3: Who manages confidentiality or privacy of user interactions? A final, and increasingly significant, criterion to consider is where the library believes responsibility for patron data and information seeking behavior to reside. Notwithstanding contractual or licensing obligations taken on by library vendors, the risk of inadvertent exposure or intentional sharing of user interactions is always present. One advantage of building local systems to interact with vendor systems (link resolvers, discovery platforms, etc.) is that vendor does not have access to the end-user’s IP address or any other personally identifying information. The vendor only sees a request coming from the library’s application; all requests are equal and undifferentiated. Of course, once users access the target item they are seeking (an online journal, database, etc.), that particular vendor’s site has access to that information. For libraries concerned about user privacy, the risk of exposure is somewhat mitigated by managing the discovery or access layer in-house -- and deciding to maintain a level of user information that suits that particular library’s comfort level -- and potentially minimizing the single point of failure for breaches. At the same time, such a decision puts more responsibility on the library or its parent information technology organization to protect data from exposure. Some libraries feel they can handle this responsibility -- either by careful protection of the data, or by not collecting and storing it in the first place -- in a way that library vendors cannot. Concluding Thoughts Making the buy-or-build decision is not straightforward; the criteria described here are not the only ones a library might wish to consider, but they are common ones with the greatest ramifications. Putting the decision process into a framework can help a library make consistent EDITORIAL BOARD THOUGHTS | VARNUM https://doi.org/10.6017/ital.v36i3.10087 6 decisions over time, enabling it to focus on the projects and systems that are most important to the library and its community (a campus, a town, or company). 10113 ---- From Dreamweaver to Drupal: A University Library Website Case Study Jesi Buell and Mark Sandford INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 118 Jesi Buell (jbuell@colgate.edu) is Instruction and Design and Web Librarian and Mark Sandford (msandford@colgate.edu) is Systems Librarian at Colgate University, Hamilton, New York. ABSTRACT In 2016, Colgate University Libraries began converting their static HTML website to the Drupal platform. This article outlines the process librarians used to complete this project using only in-house resources and minimal funding. For libraries and similar institutions considering the move to a content management system, this case study can provide a starting point and highlight important issues. INTRODUCTION The literature available on website design and usability is predominantly focused on business or marketing websites. What separates library websites from other informational or commercial websites is the complexity of the information architecture—they contain both intricate informational and transactional functions. Website managers need to maintain congruity between many interrelated but disparate tools in a singular interface and navigational system. Libraries are also often challenged with finding individuals who possess the appropriate skills to build and maintain a secure, accessible, attractive, and easy-to-use website. In contrast to libraries, commercial companies employ a team of designers, developers, content managers, and specialists to triage internal and external issues. They can also spend months or years perfecting a website and, of course, all these factors have great costs associated with them. Given that many commercial websites need a team of highly skilled workers with copious time and funding, how can librarians be expected to give their patrons similar experiences to sites like Google? This case study will outline how a small team of librarians completely overhauled their fragmented, Dreamweaver-based website to a more secure, organized, and appealing open-source platform with Drupal within a tight timeline and very few financial consequences. It includes a timeline of major milestones in the appendix. GOALS AND OBJECTIVES The first necessity for restructuring the Colgate University Libraries’ website was building a team that had the skills and knowledge necessary to perform this task. The website overhaul was spearheaded by Jesi Buell, instructional design and web librarian, and Mark Sandford, systems librarian. Buell has a user experience (UX) design and editing background while Sandford has systems, cataloging, and server experience. They were advised by Web Development Committee (WDC) members Cindy Li, associate director of library technology and digital initiatives, and Debbie Krahmer, digital learning and media librarian. Together, the group understood trends in digital librarianship, the needs of the Libraries’ patrons, as well as website and catalog design and mailto:jbuell@colgate.edu mailto:msandford@colgate.edu FROM DREAMWEAVER TO DRUPAL | BUELL AND SANDFORD 119 https://doi.org/10.6017/ital.v37i2.10113 maintenance. The first thing the WDC did was outline its goals and objectives, and this documented weaknesses the group wanted to address with a new website. The WDC identified four main improvements Colgate Libraries needed to make to the website: Improve Design Colgate Libraries’ old website suffered from varied design and language use across pages and various tools (LibGuides, catalog, etc.). This led to an inconsistent and often frustrating user experience and detracted from the user’s sense of a single, cohesive website. The WDC also wanted to improve and update the aesthetic quality of the website. While many of these changes could have been made with an overhaul of the existing site, the WDC would have still needed to address the underlying cause. Responsibility for content was decentralized, and content creation relied too heavily on technical expertise with Dreamweaver. Further, the ad hoc nature of the content—the product of years of “fitting in” content without a holistic approach—meant that changes to visual style could not be accomplished by changing a single CSS file. There were far too many exceptions to make changes simply. Improve Usability The WDC needed to make sure all the webpages were responsive and accessible. A restructuring of layout and information architecture (IA) was also necessary to improve findability of resources. On the old site, some content was hidden behind several layers of links. With no platform to ensure or enforce accessibility standards, website managers had to trust that all content creators were conscious of best practices or, failing that, pages had to be re-edited to improve accessibility. Improve Content Creation and Governance A common source of library staff frustration was the authoring experience using Dreamweaver. There was no way to track when a webpage was changed or see who had made those changes. Situations occurred where content was deleted or changed in error, and no one else knew until a patron discovered a mistake. Staff could also mistakenly push out outdated versions of pages. It was not an ideal situation, and it was impossible for an individual (the web librarian) to monitor hundreds of pieces of content for daily changes to check for accuracy. The only other option would be narrow access to only those on the WDC, but that would mean everyone had to wait for the web librarian to push content live, which would also be frustrating. Beyond the security and workflow issues, many of the library staff felt uncomfortable adding or editing content because Dreamweaver requires some coding knowledge (HTML, CSS, JavaScript). Therefore, the group wanted to install a content management system (CMS) that provided a WYSIWYG (What You See Is What You Get) content editor so that no coding knowledge would be needed. Unite Disparate Sites (Website, Blog, and Database List) under One Updated URL on a Single Secure Server Colgate Libraries’ website functionality suffered from what Marshall Breeding describes as “a fragmented user experience.”1 The Libraries website’s main address was http://exlibris.colgate.edu. However, different tools lived under other URLs—one for a blog, another for the database list, yet another still for the mobile site librarians had to maintain INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 120 because the main website was not responsive. Additionally, some portions of the website had been set up on other servers because of various limitations in the Windows.Net environment and in- house skills. This was further complicated by the fact that most specialized interactivity or visual components had to be created from scratch by existing staff. The Libraries’ blog was on an externally hosted WordPress site, and the database A–Z list was on a custom-coded PHP page. A unified domain would make usage statistics easier to track and analyze. Additionally, it would eliminate the need for multiple credentials for the various external sites. Custom code, be it in PHP, .Net, or any other language, also needs to be regularly updated as new security vulnerabilities arise.2 Moving to a well-maintained CMS would help alleviate that burden. By establishing goals and objectives, the WDC had identified that it wanted a CMS to help with better governance, easier maintenance, and ways to disperse web maintenance responsibilities across library faculty. It was important to choose a CMS platform that offered a WYSIWYG editor so that content authoring did not require coding knowledge. Additionally, the group wanted to update the site’s aesthetic and navigational designs. The WDC also decided that this was the optimal time to introduce a discovery layer (since all these changes would be one entirely new experience for Colgate users) rather than smaller, continual changes that would require users to keep readjusting how they used the website. The backend complexity of updating both the website platform and implementing a discovery layer required abundant and detailed planning. However, while there was a lot of overlap in the preparatory work for implementing the discovery layer as well the CMS, this article will focus primarily on the CMS. PLANNING After the WDC had detailed goals and objectives, and the proposal to update the Libraries’ website platform was accepted by library faculty, the group had to take several steps to plan the implementation. The first steps in planning dealt with analysis. Content Analysis The web librarian conducted a content analysis of the existing website. Using Microsoft Excel to document the pages and the Omni Group’s Omnigraffle to organize the spreadsheet into a diagram, she cataloged each page and the navigation that connected that page to other pages. This can be extremely laborious but was necessary because some content was inherited from past employees over the course of a decade, and no one knew exactly what content was live on the website. This visual representation allowed for content creators to see redundancy in both content and navigation. It also made it easy for them to identify old content and combine or reorder pages. Needs Analysis The WDC wanted to make sure it considered more than the content creators’ needs. This group surveyed Colgate faculty, staff, and students to learn what they would like to see improved or changed. The web librarian conducted several UX studies with both students and faculty, and this elucidated several key areas in need of improvement. FROM DREAMWEAVER TO DRUPAL | BUELL AND SANDFORD 121 https://doi.org/10.6017/ital.v37i2.10113 Peer Analysis Peer Analysis involves thoroughly investigating peer institution’s websites to analyze how they organize both their content and their site navigation. It also gives insight into what other services and tools they provide. It is important to choose institutions similar in size and academic focus. Colgate University is a small, liberal arts institution that only serves an undergraduate population, so the Libraries would not seek to emulate a large university that serves graduate populations or distance learners. Peer analysis is an excellent opportunity to see where a website is not measuring up to other websites as well as to borrow ideas from peers to customize for your specific patrons. Evaluating Platforms Now that the group knew what the Libraries had and what the Libraries wanted from our web presence, it was time to evaluate the available options. This involved evaluating CMS products and discovery layer platforms. The WDC researched different CMSs and listed positives and negatives. Ultimately, the group determined that Drupal best satisfied the majority of Colgate’s identified needs. A separate committee was formed to evaluate the major discovery-layer services with the understanding that any option could be integrated into the main website as a search box. Budgeting As free, open-source software, Drupal does not require a subscription or licensing fee. Campus IT provided a virtual server for the website at no cost to the Libraries. Budgeting was organized by the associate director of library technology and digital initiatives and the university librarian. Money was set aside in case a consultant or developer was needed, but the web and systems librarians were able to execute the conversion from Dreamweaver to Drupal without external support. If future development support is needed for specific projects, it can be budgeted for and purchased as needed. The last step was creating a timeline defining achievable goals, ownership (who oversees completing the goal and who needs to be involved with the work), and date of completion. TIMELINE The timeline was outlined as follows: October 2015–January 2016 Halfway through the Fall 2015 semester, the WDC began to create a proposal for changes to be made to the website. This proposal would be submitted to the university librarian for consideration by December 1. In the meantime, the web librarian completed a content inventory, peer analysis, and UX studies. She also gathered faculty and staff feedback on the current website through suggestion-box commentary, one-on-one interviews, online questionnaires, and anecdotal stories. By the deadline for the proposal, this additional information was condensed and presented to the university librarian. After incorporating suggested changes made by the university librarian, the WDC was able to present both the proposal and results from various studies to the library faculty on January 4, INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 122 2016. At the end of the meeting, the faculty voted to move forward and adopt the proposed changes. February 2016 February was spent meeting with stakeholders, both internal and external to the Libraries, to gather concerns, necessary content, and ideas for improvements. The WDC members shared the responsibility of running these meetings. All members from the following departments were interviewed: Research and Instruction, Borrowing Services, Acquisitions, Library Administration, Cataloging, Government Documents, Information Literacy, Special Collections and University Archives, and the Science Library. Together, the WDC also met with members from IT and Communications. It was vital that these sessions identify several components. First, what content was important to retain on the new site, and why? The act of justification made stakeholders evaluate whether the information was necessary and useful to the Libraries’ users. The WDC also asked the stakeholders to identify changes they wanted to see made to the website. The answers ranged from minor aesthetic tweaks to major navigational overhauls. Last, it was important to understand how specific changes might impact workflows and functionality for tools outside Colgate Libraries’ own website. For example, the WDC had to update information with the Communications department so that the Libraries’ website would be findable on the university’s app. All the answers the WDC received were compiled into a report, and the web librarian used this information to inform design decisions moving forward. March 2016 While the associate director of library technology and digital initiatives coordinated demos from discovery layer vendors, the WDC also met to choose the final template from three options designed by the web librarian. The web and systems librarians also met to create a list of developers in case assistance was needed in the development of the Drupal site. The WDC team researched potential developers and inquired about their pricing. The web librarian began to create wireframe templates of the different types of pages and page components (homepage, hours blocks, blogs, forms, etc.). She also began transferring existing content from the old website to the new website. This process, in addition to the development of new content identified by stakeholders, was to be completed by mid-summer. Meanwhile, the systems librarian began to consolidate the external sites under Drupal to the extent possible. While LibGuides lives externally to Drupal and maintains its own URL that the Libraries’ website links out to, he was able to bring the database A–Z list, blog, and analytics into the Drupal platform. This entailed setting up new content types in Drupal to accommodate various functional requirements for the A–Z list and assist in creating pages to search for and display database information. FROM DREAMWEAVER TO DRUPAL | BUELL AND SANDFORD 123 https://doi.org/10.6017/ital.v37i2.10113 April–May 2016 Drupal allows for various models of permissions and authentication. By default, accounts can be created within the Drupal system and roles and permissions assigned to individuals as needed. The LDAP (Lightweight Directory Access Protocol) module allowed us to tie authentication to university accounts and includes the ability to tie Drupal permissions to Active Directory roles and groups. Connecting Drupal to the university LDAP server required the assistance of IT Infrastructure staff but was straightforward. IT staff provided the connection information for the Drupal module’s configuration and created a resource account for the Drupal module to use to connect to the LDAP service. As currently implemented, the LDAP module simply verifies credentials and, if a local Drupal account does not exist, creates one for the user. Permissions for staff are added to accounts after account creation as needed as a part of the onboarding process. Permissions in Drupal can be highly granular. Since one of the goals of the migration to Drupal was to simplify maintenance of the website, the WDC decided to begin with a relatively simple, permissive approach. Currently, all library staff can edit any page. Because of Drupal’s ability to track and revert changes easily, undoing a problematic edit is a simple procedure, and because all changes are tied to an individual login, problems can be addressed through training as needed. The WDC discussed a more fragmented approach that tied editing privileges to specific parts of the site but decided against it. The WDC team felt it was better to begin with the presumption of trustworthiness, expecting staff to only make changes to pages they were personally responsible for. Additionally, trying to divide the site into logical pieces, then accounting for the inevitable exceptions, would be complicated and time-consuming. The WDC reserved the right to begin restricting permissions in the future, but thus far this has proven unnecessary. July–August 2016 As the Libraries ramped up to the official launch, it was crucial to educate the library faculty and staff so they could become independent back-end content creators. Both the web and systems librarians held multiple training sessions for the Libraries employees so that everyone felt comfortable both editing and generating content. The associate director of library technology and digital initiatives drafted a campus-wide email announcing the new website and discovery layer at this point. It was sent out a month in advance of the official launch. The new website launched in two parts. The soft launch occurred on August 1, 2016. The web and systems librarians set up a link to the new website on the old site so that users could choose between getting acclimated to the new website or using the tool they were used to in the frantic weeks leading up to the beginning of the semester. August 15, 2016, was the official launch. At this point, the http://exlibris.colgate.edu Dreamweaver-based website was retired, and IT redirected all traffic heading to the old URL to the new Drupal-based website at http://cul.colgate.edu. Because Drupal’s URL structure and information architecture differed from the old website, the WDC decided that mapping every page on the old site to the new one would be too time consuming. While it was acknowledged that this may cause some disruption (as it would break existing links), it seemed necessary for keeping the project moving forward. Library staff updated all external links possible. The Google search operator “inurl” allowed us to identify other sites INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 124 outside the Libraries’ control that pointed to the old website. The WDC reached out to the maintainers of those few sites as appropriate. The biggest risk the Libraries took by not redirecting all URLs to the correct content was the potential to disrupt faculty who had bookmarked content or had direct URLs in course materials. However, the WDC team received very few complaints about the new site, and most users agreed that the improvements to the site far outweighed any temporary inconveniences caused by it. If nothing else, the simplified architecture made finding content easier, so direct links and bookmarks became far less important than they once were. IMPLEMENTATION AND FUTURE STEPS By strictly following the timeline and working closely together, the web librarian and systems librarian were able to launch Colgate Libraries’ new website in time for the 2016 Fall semester. The WDC team was able to pull off this feat within eight months without spending any extra money. The timeline above only gives a high-level view of the steps the WDC took to accomplish this task. The librarians who worked on this project cannot overemphasize the complexity of this endeavor, especially with a small team. However, a website conversion is feasible with organization, time, and with the online support the Drupal community provides (especially the community of libraries on the Drupal platform). It is also critical to have in-house personnel that have technical (coding and server-side) knowledge, project management knowledge, and information architecture and design knowledge. The response from incoming and returning students and faculty to the updated look and improved usability of the Libraries’ digital content was overwhelmingly positive. Following best design practices, in January 2017 more UX testing was conducted with student and teaching faculty participants to gauge their reactions to the new website. 3 Users overwhelmingly found the new website to be both more aesthetically pleasing and usable than the old website. On the back end, the Libraries’ content is now more secure, responsive, and accessible because the Libraries are using a CMS. Library faculty and staff have been able to add or remove content that they are responsible for, but the website can still maintain a consistent look and feel across all pages. Governance has been improved exponentially as library staff have been able to easily and quickly contribute to the website’s content without administrative delays. As the team moves forward, the WDC plans to investigate different advanced Drupal tools, implementing an intranet, and better leveraging Google Analytics. As with all library endeavors, improvement requires continued effort and attention. FROM DREAMWEAVER TO DRUPAL | BUELL AND SANDFORD 125 https://doi.org/10.6017/ital.v37i2.10113 APPENDIX: DETAILED TIMELINE 1. October 2015 a. Began discussion with WDC to create proposal for website changes (web librarian) 2. November–December 2015 a. Complete content inventory (web librarian) b. Complete peer analysis (web librarian) c. Complete UX studies (web librarian) d. Gather faculty and staff feedback on current website (web librarian) 3. December 1, 2015 a. Submit proposal to change from Dreamweaver to Drupal to university librarian for consideration and approval (web librarian) 4. January 4, 2016 a. Submit revised proposal to library faculty for consideration and approval (web librarian) 5. January 2016 a. Set up test Drupal site (systems librarian) 6. February 2016 a. Complete meetings with departments to gather feedback on concerns, content, and ideas for improvements (library department meetings were split among WDC members) 7. March 2016 a. Demo PRIMO, Ex Libris, and Summon for library faculty and staff consideration (associate director of library technology and digital initiatives) b. From three options, choose template for our website (web librarian—approval by the WDC and then the library faculty) c. Create list of developers in case we need assistance (web librarian and systems librarian) d. Create wireframe templates for homepage (web librarian) e. Begin transferring content from old website to new website and create new content with other stakeholders—to be completed by mid-summer (web librarian) f. Begin consolidating multifarious external sites under Drupal as much as possible (systems librarian) 8. April 2016 a. Get Drupal working with the LDAP (systems librarian) b. Agree on permissions and roles for back-end users (systems librarian—with approval by WDC) c. Agree on discovery layer choice (associate director of library technology and digital initiatives) d. Meet with outside stakeholders—Communications, IT, administration 9. May 2016 a. Integrate discovery layer search (systems librarian) 10. July 2016 a. Provide training for library faculty and staff as back-end content creators (web librarian) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 126 b. Prepare campus-wide email to announce new website and discovery layer with our new URL (associate director of library technology and digital initiatives and web librarian) 11. August 1, 2016 a. Set up a link on our old site (http://exlibris.colgate.edu) so for two weeks users could choose between using the old interface or start getting acclimated to the new website before the Fall semester started (systems librarian) 12. August 15, 2016 a. OFFICIAL LAUNCH—We retire our http://exlibris.colgate.edu Dreamweaver-based website and redirect all traffic headed to our old URL to our new Drupal-based website at http://cul.colgate.edu (systems librarian) 13. September–October 2016 a. Update and get approval from library faculty for a new web style guide and governance guide (web librarian) 14. January 2017 a. Conduct UX studies of students and faculty to see how people are using both the new website and the new discovery layer; gather feedback and ideas for improvement (web librarian) BIBLIOGRAPHY Breeding, Marshall. “Smarter libraries through technology: strategies for creating a unified web presence.” Smart Libraries Newsletter 36, 11 (November 2016): 1-2. General OneFile (accessed August 3, 2017). http://go.galegroup.com/ps/i.do?p=ITOF&sw=w&v=2.1&it=r&id=GALE%7CA471553487. Naudi, T. “Nearly all websites have serious security vulnerabilities--new research shows.” Database and Network Journal 45, 4 (2015): 25. General OneFile (accessed August 3, 2017). http://bi.galegroup.com/essentials/article/GALE%7CA427422281. Raward, R. “Academic Library Website Design Principles: Development of a Checklist.” Australian Academic & Research Libraries 32, 2 (2001): 123-36. http://dx.doi.org/10.1080/00048623.2001.10755151 1 Marshall Breeding, “Smarter Libraries through Technology: Strategies for Creating a Unified Web Presence,” Smart Libraries Newsletter 36, no. 11 (November 2016): 1–2. General OneFile. 2 Tamara Naudi, “Nearly All Websites have Serious Security Vulnerabilities—New Research Shows,” Database and Network Journal 45, no. 4 (2015): 25. General OneFile. 3 Roslyn Raward, “Academic Library Website Design Principles: Development of a Checklist,” Australian Academic & Research Libraries 32, no. 2 (2001): 123–36. http://dx.doi.org/10.1080/00048623.2001.10755151 Introduction GOALS and OBJECTIVES Improve Design Improve Usability Improve Content Creation and Governance Unite Disparate Sites (Website, Blog, and Database List) under One Updated URL on a Single Secure Server PLANNING Content Analysis Needs Analysis Peer Analysis Evaluating Platforms Budgeting TIMELINE October 2015–January 2016 February 2016 March 2016 April–May 2016 July–August 2016 IMPLEMENTATION and FUTURE STEPS APPENDIX: Detailed timeline BIBLIOGRAPHY 10146 ---- Metadata Provenance and Vulnerability Timothy Robert Hart and Denise de Vries INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 24 Timothy Robert Hart (tim.hart@flinders.edu.au) is PhD researcher and Denise de Vries (denise.devries@flinders.edu.au) is Lecturer of Computer Science, College of Science and Engineering, Flinders University, Adelaide, Australia. ABSTRACT The preservation of digital objects has become an urgent task in recent years as it has been realised that digital media have a short life span. The pace of technological change makes accessing these media increasingly difficult. Digital preservation is primarily accomplished by main methods, migration and emulation. Migration has been proven to be a lossy method for many types of digital objects. Emulation is much more complex; however, it allows preserved digital objects to be rendered in their original format, which is especially important for complex types such as those comprising multiple dynamic files. Both methods rely on good metadata to maintain change history or construct an accurate representation of the required system environment. In this paper, we present our findings that show the vulnerability of metadata and how easily they can be lost and corrupted by everyday use. Furthermore, this paper aspires to raise awareness and to emphasise the necessity of caution and expertise when handling digital data by highlighting the importance of provenance metadata. INTRODUCTION UNESCO recognised digital heritage in its “Charter on the Preservation of Digital Heritage,” adopted in 2003, stating, “The digital heritage consists of unique resources of human knowledge and expression. It embraces cultural, educational, scientific and administrative resources, as well as technical, legal, medical and other kinds of information created digitally, or converted into digital form from existing analogue resources. Where resources are ‘born digital’, there is no other format but the digital object.” 1 Born-digital objects are at risk of degradation, corruption, loss of data, and becoming inaccessible. We combat this through digital preservation to ensure they remain accessible and useable. The two main approaches to preservation are migration and emulation. Migration involves migrating digital objects to a different and currently supported file type. Emulation involves replicating a digital environment in which the digital object can be accessed in its original format. Both methods have advantages and disadvantages. Migration is the more common method because it is simpler than emulation and the risks can often be neglected. These risks include potential data loss or change, in which the effects are permanent. Emulation is complex, but it offers the better means to access preserved objects, especially complex file types comprising multiple dynamic files that must be constructed correctly. Emulation also allows users to handle digital objects as closely to the “look and feel” as originally intended. 2 mailto:tim.hart@flinders.edu.au mailto:denise.devries@flinders.edu.au METADATA PROVENANCE AND VULNERABILITY | HART AND DE VRIES 25 https://doi.org/10.6017/ital.v36i4.10146 Accurate and complete metadata is central to both migration and emulation; thus, it is the focus of this paper. Metadata are needed to record the migration history of a digital object and to record contextual information. They are also necessary to accurately render digital objects in emulated environments. Emulated environments are designed around a digital object’s dependencies , which typically include, but are not limited to, drivers, software, and hardware. 3 The metadata describe the attributes of the digital object from which we can derive the type of system in which it can run (e.g., the operating system), the versions of any software dependencies, and other criteria that are crucial for accurate creation of an emulated environment. While metadata are being used to support the preservation of digital objects, there is another equally important role it should be playing. It is not enough to preserve the object so it can be accessed and used in the future. What of the history and provenance of the digital object? What about search and retrieval functionality within the archive or repository the digital object is held in? One must consider how these preserved objects will be used in the future, and by whom. Preserving digital objects is difficult if adequate metadata is not present, especially if the item is outdated and no longer supported. Looking to the future, we should try to ensure metadata are processed correctly for the lifecycle of the digital object. This means care must be taken at the time of creation and curation of any digital objects because although some metadata are typically generated automatically, many elements that will play a pivotal role later must be created manually. Digital objects also commonly go through many changes, which is something that must be captured, as the change history will reveal what has happened to the object over of its lifecycle. The changes may include how the object has been modified, migrations to different formats, and what software created or changed the object—all of which is considered when emulating an appropriate environment. Examples of these changes can be found in case studies presented in the paper. METADATA TYPES The common and more widely used metadata types include, but are not restricted to, Administrative, Descriptive, Structural, Technical, Transformative, and Preservation metadata. Each metadata type describes a unique set of characteristics for digital objects. Administrative metadata include information on permissions as well as how and when an object was created. Transformative Metadata includes logs of events that have led to changes to a digital object. 4 Structural metadata describe the internal structure of an object and any relationships between components. Technical metadata describe the digital object with attributes such as height, weight, format, and other technical details. 5 Preservation metadata support digital preservation by maintaining authenticity, identity, renderability, understandability, and viability. They are not bound to any one category as they comprise multiple types of metadata, not including descriptive or contextual metadata. However, unlike the common metadata types, preservation metadata are unique from the other metadata types and are often ambiguous. 6 In 2012, the developers of version 2.2 of the PREMIS Data Dictionary for Preservation Metadata saw descriptive metadata as less crucial for preserving digital objects; however, they did state it was important for discovery and decision making. 7 While version 2.2 allowed descriptive INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 26 metadata to be handled externally through existing standards such as Dublin Core, the latest version (2017) of the dictionary allows for “Intellectual Entities” to be created within PREMIS that can capture descriptive metadata. 8 Thus, while digital preservation does not require all types of metadata, the absence of contextual metadata limits the future possibilities for the preserved object. Hart writes that because the multimedia objects are dynamic and interactive, and often composed of multiple image, audio, video, and software files, descriptive metadata are increasingly important because they can be used to describe, organise, and package the files. 9 It is also stressed that content description is of great importance because digital objects are not self-describing, which makes identifying semantic-level content difficult; without description metadata, context is lost. 10 For example, without description metadata to provide context, an image’s subject information and search and retrieval functionality is lost. Without this information, verifying whether an object is the original, a copy, or a fabricated or fraudulent item is impossible in most cases. Metadata Vulnerability—Case Studies Digital objects that are currently being created often go through several modifications, making it difficult to identify the original or authentic copy of the object. Verifying and validating authenticity is important for preserving, conserving, and archiving objects. The Digital Preservation Coalition defines authenticity as The digital material is what it purports to be. In the case of electronic records, it refers to the trustworthiness of the electronic record as a record. In the case of “born digital” and digitised materials, it refers to the fact that whatever is being cited is the same as it was when it was first created unless the accompanying metadata indicates any changes. Confidence in the authenticity of digital materials over time is particularly crucial owing to the ease with which alterations can be made. 11 Tests were undertaken to discover how vulnerable metadata can be in digital files that are subject to change, which can lead to loss, addition, and modification. The tests were conducted using the file types JPEG, PDF, and DOCX (Word 2007). The tests revealed what metadata can be extracted and what metadata could be present in the selected file types. Furthermore, they revealed how specific metadata can verify and validate the authenticity of a file such as an image. For each test, the metadata were extracted using ExifTool (http://owl.phy.queensu.ca/~phil/exiftool/). Alternative browser-based tools were tested and provided similar results; however, ExifTool was selected as the primary testing tool because it produced the best results and had the best functionality. Some of the files tested provided extensive sets of metadata that are too large to include, but subsets can be found in Hart (2009). Note that only subsets are included because some metadata was removed for privacy and relevance reasons. The process and method for each test was conducted in the following manner: http://owl.phy.queensu.ca/~phil/exiftool/ METADATA PROVENANCE AND VULNERABILITY | HART AND DE VRIES 27 https://doi.org/10.6017/ital.v36i4.10146 • Case study 1—JPEG o Original metadata extracted for comparison o Image copied, metadata extracted from copy and examined for changes o File uploaded to social media, downloaded from social media, extracted and examined against original • Case study 2—JPEG (modified) o Original metadata extracted for comparison o Image opened and modified in photo editing software (Adobe Photoshop), metadata extracted from new version and examined against original • Case study 3—PDF o Basic metadata extraction performed to establish what metadata are typically found in PDF files and what types of metadata could be possible • Case study 4—DOCX o Original metadata extracted for comparison o File saved as PDF through Microsoft Word and metadata compared to original o File converted to PDF through Adobe Acrobat and metadata compared to original Case Study 1 This case study investigated the everyday use of digital files, the first being simply copying a file. It was revealed that copying a file creates an exact copy of the original file and no changes in metadata aside from the creation and modification time/date. Thus, the copy could not be identified against the original unless the original creation time/date was known. The second everyday use was uploading an image to Facebook. The metadata-extraction tests revealed that the original file had approximately 265 metadata elements. (The approximation is caused by the ambiguity of certain elements that may be read as singular or multiple entries.) These elements included, but were not limited to, the following: • dates • technical metadata • creator/author information • color data • image attributes • creation-tool information • camera data • change • software history Many of the metadata elements had useful information for a range of situations. Even so, several metadata elements were missing that would require a user input for creation. Once the file had been uploaded to and then downloaded from social media, approximately 203 metadata elements were lost, included date, color, creation-tool information, camera data, change, and software history. It can be argued that removing some of this metadata would help keep user information private, but certain metadata should be retained, such as change and software history. These INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 28 metadata make it easier to differentiate fabricated images from authentic images and to know which modifications have been made to a file. For preservation purposes, the missing metadata is what may be needed to provide authenticity. This case study aims to make users aware of the significant risk of metadata loss when dealing with digital objects. If metadata are not identified and captured before the object is processed within a repository, the loss could be irreversible. Case Study 2 The second case study revealed how the change and software history metadata can be used to easily identify when a file has been modified. In the test conducted, it was evident by visually comparing the images that changes were made; however, modifications are not always obvious as some changes can be subtle, such as moving an element in the image that completely changes what the image is conveying. The following example displays the change history from the image used in case study 1, revealing how the metadata can easily identify modification: • History Action—saved, saved, saved, saved, converted, derived, saved • History When—The first saved was at 2010:02:11 21:59:05, the last saved was at 2010:02:11 22:12:01 with each action having its own timestamp • History Software Agent—Adobe Photoshop CS4 Windows for each action • History Parameters—Converted from TIFF to JPEG Further testing was conducted with simple photo manipulation using an original image to see firsthand the issues described in the initial test. The image contained approximately 178 metadata elements, including the typical metadata that were found in the first case study. Once the image was processed and modified with Adobe Photoshop CS5, the metadata were no longer identical. The modified image had approximately 201 metadata elements. The new elements included Photoshop-specific data, change, and software history. However, extensive camera data were lost. It can be argued that the camera data are not important for digital preservation because the lack of it will not hinder the preservation process. However, once the file is preserved and those data are lost, important technical and descriptive information can never be regained. For example, consider a spectacular digital image that captures an important moment in history. If that image is preserved for twenty years, in that time cameras and perhaps photography itself will have advanced dramatically. How digital images are captured and processed might be completely different and will most likely provide different results. Should someone wish to know how that preserved image was captured, they would need to know what camera was used, lens and shutter - speed data, lighting data, and other technical information. Preserving those metadata can be almost as important as preserving the file itself because each metadata element has importance and meaning to someone. As most viewers of online media are aware, photos are often modified, especially on social media. This is often performed on “selfies,” pictures taken of oneself. These can be modified to make the person in the photo look better or to hide features they see as flawed. Small modifications, such as covering some blemishes or improving the lighting have little effect on the image’s context, but some modifications and manipulations that can mislead people. These manipulated images often METADATA PROVENANCE AND VULNERABILITY | HART AND DE VRIES 29 https://doi.org/10.6017/ital.v36i4.10146 take the form of viral hoax images circulating around the web. For example, Figure 1 displays how two images can be combined into a composite image that changes the context of the image. Figure 1. Composite image. “Photo Tampering throughout History,” Fourandsix Technologies, 2003, http://pth.izitru.com/2003_04_00.html. The two images side by side are original photos taken in Basra of a British soldier gesturing to Iraqi civilians to take cover. In the right image, the Iraqi man is holding a child and seeking help from the solider; as you can see, this soldier does not interpret this as a hostile act. The image above is a composite of the two that changes the story. In this image, the soldier appears to be responding with hostility toward the man approaching. With basic photo manipulation, this soldier who is protecting innocent civilians is portrayed holding them against their will. Images like this circulate through media of all types, and although the exchangeable image file format (EXIF) metadata may not identify what has been done to the image, it would eliminate any doubt that the image has been modified. Unfortunately, these data are not made available. Making users aware of this vulnerability may improve detection of file manipulation at the time of ingest to better ensure only accurate and authentic material is being considered for preservation. Donations received by digital repositories such as libraries must be scrutinised by trained individuals. With this awareness and knowledge of metadata, they can perform their duties to a much higher standard. Case Study 3 The PDF metadata extraction provided interesting results. Over a range of tests on academic research papers, the main metadata identified consisted of PDF version, author, creator, creation date, modification date, and XMP (Adobe Extensible Metadata Platform) data. These metadata http://pth.izitru.com/2003_04_00.html INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 30 were not present in every PDF tested; in fact, the majority of PDF files seemed to be lacking important metadata. The author and creator fields were generally listed as “administrator” or “user” and bibliographic metadata was usually missing. However, PDF openly supports XMP embedding, therefore, bibliographic metadata could be embedded into the PDF. Through further testing, bibliographic metadata linked to the PDFs were discovered stored in online databases. Bibliographic software such as Endnote and Zotero allow metadata extraction, which enables users to import PDF files and automatically generate the appropriate bibliographic metadata. For example, Zotero performs this extraction by first searching for a match for the PDF on Google Scholar. If this search does not return a match, Zotero uses the embedded Digital Object Identifier (DOI) to perform the match. This method is not consistent: it often fails to retrieve any data, and in rare cases it retrieves the wrong data, which leads to incorrect references. Given what we saw happen to metadata when a file is uploaded such as in case study 1 and the nature of a PDF’s journey through template selection, editing, and publishing, it is no surprise that metadata are lost or diluted along the way. Case Study 4 The fourth case study conducted on DOCX files provided an extensive set of metadata, some of which are unique to this file type. Creating a new Word document via the File Explorer context menu and attempting to extract metadata resulted in an error as there were no readable metadata to extract until the file was accessed and saved. Once the file had some user input and was saved, the metadata were created and could be extracted. Microsoft Office files contain external XML files that holds information about the document, such as formatting data, user information, edit history, and information about the document’s page count, word count, etc. Picture a DOCX file as an uncompressed directory. However, using ExifTool on the DOCX file allowed retrieval of the metadata from all the hidden files. The metadata included creation, modification, and edit information, such as number of edits and total edit time. Every element within the document (e.g., text, images, tables, etc.) has its own metadata attached that are crucial for preserving the format of the document. The next step in the test involved converting the DOCX file into PDF using the following two methods: (1) converting the document via the “Publish” save option within Microsoft Word; and (2) “right clicking” the document and selecting the option to convert to an Adobe PDF. The results of the two methods varied slightly. Method 1 stripped all the metadata from the document and generated only default PDF metadata consisting of system metadata (file size, date, time, permissions) and the PDF version, author details, and document details. Method two behaved the same way except that some XMP metadata were created. Both methods resulted in no informative metadata remaining as the majority of the XMP elements were empty fields or contained generic values such as the computer name as the author. All formatting and metadata unique to Microsoft Word was lost. This case study is an enlightening example of what can happen to metadata when a file is changed from one format to another. METADATA PROVENANCE AND VULNERABILITY | HART AND DE VRIES 31 https://doi.org/10.6017/ital.v36i4.10146 HUMAN INTERVENTION The human element is a requirement in digital preservation as certain metadata, such as descriptive and administrative metadata, can only be created by humans. In fact, as Hart notes, user input is needed to record the majority of the digital preservation metadata. 12 The process can be tedious, as described by Wheatley. 13 One of the examples described included following the processes in a repository from ingest to access, beginning with the creation of metadata and the managerial tasks that are necessary. These tasks include using extraction tools and automation where possible. Using frameworks to record changes to metadata is required, and in some cases metadata must be stored externally to their digital objects. This allows multiple objects of the same type to utilise a generic set of metadata to avoid redundant data. However, although using a generic metadata set is convenient, a large collection of digital objects could be affected if the metadata is lost or damaged. The human element increases the risk of error drastically because there are numerous steps to metadata creation. Misconduct is also possible. Therefore, the less digital preservation is reliant on humans (and the easier the tasks are that require human input), the better. This can only be achieved by automating most process and training people to ensure they handle their responsibilities accurately, consistently, and completely. Learning the results from the case studies like those described in this paper will better prepare users working with digital objects. DISCUSSION To achieve the most authentic, consistent, and complete digital preservation, institutions must revise their preservation workflows and processes. This entails ensuring the initial processes within workflows are correct before processing digital content. The content must come from a credible source and have its authenticity approved. Participation from the donor of the digital content might be beneficial if they can provide information and metadata about the content. This information could provide additional context for the content as well as identify its history (e.g., format migration or modification). This is not always possible as the donor is not always be the creator of the digital content. If the original source is no longer available, as much information as possible should be gathered from the donor about the acquisition of the content and any information regarding the original source. This should be considered and carefully monitored throughout the lifecycle of digital content. Granted, if no changes are needed, devices such as write blockers can ensure this as they restrict users and any systems from making unwanted changes or “writes.” However, changes are sometimes unavoidable and (although it may not affect the content) detrimental. When changes are required, it is crucial to maintain the digital history by capturing all metadata added, removed, or modified during processing, commonly known as the “change history.” Donor participation should be stipulated in a donor agreement, something that each institution offers to all donors, sometimes in the form of agreements through communication and often with a structured document. Donor-agreement policies differ for each institution: some are quite detailed, allowing donors to carefully stipulate their conditions, whereas others place most of the INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 32 responsibility on the receiving institution. When dealing with sensitive or historic data of importance, policies should be in place to capture adequate data from the donor. When the content does not fall into this category, standard procedures, which should be present in all donor agreements and institution policies, can be followed. Institutions must also consider when to apply these steps as some transactions between donor and institution can follow standard protocol; others are more complex, such as donations of content with diverse provenance issues. CONCLUSION We have presented four case studies that illustrate how vulnerable digital-object metadata are. These examples show that common methods of handling files can cause irretrievable loss of important information. We discovered significant loss of metadata when uploading photos to social media and when converting a file to another format. The digital footprint left behind from photo manipulation was also exposed. We shed light on the bibliographic-metadata generation of PDF files, how they are obtained, and the surrounding issues. Action is needed to ensure proper metadata creation and preservation for born-digital objects. Librarians and Archivists must place a greater emphasis on why digital objects are preserved as well as how and when users may need to access them. Therefore, all types of metadata must be captured to allow users from all disciplines to take advantage of historical data in many years to come. Given the rate of technological change, we must be prepared; observing first-hand the vulnerability of metadata is a step toward a safer future for our digital history. REFERENCES 1 “Charter on the Preservation of Digital Heritage,” UNESCO, October 15, 2003, http://portal.unesco.org/en/ev.php- URL_ID=17721&URL_DO=DO_TOPIC&URL_SECTION=201.html. 2 K. Rechert et al., “bwFLA—A Functional Approach to Digital Preservation,” PIK—Praxis der Informationsverarbeitung und Kommunikation 35, no. 4 (2012), 259–67. 3 K. Rechert et al., Design and Development of an Emulation-Driven Access System for Reading Rooms, Archiving Conference, 2014, 126–31, Society for Imaging Science and Technology, 2014. 4 M. Phillips et al., The NDSA Levels of Digital Preservation: Explanation and Uses, Archiving Conference, 2013, 216–22, Society for Imaging Science and Technology, 2013. 5 “PREMIS: Preservation Metadata Maintenance Activity” Library of Congress, accessed March 10, 2016, http://www.loc.gov/standards/premis/. 6 R. Gartner and B. Lavoie, Preservation Metadata (2nd Edition) (York, UK: Digital Preservation Coalition, 2013), 5–6. http://portal.unesco.org/en/ev.php-URL_ID=17721&URL_DO=DO_TOPIC&URL_SECTION=201.html http://portal.unesco.org/en/ev.php-URL_ID=17721&URL_DO=DO_TOPIC&URL_SECTION=201.html http://www.loc.gov/standards/premis/ METADATA PROVENANCE AND VULNERABILITY | HART AND DE VRIES 33 https://doi.org/10.6017/ital.v36i4.10146 7 PREMIS Editorial Committee, PREMIS Data Dictionary for Preservation Metadata, Version 2.2 (Washington, DC: Library of Congress, 2012), http://www.loc.gov/standards/premis/v2/premis-2-2.pdf. 8 PREMIS Editorial Committee, PREMIS Schema, Version 3.0 (Washington, DC: Library of Congress, 2015), http://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf. 9 Timothy Hart, “Metadata Standard for Future Digital Preservation” (Honours thesis, Flinders University, Adelaide, Australia, 2015). 10 J. R. Smith and P. Schirling, “Metadata Standards Roundup,” IEEE MultiMedia 13, no 2 (April-June 2006): 84–88. 11 “Glossary,” Digital Preservation Coalition, accessed August 5, 2016, http://handbook.dpconline.org/glossary. 12 Timothy Hart, “Metadata Standard for Future Digital Preservation” (Honours thesis, Flinders University, Adelaide, Australia, 2015). 13 Paul Wheatley, “Institutional Repositories in the Context of Digital Preservation,” Microform & Digitization Review 33, no. 3 (2004): 135–46. http://www.loc.gov/standards/premis/v2/premis-2-2.pdf http://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf http://handbook.dpconline.org/glossary ABSTRACT INTRODUCTION METADATA TYPES Metadata Vulnerability—Case Studies Case Study 1 Case Study 2 Case Study 3 Case Study 4 HUMAN INTERVENTION DISCUSSION CONCLUSION REFERENCES 10160 ---- Academic Libraries on Social Media: Finding the Students and the Information They Want Heather Howard, Sarah Huber, Lisa Carter, and Elizabeth Moore INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 8 Heather Howard (howar198@purdue.edu) is Assistant Professor of Library Science; Sarah Huber (huber47@purdue.edu) is Assistant Professor of Library Science; Lisa Carter (carte241@purdue.edu) is Library Assistant; and Elizabeth Moore (moore658@purdue.edu) is Library Assistant and Student Supervisor at Purdue University. Librarians from Purdue University wanted to determine which social media platforms students use, which platforms they would like the library to use, and what content they would like to see from the library on each of these platforms. We conducted a survey at four of the nine campus libraries to determine student social media habits and preferences. Results show that students currently use Facebook, YouTube, and Snapchat more than other social media types; however, students responded that they would like to see the library on Facebook, Instagram, and Twitter. Students wanted nearly all types of content from the libraries on Facebook, Twitter, and Instagram, but they did not want to receive business news or content related to library resources on Snapchat. YouTube was seen as a resource for library service information. We intend to use this information to develop improved communication channels, a clear social media presence, and a cohesive message from all campus libraries. INTRODUCTION In his book Tell Everyone: Why We Share and Why It Matters, Alfred Hermida states, “People are not hooked on YouTube, Twitter or Facebook but on each other. Tools and services come and go; what is constant is our human urge to share.”1 Libraries are places of connection, where people connect with information, technologies, ideas, and each other. As such, libraries look for ways to increase this connection through communication. Social media is a key component of how students communicate with classmates, families, friends, and other external entities. It is essential for libraries to communicate with students regarding services, collections, events, library logistics, and more. Purdue University is a large, land-grant university located in West Lafayette, Indiana, with an enrollment of more than forty thousand. The Purdue Libraries consist of nine libraries, presented collectively on the social media platforms Facebook and Twitter since 2009 and YouTube since 2012. Going forward, the Purdue Libraries want to ensure it establishes a cohesive message and brand that is communicated to students on platforms they use and on which they will engage with it. The purpose of this study was to determine which social media platforms the students are currently using, which platforms they would like the library to use, and what content they would like to see from the libraries on each of these platforms. mailto:howar198@purdue.edu mailto:huber47@purdue.edu mailto:carte241@purdue.edu mailto:moore658@purdue.edu ACADEMIC LIBRARIES ON SOCIAL MEDIA | HOWARD, HUBER, CARTER, AND MOORE 9 https://doi.org/10.6017/ital.v37i1.10160 LITERATURE REVIEW Academic Libraries and Social Media Academic libraries have been slow to accept social media as a venue for either promoting their services or academic purposes. A 2007 study of 126 academic librarians found that only 12 percent of those surveyed “identified academic potential or possible benefits” of Facebook while 54 percent saw absolutely no value in social media.2 However, the mission of academic libraries has shifted in the last decade from being a repository of knowledge to being a conduit for information literacy; new roles include being a catalyst for on-campus collaboration and a facilitator for scholarly publication within contemporary academic librarianship.3 Academic librarians have responded to this change, with many now believing that “social media, which empowers libraries to connect with and engage its diverse stakeholder groups, has a vital role to play in moving academic libraries beyond their traditional borders and helping them engage new stakeholder groups.”4 Student Perceptions about Academic Libraries on Social Media As the use of social media has grown with college-aged students, so has an increasing acceptance of academic libraries using social media to communicate. A Pew Research Center report from 2005 showed just 7 percent of eighteen to twenty-nine year olds using social media. By 2016, 86 percent were using social media.5 In 2007 the OCLC asked 511 college students from six different countries to share their thoughts on libraries using social networking sites. This survey revealed that “most college students would be unlikely to participate in social networking services offered by a library,” with just 13 percent of students believing libraries have a place on social media.6 However, just two years later (in 2009), a shift was seen: students were open to connecting with academic libraries, as observed in a survey of 366 freshmen at Valparaiso University. When asked their thoughts on the library sending announcements and communications to them via Facebook or MySpace (a social media powerhouse at the time), 42.6 percent answered they would be “more receptive to information received in this way than any other response.” A smaller group, 12.3 percent, responded more negatively to this approach. Students showed concern for their privacy and the level of professionalism, as a quote from a student illustrates: “Facebook is to stay in touch with friends or teachers from the past. Email is for announcements. Stick with that!!!” 7 As students report becoming more open to academic libraries on social media, the question of whether they will engage through social media emerges. A recent study from Western Oregon University’s Hammersley Library asked this question with promising results. Forty percent of students said they were either “very likely “or “somewhat likely” to follow the library on Instagram and Twitter, as opposed to wanting communications being sent to them directly through social media (for example, a Facebook message). Pinterest followed, with 33 percent of students saying they were either “very likely” or “somewhat likely” to follow the library using this platform.8 Throughout the literature, students have shown an interest in information about the libraries that is useful to them. In another survey given to undergraduate students from three information technology classes at Florida State University, one question examined the perceived importance of different library social media postings to students. The report showed students considered postings related to operations updates, study support, and events as the most important.9 In the Hammersly study noted above, 78 percent and 87 percent of respondents said INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 10 they were either “very interested” or “somewhat interested,” respectively, in every category relating to library resources presented in the survey, but “interesting/fun websites and memes” received the least interest from participants.10 The literature shows an increase in students being receptive to academic libraries on social media. Results vary campus to campus and students are leery of libraries reaching out to them via social media, but they have an increasingly positive view about content posted that will help them with the library. RESEARCH QUESTIONS The aim of this project was to investigate the social media behaviors of Purdue University students as they relate to the libraries, and to develop evidence-based practices for managing the library’s social media accounts. The project focused on three research questions: 1. What social media platforms are students using? 2. What social media platforms do students want the library to use? 3. What kind of content do students want from the library on each of these platforms? METHODS We created the survey using the web-based Qualtrics survey software. It was distributed in electronic form only, and it was promoted to potential respondents via table tents in the libraries, bookmarks at the library desk, Facebook posts, and in-classroom promotion. Potential respondents were advised that the survey was anonymous and voluntary. The survey consisted of closed questions, though many questions contained an open-ended field for answers that did not fall into the provided choices. Inspiration for some of the options in our survey questions came from the Hammersly Library study, as we felt they did a good job capturing information about the social media usage of their patrons.11 Our survey asked what social media platforms students use, what they use them for, how often they visit the library, how likely they are to follow the library on social media, which platforms they want the library to have, and what content they would like from the library on each of those platforms. The social media platforms included were Facebook, Flickr, G+, Instagram, LinkedIn, Pinterest, Qzone, Renren, Snapchat, Tumblr, Twitter, YouTube, and Yik Yak.12 There were also open-ended spaces where participants could write in additional platforms. The survey originally ran for three weeks in only the business library early in the spring 2017 semester, as its intended purpose was to inform how the business library would manage social media. After that survey was completed, we decided to replicate the survey in three additional libraries (humanities, social science, and education; engineering; and the main undergraduate libraries). This was done to expand the dataset and reach additional students in a variety of disciplines. These libraries were chosen because they were the libraries in which the authors work, with the hope to expand to additional libraries in the future. The second survey also lasted for three weeks starting in mid-April of the spring 2017 semester. As a participation incentive, students who completed the initial survey and the second survey had an opportunity to enter a drawing for a $25 Visa gift card. ACADEMIC LIBRARIES ON SOCIAL MEDIA | HOWARD, HUBER, CARTER, AND MOORE 11 https://doi.org/10.6017/ital.v37i1.10160 The survey was advertised across four different campus libraries and promoted in several ways to reach different populations. Though the results are not from a random sample of the student population, the results are broad enough that we intend to apply them to our entire student population. RESULTS Survey The survey was completed by 128 students. An additional 13 students began the survey but did not complete it; we removed their results from the analysis. The breakdown of respondents was 10 percent freshmen (n = 13), 22 percent sophomore (n = 28), 27 percent junior (n = 35), 20 percent senior (n = 25), and 21 percent graduate or professional (n = 27). Library Usage The students were asked how frequently they visit the library to determine if the survey was reaching a population of regular or infrequent library visitors. The results showed that the students who completed the survey were primarily frequent library users, with 93 percent (n = 119) visiting once a week or more. Social Media Platforms The students were asked to identify which social media platforms they used and how frequently they used them. The most popular social media platforms were determined by combining the number of students who said they used them daily or weekly. The top five were Facebook (n = 114, 88 percent), YouTube (n = 102, 79 percent), Snapchat (n = 90, 70 percent), Instagram (n = 85, 66 percent), and Twitter (n = 41, 32 percent). Full results are in table 1. Table 1. Usage frequency by platform Social Media Platform Daily Weekly Monthly < Once per Month Never Facebook 94 (72.87%) 20 (15.50%) 5 (3.88%) 5 (3.88%) 4 (3.10%) Flickr 0 (0.00%) 1 (0.78%) 2 (1.55%) 8 (6.20%) 117 (90.70%) G+ 3 (2.33%) 6 (4.65%) 4 (3.10%) 16 (12.40%) 99 (76.74%) Instagram 68 (52.71%) 17 (13.18%) 5 (3.88%) 11 (8.53%) 27 (20.93%) LinkedIn 9 (6.98%) 29 (22.48) 22 (17.05%) 22 (17.05%) 46 (35.66%) Pinterest 12 (9.30%) 12 (9.30%) 16 (12.40%) 19 (14.73%) 69 (53.49%) Qzone 0 (0.00%) 0 (0.00%) 0 (0.00%) 4 (3.10%) 124 (96.12%) Renren 0 (0.00%) 0 (0.00%) 1 (0.78%) 3 (2.33%) 124 (96.12%) Snapchat 84 (65.12%) 6 (4.65%) 6 (4.65%) 7 (5.43%) 25 (19.38%) Tumblr 7 (5.43%) 2 (1.55%) 7 (5.43%) 11 (8.53%) 101 (78.29%) INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 12 Social Media Platform Daily Weekly Monthly < Once per Month Never Twitter 28 (21.71%) 13 (10.08%) 12 (9.30%) 9 (6.98%) 66 (51.16%) YouTube 58 (44.96%) 44 (34.11%) 15 (11.63%) 4 (3.10%) 7 (5.43%) Yik Yak 0 (0.00%) 0 (0.00%) 0 (0.00%) 11 (8.53%) 117 (90.70%) Other: Email 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Other: Groupme 3 (2.33%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Other: Reddit 2 (1.55%) 2 (1.55%) 0 (0.00%) 0 (0%) 0 (0.00%) Other: Skype 0 (0.00%) 0 (0.00%) 0 (0.00%) 1 (0.78%) 0 (0.00%) Other: Vine 0 (0.00%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Other: Wechat 3 (2.33%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Other: Weibo 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Other: Whatsapp 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Social Media Activity Next, students were asked how much time they spend on social media doing the following activities: watching videos, keeping in touch with friends/family, sharing photos, keeping in touch with classmates/professors, learning about campus events, doing research, getting news, or following public figures. Table 2 shows that students overwhelmingly use social media daily or weekly to watch videos (94 percent, n = 120), keep in touch with family/friends (93 percent, n = 119), and to get news (81 percent, n = 104). The least popular activities, those that students do less than once per month or never, were research (47 percent, n = 60) and to following public figures (34 percent, n = 45). Social Media and the Library The students were asked how likely they are to follow the libraries on social media. The response to this was primarily positive, with 57 percent of respondents saying they are either extremely likely or somewhat likely to follow the library. One response for this question was inexplicably null, so for this question n = 127. Figure 1 contains the full results. ACADEMIC LIBRARIES ON SOCIAL MEDIA | HOWARD, HUBER, CARTER, AND MOORE 13 https://doi.org/10.6017/ital.v37i1.10160 Table 2. Social media activity Social Media Activity Daily Weekly Monthly < Once per Month Never Watch videos 85 (66.41%) 35 (27.34%) 1 (0.78%) 4 (3.13%) 3 (2.34%) Keep in touch with friends/family 89 (69.53%) 30 (23.44%) 6 (4.69%) 2 (1.56%) 1 (0.78%) Share photos 32 (25%) 33 (25.78%) 38 (29.69%) 20 (15.63%) 5 (3.91%) Keep in touch with classmates/professors 34 (26.56% 47 (36.72%) 21 (16.41%) 19 (14.84%) 7 (5.47%) Learn about campus events 24 (18.75%) 53 (41.41%) 29 (22.66%) 18 (14.06%) 4 (3.13%) Do research 24 (18.75%) 26 (20.31%) 18 (14.06%) 23 (17.97%) 37 (28.91%) Get news 66 (51.56%) 38 (29.69%) 7 (5.47%) 9 (7.03%) 8 (6.25%) Follow public figures 34 (26.56%) 30 (23.44%) 20 (15.63%) 19 (14.84%) 24 (18.75%) Other 2 (1.56%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) Figure 1. Library social media follows. 12 66 23 16 10 0 10 20 30 40 50 60 70 Extremely likely Somewhat likely Neither likely nor unlikely Somewhat unlikely Extremely unlikely How likely are you to follow the library on social media? INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 14 The students were asked which social media platforms they thought the library should be on. Five rose to the top of the results: Facebook (82 percent, n = 105), Instagram (55 percent, n = 70), Twitter (40 percent, n = 51), Snapchat (34 percent, n = 44), and YouTube (29 percent, n = 37). Full results can be seen in figure 2. After a student selected a platform they wanted the library to be on, logic built into the survey then directed them to an additional question that asked what content they would like to see from the library on that platform. Content included library logistics (hours, events, etc.), research techniques and tips, how to use library resources and services, library resource info (database instruction/tips, journal availability, etc.), business news, library news (e.g., if the library wins an award), campus-wide info/events, and interesting/fun websites and memes. For Facebook, students widely selected all types of content, with the most selections made for library logistics (n = 73) and the fewest made for business news (n = 33). For Instagram, students wanted all content except business news (n = 18). Snapchat was similar, except along with business news (n = 8), students also were not interested in receiving content related to library resource information (n = 9). Twitter was similar to Facebook in that all content was widely selected. YouTube had a focus on library services, with the three most-selected content options being research techniques and tips (n = 20), how to use library resources and services (n = 19), and library resource info (n = 16). Table 3 contains the full results. Figure 2. Library social media presence. 105 7 70 23 10 1 1 44 5 51 37 0 20 40 60 80 100 120 Facebook G+ Instagram LinkedIn Pinterest Qzone Renren Snapchat Tumblr Twitter YouTube What social media platform should the library be on? ACADEMIC LIBRARIES ON SOCIAL MEDIA | HOWARD, HUBER, CARTER, AND MOORE 15 https://doi.org/10.6017/ital.v37i1.10160 Table 3. Library social media content by platform What type of content would you like to see from the library? Content Type F a c e b o o k (n = 1 0 5 ) G + (n = 7 ) In s ta g r a m (n = 7 0 ) L in k e d In (n = 2 3 ) P in te r e s t (n = 1 0 ) S n a p c h a t (n = 4 4 ) T u m b lr (n = 5 ) T w itte r (n = 5 1 ) Y o u T u b e (n = 3 7 ) Library logistics (hours, events, etc.) 73 (69.52%) 2 (28.57%) 34 (48.57%) 7 (30.43%) 4 (40%) 23 (52.27%) 2 (40%) 32 (62.75%) 8 (21.62%) Research techniques & tips 52 (49.52%) 3 (42.85%) 28 (40%) 13 (56.53%) 7 (70%) 19 (43.18%) 3 (60%) 27 (52.94%) 20 (54.05%) How to use library resources & services 53 (50.48%) 3 (42.85%) 26 (37.14%) 8 (34.78%) 7 (70%) 16 (36.36%) 3 (60%) 25 (49.02%) 19 (51.35%) Library resource info (database instruction/tips , journal availability, etc.) 53 (50.48%) 3 (42.85%) 22 (31.42%) 8 (34.78%) 6 (60%) 9 (20.45%) 2 (40%) 23 (45.10%) 16 (43.24%) Business news 33 (31.43%) 2 (28.57%) 18 (25.71%) 13 (56.52%) 3 (30%) 8 (18.18%) 2 (40%) 17 (33.33%) 7 (18.92%) Library news (e.g., if the library wins an award) 49 (46.67%) 3 (42.85%) 37 (52.86%) 12 (52.17%) 5 (50%) 19 (43.18%) 3 (60%) 24 (47.06%) 7 (18.92%) Campus-wide info/events 73 (69.52%) 3 (42.85%) 42 (60%) 5 (21.74%) 5 (50%) 26 (59.09%) 2 (40%) 35 (68.63%) 13 (35.14%) Interesting/fun websites & memes 48 (45.71%) 0 41 (58.57%) 2 (8.70%) 10 (100%) 30 (68.18%) 3 (60%) 26 (50.98%) 12 (32.43%) Other 1 (0.95%) 0 2 (2.86%) 0 1 (10%) 2 (4.55%) 0 2 (3.92%) 1 (2.70%) DISCUSSION Historically, libraries have used social media as a marketing tool.13 With social media’s ever- increasing popularity with young adults, academic libraries have actively established a presence on several platforms.14 Our survey shows that our students follow this trend, using social media regularly and for a variety of activities. We were surprised that Facebook turned out to be the INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 16 most widely used by our students, as much has been written in the last few years about teens and young adults leaving the platform.15 A November 2016 survey, however, found that 65 percent of teens said they used Facebook daily, a large increase from 59 percent in November 2014. Though Snapchat and Instagram preferred, teens continue to use Facebook for its utility in scheduling events or keeping in touch regarding homework.16 Students do seem receptive to following the library on different platforms and report wanting primarily library-related content from us, including more in-depth content such as research techniques and database instruction. LIMITATIONS AND FUTURE WORK Findings from this study give insight into opportunities for libraries to reach university students through social media. We acknowledge that only limited generalizations can be made because of the way the survey was conducted. Our internal recruitment methods led to a selection bias in our surveyed population, as advertisement of the survey took place either in the chosen libraries or on the Purdue Libraries’ existing Facebook page. Because of this, our sample consists primarily of students who visit the library or already follow the library on Facebook. We hope to alter this in future surveys by expanding our recruitment to other physical spaces across campus. In addition, we plan to add questions that first establish a better understanding of students’ opinions of libraries being on social media before asking what social media they would like to see libraries use. This would potentially avoid leading students to an answer. Further, we are concerned we took for granted students’ understanding of library resources; that is, we may have made distinctions librarians understand, but students may not. In future studies, we plan to rephrase, and possibly combine, questions in a way that will be clear to people less familiar with library resources and services. We believe confusion with these questions created contradictory responses. For example, “research help through social media” received a low response rate, but “information on research techniques and tips” received a much higher response rate. Additionally, a limitation of using a survey to collect behavior information is that respondents do not always report how they actually behave. Using methods such as focus groups, interviews, text mining, or usability studies could provide a more holistic view of student behavior. Duplication of this study on a yearly or semi-yearly basis across all libraries could help us see how social media preferences change over time and across a larger sample of our population. This study aimed to provide a broad view of a large university’s student body by surveying across different subject libraries. With the changes discussed, we think a revised survey could give us the detailed information we need to build a more effective social media strategy that reaches both library users and non-users. CONCLUSION This study improved our understanding of the social media usage and preferences of Purdue students. From these results, we intend to develop better communication channels, a clear social media presence, and a more cohesive message across the Purdue libraries. Under the direction of our new director of strategic communication, a social media committee was formed with representatives from each of the libraries to contribute content for social media. The committee will consider expanding the Purdue Libraries’ social media presence to communication channels where students have said they are and would like us to be. As social media usage is ever-changing, we recommend repeated surveys such as this to better understand where on social media students want to see their libraries and what information they want to receive from them. ACADEMIC LIBRARIES ON SOCIAL MEDIA | HOWARD, HUBER, CARTER, AND MOORE 17 https://doi.org/10.6017/ital.v37i1.10160 REFERENCES 1 Alfred Hermida, Tell Everyone: Why We Share and Why It Matters (Toronto: Doubleday Canada, 2014), 1. 2 Laurie Charnigo and Paula Barnett-Ellis, “Checking Out Facebook.com: The Impact of a Digital Trend on Academic Libraries,” Information Technology and Libraries 26, no. 1 (March 2007): 23–34, https://doi.org/10.6017/ital.v26i1.3286. 3 Stephen Bell, Lorcan Dempsey, and Barbara Fister, New Roles for the Road Ahead: Essays Commissioned for the ACRL’s 75th Anniversary (Chicago: Association of College and Research Libraries, 2015). 4 Amanda Harrison et al., “Social Media Use in Academic Libraries: A Phenomenological Study,” Journal of Academic Librarianship 43, no. 3 (May 1, 2017): 248–56, https://doi.org/10.1016/j.acalib.2017.02.014. 5 “Social Media Fact Sheet,” Pew Research Center, January 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 6 Online Computer Library Center, Sharing, Privacy and Trust in Our Networked World: A Report to the OCLC Membership, (Dublin, Ohio: OCLC, 2007)), https://eric.ed.gov/?id=ED532599. 7 Ruth Sara Connell, “Academic Libraries, Facebook and MySpace, and Student Outreach: A Survey of Student Opinion,” Portal: Libraries and the Academy 9, no. 1 (January 8, 2009): 25–36, https://doi.org/10.1353/pla.0.0036. 8 Elizabeth Brookbank, “So Much Social Media, So Little Time: Using Student Feedback to Guide Academic Library Social Media Strategy,” Journal of Electronic Resources Librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126X.2015.1092344. 9 Besiki Stvilia and Leila Gibradze, “Examining Undergraduate Students’ Priorities for Academic Library Services and Social Media Communication,” Journal of Academic Librarianship 43, no. 3 (May 1, 2017): 257–62, https://doi.org/10.1016/j.acalib.2017.02.013. 10 Brookbank, “So Much Social Media, So Little Time.” 11 Stvilia and Gibradze, “Examining Undergraduate Students’ Priorities.” 12 Qzone and Renren are Chinese social media platforms. 13 Curtis R. Rogers, “Social Media, Libraries, and Web 2.0: How American Libraries are Using New Tools for Public Relations and to Attract New Users,” South Carolina State Library, May 22, 2009, http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/SCSL_Social_Media_Libraries_20 09-5.pdf?sequence=1; Jakob Harnesk and Marie-Madeleine Salmon, “Social Media Usage in Libraries in Europe—Survey Findings,” LinkedIn SlideShare slideshow presentation, August https://doi.org/10.6017/ital.v26i1.3286 https://doi.org/10.1016/j.acalib.2017.02.014 http://www.pewinternet.org/fact-sheet/social-media/ https://eric.ed.gov/?id=ED532599 https://doi.org/10.1353/pla.0.0036 https://doi.org/10.1080/1941126X.2015.1092344 https://doi.org/10.1016/j.acalib.2017.02.013 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/SCSL_Social_Media_Libraries_2009-5.pdf?sequence=1 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/SCSL_Social_Media_Libraries_2009-5.pdf?sequence=1 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 18 10, 2010, https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe- survey-teaser. 14 “Social Media Fact Sheet.” 15 Daniel Miller, “Facebook’s so Uncool, but It’s Morphing into a Different Beast,” The Conversation, 2013, http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different- beast-21548; Ryan Bradley, “Understanding Facebook’s Lost Generation of Teens,” Fast Company, June 16, 2014, https://www.fastcompany.com/3031259/these-kids-today; Nico Lang, “Why Teens Are Leaving Facebook: It’s ‘Meaningless,’” Washington Post, February 21, 2015, https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens- are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662. 16 Alison McCarthy, “Survey Finds US Teens Upped Daily Facebook Usage in 2016,” eMarketer, January 28, 2017, https://www.emarketer.com/Article/Survey-Finds-US-Teens-Upped-Daily- Facebook-Usage-2016/1015053. https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 https://www.fastcompany.com/3031259/these-kids-today https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.emarketer.com/Article/Survey-Finds-US-Teens-Upped-Daily-Facebook-Usage-2016/1015053 https://www.emarketer.com/Article/Survey-Finds-US-Teens-Upped-Daily-Facebook-Usage-2016/1015053 Introduction Literature Review Academic Libraries and Social Media Student Perceptions about Academic Libraries on Social Media Research Questions Methods Results Survey Library Usage Social Media Platforms Social Media Activity Social Media and the Library Discussion Limitations and Future Work Conclusion References 10170 ---- The Provision of Mobile Services in US Urban Libraries Ya Jun Guo, Yan Quan Liu, and Arlene Bielefield INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 78 Ya Jun Guo (yadon0619@hotmail.com) is Associate Professor of Information and Library Science at Zhengzhou University of Aeronautics, China. Yan Quan Liu (liuy1@southernct.edu) is Professor of Information and Library Science at Southern Connecticut State University. Arlene Bielefield (bielefielda1@southernct.edu) is Professor in Information and Library Science at Southern Connecticut State University. . ABSTRACT To determine the present situation regarding services provided to mobile users in US urban libraries, the authors surveyed 138 Urban Libraries Council members utilizing a combination of mobile visits, content analysis, and librarian interviews. The results show that nearly 95% of these libraries have at least one mobile website, mobile catalog, or mobile app. The libraries actively applied new approaches to meet each local community’s remote-access needs via new technologies, including app download links, mobile reference services, scan ISBN, location navigation, and mobile printing. Mobile services that libraries provide today are timely, convenient, and universally applicable. INTRODUCTION The mobile internet has had a major impact on people’s lives and on how information is found located and accessed. Today, library patrons are untethered from and free of the limitations of the desktop computer.1 The popularity of mobile devices has changed the relationship between libraries and patrons. Mobile technology allows libraries to have the kind of connectivity with their patrons that did not exist previously. Patrons no longer think that it is necessary for them to be physically in the library building to use library services, and they are eager to obtain 24/7 access to library resources anywhere using their mobile devices. Mobile patrons need mobile libraries to provide them with services. In other words, “patrons want to have a library in their pocket.”2 As a result, libraries around the world are exploring and developing mobile services. According to the State of America’s Libraries 2017 report by the American Library Association, the 50 US states, the District of Columbia, and outlying territories have 8,895 public library administrative units (as well as 7,641 branches and bookmobiles). The vital role public libraries play in their communities has also expanded.3 As part of the main role of public libraries, US urban libraries need to embrace the developmental trend of the mobile internet to better serve their communities. The provision of mobile services in US urban libraries is worthy of study and is of great significance as a model for how other public libraries plan and implement their mobile services. mailto:yadon0619@hotmail.com mailto:liuy1@southernct.edu mailto:bielefielda1@southernct.edu THE PROVISION OF MOBILE SERVICES IN US URBAN LIBRARIES | GUO, LIU, AND BIELEFIELD 79 https://doi.org/10.6017/ital.v37i2.10170 LITERATURE REVIEW Definition and Types of Mobile Devices and Mobile Services As early as 1991, Mark Weiser proposed “ubiquitous computing,” pointing out how people could obtain and handle information at anytime, anywhere, and in any way.4 With this expectation, the possibilities of using personal digital assistants (PDAs) as mobile web browsers were researched in 1995.5 In combination with a wireless modem, library users are able to use PDAs to access information services whenever they are needed. Today, mobile devices are generally defined as units small enough to carry around in a pocket, falling into the categories of PDAs, mobile phones, and personal media players.6 For many researchers, laptops are not included in the definition of mobile devices. Although wireless laptops purportedly offer the opportunity to go “anywhere in the home,” laptops are generally used in a small set of locations, rather than moving fluidly through the home; wireless laptops are portable, but not mobile.7 In contrast, Lippincott suggested that mobile devices should include laptops, netbooks, notebook computers, cell phones, audio players such as MP3 players, cameras, and other items.8 According to the “Mobile Strategy Report” by the California Digital Library, mobile phones, e-readers, MP3 players, tablets, gaming devices, and PDAs are common mobile devices.9 Each mobile device has its own characteristics and the potential to connect to the internet from anywhere with a Wi-Fi network, driving widespread use and thus the provision of library mobile services. Mobile services are services libraries offer to patrons via their mobile devices. These services as described herein comprise two categories: traditional library services modified to be available via mobile devices and services created for mobile devices.10 Pope et al. listed several mobile services, including SMS or text-messaging services, the My Info Quest Project, digital collections, audiobooks, applications, and mobile-friendly websites.11 The California Digital Library pointed out that a growing number of university and public libraries are offering mobile services. Libraries are creating mobile versions of library websites, using text messaging to communicate with patrons, developing mobile catalog searching, providing access to resources, and creating new tools and services, particularly for mobile devices.12 The most recognized mobile services in university libraries are mobile sites, mobile apps, mobile OPACs, mobile access to databases, text messaging services, QR codes, augmented reality, and e - books.13 Both academic and public libraries’ use of Web 2.0 applications and services include blogs, wikis, phone apps, QR codes, mash-ups, video or audio sharing, customized webpages, social media and social networking, and types of social tagging.14 This study focuses on the two most common mobile devices, mobile phones and tablets, and on the services provided to library patrons and local communities through mobile websites, mobile apps, and mobile catalogs. Status of Mobile Services in US Libraries Mobile devices present a new and exciting opportunity for libraries of all types to provide information to people of all ages on the go, wherever they are.15 It is generally observed that there is an increased use of mobile technology in the library environment. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 80 Librarians see their users increasingly using mobile phones instead of laptops and desktop computers to search the catalog, check the library’s opening hours, and maintain contact with library staff.16 In an earlier investigation of 766 librarians, Spires found that there was very little demand for services for mobile devices as of August 2007. At that time, relatively few libraries (18%) purchased content specifically for wireless handheld device use, and very few libraries (15%) reformatted content for these devices.17 However, a survey of public libraries completed by the American Library Association between September and November 2011 indicated interesting changes: 15% of library websites are optimized for mobile devices, and 12% of libraries use scanned codes (e.g. QR codes), and 7% of libraries have developed smartphone applications for access to library services; 36% of urban libraries have websites optimized for mobile devices, compared to 9% of rural libraries; 76% of libraries offer access to e-books; 70% of libraries use social networking tools such as Facebook. 18 Later studies revealed more significant changes. 99 Association of Research Libraries member libraries were surveyed in 2012 to identify how many had optimized at least some services for the mobile web. Apps were not investigated. The result showed that 83 libraries (84%) had a mobile website.19 A study in 2015 by Liu and Briggs showed that the top 100 university libraries in the United States offered one or more mobile services, with mobile websites, mobile access to the library catalog, mobile access to the library’s databases, e-books, and text messaging services being the most common. QR codes and augmented reality were less common.20 Kim noted that “libraries are acknowledging that people expect to do just about everything on mobile devices and that more and more people are now using a mobile device as their primary access point for the Web.”21 Although librarians may have previously underestimated what people wanted to do using mobile devices, there is a growing understanding of the potential of these access points. RESEARCH DESIGN Survey Samples While a growing number of users tend to access information remotely, urban libraries, as the most popular public-sector institutions and community centers, are facing great challenges in addressing the growing need for mobile services. The Urban Libraries Council (ULC) (https://www.urbanlibraries.org), as an authoritative source founded in 1971, is the premier membership association of North America’s leading public library systems. ULC’s member libraries are in communities throughout the United States and Canada, comprising a mix of institutions with varying revenue sources and governance structures, and serving communities with populations of differing sizes. ULC’s website lists 145 US and Canadian urban libraries. Since this study focused only on US urban libraries, 138 libraries were chosen as the study targets, and all were examined. https://www.urbanlibraries.org/ THE PROVISION OF MOBILE SERVICES IN US URBAN LIBRARIES | GUO, LIU, AND BIELEFIELD 81 https://doi.org/10.6017/ital.v37i2.10170 Table 1. The survey and examples of survey results. Contents Options Example No.1: Pima County Public Library … Example No.138: Milwaukee Public Library Components of mobile websites 1 Account login; 2 Catalog search; 3 Contact us; 4 Downloadables; 5 Events; 6 Interlibrary loan; 7 Kids & teens; 8 Locations and hours; 9 Meeting room; 10 Recent arrivals; 11 Recommendations; 12 Social media; 13 Suggest a purchase; 14 Support 1, 2, 3, 4, 5, 7, 8, 9, 10, 12, 13, 14. 1, 2, 3, 4, 5, 7, 8, 9, 12, 13, 14. Components of mobile apps 1 Account login; 2 Barcode Wallet; 3 Bestsellers; 4 Catalog search; 5 Contact us; 6 Downloadables; 7 Events; 8 Full website; 9 Interlibrary loan; 10 Just ordered; 11 Kids & teens; 12 Locations and hours; 13 Meeting room; 14 My Bookshelf; 15 My library; 16 Pay fines; 17 Popular this week; 18 Recent arrivals; 19 Recommendations; 20 Scan ISBN; 21 Social media; 22 Suggest a purchase; 21 Support 1, 4, 5, 6, 7, 8, 12, 15, 18, 20, 21. 1, 4, 5, 6, 7, 8, 12, 17, 20, 21. Mobile reference services 1 Chat/IM; 2 Social Medias; 3 Text/SMS; 4 Web Form -- 1, 3, 4. Social media 1 Blog; 2 Facebook; 3 Flickr; 4 Goodreads; 5 Google+; 6 Instagram; 7 LinkedIn; 8 Pinterest; 9 Tumblr; 10 Twitter; 11 YouTube 1, 2, 3, 6, 8, 10, 11. 1, 2, 6, 8, 10. Mobile reservation services 1 Reserve a computer; 2 Reserve a librarian; 3 Reserve a meeting room; 4 Reserve a museum pass; 5 Reserve a study room; 6 Reserve exhibit space -- 3. Mobile printing 1 Mobile printing; 2 No mobile/ Wi-Fi printing; 3 Wi- Fi printing 3. 2. Apps or databases 1 Axis 360; 2 BiblioBoard; 3 BookFlix;4 Brainfuse; 5 Career Transitions; 6 Cloud Library; 7 Driving -Tests.org; 8 EBSCOhost; 9 Flipster; 10 Freading; 11 Freegal; 12 Gale Virtual; 13 Hoopla; 14 Instant Flix; 15 Learning Express; 16 Lynda.com; 17 Mango Languages; 18 Master FILE; 19 Morningstar; 20 New York Times; 21 NoveList; 22 One Click Digital; 23 Overdrive; 24 Reference USA; 25 Safari; 26 Tumble Book; 27 Tutor.com; 28 World Book; 29 WorldCat; 30 Zinio. 4, 11, 14, 22, 23, 26, 28, 30. 4, 8, 11,12, 13, 15, 17, 18, 19, 21, 23, 24, 30. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 82 Survey Methods As mobile services are offered basically via wireless systems and mobile devices, a combination of research methods, including mobile website visits, content analysis, and librarian interviews, were applied for data collection. Specifically, librarian interviews were employed as a verification and supplemental process to ensure that survey data were accurate and exhaustive. First, the authors utilized an iPhone, an Android mobile phone, and an iPad to access the websites of the 138 US urban libraries in the study sample to ascertain if these libraries have mobile websites or mobile catalogs and whether the platforms are operated properly. Then the authors checked whether these libraries have mobile apps that can be downloaded from the Apple app store or the Google Play store. The survey was conducted from June 18 to June 24, 2017. Next, the authors went through all the mobile websites and the mobile apps the libraries provide to check the mobile services offered. The authors used a specially designed survey to collect data about each library’s mobile website and app (see table 1). The procedure of survey content analysis was conducted between June 25 and July 24, 2017, with the examination of each library’s services taking approximately 30 minutes. Finally, for those libraries that had no mobile websites or mobile apps found through the website visits, the authors made interview requests to staff librarians via their online reference services such as live chat, web form and email. An additional purpose of this step was to confirm the accuracy of the survey data collected from website visits. The survey was conducted from July 22 to August 3, 2017. RESULTS AND ANALYSIS Results from the examination of mobile website visits, content analysis, and librarian interviews revealed what services US urban libraries provided as mobile services, how they were provided, and which were commonly provided. How Many Libraries Provide Mobile Services? Over 83% of US urban libraries have developed their own mobile websites (see figure 1) for communities they serve. The mobile website is currently the most popular service platform for mobile users. THE PROVISION OF MOBILE SERVICES IN US URBAN LIBRARIES | GUO, LIU, AND BIELEFIELD 83 https://doi.org/10.6017/ital.v37i2.10170 Figure 1. Types of mobile services provided by libraries. Promisingly, each test of these websites through the authors’ mobile devices, either smartphones or tablets, confirmed that all the study subjects can be accessed 100% of the time. These library websites, however, are not entirely built specially for mobile devices. While the majority of urban libraries have transformed their desktop websites into mobile sites with proper responsive design, about 17% are just smaller versions of their desktop websites (see figure 2). A responsive mobile website can react or change according to the needs of the users and the mobile device they’re viewing it on to achieve a good layout and content display. Here, text and images change from a three-column to a single-column layout, and unnecessary images are hidden. The web address of a responsively designed mobile website is the same as the desktop website. Responsive design is described as a long-term solution for addressing both designers’ and users’ needs.22 The survey found that 59% of libraries now have apps. Our analysis of the earliest version of apps records indicate that Los Angeles Public Library was the first to use an app, in August 2010. Mobile apps have advantages and disadvantages compared to mobile websites, and many libraries compared them and chose between the two. Skokie (Illinois) Public Library, as of October 2015, is no longer supporting the library’s mobile app because they claim the library’s website offers a better mobile experience. They also offer an easy access solution like that for a mobile app, with a message displayed to users: “Miss having an icon on your home screen? Bookmark the site to your home screen and you’ll have an icon to take you directly to this site.” 83% 59% 22% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Mobile website Mobile app Mobile catalog INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 84 Figure 2. The smaller versions of the desktop website and the specially designed mobile website The proportion of libraries providing mobile catalog services is only 22%. Libraries can use multiple options to create one or more mobile service platforms. Nearly half (46%) of US urban libraries have both mobile websites and mobile apps. According to the survey, 95% of libraries have at least one mobile website, mobile catalog, or mobile app. A survey the authors conducted in April 2014 found that only 81% of the urban libraries had at least one mobile website, mobile catalog, or mobile app (see figure 3). Clearly, libraries are paying increasing attention to mobile services, and providing mobile services has become the unavoidable choice of libraries nowadays. Figure 3. Changes in the proportion of libraries that provide mobile services from 2014 to 2017. 19% 81% 2014 No mobile services At least one mobile service 5% 95% 2017 No mobile services At least one mobile service THE PROVISION OF MOBILE SERVICES IN US URBAN LIBRARIES | GUO, LIU, AND BIELEFIELD 85 https://doi.org/10.6017/ital.v37i2.10170 What Content do the Mobile Websites Offer? Through mobile website visits and content analysis, it was found that some types of information are available at all libraries, including “Account login,” “Events,” “Locations and hours,” “Contact us,” and “Social media” (see figure 4). Figure 4. Components of mobile websites The proportion of library mobile sites that offer “Support” and “Downloadables” is 96% and 95%, respectively. Among them, “Support” generally includes donations to the library foundation, donation of books and other materials, and providing volunteer services; “Downloadables” generally include e-books, e-magazines, and music. A total of 86% of the urban libraries set up “Kids” and “Teens” sections, providing specialized information services, such as storytime, games, events, book lists, homework help, volunteer information, and college information. A majority (62%) of libraries provide interlibrary loan information on mobile websites, but one library, Palo Alto (California) City Library, no longer offers the costly Interlibrary loan service as of July 2011. More than half (56%) of the libraries set up a “Suggest a purchase” function and generally ask readers to provide title, author, publisher, year published, format, and other information in web form. Some libraries display “Recommendations” (26%) on their mobile websites. Denver Public Library has a special column recommending books for children and teenagers and offers personalized reading suggestions: “Tell us what you like to read and we’ll send you our recommendations in about a week.” Many mobile websites will pop hints to the libraries’ mobile apps and link to the Apple app store or the Google Play store after automatically identifying the user’s mobile phone operating system. This is helpful for promoting the use of the libraries’ apps, and it also provides great convenience for users. 100% 100% 100% 100% 100% 99% 96% 95% 86% 74% 62% 56% 32% 26% 0% 20% 40% 60% 80% 100% Account login Events Locations and hours Contact us Social media Catalog search Support Downloadables Kids & teens Meeting room Interlibrary loan Suggest a purchase Recent arrivals Recommendations http://www.marinlibrary.org/events/?trumbaEmbed=filter3%3DStorytimes INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 86 What Content do the Mobile Apps Offer? The content of mobile websites in libraries is basically the same, but the content of their mobile apps varies widely. The reason is that the understanding of the various libraries about the functions an app should offer differs from one library to another. Some of these apps were designed by software vendors, such as Boopsie, SirsiDynix, and BiblioCommoms, but some were designed by the libraries themselves, leading to the absence of a uniform standard or template for the app design. Survey results show that only “Account login” and “Catalog search” are available in all apps (see figure 5). “Locations and hours” accounts for a high proportion of apps at 96%. The “Locations” feature in many libraries apps, with the help of GPS, helps users find their nearest library location. Figure 5. Components of mobile apps About 85% of apps provide “Contact us.” Click “Contact us” in Poudre River Public Library District and some other libraries’ apps, and you can directly call the library or send text messages via e- mail. “Scan ISBN” is a unique feature of mobile apps, and 75% of apps provide this functionality. If a library user finds a book they need in a bookstore or elsewhere, they can scan the ISBN to can see if that book is in the library’s collection. Apps designed by BiblioCommoms all have “Bestsellers”, “Recently Reviewed”, “Just Ordered” and “My library” (See chart Figure 6). In “My library,” the “Checked Out” section contains red alerts for “Overdue,” yellow alerts for “Due Soon,” and “Total items.” The “Holds” section contains “Ready for pickup,” “Active holds,” and “Paused holds.”. The “My Shelves” section contains “Completed,” “In Progress,” and “For Later.” In this way, users can clearly see the details of the books they have 100% 100% 96% 89% 85% 77% 75% 68% 46% 27% 24% 19% 18% 16% 16% 10% 6% 5% 3% 0% 20% 40% 60% 80% 100% Account login Catalog search Locations and hours Downloadables Contact us Events Scan ISBN Social media Full website Recent arrivals Bestsellers Recently reviewed Popular this week Just ordered My library My Bookshelf Pay fines Barcode wallet Kids & teens THE PROVISION OF MOBILE SERVICES IN US URBAN LIBRARIES | GUO, LIU, AND BIELEFIELD 87 https://doi.org/10.6017/ital.v37i2.10170 borrowed and intend to borrow. Apps designed by Boopsie generally have “Popular this week” to tell users which books have been borrowed more recently. Figure 6. An app designed by BiblioCommoms. Only 3% of apps have “Kids” and “Teens” sections, which differs greatly from the percentage of mobile websites that offer those sections (86%). What Mobile Reference Services do Libraries Provide? According to the survey, the most common way for US urban libraries to provide mobile reference service is a web form, which is available in 86% of surveyed libraries (see figure 7). Related to “Call us,” a web form has the advantage of being independent from the library’s working hours. Although users fill out and submit a web form, it is similar to email and, generally, librarians respond to the user’s e-mail address, but it does not require users to enter their own email system, as they only need to fill in the content required by the web form. Therefore, it is more convenient to use. The authors believe that providing only an email address is not mobile reference service. The survey found that 6% of libraries do not have mobile reference services. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 88 Figure 7. Mobile reference services provided by libraries. Currently, 43% of libraries offer chat and instant messaging (IM) services, which allow users to communicate with librarians instantly. For example, when Gwinnett County (Georgia) Public Library’s mobile website is visited, an “Ask Us” dialog box appears in the upper right corner of the site, which allows visitors to chat with librarians. Outside of the library’s work hours, the box displays “sorry, chat is offline but you can still get help” (see figure 8). The County of Los Angeles Public Library provides four options for IM. They are AIM, Google Talk, Yahoo! Messenger, and MSN Messenger. Figure 8. “Ask Us” on Gwinnett County Public Library’s mobile website 86% 43% 33% 8% 0% 20% 40% 60% 80% 100% Web form Chat/IM Text/SMS Social media THE PROVISION OF MOBILE SERVICES IN US URBAN LIBRARIES | GUO, LIU, AND BIELEFIELD 89 https://doi.org/10.6017/ital.v37i2.10170 All the Florida urban libraries surveyed offer reference services via the web form, chat, and text because an “Ask a Librarian” service administered by the Tampa Bay Library Consortium provides Florida residents with those mobile reference services. The survey shows that only 8% of the libraries provide social media reference service in “Ask a librarian.” The social media that provides reference service is either Facebook or Twitter. In fact, 100% of libraries have social media, and 100% of libraries have Facebook and Twitter, but most libraries do not use them to provide reference services. What Social Media do the Libraries Use? Survey results showed that 100% of mobile websites display links to their social media, usually in the prominent position of the front page of the websites; 68% of apps have social media links. Facebook and Twitter are social media leaders, and now all libraries’ mobile websites have both (see figure 9). The survey conducted in 2014 showed that Facebook and Twitter had the highest occupancy rate, but only 61% of libraries offered Facebook and 53% offered Twitter. It is obvious that libraries have made great progress in the last three years in the application of social media. Figure 9. Social media being used by libraries. Instagram and Pinterest are both photo social media, and they are used 76% and 49%, respectively. As the leading social media in the video field, YouTube is used by 67% of libraries. What Mobile Reservation Services do Libraries Provide? Mobile reservation services were found in 78% of all libraries’ mobile services. A majority (62%) of the libraries allow online reservation of a meeting room via web form or other forms, and 14% allow reserving a study room (see figure 10). Some libraries only reserve a study or meeting room via phone. 100% 100% 76% 67% 57% 49% 41% 19% 12% 12% 9% 0% 20% 40% 60% 80% 100% Facebook Twitter Instagram YouTube Blog Pinterest Flickr Tumblr LinkedIn Google+ Goodreads INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 90 Figure 10. Mobile reservation services provided by libraries. A few libraries provide instant online access to free and low-cost tickets to museums, science centers, zoos, theatres, and other fun local cultural venues with Discover & Go. A total of 14% of the libraries provide “reserve a librarian” service, allowing patrons to reserve a free session with a reference librarian or subject specialist at the library. In addition, several libraries, such as Pasadena Public Library, allow reserving of exhibit space. How Many Libraries Provide Mobile Printing? Mobile printing services allow patrons to print to a library printer from outside the library or from their mobile device. Patrons’ print jobs are available for pick up at the library. Already, 43% of the libraries provide mobile printing service (see figure 11). It is expected that more libraries will provide this service. To print from a mobile device, patrons need to download an app that supports mobile printing. PrinterOn is the more commonly used app, which has been used by Oakland Public Library, and San Mateo County (California) Libraries, and others. However, San Diego Public Library uses the Your Print Cloud print system, and Santa Clara County (California) Library uses Smart Alec. San Mateo County Libraries offers wireless printing from smartphones, tablets, and laptops at all of its locations, and its wireless printing includes mobile printing, web printing, and email printing. In addition, 14% of libraries offer wireless printing services but do not provide mobile printing services. For example, Live Oak Public Libraries in Savannah, Georgia, states that printing from laptops (PC and Mac) is available in all branches, but they don’t have apps that support printing from tablets or mobile phones. 62% 20% 15% 14% 14% 4% 0% 10% 20% 30% 40% 50% 60% 70% Reserve a meeting room Reserve a computer Reserve a museum pass Reserve a study room Reserve a librarian Reserve exhibit space THE PROVISION OF MOBILE SERVICES IN US URBAN LIBRARIES | GUO, LIU, AND BIELEFIELD 91 https://doi.org/10.6017/ital.v37i2.10170 Figure 11. The proportion of libraries that offer mobile printing. What Apps or Databases do Libraries Provide for Patrons? Four main software programs found to be used to display e-books of the surveyed libraries are Overdrive (93%), Hoopla (64%), Tumblebook (61%), and Cloud Library (48%). For audiobooks, Overdrive (93%) and Hoopla (64%) are the most popular; oneclickdigital is used by 48%. Most libraries (74%) use Zinio for e-magazines, and 48% use the music software Freegal. Overdrive is the most common application in libraries (see table 2). Table 2. The proportion of apps or databases being used in libraries. Apps or Databases % of Libraries Providing Apps or Databases % of Libraries Providing Overdrive 93 World Book 46 NoveList 79 New York Times 44 ReferenceUSA 74 MasterFILE 43 Zinio 74 EBSCOhost 43 LearningExpress 69 Flipster 29 Gale Virtual 68 BookFlix 28 Hoopla 64 Brainfuse 22 Morningstar 64 Tutor.com 17 Mango Languages 61 Safari 17 TumbleBook 61 Driving-Tests.org 16 Lynda.com 57 BiblioBoard 12 WorldCat 51 Career Transitions 12 Freegal 48 Axis 360 11 OneClick Digital 48 InstantFlix 10 Cloud Library 48 Freading 9 Mobile printing 43% No Wireless/mobile printing 42% Wireless printing 14% INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 92 The libraries provide users with various types of databases. Survey statistics show that the widely used databases include ReferenceUSA (business), Mango Languages (language learning), LearningExpress and Career Transitions (job and career), Lynda.com and Tutor.com (education), Morningstar (investment), World Book (encyclopedias), WorldCat (library resources worldwide), New York Times (newspaper articles), Driving-Tests.org (testing preparation), and Safari (technology). CONCLUSION This study shows that mobile services have become popular in US urban libraries as of summer 2017, with 95% offering one or more types of mobile service. Responsive mobile websites and mobile apps are the main platforms of current mobile services. The US urban libraries are terribly striving to meet local community’s remote access needs via new technologies. Compared with desktop websites, mobile websites and apps for mobile devices offer services that are more accessible, smarter and interactive for local users. Some mobile websites automatically prompt the user to install the libraries’ apps; many libraries’ apps offer the “Scan ISBN” function, making it convenient for the user to scan a book title at any time to see if it is in the library’s collection; “Location” provides GPS positioning and navigation services for users; “Contact us” can directly link telephone, text, and email. Libraries are actively developing and adding more mobile services, such as mobile reservation services and mobile printing services. The development of mobile technology has provided the support for libraries to offer mobile services. A future world of users accessing services provided by the libraries at anytime, anywhere, and in any way is getting closer and closer. ACKNOWLEDGEMENTS This work was supported by grant no. 14CTQ028 from the National Social Science Foundation of China. REFERENCES 1Jason Griffey, Mobile Technology and Libraries (New York: Neal-Schuman, 2010). 2Meredith Farkas, “A Library in Your Pocket,” American Libraries no. 41 (2010): 38. 3American Library Association, “The State of America’s Libraries 2017: A Report from the American Library Association,” special report, American Libraries, April 2017, http://www.ala.org/news/sites/ala.org.news/files/content/State-of-Americas-Libraries- Report-2017.pdf. 4Mark Weiser, “The Computer for the 21st Century,” Scientific American 265, no. 3 (1991): 94–104. 5Stefan Gessler and Andreas Kotulla, “PDAs as mobile WWW browsers,” Computer Networks and ISDN Systems 28, no. 1–2 (1995): 53–59. 6Georgina Parsons, “Information Provision for HE Distance Learners using Mobile Devices,” Electronic Library 28, no. 2 (2010): 231–44, https://doi.org/10.1108/02640471011033594. http://www.ala.org/news/sites/ala.org.news/files/content/State-of-Americas-Libraries-Report-2017.pdf http://www.ala.org/news/sites/ala.org.news/files/content/State-of-Americas-Libraries-Report-2017.pdf https://doi.org/10.1108/02640471011033594 THE PROVISION OF MOBILE SERVICES IN US URBAN LIBRARIES | GUO, LIU, AND BIELEFIELD 93 https://doi.org/10.6017/ital.v37i2.10170 7Allison Woodruff et al., “Portable, but Not Mobile: A Study of Wireless Laptops in the Home,” International Conference on Pervasive Computing 4480 (2007): 216–33, https://doi.org/10.1007/978-3-540-72037-9_13. 8Joan K. Lippincott, “A Mobile Future for Academic Libraries,” Reference Services Review 38, no. 2 (2010): 205–13. 9Rachel Hu and Alison Meir, “Mobile Strategy Report,” California Digital Library, August 18, 2010, https://confluence.ucop.edu/download/attachments/26476757/CDL+Mobile+Device+User+R esearch_final.pdf?version=1. 10Yan Quan Liu and Sarah Briggs, “A Library in the Palm of Your Hand: Mobile Services in Top 100 University Libraries,” Information Technology & Libraries 34, no. 2 (2015): 133–48, https://doi.org/10.6017/ital.v34i2.5650. 11Kitty Pope et al., “Twenty-First Century Library MUST-HAVES: Mobile Library Services,” Searcher 18, no. 3 (2010): 44–47. 12Hu and Meir, “Mobile Strategy Report.” 13Qian and Briggs, “A Library in the Palm of Your Hand.” 14Kalah Rogers, “Academic and Public Libraries’ Use of Web 2.0 Applications and Services in Mississippi,” SLIS Connecting 4, no. 1 (2015), https://doi.org/10.18785/slis.0401.08. 15 Pope et al., “Twenty-First Century Library MUST-HAVES.” 16Lorraine Paterson and Low Boon, “Usability Inspection of Digital Libraries: A Case Study,” Ariadne 63, no. 1 (2010): 11, https://doi.org/10.1007/s00799-003-0074-4. [website lists H. Rex Hartson, Priya Shivakumar, and Manuel A. Pérez-Quiñones as the authors] 17Todd Spires, “Handheld Librarians: A survey of Librarian and Library Patron Use of Wireless Handheld Devices,” Internet Reference Services Quarterly 13, no. 4 (2008): 287–309, https://doi.org/10.1080/10875300802326327. 18 American Library Association, “Libraries Connect Communities 2011-2012,” Last modified June, 2012, http://connect.ala.org/files/68293/2012.67B%20PLFTS%20Results.pdf. 19Barry Trott and Rebecca Jackson, “Mobile Academic Libraries,” Reference & User Services Quarterly 52, no. 3 (2013): 174–78. 20 Liu and Briggs, “A Library in the Palm of Your Hand.” 21Bohyun Kim, “The Present and Future of the Library Mobile Experience,” Library Technology Reports 49, no. 6 (2013): 15–28. 22Hannah Gascho Rempel and Laurie Bridges, “That Was Then, This Is Now: Replacing the Mobile- Optimized Site with Responsive Design,” Information Technology & Libraries 32, no. 4 (2013): 8–24, https://doi.org/10.6017/ital.v32i4.4636. https://doi.org/10.1007/978-3-540-72037-9_13 https://confluence.ucop.edu/download/attachments/26476757/CDL+Mobile+Device+User+Research_final.pdf?version=1 https://confluence.ucop.edu/download/attachments/26476757/CDL+Mobile+Device+User+Research_final.pdf?version=1 https://doi.org/10.6017/ital.v34i2.5650 https://doi.org/10.18785/slis.0401.08 https://doi.org/10.1007/s00799-003-0074-4 https://doi.org/10.1080/10875300802326327 http://connect.ala.org/files/68293/2012.67B%20PLFTS%20Results.pdf https://doi.org/10.6017/ital.v32i4.4636 ABSTRACT INTRODUCTION LITERATURE REVIEW Definition and Types of Mobile Devices and Mobile Services Status of Mobile Services in US Libraries RESEARCH DESIGN Survey Samples Survey Methods RESULTS AND ANALYSIS How Many Libraries Provide Mobile Services? What Content do the Mobile Websites Offer? What Content do the Mobile Apps Offer? What Mobile Reference Services do Libraries Provide? What Social Media do the Libraries Use? What Mobile Reservation Services do Libraries Provide? How Many Libraries Provide Mobile Printing? What Apps or Databases do Libraries Provide for Patrons? CONCLUSION Acknowledgements REFERENCES 10177 ---- Efficiently Processing and Storing Library Linked Data using Apache Spark and Parquet Kumar Sharma, Ujjal Marjit, and Utpal Biswas INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 29 Kumar Sharma (kumar.asom@gmail.com) is Research Scholar, Department of Computer Science and Engineering; Ujjal Marjit (marjitujjal@gmail.com) is System-in-Charge, Center for Information Resource Management (CIRM); and Utpal Biswas (utpal01in@yahoo.com) is Professor, Department of Computer Science and Engineering, the University of Kalyani, India. ABSTRACT Resource Description Framework (RDF) is a commonly used data model in the Semantic Web environment. Libraries and various other communities have been using the RDF data model to store valuable data after it is extracted from traditional storage systems. However, because of the large volume of the data, processing and storing it is becoming a nightmare for traditional data- management tools. This challenge demands a scalable and distributed system that can manage data in parallel. In this article, a distributed solution is proposed for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. Apache Spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing RDF data. The storage system is built on top of Hadoop Distributed File Systems (HDFS) and uses the Apache Parquet format to store data in a compressed form. The experimental evaluation showed that storage requirements were reduced significantly as compared to Jena TDB, Sesame, RDF/XML, and N-Triples file formats. SPARQL queries are processed using Spark SQL to query the compressed data. The experimental evaluation showed a good query response time, which significantly reduces as the number of worker nodes increases. INTRODUCTION More and more organizations, communities, and research-development centers are using Semantic Web technologies to represent data using RDF. Libraries have been trying to replace the cataloging system using a linked-data technique such as BIBFRAME.1 Libraries have received much attention on transitioning MARC cataloging data into RDF format.2 Data stored in various other formats such as relational databases, CSV, and HTML have already begun their journey toward the open-data movement.3 Libraries have participated in the evolution of Linked Open Data (LOD) to make data an essential part of the web.4 Various researchers have explored areas related to library data and linked data. In particular, transitioning legacy library data into linked data has dominated most of the research works. Other areas include researching the impact of linked library data, investigating how privacy and security can be maintained, and exploring the potential effects of having open linked library data. Obviously, a linked-data approach for publishing data on the web brings many benefits to libraries. First, once isolated library data currently stored using traditional cataloging systems (MARC) becomes a part of the web, it can be shared, reused, and consumed by web users.5 This promotes the cross-domain sharing of knowledge hidden in the library data, opening the library as a rich source of information. Online library users can share more information using linked library resources since every library mailto:kumar.asom@gmail.com mailto:marjitujjal@gmail.com mailto:utpal01in@yahoo.com EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 30 https://doi.org/10.6017/ital.v37i3.10177 resource is crawlable on the web via Uniform Resource Identifiers (URI). Most importantly, library data benefits from linked-data technology’s real advantages, such as interoperability, integration with other systems, data crosswalks, and smart federated search.6 Numerous approaches have evolved for making the vision of the Semantic Web a success. No doubt, they have succeeded in making the library a part of the web, but there remain issues related to library big data. The term big data refers to data or information that cannot be processed using traditional software systems.7 The volume of such data is so large that it requires advanced technologies for processing and storing the information. Libraries also have real concerns with large volumes of data during and after the transition to linked data. The main challenges are in processing and storage. During conversion from library data to RDF, the process can become stalled because of the large volumes of data. Once the data is successfully converted into RDF formats, there are storage issues. Finally, even if the data is somehow stored using common RDF triple stores, it is difficult to retrieve and filter. This is a challenging problem that every librarian must give attention to. Librarians should know the real nature of library big data, which causes problems in analyzing data and decision making. Librarians must also know the technologies that can resolve these issues. The rate of data generation and the complexity of the data itself are constantly increasing. Traditional data-management tools are becoming incapable of managing the data. That is why the definition of big data has been characterized by five Vs—volume, velocity, variety, value, and veracity.8 • Volume is the amount of the data. • Velocity is the data-generation rate (which is high in this case). • Variety refers to the heterogeneous nature of the data. • Value refers to the actual use of the data after the extraction. • Veracity is the quality or trustworthiness of the data. To handle the five Vs of big data, distributed technologies such as commodity hardware, parallel processing frameworks, and optimized storage systems are needed. Commodity hardware reduces the cost of setting up a distributed environment and can be managed with very limited configurations. A parallel processing system can process distributed data in parallel to reduce processing time. An optimized storage system is required to store the large volume of data, supporting scalability to accommodate more data on demand. With these library requirements to tackle the challenges posed by library big data, a distributed solution is proposed. This approach is based on Apache Hadoop, Apache Spark, and a column-oriented storage system to process large- size data and to store the processed data in a compressed form. Bibliographic RDF data from British National Library and the National Library of Portugal have been used for this experiment. These bibliographic data are processed using Apache Spark and stored using Apache Parquet format. The stored data can be queried using SPARQL queries for which Spark SQL is used to execute queries. Given an existing RDF dataset, we designed a schema for storing RDF data using a column- oriented database. Using column-oriented design with Apache Parquet and Spark SQL as the query INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 31 processor, a distributed RDF storage system was implemented that can store any amount of RDF data by increasing the number of distributed nodes as needed. LITERATURE REVIEW While big data continues to rise, library data are still in traditional storage systems isolated from the web. To continue working with the web, libraries must redesign the way they format data and contribute toward the web of data. To serve library data to other communities, libraries must integrate their data with the web. Attempts to do this have been made by several researchers. The task of integration cannot be achieved by only librarians; rather, it requires a team of experts in the field of library and information technology. The advanced way for integrating resources is with linked-data technology by assigning URIs to every piece of library data. With this goal, there exist various projects related to the convergence of library data and linked data. One of these, BIBFRAME, is an initiative to transition bibliographic resources into linked-data representation. BIBFRAME aims to replace traditional cataloging standards such as MARC and UNIMARC using the concept of publishing structured data on the web. MARC formats cannot be exchanged easily with nonlibrary systems. The MARC standard also suffers from inconsistencies, errors, and inability to express relationships between records and fields within the record. That is why mostly bibliographic resources stored in MARC standards are targeted for conversion.9 Other works include the open-data initiative from the British National Library, library catalog to linked open- data conversion, exposing library data as linked data, and building a knowledge graph to reshape the library staff directory.10 Linked data is fully dependent on RDF. RDF reveals graph-like structures where resources are linked with one another. Thus, RDF can improve on MARC standards because of its strong ability to link related resources. This system of revealing everything as a graph helps in building a network of library resources and other data on the web. This also makes for fast search functionality. In addition, searching a topic or book could bring similar graphs from other library resources, leading to the creation of linked-data service.11 Such a service has been implemented by the German National Library to provide bibliographic and authority data in RDF format, by the Europeana Linked Open Data with access to open metadata on millions of books and multimedia data, and by the Library of Congress Linked Data Service.12 There is less discussion of library big data. Though big data in general is in active research, the library domain has received much less attention than the broader concept of big data and its challenges. This could be because most of librarians working with linked data are from nontechnical backgrounds. Now is the right time for libraries to give priority to adopting big data technologies to overcome challenges posed by big data. Wang et al. have discussed library big data issues and challenges.13 They made some statements about whether library data belongs to the big data category. Obviously, library data belongs to big data since it fulfills some of the characteristics of big data, such as volume, variety, and velocity. Wang et al. also raise some of libraries’ challenges related to library big data, such as lacking teams of experts, inability to adopt big data due to budgetary issues, and technical challenges. Finally, they point out that to take advantage of the web’s full potential, library data must be transformed into a format that can be accessible beyond the library using technologies like Semantic Web and linked data. The web has already started its work related to big-data challenges. Libraries need to transition their data into an advanced format with the ability to handle big-data issues. The main problems EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 32 https://doi.org/10.6017/ital.v37i3.10177 related to library big data happen at data transformation and storage. To store and retrieve large amounts of data, we need commodity hardware that can handle trillions of RDF triples, requiring terabytes or petabytes of disk space. As of now, there are Semantic Web frameworks such as Jena and Sesame to handle RDF data, but these frameworks are not scalable for large RDF graphs.14 Jena is a Java-based framework for building Semantic Web and linked-data applications. It is basically a Semantic Web programming framework that provides Java libraries for dealing with RDF data. Jena TDB is the component of Jena for storing and querying RDF data. 15 It is designed to work in a single-node environment. Sesame is also a Semantic Web framework for processing, storing, and querying RDF data. Basically, Sesame is a web-based architecture for storing and querying RDF data as well as schema information. 16 BACKGROUND This section briefly describes the structure of RDF triples, Apache Spark along with its features and column-oriented database system, and Apache Parquet. Structure of RDF Triples RDF is a schema-less data model. It implies that the data is not fixed to a specific schema, so it does not need to conform to any predefined schema. Unlike in relational tables, where we define columns during schema definition and those columns must contain the required type of data, in RDF we can have any number of properties and data using any kind of vocabulary. We only need vocabulary terms to embed properties. The vocabulary is created using domain ontology, which represents the schemas. To describe library resources we need a library-domain ontology. For example, to define a book and its properties one can use the BookOnt ontology.17 BookOnt is a book-structure ontology designed for an optimized book search and retrieval process. However, it is not mandatory to use existing ontology and all the properties defined under it. We can use terms from a newly created ontology or mixed ontologies with required properties. RDF represents resources in the form of subject, predicate, and object. The subject is the resource being described, identified by a URI. This subject can have any number of property-value pairs. This way representation of a resource is called knowledge representation, where everything is defined as a knowledge in the form of entity attribute value (EAV). In RDF, the basic unit of information is a triple T, such that T = {Subject, Predicate, Object}. Such information when stored on disk is called a triplestore. The collection of RDF triples is called an RDF database. An RDF database is specially designed to store linked data to make the web more useful by interlinking data from different sources in a meaningful way. The real advantage of RDF is its support of the common data model. RDF is the standard way for publishing meaningful data on the web, and this is backed by linked data. Linked data provides some rules about how data can be published on the web by following the RDF data model.18 With such a common data model, one can integrate data from any sources by inserting new property-value pairs without altering database schema. Another important purpose of RDF is to provide resources to be processable by software agents on the web. RDF triples are of two types: literal triples and linked triples. Literal triples consist of a URI - referenced subject and a literal object (scalar value) joined by a predicate. In linked triples, both the subject and the object consist of URIs linking by the predicate. This type of linking is called RDF link, which is the basis for interlinking the resources.19 RDF data are queried using the SPARQL query language.20 SPARQL is a graph-matching query language and is used to retrieve INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 33 triples from the triple store. The SPARQL queries are also called semantic queries. Like SQL queries, SPARQL also finds and retrieves the information stored in the triplestore. A SPARQL query is composed of five main components:21 • the prefix declaration part is used to abbreviate the URIs; • the dataset definition is used to specify the RDF dataset from which the data is to be fetched; • the result clause is used to specify what information is needed to be fetched, which can be SELECT, CONSTRUCT, DESCRIBE, and ASK; • the query pattern is used to specify the search conditions; and • the query modifiers are used to rearrange query results using ORDER BY, LIMIT etc. Hadoop and MapReduce Hadoop is open-source software that supports distributed processing of large datasets on machine clusters.22 Two core components—Hadoop Distributed File System (HDFS) and MapReduce— make distributed storage and computation of processing jobs possible.23 HDFS is the storage component, whereas MapReduce is a distributed data-processing framework, the computational model of Hadoop based on Java. The MapReduce algorithm consists of two main tasks: map and reduce. The map task takes a set of data as input and produces another set of data with individual components in the form of key/value pairs or tuples. The output of the map task goes to the reduce task, which combines common key/value pairs into a smaller set of tuples. HDFS and MapReduce are based on driver/worker architecture consisting of driver and worker nodes having different roles. An HDFS driver node is called the Name-Node while the worker node is called the Data-Node. The Name-Node is responsible for managing names and data blocks. Data blocks are present in the Data-Nodes. Data-Nodes are distributed across each machine, responsible for actual data storage. Similarly, the MapReduce driver node is called the Job-Tracker and the worker node is called the Task-Tracker. Job-Tracker is responsible for scheduling jobs on Task-Trackers. Task-Tracker again is distributed across each machine along with the Data-Nodes, responsible for processing map and reducing tasks as instructed by the Job-Tracker. The concept of Hadoop implies that the set of data to be processed is broken into smaller forms that can be processed individually and independently. This way, tasks can be assigned to multiple processors to process the data, and eventually it becomes easy to scale data processing over multiple computing nodes. Once a MapReduce program is written, the program can be scaled to run over thousands of machines in a cluster. Spark and Resilient Distributed Datasets (RDD) Apache Spark is an in-memory cluster computing platform, which is a faster batch-processing framework than MapReduce. More importantly, it supports in-memory processing of tasks along with data, so querying data is much faster than disk-based engines. The core of Spark is the Resilient Distributed Dataset (RDD). RDD is a fundamental data structure of Spark that holds a distributed collection of data where data cannot be modified. Rather, data modification yields another immutable collection of data (or RDD). This process is called RDD transformation. For example, figure 1 depicts an example of RDD transformation. The distributed processing and EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 34 https://doi.org/10.6017/ital.v37i3.10177 transformation of data is managed by RDD. RDDs are fault-tolerant, meaning that the lost data is recoverable using lineage graph of RDDs.24 Spark constructs a Direct Acyclic Graph (DAG) of a sequence of computations that needed to be performed on data. Spark has the most powerful computing engine that allows most of the computations in multistage memory. Because of this multistage in-memory computation engine, it provides better performance at reading and writing data than the MapReduce paradigm.25 It aims at speed, ease of use, extensibility, and interactive analytics. Spark relies on concepts such as RDD, DAG, Spark Context, Transformations, and Actions. Spark Context is an execution environment in which RDDs and broadcasting variables can be created. Spark Context is also called the master of a Spark application and allows accessing the cluster through a resource manager. Data transformation happens in the Spark application when the data is loaded from a data-store into RDDs and some filter or map functions are performed to produce a new set of RDDs. When the set of computations is created, forming a DAG, it does not perform any execution; rather, it prepares for execution in the end, like a lazy loading process. Some examples of actions are data extraction or collection and getting the count of words. Transformations are the sequence of events, and action is the final execution of the underlying logic. Figure 1. RDD transformations. The execution model of Spark is shown in figure 2. The execution model is based on the driver/worker architecture consisting of the driver and the worker processes. The driver process creates the Spark context and schedules tasks based on the available worker nodes. Initially, the master process must be started, then creating worker nodes follows. The driver takes the responsibility of converting a user’s application into several tasks. These tasks are distributed among the workers. The executors are the main components of every Spark application. Executors actually perform data processing, reading and writing data to the external sources and the storage system. The Spark manager is responsible for resource allocation and deallocation to the Spark job. Basically, Spark is only a computation model. It is not related to storage of data, which is a different concept. It only helps in computations and data analytics in a distributed manner. For distributed execution, the task is distributed among the connected nodes so that every node can perform tasks at the same time; it performs the desired operation and notifies the master upon completion of the task. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 35 Figure 2 Execution model of Spark. In MapReduce, read/write operations happen between disk and memory, making job computation slower than Spark. RDDs resolve this by allowing fault-tolerant, distributed, in-memory computations. In RDD, the first load of data is read from disk and then a write-to-disk operation may take place depending upon the program. The operations between first read and last write happen in memory. Data on RDDs are lazily evaluated, i.e., during RDD transformations, data will not take part until any action is called on the final RDD, which triggers the job execution. The chain of RDD transformations creates dependencies between RDDs. Each dependency has a function for calculating its data and a pointer to its parent RDD. Spark divides RDD dependencies into stages and tasks, then it sends them to workers for execution. Hence, an RDD does not actually hold the data; rather, it either loads data from disk or from another RDD and performs some actions on the data for producing results. One of the important features of RDD is its fault tolerance, because of which it can retain and recompute any of the unsuccessful partitions due to node failures. RDDs have built-in methods for saving data into files. For example, the RDD calls on saveAsTextFile(), its data are written on the specified text file line by line. There are numerous options for storing data in different formats, such as JSON, CSV, sequence files, and object files. All these file formats can be saved directly into HDFS or normal file systems. Spark SQL and Dataframe Spark SQL is a query interface for processing structured data using SQL style on the distributed collection of data. That means it is used for querying structured data stored in HDFS (like Hive) and Parquet. Spark SQL runs on top of Spark as a library and provides higher optimization. The EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 36 https://doi.org/10.6017/ital.v37i3.10177 Spark dataframe is an API (application programming interface) that can perform relational operations on RDDs and external data sources such as Hive and Parquet. Like RDDs, a Spark dataframe is also a collection of structured records that can be manipulated by Spark SQL. It evaluates operations lazily to perform relational optimizations.26 A dataframe is created using RDDs along with the schema information. For example, the Java code snippet below creates a dataframe using RDD and a schema called RDFTriple (rdf-triple schema will be discussed in the proposed approach). JavaRDD n_triples_ = marc_records.map(new TextToString()); JavaRDD rdf_triples = n_triples.map(new LinesToRDFFunction()); Dataset dataframe = sparkSession.createDataFrame(rdf_triples, RDFTriple.class); dataframe.write().parquet("/full-path/RDFData.parquet"); The Spark dataframe uses memory management wisely by saving data in off-heap memory and provides an optimized execution plan. Conceptually, a dataframe is equivalent to the relational tables with richer optimization and supports SQL queries over its data. So, a dataframe is used for storing data into tables. Structured data from Spark dataframe can be saved into the Parquet file format as shown in the above code snippet. Column-Oriented Database A database is a persistent collection of records. These records are accessed via queries. The system that stores data and processes queries to retrieve data is called a database system. Such systems use indexes or iteration over the records to find the required information stored in the database. Indexes are an auxiliary, dictionary-like data structure that keeps indexes of individual records. Indexing is efficient in some cases, however, as it requires two lookup operations and it slows down the access time. Data scanning or iteration over each record resolves the query by finding the exact location of the records. It is inefficient when the size of the data is too large. As data-generation rate is increasing constantly, more and more data is going to be stored on the disk. For a fast-growing rate of data, we need a system that can adjust to more data than traditional storage systems and, at the same time, query-processing tasks should take less time. When the data gets too large, indexing and record scanning will be costly during querying. Hence, a satisfying solution is the columnar-storage system, which stores data by columns rather than by rows. 27 A column-oriented database system stores data in corresponding columns, and each column is stored in a separate file into the disk. This makes data access time much quicker. Since each column is stored separately, any required data can directly be accessed instead of reading all the data. That means any column can be used as an index, making it auto-indexing. That is why the column-oriented representation is much faster than the row-oriented representation. Apart from this, data is stored in the compressed form. Each column is compressed using a different scheme. In the column-oriented database, the compression is always efficient as all the values belong to the same data type. Hence, column-oriented databases require less disk space, as they do not need additional storage for indexes since the data is stored within the indexes themselves. Consider an example where a database table named “Book” consisting of columns “BookID,” “Title,” and “Price.” Following a column-oriented approach, all the values for BookID are stored together under the “BookID” column, all the values for Title are stored together under “Title” column. and so on as shown in Figure 3. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 37 Figure 3 An example of an entity and its row and column representation. Apache Parquet Parquet is a top-level Apache project that stores data in column-oriented fashion, highly compressed and densely packed in the disk.28 It is a self-describing data format that embeds schema within the data itself. It supports efficient compression and encoding schemes that allows lowering data-storage costs and maximizes the effectiveness of querying data. Parquet has added advantages, such as limiting the I/O operation and storing data in compressed form using the Snappy method developed by Google and used in its production environment. Hence it is designed especially for space and query efficiency. Snappy aims at compressing petabytes of data in minimal amounts of time, and especially aims for resolving big data issues.29 The data compression rate is more than 250 MB/sec, and decompression rate is more than 500 MB/sec. These compression and decompression rates are for a single core of a system having a Core i7 processor in 64-bit mode. It is even faster than the fastest mode of zlib compression algorithm.30 Parquet is implemented using column-striping and assembly-language algorithms that are optimized for storing large data-blocks.31 It supports nested data structures in which each value of the same column is stored in contiguous memory locations.32 Apache Parquet is flexible and can work with many programming languages because it is implemented using Apache Thrift (https://thrift.apache.org/). A Parquet file is divided into row groups and metadata at the end of the file. Each row group is divided into column values (or column chunks), such as column 1, column 2, and so on as shown in figure 4. Each column value is divided into pages, and each page consists of the page header, repetition levels, definition levels, and values. The footer of the file contains various metadata, such as file metadata, column metadata, and page-header metadata. The metadata information is required to locate and find the values, just like indexing. https://thrift.apache.org/ EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 38 https://doi.org/10.6017/ital.v37i3.10177 Figure 4 Parquet file structure. THE PROPOSED APPROACH The proposed approach relies on Spark’s core APIs—RDD, Spark SQL, and Dataframe—which can operate on large datasets. RDD is used to load the initial data from the input file, process the data and transform them into triple structure. Spark dataframe is used to load the data from RDD into the triple structure and send the transformed RDF data into a Parquet file. Spark SQL is used to fetch the data stored in the Parquet file. Processing RDF Data Processing RDF data from large RDF/XML files requires breaking the file into smaller file components. General data-processing systems cannot handle large files because they face memory issues. At this stage, the proposed approach can process the data using an N-Triples file, hence individual RDF/XML files again need to be converted into the N-Triples file format. The process of breaking RDF/XML file into smaller file components and then converting them into N-Triples format depends upon the size of the input file. If it is not more than 500 MB then it is directly converted into N-Triples file format. Multiple RDF/XML files are converted into individual N- Triples file formats, which are again combined into one N-Triples file, as the proposed Spark application reads input from a single file. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 39 Schema to Store RDF Data A simple RDF schema with three triple entities has been designed. This schema is an RDF triple view, which is the building block of the RDF storage schema proposed in this work. The RDF triple view is a simple Java class consisting of three attributes—subject, predicate, and object. Given an RDF dataset D, consisting of a set of RDF triples T, in either RDF/XML or N-Triples format, the dataset is transformed into a format that can be processed by a Spark application. Further, the dataset is transformed into a line-based format where the individual triple statement is placed in a line separated by a new-line (\n) character. A line contains three components—subject, predicate, and object separated by a space. Here each line is unique, using the combined information of subject, predicate, and object. Given an RDF triple structure Ti, Ti = (Si, Pi, Oi) and Ti ∈ T, for each T an instance of RDF triple view is created to hold the triple information. The columnar schema organizes triple information into three components, storing each component separately as subject, predicate, and object columns (figure 5). Figure 5. RDF Triple view. RDF Storage We store the RDF data based on RDF Triple view, which is the main schema for storing data in the triple representation. We do not need any indexing or additional information related to subject, predicate, or object to be stored on the disk. Since we can have any number of temporary dataframe tables in memory, join operations can be performed using these tables to filter the data. In the absence of expensive indexing and additional triple information, storage area can be reduced significantly. Apart from this, the compression technique used in Apache Parquet reduces lot more space than storing in other triple stores. In figure 6, we illustrate the data-storing process. EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 40 https://doi.org/10.6017/ital.v37i3.10177 Figure 6. Data-storing process in HDFS. The collection of triple instances is loaded into an RDD. At the end, the collection of triple instances is loaded into Spark dataframe. Spark dataframes are equivalent to the RDBMS tables and support both structured and unstructured data formats. Using a single schema, multiple dataframes can be used and can be registered as temporary tables in the memory, where high- level SQL queries can be executed on top of them. Here the concept of using multiple dataframes with a single schema is motivated to avoid joins and indexing. In the final step, the Spark dataframe is saved into HDFS files in the Parquet format. From the Parquet file, the data can be loaded back into dataframes in memory and queried using Spark SQL. Fetching Data from Storage Given an RDF dataset D, a SPARQL query Q, and a columnar-schema S, we use S to translate Q to Q' to perform queries on top of S. Here, the answer of query Q' on top of S is equal to the answer of Q on top of D. Query mappings M are used to transform SPARQL queries into Spark SQL queries. For querying, first the data is loaded into a Spark dataframe from Parquet files. To query data using SPARQL, queries must follow basic graph patterns (BGP). A BGP is a set of triple patterns similar to an RDF triple (S, P, O) where any of S, P, and O can be query variables or literals. BGP is used for matching a triple pattern to an RDF graph. This process is called binding between query variables and RDF terms. The statements listed under the WHERE clause is known as BGP consisting of query patterns. For example, the query “SELECT ?name ?mbox WHERE {?x foaf:name ?name . ?x foaf:mbox ?mbox .}” has two query patterns. To evaluate the query containing two query patterns, one join is required. Based on the total number of query patterns, INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 41 we need one less number of joins. That is, for n number of query patterns we need n-1 joins to resolve the values. Figure 7 illustrates the process of query execution. Figure 7. Process of query execution. EVALUATION To evaluate the proposed approach we compare the storage size with file-based storage systems such as N-Triples files and RDF/XML files. We also compare with standard triple stores such as Jena TDB and Sesame. The data-storing time is compared with Jena TDB, Sesame, and Parquet, having one, two, and three worker nodes respectively. Finally, for the purposes of the experiment, some SPARQL queries are selected and tested over RDF data stored in Parquet format into HDFS. The query performance is tested on the distributed system having one, two, and three worker nodes respectively. In the following subsections, we show the results for each of the above comparisons. Datasets For evaluation, we use two datasets. Dataset 1 contains bibliographic data from the National Library of Portugal (NLP) (http://opendata.bnportugal.pt/eng_linked_data.htm). From NLP, we choose the NLP Catalogue datasets in RDF/XML formats. The datasets are freely available to reuse and contain metadata information from NLP Catalogue, the National Bibliographic Database, the Portuguese National Bibliography, and the National Digital Library. The datasets are available as linked data, which were produced in the context of the European Library. The size of the RDF/XML file is 6.46 GB with more than 45 billion RDF triples. http://opendata.bnportugal.pt/eng_linked_data.htm EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 42 https://doi.org/10.6017/ital.v37i3.10177 Dataset 2 contains bibliographic data from the British National Library (https://www.bl.uk/bibliographic/download.html). From the British National Bibliography collection we choose the BNB LOD Books dataset. The datasets are publicly available and contain bibliographic records of different categories, such as books, locations, bibliographic resources, persons, organizations, and agents. The datasets are divided into sixty-seven files in RDF format. However, we combine them into one file in N-Triples format to fit the requirement of the large size of the input data. The combined file is 22.52 GB and contains more than 16 billion RDF resources in N-Triples format, making it suitable for the proposed approach. From this conversion, we get more than 150 billion RDF triples. Figure 8. Data storage time for different file formats. Figure 9. Disk size for different file formats. Disk Storage Figure 8 shows the data-storing time using Sesame, Jena TDB, and Parquet for the above two datasets. Data from raw RDF files are stored in Jena TDB and Sesame. Individual files are processed for storing into Jena TDB and Sesame to avoid memory overflow as Jena or Sesame models cannot load data at once from the large files. To store data in Parquet format we run the program separately on different worker nodes. Figure 9 presents the total disk size required for each of these file formats and triple stores for the two datasets. https://www.bl.uk/bibliographic/download.html INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 43 Query Performance For testing, the SPARQL queries are converted manually at this stage. We run some of the selected queries over bibliographic RDF data stored in Parquet file format in HDFS. We run the following type of queries on worker nodes 1, 2 and 3 respectively. The queries are listed below: Q1) The first query is to fetch the count of RDF triples present in the storage. Query: SELECT (count(*) as ?count) WHERE ?s ?p ?o . Q2) The second query is to fetch the entire dataset in SPO format. It fetches data in the N - Triples format. Query: SELECT * { ?s ?p ?o } . Q3) The third query is to fetch resources that belong to books with the subject “English language Composition and exercises.” Query: SELECT ?s WHERE ?x rdf:type Bibo:Book . ?x DC:Subject . Q4) The fourth query is to fetch resources that belong to books with the subject “English language Composition and exercises” and creator “Palmer Frederick.” Query: SELECT ?s WHERE ?x rdf:type Bibo:Book . ?x DC:Subject . ?x DC:Creator . Q5) The fifth query is to fetch objects having predicate DCTerms:isPartOf. Query: SELECT ?name WHERE ?s DCTerms:isPartOf ?name . Figure 10 shows the query response time for the above queries on different worker nodes for two different datasets. The queries are executed in the distributed environment. It shows that increasing the number of worker nodes decreases the query response time. EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 44 https://doi.org/10.6017/ital.v37i3.10177 Figure 10. Query response time with different numbers of worker nodes. Query Comparison For comparing query response time, the proposed approach is tested with the first dataset as mentioned above. Though at this stage the proposed approach requires further research to be compared with other distributed triple storage systems. Also, it requires more worker nodes and larger datasets compatible for parallel processing in the distributed environment. With a smaller setup, it will be hard to analyze the performance of the individual approaches, as they may produce similar results. We compare the proposed approach with the standard Jena TDB solution in a single-node environment. The following SPARQL queries are tested against dataset 1. prefix rdf: prefix DC: prefix rdau: prefix foaf: Q1. SELECT (count(*) AS ?count) { ?s ?p ?o } Q2. SELECT * { ?s ?p ?o } Q3. SELECT ?x WHERE { ?x rdf:type DC:BibliographicResource. } Q4. SELECT ?x WHERE { ?x rdf:type . ?x rdau:P60339 'Time Out Lisboa'. } Q5. SELECT ?s WHERE {?s DC:isPartOf . ?s foaf:page 'http://www.theeuropeanlibrary.org/tel4/record/3000115318515'. } INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 45 Figure 11. Query comparison. We are interested in measuring the query response time with the above queries. First, we test with Jena TDB. We then test the proposed approach on a single-node environment. We execute the above set of queries multiple times to record the average performance. As mentioned above, no indexing is used in the storage. RDF triples are stored as they appeared in the N-Triples file. Queries are executed without indexing and are still getting better performance than Jena TDB, as shown in figure 11. Discussion In this article, we claim that Apache Spark and column-oriented databases can resolve library big data issues. Especially when dealing with RDF data, Spark can perform far better than other approaches because of its in-memory processing ability. Concerning RDF data storage, the column-oriented database is suitable for storing the large volume of data because of its scalability, fast data loading, and highly efficient data compression and partitioning. A column-oriented database system requires less disk, reducing the storage area. As a proof, we have shown the data storage comparison and the performance of the columnar-storage for RDF data using Parquet formats in HDFS. As shown in the results, Apache Parquet takes much less disk space as compared to other storage systems. Also, the data-storing time is relatively very small as compared to others. We observed that the result of query 2 is the entire dataset stored in Parquet format. The size of this resultant dataset is 22.52 GB, which is the same as the original size. The same dataset when stored with Parquet format is reduced to 2.89 GB. This shows that Parquet is a very optimized EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 46 https://doi.org/10.6017/ital.v37i3.10177 storage system that can reduce the storage cost. We have shown the query response time for five different SPARQL queries on distributed nodes for two different datasets. We believe with better schema for storing RDF triples the proposed approach can be improved, and with the used technologies a fast and reliable triple store can be designed. CONCLUSION AND FUTURE WORK Librarians all over the globe should give priority to integrating library data with the web to enable cross-domain sharing of library data. To do this, they must pay attention to current trends in big data technologies. Because the data-generation rate is increasing in every domain, traditional data processing and storage systems are becoming ineffective because of the scale and complexity of the data. In this article, we present a distributed solution for processing and storing a large volume of library linked data. From the experiment, we observe that the processing of large volume of the data takes significantly less time using the proposed approach. Also, the storage area is reduced significantly as compared to other storage systems. In the future we plan to optimize the current approach using advanced technologies such as GraphX, machine learning tools, and other big -data technologies for even faster data processing, searching, and analyzing. REFERENCES 1 Eric Miller et al., “Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services,” Library of Congress, November 11, 2012, https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf. 2 Brighid M. Gonzales, “Linking Libraries to the Web: Linked Data and the Future of the Bibliographic Record,” Information Technology and Libraries 33 no. 4 (2014): 10, https://doi.org/10.6017/ital.v33i4.5631; Myung-Ja K. Han et al., “Exposing Library Holdings Metadata in RDF Using Schema.org Semantics,” in International Conference on Dublin Core and Metadata Applications DC-2015, São Paulo, Brazil, September 1–4, 2015, pp. 41–49, http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/328/363. 3 Franck Michel et al., “Translation of Relational and Non-relational Databases into RDF with xR2RML,” in Proceedings of the 11th International Conference on Web Information Systems and Technologies, Lisbon, Portugal, 2015, pp. 443–54, https://doi.org/10.5220/0005448304430454; Varish Mulwad, Tim Finin, and Anupam Joshi, “Automatically Generating Government Linked Data from Tables,” Working Notes of AAAI Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges 4, no. 3 (2011), https://ebiquity.umbc.edu/_file_directory_/papers/582.pdf; Matthew Rowe, “Data.dcs: Converting Legacy Data into Linked Data,” LDOW 628 (2010), http://ceur-ws.org/Vol- 628/ldow2010_paper01.pdf. 4 Virginia Schilling, “Transforming Library Metadata into Linked Library Data,” Association for Library Collections and Technical Services, September 25, 2012, http://www.ala.org/alcts/resources/org/cat/research/linked-data. 5 Getaneh Alemu et al., “Linked Data for Libraries: Benefits of a Conceptual Shift from Library- Specific Record Structures to RDF-Based Data Models,” New Library World 113, no. 11/12 (2012): 549–70 (2012), https://doi.org/10.1108/03074801211282920. https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf https://doi.org/10.6017/ital.v33i4.5631 http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/328/363 https://doi.org/10.5220/0005448304430454 https://ebiquity.umbc.edu/_file_directory_/papers/582.pdf http://ceur-ws.org/Vol-628/ldow2010_paper01.pdf http://ceur-ws.org/Vol-628/ldow2010_paper01.pdf http://www.ala.org/alcts/resources/org/cat/research/linked-data https://doi.org/10.1108/03074801211282920 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 47 6 Lisa Goddard and Gillian Byrne, “The Strongest Link: Libraries and Linked Data,” D-Lib Magazine, 16, no. 11/12 (2010), https://doi.org/10.1045/november2010-byrne. 7 T. Nasser and R. S. Tariq, “Big Data Challenges,” Journal of Computer Engineering & Information Technology 4, no. 3 (2015), https://doi.org/10.4172/2324-9307.1000133. 8 Alexandru Adrian Tole, “Big Data Challenges,” Database Systems Journal 4, no. 3 (2013): 31–40, http://dbjournal.ro/archive/13/13_4.pdf. 9 Carol Jean Godby and Karen Smith-Yoshimura, “From Records to Things: Managing the Transition from Legacy Library Metadata to Linked Data,” Bulletin of the Association for Information Science and Technology 43, no. 2 (2017): 18–23, https://doi.org/10.1002/bul2.2017.1720430209. 10 Corine Deliot, “Publishing the British National Bibliography as Linked Open Data,” Catalogue & Index, issue 174 (2014): 13–18, http://www.bl.uk/bibliographic/pdfs/publishing_bnb_as_lod.pdf; Gustavo Candela et al., “Migration of a Library Catalogue into RDA Linked Open Data,” Semantic Web 9, no. 4 (2017): 481–91, https://doi.org/10.3233/sw-170274; Martin Malmsten, “Exposing Library Data as Linked Data,” IFLA satellite preconference sponsored by the Information Technology Section: Emerging Trends in 2009, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf ; Keri Thompson and Joel Richard, “Moving Our Data to the Semantic Web: Leveraging a Content Management System to Create the Linked Open Library,” Journal of Library Metadata 13, no. 2– 3 (2013): 290–309, https://doi.org/10.1080/19386389.2013.828551; Jason A. Clark and Scott W. H. Young, “Linked Data is People: Building a Knowledge Graph to Reshape the Library Staff Directory,” Code4lib Journal 36 (2017), http://journal.code4lib.org/articles/12320; Martin Malmsten, “Making a Library Catalogue Part of the Semantic Web,” Humbolt University of Berlin, 2008, https://doi.org/10.18452/1260. 11 R. Hastings, “Linked Data in Libraries: Status and Future Direction,” Computers in Libraries 35, no. 9 (2015): 12–28, http://www.infotoday.com/cilmag/nov15/Hastings--Linked-Data-in- Libraries.shtml. 12 Mirjam Keßler, “Linked Open Data of the German National Library,” In ECO4r Workshop LOD of DNB, 2010; Antoine Isaac, Robina Clayphan, and Bernhard Haslhofer, “Europeana: Moving to Linked Open Data,” Information Standards Quarterly 24, no. 2/3 (2012)<>; Carol Jean Godby and Ray Denenberg, “Common Ground: Exploring Compatibilities between the Linked Data Models of the Library of Congress and OCLC,” OCLC Online Computer Library Center, 2015, https://files.eric.ed.gov/fulltext/ED564824.pdf. 13 Chunning Wang et al., “Exposing Library Data with Big Data Technology: A Review,” 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1-6, https://doi.org/10.1109/icis.2016.7550937. 14 B. McBride, “Jena: a Semantic Web Toolkit,” IEEE Internet Computing 6, no. 6 (2002): 55–59, https://doi.org/10.1109/mic.2002.1067737; Jeen Broekstra, Arjohn Kampman, and Frank Van https://doi.org/10.1045/november2010-byrne https://doi.org/10.4172/2324-9307.1000133 http://dbjournal.ro/archive/13/13_4.pdf https://doi.org/10.1002/bul2.2017.1720430209 http://www.bl.uk/bibliographic/pdfs/publishing_bnb_as_lod.pdf https://doi.org/10.3233/sw-170274 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf https://doi.org/10.1080/19386389.2013.828551 http://journal.code4lib.org/articles/12320 https://doi.org/10.18452/1260 http://www.infotoday.com/cilmag/nov15/Hastings--Linked-Data-in-Libraries.shtml http://www.infotoday.com/cilmag/nov15/Hastings--Linked-Data-in-Libraries.shtml https://files.eric.ed.gov/fulltext/ED564824.pdf https://doi.org/10.1109/icis.2016.7550937 https://doi.org/10.1109/mic.2002.1067737 EFFICIENTLY PROCESSING AND STORING LIBRARY LINKED DATA | SHARMA, MARJIT, AND BISWAS 48 https://doi.org/10.6017/ital.v37i3.10177 Harmelen, “Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema,” International Semantic Web Conference, ed. J. Davies, D. Fensel, and F. van Harmelen (Berlin and Heidelberg: Springer, 2002), https://doi.org/10.1002/0470858060.ch5. 15 “Apache Jena—TDB,” Apache Jena, accessed August 22, 2018, https://jena.apache.org/documentation/tdb/. 16 “Sesame (framework),” Everipedia, July 15, 2016, https://everipedia.org/wiki/Sesame_(framework)/. 17 Asim Ullah et al., “BookOnt: A Comprehensive Book Structural Ontology for Book Search and Retrieval,” 2016 International Conference on Frontiers of Information Technology (FIT), 211– 16, https://doi.org/10.1109/fit.2016.046. 18 Tom Heath and Christian Bizer, “Linked Data: Evolving the Web into a Global Data Space,” Synthesis Lectures on the Semantic Web: Theory and Technology 1, no. 1 (2011): 1–136, https://doi.org/10.2200/s00334ed1v01y201102wbe001. 19 Christian Bizer et al., “Linked Data on the Web (LDOW2008),” Proceeding of the 17th International Conference on World Wide Web—WWW 08, 2008, pp. 1265–66 (2008), https://doi.org/10.1145/1367497.1367760. 20 Eric Prud and Andy Seaborne, “SPARQL Query Language for RDF,” W3C Recommendation, January 15, 2008, https://www.w3.org/TR/rdf-sparql-query/. 21 Devin Gaffney, “How to Use SPARQL,” Datagov Wiki RSS, last modified April 7, 2010, https://data-gov.tw.rpi.edu/wiki/How_to_use_SPARQL. 22 Tom White, Hadoop: The Definitive Guide (Sebastopol, CA: O’Reilly Media,, 2012), https://www.isical.ac.in/~acmsc/WBDA2015/slides/hg/Oreilly.Hadoop.The.Definitive.Guide. 3rd.Edition.Jan.2012.pdf. 23 Dhruba Borthakur, “The Hadoop Distributed File System: Architecture and Design,” Hadoop Project Website, 2007, http://svn.apache.org/repos/asf/hadoop/common/tags/release- 0.16.3/docs/hdfs_design.pdf; Seema Maitrey and C. K. Jha, “MapReduce: Simplified Data Analysis of Big Data,” Procedia Computer Science 57 (2015), 563–71 (2015), https://doi.org/10.1016/j.procs.2015.07.392. 24 Michael Armbrust et al., “Spark SQL: Relational Data Processing in Spark,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (New York: ACM, 2015), 1383–94, https://doi.org/10.1145/2723372.2742797. 25 Abdul Ghaffar Shoro and Tariq Rahim Soomro, “Big Data Analysis: Apache Spark Perspective,” Global Journal of Computer Science and Technology 15, no. 1 (2015), https://globaljournals.org/GJCST_Volume15/2-Big-Data-Analysis.pdf. 26 Salman Salloum et al., “Big Data Analytics on Apache Spark,” International Journal of Data Science and Analytics 1, no. 3–4 (2016): 145–64, https://doi.org/10.1007/s41060-016-0027-9. https://doi.org/10.1002/0470858060.ch5 https://jena.apache.org/documentation/tdb/ https://everipedia.org/wiki/Sesame_(framework)/ https://doi.org/10.1109/fit.2016.046 https://doi.org/10.2200/s00334ed1v01y201102wbe001 https://doi.org/10.1145/1367497.1367760 https://www.w3.org/TR/rdf-sparql-query/ https://data-gov.tw.rpi.edu/wiki/How_to_use_SPARQL https://www.isical.ac.in/~acmsc/WBDA2015/slides/hg/Oreilly.Hadoop.The.Definitive.Guide.3rd.Edition.Jan.2012.pdf https://www.isical.ac.in/~acmsc/WBDA2015/slides/hg/Oreilly.Hadoop.The.Definitive.Guide.3rd.Edition.Jan.2012.pdf http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf https://doi.org/10.1016/j.procs.2015.07.392 https://doi.org/10.1145/2723372.2742797 https://globaljournals.org/GJCST_Volume15/2-Big-Data-Analysis.pdf https://doi.org/10.1007/s41060-016-0027-9 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 49 27 Daniel J. Abadi, Samuel R. Madden, and Nabil Hachem, “Column-Stores vs. Row-Stores: How Different are They Really?,” in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (New York: ACM, 2008), 967–80, https://doi.org/10.1145/1376616.1376712. 28 Deepak Vohra, “Apache Parquet,” in Practical Hadoop Ecosystem (Berkeley, CA: Apress, 2016), 325–35, https://doi.org/10.1007/978-1-4842-2199-0_8. 29 “Google/Snappy,” GitHub, January 04, 2018, https://github.com/google/snappy. 30 Jean-loup Gailly and Mark Adler, “Zlib Compression Library,” 2004, https://www.repository.cam.ac.uk/bitstream/handle/1810/3486/rfc1951.txt?sequence=4. 31 Sergey Melnik et al., “Dremel: Interactive Analysis of Web-Scale Datasets,” Proceedings of the VLDB Endowment 3, no. 1–2 (2010): 330–39, https://doi.org/10.14778/1920841.1920886. 32 Marcel Kornacker et al., “Impala: A Modern, Open-Source SQL Engine for Hadoop,” in Proceedings of the 7th Biennial Conference on Innovative Data Systems Research, Asilomar, California, January 4–7, 2015, http://www.inf.ufpr.br/eduardo/ensino/ci763/papers/CIDR15_Paper28.pdf. https://doi.org/10.1145/1376616.1376712 https://doi.org/10.1007/978-1-4842-2199-0_8 https://github.com/google/snappy https://www.repository.cam.ac.uk/bitstream/handle/1810/3486/rfc1951.txt?sequence=4 https://doi.org/10.14778/1920841.1920886 http://www.inf.ufpr.br/eduardo/ensino/ci763/papers/CIDR15_Paper28.pdf ABSTRACT Introduction literature Review Background Structure of RDF Triples Hadoop and MapReduce Spark and Resilient Distributed Datasets (RDD) Spark SQL and Dataframe Column-Oriented Database Apache Parquet The proposed approach Processing RDF Data Schema to Store RDF Data RDF Storage Fetching Data from Storage Evaluation Datasets Disk Storage Query Performance Query Comparison Discussion Conclusion and Future Work References 10181 ---- 10181 20190318 galley A Systematic Approach Towards Web Preservation Muzammil Khan and Arif Ur Rahman INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 71 Muzammil Khan (muzammilkhan86@gmail.com) Assistant Professor, Department of Computer and Software Technology, University of Swat. Arif Ur Rahman (badwanpk@gmail.com) Assistant Professor, Department of Computer Science, Bahria University Islamabad. ABSTRACT The main purpose of the article is to divide the web preservation process into small explicable stages and design a step-by-step web preservation process that leads to creating a well-organized web archive. A number of research articles are studied about web preservation projects and web archives, and designed a step-by-step systematic approach for web preservation. The proposed comprehensive web preservation process describes and combines strengths of different techniques observed during the study for preserving digital web contents into a digital web archive. For each web preservation step, different approaches and possible implementation techniques have been identified that can be adopted in digital archiving. The potential value of the proposed model is to guide the archivist, related personnel, and organizations to effectively preserved their intellectual digital contents for future use. Moreover, the model can help to initiate a web preservation process and create a well- organized web archive to efficiently manage the archived web contents. A section briefly describes the implementation of the proposed approach in a digital news stories preservation framework for archiving news published online from different sources. INTRODUCTION The amount of information generated by institutions is increasing with the passage of time. One of the mediums that uses this information is the World Wide Web (WWW). The WWW has become a tool to share information quickly with everyone regardless of their physical location. The number of web pages is vast. Google and Bing each index approximately 4.8 billion.1 Though the WWW is a rapidly growing source of information, it is fragile in nature. According to the available statistics, 80 percent of pages become unavailable after one year and 13 percent of links (mostly web references) in scholarly articles are broken after 27 months.2 Moreover, 11 percent of posts and comments on websites for various purposes are lost within a year. According to another study conducted on 10 million web pages collected from the Internet Archive in 2001, the average survival rate of web pages is 1,132.1 days with a standard deviation of 903.5 days. 90.6 percent pages of those web pages are inaccessible today.3 The information fragility causes this valuable scholarly, cultural, and scientific information to vanish and become inaccessible to future generations. In recent years, it was realized that the lifespan of digital objects is very short, and rapid technological changes make it more difficult to access these objects. Therefore, there is a need to preserve the information available on the WWW. Digital preservation is performed using the primary methods of emulation and migration, in which emulation provides the preserved digital objects in their original format while migration provide objects in a different format.4 In the last SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 72 https://doi.org/10.6017/ital.v38i1.10181 two decades, a number of institutions worldwide, such as national and international libraries, universities, and companies started to preserve their web resources (resources found at a web server, i.e., web contents and web structure). The first web archive was initiated in 1996 by Brewster Kahle, named the Internet Archive, and it holds more than 30 petabytes data, which includes 279 billion web pages, 11 million books and texts, and 8 million other digital objects such as audio, video, image files, etc. More than seventy web archive initiatives were started in 33 countries since 1996, which shows the importance of web preservation projects and preservation of web contents. This information era encourages librarians, archivists, and researchers to preserve the information available online for upcoming generations. While digital resources may not replace the information available in physical form, the digital version of these information resources improves access to the available information.5 There are different aspects of the preservation process and web archiving, e.g., digital objects’ ingestion to the archive during preservation process, digital object’s format and storage, archival management, administrative issues, access and security to the archive, and preservation planning. These aspects need to be understood for effective web preservation and will help in addressing the challenges that occur during the preservation process. The Reference Model for Open Archival Information System (OAIS) is an attempt to provide a high-level framework for the development and comparison of digital archives. In web preservation, a challenging task is to identify the starting point of the preservation process and to effectively complete the process which help to proceed further to the other activities. Therefore, the complicated nature of the Web and the complex structure of the web contents make the preservation of the web content even more difficult. The OAIS reference model helps in achieving the goals of a preservation task in a step-by-step manner. The stakeholders are identified, i.e., producer, management, and consumer, and the packages, i.e., submission information package (SIP), archival information package (AIP) and dissemination information package (DIP), which need to be processed, are clearly defined.6 This study aims to design a step-by-step systematic approach for web preservation that helps to understand preservation or archival activities’ challenges, especially those that relate to digital information objects at various steps of the preservation process. The systematic approach may lead to an easy way to analyze, design, implement, and evaluate the archive with clarity and different options for an effective preservation process and archival development. An effective preservation process is one that leads to a well-organized, easily managed web archive and accomplishes designated community requirements. This approach may help to address the challenges and risks that confront archivists and analysts during preservation activities. STEP-BY-STEP SYSTEMATIC APPROACH Digital preservation is “the set of processes and activities that ensure long-term, sustained storage of, access to and interpretation of digital information.”7 The growth and decline rates of WWW content and the importance of the information presented on the web make it a key candidate for preservation. Web preservation confronts a number of challenges due to its complex structure, a variety of available formats, and the type of information (purpose) it provides. The overall layout of the web varies domain to domain based on the type of information and its presentation. The websites can be categorized based on two things. First, the type of information (i.e., the web INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 73 contents) and second, the way this information presented (i.e., the layout or structure of the web page. Examples include educational, personal, news, e-commerce, and social networking websites, which vary a lot in their contents and structure. The variations in the overall layout make it difficult to preserve different web contents in a single web archive. The web preservation activities are summarized in figure 1. The following sections explain the web preservation activities and possible implementation in proposed systematic approach. Defining the Scope of the Web Archive The WWW provides an opportunity to share information using various services, such as blogs, social networking websites, e-commerce, wikis, and e-libraries. These websites provide information on a variety of topics and address different communities based on their interest and needs. There are many differences in the way the information is handled and presented on the WWW. In addition, the overall layout of the web changes from one domain to another domain.8 Therefore, it is not practically feasible to develop a single system to preserve all types of websites for the long term. So, before starting to preserve the web, one (the archivist) should define the scope of the web to be archived. The archive will be either a site-centric, topic-centric, or domain- centric archive.9 Site-centric Archive A site-centric archive focuses on a particular website for preservation. These types of archives are mostly initiated by the website creator or owner. The site-centric web archives allow access to the old versions of the website. Topic-centric Archive Topic-centric archives are created to preserve information on a particular topic published on the web for future use. For scientific verification, researchers need to refer to the available information while it is difficult to ensure access to these contents due to the ephemeral nature of the web. A number of topic-centric archive projects have been performed including the Archipol archive of Dutch political websites,10 the Digital Archive for Chinese Studies (DACHS) archive2,11 Minerva by the Library of Congress,12 and the French Elections Web archive for archiving the websites related to the French elections.13 Domain-centric Archive The word “domain” refers to a location, network, or web extension. A domain-centric archive covers websites published with a specific domain name DNS, using either a top-level domain (TLD), e.g., .com, .edu, or .org, or a second-level domain (SLD), e.g., .edu.pk or .edu.fr. An advantage of domain-centric archiving is that it can be created by automatically detecting specific websites. Several projects have a domain-centric scope, e.g., the Portuguese Web Archive (PWA) national websites,14 the Kulturarw, a Swedish Royal Library web archive collection of.se and .com domain websites,15 and the UK Government Web Archive collection of UK government websites, e.g., .gov.uk domain websites. Understanding the Web Structure After defining the scope of the intended web archive, the archivist will have a better understanding of the interest and expected queries of the intended community based on the resources available or the information provided by the selected domain. The focus in this step is to understand the type of information (contents) provided by the selected domain and how the information has been presented. The web can be understood by two dimensions. The first SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 74 https://doi.org/10.6017/ital.v38i1.10181 Figure 1. Systematic Approach for Web Preservation Process. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 75 considers the web as a medium that communicates contents using various protocols, i.e., HTTP, and the second considers the web as a content container, which further presents the contents to the viewers and not simply contents, e.g. the underlying technology used to display the contents.16 The preservation team should understand such parameters as the technical issues, the future technologies, and the expected inclusion of other related content. Identify the Web Resources The archivist should understand the contents and the representation of the contents of the selected domain, e.g., blogs, social networking websites, institutional websites, educational institutional websites, newspaper websites, or entertainment websites. All of these websites provide different information and address individual communities that have distinct information needs. A web page is the combination of two things, i.e., web contents and web structure.17 The resources which can be preserved are as follows. Web Contents Web contents or web information can be categorized into the following categories: • Textual Contents (Plain Text): This category describes textual information that appears on a web page. It does not include links, behaviors, and presentation stylesheets. • Visual Contents (Images): These contents are the visual forms of information or are a complementary material to the information provided in the textual form. • Multimedia Contents: As another form of information, multimedia contents mainly include audio and video. It may also include animation or even text as a part of a video or a combination of text, audio, and video. Web Structure Web structure can be categorized in the following categories: • Appearance (Web Layout or Presentation): This category indicates the overall layout or presentation of a web page. The look and feel of a web page (representation of the contents) are important, which is maintained with different technologies, e.g., HTML or stylesheets, etc. • Behavior (Code Navigations): Categorized by link navigations, these can be within a website or to other websites, external document links or dynamic and animated features, such as live feed, comments, tagging, or bookmarking. Identify Designated Community The archivist should identify the designated community of the intended web archive, their functional requirements and expected queries by analyzing them carefully. The designated community means the potential users, such as those who can access the archived web contents for different purposes, i.e., accessing old information that is not available in normal circumstances or referring to an old news article which is not bookmarked properly or retrieving relevant news articles published long ago, etc. Prioritize the Web Resources After a comprehensive assessment of the resources of the selected domain and the identification of potential users’ requirements and expected queries, the archivist should prioritize the web SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 76 https://doi.org/10.6017/ital.v38i1.10181 resources. The complexity of web resources and their representation cause complications in the digital preservation process. Generally, it may be undesirable or unviable to preserve all web resources; therefore, it is worthwhile to designate the web resources for preservation. The priority should be assigned on the basis of two things: first, the potential reuse of the resource and second, the frequency with which the resource will be accessed. The resources with no value, little value, or those managed elsewhere can be excluded. For prioritization of resources, the MoSCoW method can be applied.18 The acronym MoSCoW can be elaborated as: M - MUST have, the resource must be preserved or resources that must be a part of the archive and preserved. For example, in the Digital News Story Archive (DNSA), the textual news story must be preserved in the archive because the preservation emphasis is on a textual news story.19 Online news contains textual news stories, and many news stories contain associated images, and a fraction of news stories contain associated audio-video contents. S - SHOULD have, the resource should be preserved if at all possible. Almost all the news stories have associated images; a few news stories have associated audio and video that complement it and should be preserved as a part of the news story in the web archive. C - COULD have, the resource could be preserved if it does not affect anything else or is nice to have. The web structure in DNSA depends on the resources to be used for the preservation of news stories; the layout of the newspaper website could (C) be a part of the preservation process if it does not affect anything, e.g., storage capacity and system efficiency. W - WON’T have, the resource would not be included. Archiving multiple versions of the layout or structure of the online newspaper are not worthwhile and hence would not (W) be preserved. The prioritization of these resources is very important in the context of web preservation planning because it does not waste time and energy, and it is the best way to handle users’ requirements and fulfill their expected queries. How to Capture the Resource(s) The selection of a feasible capturing technique depends on: first, the resources to be captured and second, the capturing task frequency. There are three web resources capturing techniques, i.e., by browser, web crawler, and authoring system. Each capturing technique has associated advantages and disadvantages.7 Web Capturing Using Browsers The intended web content can be captured using browsers after a web page is rendered when the HTTP transaction occurs. This technique is also referred to as a snapshot or post-rendering technique. The method captures those things which are visible to the users; the behavior and other attributes remain invisible. Capturing static contents is one of the disadvantages of web capturing by the browser approach, this approach generally preserved contents in the form of images. It is best for well-organized websites, and commercial tools are available for capturing the web. The following are well-known tools to capture web using browsers. WebCapture (https://web-capture.net/) is a free online web-capturing service. It is a fast web page snapshot tool, which can grab web pages in seven different formats, i.e. JPEG, TIFF, PNG, BMP INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 77 image formats, PDF, SVG, and postscript files of high quality. It also allows downloading the intended format in a ZIP file and is suitable for long vertical web pages with no distortion in layout. A.nnotate (http://a.nnotate.com/), is an online annotating web snapshot tool to keep track of information gathered from the web efficiently and easily. It allows adding tags and notes to the snapshot and building a personal index of web pages as document index. The annotation feature can be used for multiple purposes, for example, compiling an annotated library of objects for organization, sharing commented web pages, product comparison, etc. SnagIt (https://www.techsmith.com/screen-capture.html) is a well-known snapshot tool for capturing screens with built-in advanced image editing features and screen recording. SnagIt is a commercial and advanced screen capture tool that can capture web pages with images, linked files, source code, and the URL of the web page. Acrobat WebCapture (File > Create > PDF from Web Page...) creates a tagged PDF file from the web page that a user visits while the Adobe PDF toolbar is used for the entire website.20 The capture by a browser technique has the following advantages: • By this technique, the archivist can capture only the displayed contents, and it is an advantage if you need to preserve the displayed contents only. • It is a relatively simple technique for well-organized websites. • Commercial tools exist for web capturing using browsers. In addition, the disadvantages are the following: • Capturing displayed contents only is a disadvantage if the focus is not on only displayed contents. • It results in frozen contents and treats contents as if they are publications. • It loses the web structure, such as appearance, behavior, and other attributes of the web page. Web Capturing Using an Authoring System/Server The authoring system capturing technique is used for web harvesting directly from the website hosting server. All the contents, e.g., textual information, images, and source code, are collected from the source web server. The authoring system allows the archivist to preserve the different versions of the website. The authoring system depends on the infrastructure of the content management system and is not a good choice for external resources. The system is best for an owned web server and works well for limited internal purposes. The Web Curator Tool (http://webcurator.sourceforge.net/), PANDAS (an old British Library harvesting tool), and NetarchiveSuite (https://sbforge.org/display/NAS/NetarchiveSuite) are known tools use for planning and scheduling web harvesting. They can be used by non-technical personnel for both selection and harvesting web content selection policies. These web archiving tools were developed in a collaboration of the National Library of New Zealand and the British Library and are used for the UK Web Archive (http://www.ariadne.ac.uk/issue50/beresford/). The tools can interface with web crawlers, such as Heritrix (https://sourceforge.net/projects/archive- crawler/). Authoring systems are also referred to as workflow systems or curatorial tools. SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 78 https://doi.org/10.6017/ital.v38i1.10181 The authoring system has the following advantages: • It is best for web harvesting, which captures everything available. • It is easy to perform, if you have proper access permission or you own the server or system to access for capturing the resources. • It works in short to medium term resources and feasible for internal access within organizations. The disadvantages of web capturing using the authoring system are: • It captures all available raw information, not only presentations. • It may be too reliant on the authoring infrastructure or the content management system. • It is not feasible for large term resources, or for external access from outside organization. Web Capturing Using Web Crawlers Web crawlers are perhaps the mostly used technique for capturing web contents in systematic and automated manner.21 Crawler development needs the expertise and experience of different tools, i.e. positive and negative of technologies, and the viability of a tool in a specific scenario. The main advantage of crawlers is that they extract embedded content. Heritrix, HTTrack, Wget, and DeepArc are common examples of web crawlers. Heritrix (https://github.com/internetarchive/heritrix3/wiki) is developed in java, an open source and freely available web crawler, and it was developed by Internet Archive. Heritrix is one of the widely used extensible and web-scale web crawlers in web preservation projects. Initially, the Heritrix was developed for specific purpose crawling of specific websites and now a resourceful or customize web crawler for archiving the web. HTTrack (https://www.httrack.com/) is a freely available configurable browser utility. HTTrack crawls HTML, images, and other files from a server to a local directory and allows offline viewing of the website. The HTTrack crawler downloads a complete website from the web server to a local computer system and makes it available for offline for viewing with all related link-structure and seems like the user is using it online. It also updates the archived websites at the local system from the server and resumes all the interrupted previous extractions. The HTTrack available for both Windows and Linux/Unix operating systems. Wget (http://www.gnu.org/software/wget/) is a freely available non-interactive command line tool that can easily be configured with other technologies and different scripts. It can capture files from the web using widely used FTP, FTPS, HTTP and HTTPS protocols, and support cookies as well. It also updates the archived websites and resumes all the interrupted extractions. Wget is available for both Microsoft Windows and Unix operating systems. The advantages of web crawling: • Widely used in capturing techniques. • Can capture specific content or everything. • Avoids some of the accessing issues, such as: Link rewriting and embedded external content from an archive or live. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 79 Disadvantages associated with web crawling: • Much work is required, as well as tools or development expertise and experience, etc. • The web crawler does not have the right scope: sometimes, it does not capture everything that it should, and sometimes the crawler captures too much content. Web Content Selection Policy In the previous steps, the web resources are identified, prioritized based on requirements and expected queries of the designated community, and feasible capturing technique is identified based on capturing frequency. Now, the contents need to be prepared and filtered for selection, and a feasible selection approach needs to be selected based on the contents. A web content selection policy helps to determine and clarify, which web contents are required to be captured based on the priorities, the purpose and the scope of web contents already defined.22 The decision of the selection policy comprises the description of the context, the intended users, the access mechanisms and the expected uses of the archive. The selection policy may comprise the selection process and selection approach. The selection process can be divided into subtasks which, in combination, provide a qualitative selection of web contents to a certain extent, i.e., preparation, discovery, and filtering, as shown in figure 2. The main objective of the preparation phase is to determine the targeted information space, the capture technique, capturing tools, extension categorization, granularity level, and the frequency of archiving activity. The best personnel who can provide help in preparation are the domain experts, regardless of the scope of the web archive. The domain experts may be the archivists, researchers, librarians, or any other authentic reference, i.e. a document or a research article. The tools defined in the preparation phase will help to discover intended information in the discovery phase, which can be divided into the following four categories: 1. Hubs may be the global directories or topical directories, collection of sites or even a single web page with essential links related to a particular subject or topic. 2. Search engines can facilitate discovery by defining a precise query or set of alternative queries related to a topic. The use of specialized search engines can significantly improve the results of discovering related information that can be greatly improved. 3. Crawlers can be used to extract web contents such as textual information, images, audio, video and links. Moreover, the overall layout of a web page or a whole website can also be extracted in a well-defined systematic manner. 4. External Sources may be non-web sources that may be anything, such as printed material for mailing lists, which can be monitored by the selection team. The main objective of the discovery phase is to determine the source of information to be stored the archive. This determination can be achieved by two ways. First, a manually created entry point list is used to determine the list of entry points (usually links) for crawling the collection manually and updating the list during the crawl. There are two discovery methods, i.e., exogenous and endogenous. Exogenous discovery is used in manual selection and mostly relies on exploitation of an entry point list for hubs, search engines, and on non-web documents. Second, there is an automatically created entry point list to determine the list of entry points by extracting links automatically and obtaining an updated list every time during the crawl. Endogenous discovery is SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 80 https://doi.org/10.6017/ital.v38i1.10181 used in automatic selection and relies on the link extraction using crawlers by exploring the entry point list. Figure 2. Selection Process. The main objective of the filtering phase is to optimize and make concise the discovered web contents (discovery space). Filtering is important in order to collect more specific web content and remove unwanted or duplicated content. Usually, for preservation, an automatic filtering method is used; manual filtering is useful if the robots or automatic tools cannot interpret the web. The discovery and filter phase can be combined practically or logically. Several evaluation axes can be used for the selection policy (e.g., quality, subject, genre, and publisher). In the literature, we have three known techniques for selecting web content. The selection approach can be either automatic or manual. Manual content selection is very rare because it is labor intensive: it requires automatic tools for finding the content, and then manual review of that collection to identify the subset that should be captured. Automatic selection policies are used frequently in web preservation projects for web collection, especially for web archives.23 The selection of the collection approach depends on the frequency with which the web content has been preserved in the archive. There are four different selection approaches for web content collection. Unselective Approach The unselective approach implies collecting everything possible; by specifically using this approach, the whole website and its related domains and subdomains are downloaded to the archive. It is also referred to as automatic harvesting or selection, bulk selection, and domain selection.24 The automatic approach is used in a situation where a web crawler usually performs the collection. For example, the collection of websites from a domain, i.e., .edu means all educational institution websites (at domain level) or the collection of all possible contents/pages from a website (harvesting at website level) by extracting the embedded links. A section of the data preservation community believes that technically it is a relatively cheaper, quicker collection approach and yields a comprehensive picture of the web as a whole. In contrast, its significant drawbacks are that it generates huge unsorted, duplicated, and potentially useless data, consuming too many resources. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 81 The Swedish Royal Library’s project Kulturarw3 harvests websites at domain level, i.e., collecting websites from a .se domain which is a physically located website in Sweden and one of the first projects to adopt this approach.25 Usually, national-based web archive initiatives adopt the unselective approach, most notably NEDLIB, a Helsinki University Library harvester, and AOLA, an Austrian Online Archive.26 Selective Approach The selective approach was adopted by the National Library of Australia (NLA) in the PANDAS project in 1997. In this approach, a website is included for archiving based on certain predefined strategies and on the access and information provided by the archive. The Library of Congress’ project Minerva and the British Library project “Britain on the Web” are the other known projects that have adopted the selective approach. According to NLA, the selected websites are archived based on NLA guidelines after negotiation with the owners.27 The inclusion decision could be taken at one of the following levels: • Website level: which websites should be included from a selected domain, e.g., to archive all educational websites from high level domain “.pk”. • Web page level: which web pages should be included from a selected website, e.g., to archive the homepages of all educational websites. • Web content level: which type of web contents should be preserved, e.g., to archive all the images from the homepages of educational websites. A selective approach is best if the numbers of websites to be archived are very large or the archiving process is targeting the entire WWW and wants to narrow down the scope by identifying the resources in which the archivists are more interested. This approach performs implicit or explicit assumptions about the web contents that are not to be selected for preservation. It may be very helpful to initiate a pilot preservation project, which identifies: What is possible? What can be managed? In addition, some tangible results may be obtained easily and quickly in order to enhance the scope of the project in a broader perspective. The selective approach may be based on a predefined criterion or based on an event. Selective approach based on criteria involves selecting web resources based on various pre- defined sets of criteria. NLA’s guidance characterizes the criteria-based selective approach as the “most narrowly defined method,” and described it as “thematic selection.” A simple or a complex content-selection criteria can be defined, which depends on the overall goal of preservation. For example, all resources owned by an organization, all resources of one genre, i.e., all programming blogs, resources contributed to a common subject, resources addressing a specific community within an institution, i.e., students or staff, all publications belonging to an individual organization or group of organizations, all resources that may benefit external users or an external user’s community, e.g., historians, or alumni. Selective approach based on event involves selecting web resources or websites based on various time-based events. The archivists may focus on websites that address national or international important events, e.g., disasters, elections, and the football world cup, etc. Event- based websites have two characteristics: (1) very frequent updates and (2) website content is lost after a short time, e.g., a few weeks or a few months. For example, the start and end of a term or SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 82 https://doi.org/10.6017/ital.v38i1.10181 academic year, the duration of an activity, e.g., research project, appointment, or departure of a new senior official. Deposit Approach In the deposit collection approach, the information package is submitted by the administrator or owner of the website which includes a copy of the website with related files that can be accessed through different hyperlinks. The archival information package is applicable to the small collection (of a few websites), or the owner of the website can initiate the preservation project, e.g. a company can initiate a project for preserving their website. The deposit collection approach was adopted by the National Archives and Records Administration (NARA) for the collection of US federal agency websites in 2001 and by Die Deutsche Bibliothek (DDB, http://deposit.ddb.de/) for the collection of dissertations and some online publications. New digital initiatives are heavily dependent on administrator or owner support and provide an easy way to deposit new content to the repository, e.g., in the MacEwan University’s institutional repository, the librarians leading the project tried to offer an easy and effective way to deposit their archival contents.28 Combined Approach There are advantages and disadvantages associated with each collection approach. The ongoing debate is which approach is best in a given situation. For example, the deposit approach should be an inexpensive agreement with the depositors. The emphasis is to use the combination of automatic harvesting and selective approaches as these two approaches are cheaper as compared to other selection approaches because a few staff personnel are required and cope with technological challenges. This initiative was taken by the Bibliothque Nationale de France (BnF) in 2006. The BnF automatically crawls information regarding the updated web pages and stores it in an XML-based “site delta” and uses page relevancy and importance, similar to how Google ranks pages, to evaluate individual pages.29 The BnF used a selective approach for the deep web (that is, web pages or websites that are behind a password or are otherwise not generally accessible to search engines), referred to as “deposit track.” Metadata Identification Cataloging is required to discover a specific item from the digital collection. An identifier or set of identifiers is required to retrieve a digital record in digital repositories or an archive. For digital documents, this catalog or registration or identifier is referred to as metadata.30 Metadata are structured information concerning resources that describe, locate (discover or place), manage, easily retrieve (access) and use digital information resources. Metadata are often referred to as “data about data” or “information about information”, but it may be more helpful and informative to describe these data as “descriptive and technical documentation.”31 Metadata can be divided into the following three categories: 1. Descriptive metadata describes a resource for discovery and identification purposes. It may consist of elements for a document such as title, author(s), abstract, and keywords, etc. 2. Structural metadata describes how compound objects are put together, for example, how sections are ordered to form chapters. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 83 3. Administrative metadata imparts information to facilitate resource management, such as when and how a file was created, who can access the file, its type, and other technical information. Administrative metadata is classified into two types: (1) rights management metadata addresses intellectual property rights and (2) preservation metadata contains information needed to archive and preserve a resource.32 Due to new information technologies, digital repositories, especially web-based repositories, have grown rapidly over the last two decades. This interest prompts the digital libraries communities to devise metadata strategies to manage the immense amount of data stored in digital libraries.33 Metadata play a vital role in the long-term preservation of digital objects and important to identify the metadata which may help to retrieve a specific object from the archive after preservation. According to Duff et al., “the right metadata is the key to preserving digital objects.”34 There are hundreds of metadata standards developed over the years for different user environments, disciplines, and for different purposes; many of them are in their second, third, or nth edition.35 Digital preservation and archiving requires metadata standards to trace and ensure its access to the digital objects. Several of the common standards are briefly discussed below. Dublin Core Metadata Initiative (DCMI, http://dublincore.org/) was initiated at the 2nd World Wide Web conference in 1994 and was standardized by ANSI/NISO Z39.85 in 2001 and ISO 15386 in 2003.36 The main purpose of the DCMI was to define an element set for representing web resources; initially, thirteen core elements were defined which later increased to a fifteen-element set. The elements are optional, repeatable, can be followed in any order, and expressed in XML.37 Metadata Encoding and Transmission Standard (METS, http://www.loc.gov/standards/mets/) is an XML metadata standard intended to represent information of the complex digital objects. METS elements evolved from the early project Making of America II “MOA2” in 2001, supported by the Library of Congress and sponsored by the Digital Library Federation “DLF” and registered with National Information Standards Organization “NISO” in 2004. A METS document contains seven major sections in which each contains different aspects of metadata.38 Metadata Object Description Schema (MODS, http://www.loc.gov/standards/mods/) was initiated by the MARC21 maintenance agency at the Library of Congress in 2002. MODS elements are richer then DCMI, simpler then MARC21 bibliographic format and expressed in XML.39 The MODS identified the widest facets or features of an object and presented nineteen high-level optional elements.40 Visual Resources Association Core Strategies (VRA Core, http://www.loc.gov/standards/vracore/) was developed in 1996, and the current version 4.0 was released in 2007. The VRA core is a widely used standard for art, libraries, and archives for such objects as paintings, drawings, sculpture, architecture, and photographs, as well as books and decorative and performance art.41 The VRA core contains nineteen elements and nine sub-elements.42 Preservation Metadata Implementation Strategies (PREMIS, http://www.loc.gov/standards/premis/) was developed in 2005, sponsored by the Online Computer Library Center (OCLC) and the Research Libraries Group (RLG), includes a data dictionary and some information about metadata. PREMIS defined a set of five interactive core semantic units or entities and XML schema for endorsing digital preservation activities. It is not SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 84 https://doi.org/10.6017/ital.v38i1.10181 concerned with discovery and access but with common metadata, and for descriptive metadata, other standards (Dublin Core, METS or MODS) need to be used. The PREMIS data model contains intellectual entities (contents that can be described as a unit, e.g., books, articles, databases), objects (discrete units of information in digital form, which can be files, bitstreams, or any representation), agents (people, organization, or software), events (actions that involve an object and an agent known to the system) and rights (assertion of rights and permission).43 It is indisputable that good metadata improves access to the digital object in the digital repository. Therefore, the creation and selection of appropriate metadata make the web archive accessible to the archive user. Structure metadata helps to manage the archival collection internally, as well as the related services, but may not always help to discover the primary source of the digital object.44 Currently, there are many semi-automated metadata generation tools. The use of these semi- automatic tools for generating metadata is crucial for the future, considering the operation’s complexity and cost of manual metadata origination.45 Archival Format The web archive initiatives select websites for archiving based on relevance of contents and the intended audience of the archived information. The size of the web archives varies significantly depending on their scope and the type of content they are preserving, e.g., web pages, PDF documents, images, audio, or video files.46 To preserve these contents, a web archive uses different storage formats containing metadata and utilizes data compression techniques. The Internet Archive defined the ARC format (http://archive.org/web/researcher/ArcFileFormat.php), later used as a defacto standard. In 2009, the Internet Organization for Standardization (ISO) established the WARC format (https://goo.gl/0RBWSN) as an official standard for web archiving. Approximately 54 percent of web archive initiatives applied ARC and WARC formats for archiving. The use of standard formats helps the archivists to facilitate the creation of collaborative tools, such as search engines and UI utilities to efficiently manipulate the archived data.47 Information Dissemination Mechanisms A well-defined preservation process can lead to a well-organized web archive that is easy to maintain and easy to retrieve a specific digital object from the collection using information dissemination techniques. Poor search results are one of the main problems in information dissemination of web archives. The users of a web archive expend excessive time to retrieve intended documents or information to satisfy the user’s query. Archivists are more concerned with “ofness,” “what collections are made up of,” although archive users are concerned with aboutness, “what collections are about.”48 To use the full potential of web archives a usable interface is needed to help the user to search the archive for specific digital object. Full text and keyword search are the dominant ways to search the unstructured information repository, evidently observed from the online search engines. The sophistication of search results against user queries is based on the ranking tools.49 The access tools and techniques are getting the attention of researchers, and approximately 82 percent of European web archives concentrate on such tools, which makes these web archives easily accessible.50 The Lucene full-text search engine and its extension NutchWAX is widely used in web archiving. Moreover, for the combination of semantic descriptions that already rely on or are implicit within their descriptive metadata, reasoning-based or semantic searching of the archival INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 85 collection can enable the system to produce novel possibilities for the archival content retrieval and browsing.51 Even in the current era of digital archives, mobile services are adopted in digital libraries, e.g., access to e-books, libraries databases, catalogs, and text messaging are common mobile services offered in university libraries.52 In a massive repository, a user query retrieves millions of documents, which makes it difficult for users to identify the most relevant information. The ranking model estimates the results relevancy based on user’s queries using specified criteria to overcome this problem and sorts the results by placing the most relevant result at the top.53 There are a number of ranking models that exist in the literature, e.g., conventional ranking models, e.g., TF-IDF, BM25F, temporal ranking models, e.g., PageRank, and learning to rank models, e.g., L2R. The findings of the systematic approach for web preservation are used to automate the process of the digital news-story preservation. The steps of the proposed model are carefully adopted to develop a tool that is able to add contextual information to the stories to be preserved. DIGITAL NEWS STORIES PRESERVATION FRAMEWORK The advancement of web technologies and maturation of the internet attracts news readers to access news online that is provided by multiple sources and to obtain the desired information comprehensively. The amount of news published online has grown rapidly, and for an individual, it is cumbersome to browse through all online sources for relevant news articles. The news generation in the digital environment is no longer a periodic process with a fixed single output, such as printed newspapers. The news is instantly generated and updated online in a continuous fashion. However, because of different reasons, such as the short lifespan of digital information and the speed of generation of information, it has become vital to preserve digital news for the long term. Digital preservation includes various actions to ensure that digital information remains accessible and usable, as long as they are considered important.54 Libraries and archives preserve by carefully digitizing newspapers considering as a good source of knowing the history. Many approaches have been developed to preserve digital information for the long term. The lifespan of news stories published online varies from one newspaper to another, i.e., from one day to a month. However, a newspaper may be backed up and archived by the news publisher or national archives; in the future, it will be difficult to access particular information published in various newspapers regarding the same news story. The issues become even more complicated if a story is to be tracked through an archive of many newspapers, which requires different access technologies. The Digital News Story Preservation (DNSP) framework was introduced to preserve digital news articles published online from multiple sources.55 The DNSP framework is planned based on adopting the proposed step-by-step systematic approach for web preservation to develop a well- organized web archive. Initially, the main objectives defined for the DNSP framework are: • To initiate a well-organized national level digital news archive of multiple news sources. • To normalize news articles during preservation to a common format for future use. • To extract explicit and implicit metadata, which would be helpful in ingesting stories to the archive and browsing through the archive in the future. • To introduce content-based similarity measures to link digital news articles during preservation. SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 86 https://doi.org/10.6017/ital.v38i1.10181 The Digital News Story Extractor (DNSE) is a tool developed to facilitate the extraction of news stories from the online newspapers and to migrate to a normalized format for preservation. The normalized format also includes a step to add metadata in the Digital News Stories Archive (DNSA) for future use.56 To facilitate the accessibility of news articles preserved from multiple sources, some mechanisms need to be adopted for linking the archived digital news articles. An effective term-based approach “Common Ratio Measure for Stories (CRMS)” for linking digital news articles in DNSA is introduced that links similar news articles during the preservation process.57 The approach is empirically analyzed, and the results of the proposed approach are compared to get conclusive arguments. The initial results computed automatically using a common ratio measure for stories are encouraging and are compared with the similarity of news articles based on human judgment. The results are generalized by defining a threshold value based on multiple experimental results using the proposed approach. Currently, there is ongoing work to extend the scope of DNSA to dual languages, i.e., Urdu and English, as well as content-based similarity measures to link news articles published in Urdu- English. Moreover, research is underway to develop tools for exploiting the linkage created among stories during the preservation process for search and retrieval tasks. SUMMARY Effective strategic planning is critical in creating web archives; hence, it requires a well- understood and a well-planned preservation process. The process should result in a well- organized web archive that includes not only the content to be preserved but also the contextual information required to interpret the content. The study attempts to answer many questions by guiding the archivists and related personnel, such as: How to lead the web preservation process effectively? How to initiate the preservation process? How to proceed through different steps? What are the possible techniques that may help to create a well-organized web archive? How can the archived information can be used to its greatest potential? To answer these questions, the study resulted in an appropriate step-by-step process for web preservation and a well-organized web archive. The targeted goal of each step is identified by researching the existing approaches that can be adopted. The possible techniques for those approaches are discussed in detail for each step. REFERENCES 1 “World Wide Web Size,” The size of the World Wide Web, visited on Jan 31, 2019, http://www.worldwidewebsize.com/. 2 Brian F. Lavoie, “The Open Archival Information System Reference Model: Introductory Guide,” Microform & Imaging Review 33, no. 2 (2004): 68-81; Alexandros Ntoulas, Junghoo Cho, and Christopher Olston, “What's New on the Web? The Evolution of The Web from a Search Engine Perspective,” in Proceedings of the 13th International Conference on World Wide Web-04 (New York, NY: ACM, 2004), 1-12. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 87 3 Teru Agata et al., “Life Span of Web Pages: A Survey of 10 million Pages Collected in 2001,” IEEE/ACM Joint Conference on Digital Libraries, (IEEE, 2014), 463-64, https://doi.org/10.1109/JCDL.2014.6970226. 4 Timothy Robert Hart and Denise de Vries, “Metadata Provenance and Vulnerability,” Information Technology and Libraries 36, no. 4 (Dec. 2017): 24-33, https://doi.org/10.6017/ital.v36i4.10146. 5 Claire Warwick et al., “Library and Information Resources and Users of Digital Resources in the Humanities,” Program 42, no. 1 (2008): 5-27, https://doi.org/10.1108/00330330810851555. 6 Lavoie, “Open Archival Information System Reference Model.” 7 Susan Farrell, K. Ashley, and R. Davis, “A Guide to Web Preservation,” Practical Advice for Web and Records Managers Based on Best Practices from the JISC-Funded PoWR Project (2010), https://jiscpowr.jiscinvolve.org/wp/files/2010/06/Guide-2010-final.pdf. 8 Lavoie, “Open Archival Information System Reference Model;” Farrell, Ashley, and Davis, “Guide to Web Preservation.” 9 Peter Lyman, “Archiving the World Wide Web,” Washington, Library of Congress (2002), https://www.clir.org/pubs/reports/pub106/web/. 10 Diomidis Spinellis, “The Decay and Failures of Web References,” Communications of the ACM 46, no. 1 (2003): 71-77, https://dl.acm.org/citation.cfm?doid=602421.602422. 11 Digital Archive for Chinese Studies (DACHS) Archive2 https://www.zo.uni- heidelberg.de/boa/digital_resources/dachs/index_en.html, visited on Jan 31, 2019. 12 Julien Masanès, “Web Archiving Methods and Approaches: A Comparative Study,” Library Trends 54, no. 1 (2005): 72-90, https://doi.org/10.1353/lib.2006.0005. 13 Hanno Lecher, “Small Scale Academic Web Archiving: DACHS,” in Web Archiving (Berlin/Heidelberg: Springer, 2006), 213-25, https://doi.org/10.1007/978-3-540-46332- 0_10. 14 Daniel Gomes et al., “Introducing the Portuguese Web Archive Initiative,” in 8th international Web Archiving Workshop (Berlin/Heidelberg: Springer, 2009). 15 Gerrit Voerman et al., “Archiving the Web: Political Party Web Sites in the Netherlands,” European Political Science 2, no. 1 (2002): 68-75, https://doi.org/10.1057/eps.2002.51. 16 Sonja Gabriel, “Public Sector Records Management: A Practical Guide,” Records Management Journal 18, no. 2 (2008), https://doi.org/10.1108/00242530810911914. 17 Farrell, Ashley, and Davis, “Guide to Web Preservation.” SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 88 https://doi.org/10.6017/ital.v38i1.10181 18 Jung-ran Park and Andrew Brenza, “Evaluation of Semi-Automatic Metadata Generation Tools: A Survey of the Current State of the Art,” Information Technology and Libraries 34, no. 3 (Sept, 2015): 22-42, https://doi.org/10.6017/ital.v34i3.5889. 19 Muzammil Khan and Arif Ur Rahman, “Digital News Story Preservation Framework,” in Digital Libraries: Providing Quality Information: 17th International Conference on Asia-Pacific Digital Libraries, ICADL 2015 Seoul, Korea, December 9-12, 2015 (Proceedings, vol. 9469, Springer, 2015), 350-52, https://doi.org/10.1007/978-3-319-27974-9; Muzammil Khan, “Using Text Processing Techniques for Linking News Stories for Digital Preservation,” PhD Thesis, Faculty of Computer Science, Preston University Kohat, Islamabad Campus, HEC Pakistan, 2018. 20 Dennis Dimick, “Adobe Acrobat Captures the Web,” Washington Apple Pi Journal (1999): 23-25. 21 Trupti Udapure, Ravindra D. Kale, and Rajesh C. Dharmik, “Study of Web Crawler and Its Different Types,” IOSR Journal of Computer Engineering (IOSR-JCE) 16, no. 1 (2014): 01-05, https://doi.org/10.9790/0661-16160105. 22 Dora Biblarz et al., “Guidelines for a Collection Development Policy Using the Conspectus Model,” International Federation of Library Associations and Institutions, Section on Acquisition and Collection Development (2001). 23 Farrell, Ashley, and Davis, “Guide to Web Preservation;” E. Pinsent et al., “PoWR: The Preservation of Web Resources Handbook,” http://jisc.ac.uk/publications/programmerelated/2008/powrhandbook.aspx (2010); Michael Day, “Preserving the Fabric of Our Lives: A Survey of Web Preservation Initiatives,” Lecture Notes in Computer Science (Berlin/Heidelberg: Springer, 2003): 461-72, https://doi.org/10.1007/978-3-540-45175-4_42. 24 Pinsent et al., “PoWR:”; Day, “Preserving the Fabric.” 25 Allan Arvidson, “The Royal Swedish Web Archive: A Complete Collection of Web Pages,” International Preservation News (2001): 10-12. 26 Andreas Rauber, Andreas Aschenbrenner, and Oliver Witvoet, “Austrian Online Archive Processing: Analyzing Archives of the World Wide Web,” Research and Advanced Technology for Digital Libraries (2002): ECDL 2002. Lecture Notes in Computer Science, vol 2458, (Berlin/Heidelberg: Springer, 2002), 16-31, https://doi.org/10.1007/3-540-45747-X_2. 27 William Arms, “Collecting and Preserving the Web: The Minerva Prototype,” RLG DigiNews 5, no. 2 (2001). 28 Sonya Betz and Robyn Hall, “Self-Archiving with Ease in an Institutional Repository: Micro Interactions and the User Experience,” Information Technology and Libraries 34, no. 3 (Sept. 2015): 43-58, https://doi.org/10.6017/ital.v34i3.5900. 29 Serge Abiteboul et al., “A First Experience in Archiving the French Web,” in International Conference on Theory and Practice of Digital Libraries, (Berlin/Heidelberg: Springer, 2002), 1- 15, https://doi.org/10.1007/3-540-45747-X_1; Sergey Brin and Lawrence Page, “Reprint of: INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2019 89 The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Computer Networks 56, no. 18 (2012): 3825-33, https://doi.org/10.1016/j.comnet.2012.10.007. 30 Masanès, “Web Archiving.” 31 NISO-Press, “Understanding Metadata,” National Information Standards (2004), http://www.niso.org/publications/understanding-metadata. 32 Ibid. 33 Jane Greenberg, “Understanding Metadata and Metadata Schemes,” Cataloging & Classification Quarterly 40, no. 3-4 (2009): 17-36, https://doi.org/10.1300/J104v40n03_02. 34 Michael Day, “Preservation Metadata Initiatives: Practicality, Sustainability, and Interoperability,” Publishers: Archivschule Marburg (2004): 91-117. 35 Jenn Riley, Glossary of Metadata Standards (2010). 36 Corey Harper, “Dublin Core Metadata Initiative: Beyond the Element Set,” Information Standards Quarterly 22, no. 1 (2010): 20-31. 37 Jane Greenberg, “Dublin Core: History, Key Concepts, and Evolving Context (Part One),” in Slide Presentation on dc-2010 International Conference on Dublin Core and Metadata Applications Pittsburgh, PA (2010). 38 Cundiff V. Morgan, “An Introduction to the Metadata Encoding and Transmission Standard (METS),” Library Hi Tech 22, no. 1 (2004): 52-64, https://doi.org/10.1108/07378830410524495; Leta Negandhi, “Metadata Encoding and Transmission Standard (METS),”In Texas Conference on Digital Libraries, TCDL-2012 (2012). 39 Sally H. McCallum, “An Introduction to the Metadata Object Description Schema (MODS),” Library Hi Tech 22, no. 1 (2004): 82-88, https://doi.org/10.1108/07378830410524521. 40 R. Gartner, “MODE: Metadata Object Description Schema,” JISC Techwatch Report TSW (2003): 03-06. www.loc.gov/standards/mods/. 41 VRA-Core, “An Introduction of VRA Core,” http://www.loc.gov/standards/vracore/VRA Core4 Intro.pdf, Created: Oct 2014. 42 VRA-Core, “VRA Core Element Outline,” http://www.loc.gov/standards/vracore/VRA Core4 Outline.pdf, Created: Feb 2007. 43 Priscilla Caplan, “Understanding PREMIS,” Washington DC, USA: Library of Congress, (2009), https://www.loc.gov/standards/premis/understanding-premis.pdf; J. Relay, “An Introduction to PREMIS,” Singapore IPRESS Tutorial, (2011), http://www.loc.gov/standards/premis/premistutorial iPRES2011 singapore.pdf. SYSTEMATIC APPROACH TOWARDS WEB PRESERVATION | KHAN AND UR RAHMAN 90 https://doi.org/10.6017/ital.v38i1.10181 44 Jennifer Schaffner, “The Metadata is the Interface: Better Description for Better Discovery of Archives and Special Collections, Synthesized from User Studies,” Making Archival and Special Collections More Accessible, 85 (2015). 45 Joao Miranda and Daniel Gomes, “Trends in Web Characteristics,” in Web Congress, 2009. LA- WEB'09. Latin American, (IEEE, 2009), 146-53, https://doi.org/10.1109/LA-WEB.2009.28. 46 Daniel Gomes, João Miranda, and Miguel Costa, “A Survey on Web Archiving Initiatives,” Research and Advanced Technology for Digital Libraries (2011): 408-20, https://doi.org/10.1007/978-3-642-24469-8_41. 47 Ibid. 48 Schaffner, “Metadata is the Interface.” 49 Miguel Costa and Mário J. Silva, “Evaluating Web Archive Search Systems,” in International Conference on Web Information Systems Engineering (Berlin/Heidelberg: Springer, 2012), 440- 454. https://doi.org/10.1007/978-3-642-35063-4_32. 50 Foundation, I, “Web Archiving in Europe,” technical report, CommerceNet Labs (2010). 51 Georgia Solomou and Dimitrios Koutsomitropoulos, “Towards an Evaluation of Semantic Searching in Digital Repositories: A DSpace Case-Study,” Program 49, no. 1 (2015): 63-90, https://doi.org/10.1108/PROG-07-2013-0037. 52 Liu Yan Quan and Sarah Briggs, “A Library in the Palm of Your Hand: Mobile Services in Top 100 University Libraries,” Information Technology and Libraries 34, no. 2 (June 2015): 133, https://doi.org/10.6017/ital.v34i2.5650. 53 Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval 463. (New York: ACM Pr., 1999). 54 Daniel Burda and Frank Teuteberg, “Sustaining Accessibility of Information through Digital Preservation: A Literature Review,” Journal of Information Science, 39, no. 4 (2013): 442-58, https://doi.org/10.1177/0165551513480107. 55 Muzammil Khan et al., “Normalizing Digital News-Stories for Preservation,” in Digital Information Management (ICDIM), 2016 Eleventh International Conference on (IEEE, 2016), 85- 90, https://doi.org/10.1109/ICDIM.2016.7829785. 56 Khan, et al., “Normalizing Digital News.” 57 Muzammil Khan, Arif Ur Rahman, and M. Daud Awan, “Term-Based Approach for Linking Digital News Stories,” in Italian Research Conference on Digital Libraries (Cham, Switzerland: Springer, 2018), 127-38, https://doi.org/10.1007/978-3-319-73165-0_13. 10191 ---- Primo New User Interface: Usability Testing and Local Customizations Implemented in Response Blake Lee Galbreath, Corey Johnson, and Erin Hvizdak INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 10 Blake Lee Galbreath (blake.galbreath@wsu.edu) is Core Services Librarian, Corey Johnson (coreyj@wsu.edu) is Instruction and Assessment Librarian, and Erin Hvizdak (erin.hvizdak@wsu.edu) is Reference and Instruction Librarian, Washington State University. ABSTRACT Washington State University was the first library system of its 39-member consortium to migrate to Primo New User Interface. Following this migration, we conducted a usability study in July 2017 to better understand how our users fared when the new user interface deviated significantly from the classic interface. From this study, we learned that users had little difficulty using basic and advanced search, signing into and out of primo, and navigating their account. In other areas, where the difference between the two interfaces was more pronounced, study participants experienced more difficulty. Finally, we present customizations implemented at Washington State University to the design of the interface to help alleviate the observed issues. INTRODUCTION A July 2017 usability study by Washington State University (WSU) Libraries was the final segment of a six- month process for migrating to the new user interface of Ex Libris Primo called Primo New UI. WSU Libraries assembled a working group in December 2016 to plan for the migration from the classic interface to Primo New UI and met bi-weekly through May 2017. To start, the Primo New UI working group attempted to answer some baseline questions: What can and cannot be customized in the new interface? How, and according to what timeline, should we introduce the new interface to our library patrons? What methods could be used to assess the new interface? This working group customized the look and feel of the new interface to conform to WSU branding and then released a beta version of Primo New UI in March, leaving the older interface (Primo Classic) as the primary means of access to Primo but allowing users to enter and test the beta version of the new interface. In early May (at the start of the Summer semester), the prominence of the old and new interfaces was reversed, making Primo New UI the default interface but leaving the possibility of continued access to Primo Classic. The older interface was removed from public access in mid-August, just prior to the start of the Fall semester. The public had the opportunity to work with the beta version from March to May and then another two months experience with the production release by the time the usability study took place in July 2017. The remainder of this paper will focus on the details of this usability study. mailto:blake.galbreath@wsu.edu mailto:coreyj@wsu.edu mailto:erin.hvizdak@wsu.edu PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 11 https://doi.org/10.6017/ital.v37i2.10191 RESEARCH QUESTIONS Primo New UI was the name given to the new front end of the Primo discovery layer, which was made available to customers in August 2016. According to Ex Libris, “Its design is based on user studies and feedback to address the different needs of different types of users.”1 We were primarily interested in understanding the usability of the essential functionalities of Primo New UI, especially where the design of the new interface deviated significantly from the classic interface (taking local customizations into account). For example, we noted that the new interface introduced the following differences to the user (this ordinal list corresponds to the number labels in figure 1): 1. Basic Search tabs were expressed as drop-downs. 2. The Advanced Search link was less prominent than it was with our customized shape and color in the classic interface. 3. Main Menu items were located in a separate area from the Sign In and My Account links. 4. My Favorites and Help/Chat icons were located together and in a new section of the top navigation bar. 5. Sign In and My Account links were hidden beneath a “Guest” label. 6. Facet values were no longer associated with checkboxes or underlining upon hover. 7. Availability statuses were expressed through colored text. Figure 1. Basic search screen in Primo New UI. We also observed a fundamental change in the structure of the record in Primo New UI: the horizontally oriented and tabbed structure of the classic record (see figure 2) was converted to a vertically oriented and non-tabbed structure in the new interface (see figure 3). Additionally, the tabbed structure of the classic interface opened in a frame of the Brief Results area, while the same information was displayed on the Full Display page of the new interface. The options displayed in these areas are known as Get It and View It (although we locally branded our sections Availability and Request Options and Access Options, INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 12 respectively). Therefore, we were eager to see how this change in layout might affect a participant’s ability to find Get It and View It information on the Full Display page. Taking the above observations into account, we formulated the following questions: 1. Will the participant be able to find and use the Basic Search functionality? 2. Will the participant be able to understand the availability information of the brief results? 3. Will the participant be able to find and use the Sign In and Sign Out features? 4. Will the participant be able to understand the behavior of the facets? 5. Will the participant be able to find and use the Actions Menu? (See the “Send to” boxed area in figure 3.) 6. Will the participant be able to navigate the Get It and View It areas of the Full Display page? (See the “Availability and Request Options” boxed area in figure 3.) 7. Will the participant be able to navigate the My Account area? 8. Will the participant be able to find and use the Help/Chat and My Favorites icons? 9. Will the participant be able to find and use the Advanced Search functionality? 10. Will the participant be able to find and use the Main Menu items? (See figure 1, number 3.) Figure 2. Horizontally oriented and tabbed layout of Primo Classic. LITERATURE REVIEW 2012 witnessed a flurry of studies involving Primo Classic. Majors compared the experiences of users within the following discovery interfaces: Encore Synergy, Summon, WorldCat Local, Primo Central, and EBSCO Discovery Service. The study used undergraduate students enrolled at the University of Colorado and focused on common undergraduate searching activities. Each interface was tested by five or six participants who also completed an exit survey. Observations specific to the Primo interface noted that users had difficulty finding and using existing features, such as email and e-shelf, and difficulty connecting their failed searches to interlibrary loan functionality.2 PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 13 https://doi.org/10.6017/ital.v37i2.10191 Figure 3. Vertically oriented and non-tabbed Layout of Primo New UI. Comeaux noted issues relating to terminology and the display of services during usability testing carried out at Tulane University. Twenty people, including undergraduates, graduates, and faculty members, participated in this study, which tested five typical information-seeking scenarios. The study found several problems related to terminology. For example, participants did not fully understand the meaning of the Expand My Results functionality.3 Participants also did not understand that the display text “No full-text” could be used to order an item via Interlibrary Loan. 4 The study also concluded that the mixed presentation of differing resource types (e.g., books, articles, reviews) was confusing for patrons who were attempting known-item searches.5 Jarrett documented a usability study conducted at Flinders University Library. The aims of the study were to determine user perceptions regarding the usability of the discovery layer, the relevance of the information retrieved, and the user experiences of this search interface compared to other interfaces. 6 The usability portion of the study scored the participants’ completion of tasks in the Primo discovery layer as difficult, confusing, neutral, or straightforward. Scores indicated that participants had difficulty determining different editions of a book, locating a local thesis, and placing an item on hold. The investigators also observed that students had issues signing into Primo and distinguishing between journals and journal articles.7 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 14 Nichols et al. conducted a usability test on a newly implemented Primo instance at the University of Vermont Libraries in 2012. Their research questions were designed to understand Primo’s design, functionality, and layout.8 The majority of the participants were undergraduate students. Similar to Comeaux, confusion occurred when participants had to find specific or relevant records within longer sets of results.9 Nichols et al. also noticed that test subjects had difficulty navigating and finding information in the Primo tabbed structure. Like Jarrett, Nichols et al. noted that participants had difficulty distinguishing between the journals and articles.10 Similar to Majors, participants in Nichols et al. had difficulty finding certain Primo functionality, such as email, the e-Shelf, and the feature to open items in a new window.11 The investigators concluded that these tools were difficult to find because they were buried too deep in the interface. The University of Kansas Libraries conducted two usability studies on Primo. The first study took place during the 2012–13 academic year and involved 27 participants, including undergraduate, graduate, and professional students, who performed four to five main tasks in two separate sessions. Similar to other studies, participants experienced great difficulty using the Save to E-shelf and Email Citation tools.12 Kliewer et al. conducted the second usability study in 2016, which focused primarily on student satisfaction with the Primo discovery tool. Thirty undergraduates participated in this study that collected both qualitative and quantitative data. In contrast to most usability studies of discovery services, this study allowed participants to explore Primo with open-ended searches to more closely mimic natural searching strategies. Results of the study indicated that the participants preferred Basic Search to Advanced Search, used facets (but not enough to maximize their searching potential), rarely moved beyond the first page of search results, and experienced difficulties using the link resolver. In response to the latter, a Primo working group clarified language on the link resolver page to better differentiate between links to articles and links to journals.13 Brett, Lierman, and Turner conducted a usability study at the University of Houston Libraries focusing primarily on undergraduate students. Users were able to complete the assigned tasks, but the majority did not do so in the most efficient manner. That is, the participants did not take full advantage of Primo functionality, such as facets, holds, and recalls. Additionally, some participants exhibited difficulty deciphering among the terms journals, journal articles, and newspaper articles. Another difficulty participants experienced was knowing what further steps to take once they had successfully found an item in the results list. For example, participants had trouble locating stacks guides, finding request features, and using call numbers. The researchers concluded that many of the issues witnessed in this usability study could be mitigated via library instruction.14 Usability testing of Primo New UI has recently begun to take a foothold in academic libraries. In addition to conducting usability testing on the Primo Classic in April 2015 (5 participants, 5–6 tasks), researchers at Boston University carried out both pre- and post-launch testing of the new interface in December 2016 and April 2017, respectively. Pre-launch testing with five student participants identified issues with “labelling, locating links to online services, availability statement links in full results, [and] My Favorites.”15 After completing fixes, post-launch testing with four students (2 infrequent users, 2 frequent) found that they were able to easily complete tasks, use filters, save results, and find links to online resources. Usage statistics for the new interface, compared to classic, also showed an increased use of facets after fixes, and an increase in the use of some features but decrease in the use of others, providing information on what features warranted further examination.16 PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 15 https://doi.org/10.6017/ital.v37i2.10191 California State University (CSU) libraries conducted usability studies on Primo New UI with 24 participants (undergraduate students, graduate students, and faculty) across five CSU campuses. Five standard tasks were required: find a specific book, find a specific film, find a peer-reviewed journal article, find an item in the CSU network not owned locally, and find a newspaper article. Each campus added additional questions based on local needs. Participants were overwhelmingly positive about the interface look and feel, ease of use, and speed of the system. The success rate for each task varied across the campuses, with participants having greater success on simple tasks such as finding a specific or known item and mixed results on more difficult tasks including using scopes, understanding icons and elements of the FRBR record, and facets. Steps were taken to relabel and rearrange the scopes and facets so that they were more meaningful to users, and FRBR icons were replaced. The authors concluded that Primo is an ideal solution to incorporate both global changes and local preference because of its customizability.17 University of Washington Libraries conducted usability studies on the classic and new Primo interfaces. The Primo New UI study observed 12 participants. Each 60-minute session included an orientation, pre- and post-tests, tasks, and follow-up questions. Difficulties were noted with terminology, the site logo, the inability to select multiple facets, unclear navigation, volume requesting, Advanced Search logic, the pin location in item details, and the date facet. A/B testing with 12 participants (from both the New and C lassic UI studies) revealed the need to fix the Sign-In prompt for My Favorites, enable libraries to add custom actions to the actions menu, add a sort option for favorites in the new interface, add the ability to rearrange elements on a single item page, and add Zotero support. Overall, participants preferred the new interface. Generally, participants easily completed basic tasks, such as known-item searches, searches for course reserves, and open searches, but had more difficulty with article subject searching, audio/visual subject searching, and print-volume searching, which was consistent from the classic to the new interfaces for student participants.18 METHOD We conducted a diagnostic usability evaluation of Primo New UI using eight participants, whom we recruited from the WSU faculty, staff, and student populations. In the end, we received a skewed distribution among the categories: three members of staff and five students (two undergraduate students and three graduate students). The initial composition of the participants comprised a greater number of undergraduate students, but substitution created the final makeup. All the study participants had some exposure to Primo Classic in the past. We recruited participants by hanging flyers around the libraries of our Pullman campus and the adjoining student commons area. We offered the participants $15 in exchange for their time, which we advertised as being a maximum of one hour. The usability test was designed by a team of three library staff, one from Systems (IT) and two from Research Services (reference/instruction). Two of us were present at each session, one to read the tasks aloud and the other to document the session. We used Camtasia to record each session so that we would have the ability to return to it later if we needed to verify our notes or other specifics of the session. We stored the recordings on a secured share of the internal library drive. We received an Institutional Review Board Certificate of Exemption (IRB #16190) to conduct this study. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 16 This usability test comprised eleven tasks (see appendix A) to test the research questions described above. The tasks were drafted in consultation with the Ex Libris set of recommendations for conducting Primo usability testing.19 Each investigator drew their conclusions as to the participants’ successes and failures. We then met as a group to form a consensus regarding task success and failure (see appendix B). We met to discuss the patterns that emerged and to formulate remedies to problems we perceived as hindering student success. RESULTS For each of the ten research questions below, consult appendix B to see details regarding the associated tasks and how each participant approached and completed each task. Task set(s) related to research question 1: Will the participant be able to find and use the Basic Search functionality? This was one of the easier tasks for the participants to complete. Some participants did not follow the task literally to find their favorite book or movie, but rather completed a search for an item or topic of interest to them. All the participants completed this task successfully. Task set(s) related to research question 2: Will the participant be able to understand the availability information of the brief results? The majority of the participants understood that the availability text and its color represented important access information. However, there were instances where the color of the availability status was in conflict with its text. This led at least one participant to evaluate the availability of a resource incorrectly. Task set(s) related to research question 3: Will the participant be able to find and use the Sign In and Sign Out features? The participants all successfully completed this task. Participants used multiple methods to sign in: the Guest link in the top navigation bar, the Sign In link from the ellipsis Main Menu Item, and the Get It Sign In link on the Full Display page. All participants signed out via the User link in the top navigation bar. Task set(s) related to research question 4: Will the participant be able to understand the behavior of the facets? Almost all of the participants were able to select the Articles facet without issue. One person, however, misunderstood the include behavior of the facets. Instead of using the include behavior, this participant used the exclude behavior to remove all facets other than the Articles facet. Only two participants attempted to use the Print Books facet to complete the task, “From the list of results, find a print book that you would need to order from another library.” Instead, the other 75 percent simply scanned the list of results to find the same information. Five out of the eight participants attempted to find the Peer-Reviewed facet when completing the task to choose any peer-reviewed article from a results list: three were successful, while one selected the Newspaper Articles facet, and another selected the Reviews facet. Task set(s) related to research question 5: Will the participant be able to find and use the Actions Menu? The tasks related to the Actions Menu (copy a citation and email a record) were some of the most difficult for the participants: two were successful, three had some difficulty, and three were unsuccessful. Of those PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 17 https://doi.org/10.6017/ital.v37i2.10191 who experienced difficulty, one seemed not to understand the task fully; this participant found and copied the citation, but then spent additional time looking for a “clipboard.” The other two participants were both distracted by competing areas of interest: the Citations section of the Full Display and the section headings of the Full Display. Of those who were unsuccessful, one suffered from a technical issue that Ex Libris needs to resolve (the functionality to expand the list of action items failed), one did not seem to understand what a citation was when they found it, and another could not find the email functionality. This last subject continued searching in the ellipsis area of the Main Menu, in the My Account area, and the facets, but ultimately never found the Email icon in the scrolling section of the Actions Menu. Task set(s) related to research question 6: Will the participant be able to navigate the Get It and View It areas of the Full Display page? Three participants experienced substantial difficulty in completing this set of tasks. These participants were distracted by the styled Show Libraries and Stack Chart buttons on the Full Display page that were competing for attention with the requesting options. Task set(s) related to research question 7: Will the participant be able to navigate the My Account area? All of the participants completed this task successfully. Four participants located the back-arrow icon to exit the My Account area, while the other four participants used alternate methods: using the library logo, selecting the New Search button, and signing out of Primo. Task set(s) related to research question 8: Will the participant be able to find and use the Help/Chat and My Favorites icons? Participants encountered very little difficulty in finding a way to procure help and chat with a librarian, with one exception. Participant 2 immediately navigated to and opened our Help/Chat icon, but then moved away from this service because it opened in a new tab. This same participant, along with three others, had a more difficult time finding and deciding to use the Pin this Item icon than did the three participants who completed the same task with ease. The remaining participant failed to complete this task because they could not find the My Favorites area of Primo. Task set(s) related to research question 9: Will the participant be able to find and use the Advanced Search functionality? One participant had more trouble finding the Advanced Search functionality than the other seven. Another experienced a technical difficulty, in which the Primo screen froze during the experiment, and we had to begin the task anew. The remaining six people easily finished the tasks. Task set(s) related to research question 10: Will the participant be able to find and use the Main Menu items? The majority of the participants completed this task with ease, navigating to the Databases link in the Main Menu items. One participant, however, was confused by the term database but was able to succeed once we provided a brief definition of the term. The remaining two participants were further confused by the term and instead entered general search terms into the Primo search bar. These two participants failed to find the list of databases. DISCUSSION INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 18 Study participants completed four of our task sets with relative ease: using Basic Search (see research question 1 above), signing into and out of Primo (see research question 3 above), navigating their My Account area (see research question 7 above), and using Advanced Search (see research question 9 above). There was one exception: one participant experienced minor trouble finding the Advanced Search link, checking first among the drop-down options on our Basic Search page. Subsequent and unrelated to this study, WSU elected to eliminate the first set of drop-down options from our Primo landing page. Further testing might tell us if this elimination in the number of drop-down options has effectively made the Advanced Search link more prominent for users. Also, the ease with which participants were able to use items located underneath the “Guest” label contradicted our expectations. We predicted that this opacity would cause users issues, but it did not seem to deter them. From this, we concluded that the placement of the sign in options in the upper right corner is sufficient to maintain continuity. Participants encountered a moderate degree of difficulty completing two task sets: determining availability statuses and navigating the Get It area of the Full Display page. Concerning availability, participants were quick to understand that statuses such as “Check holdings” relayed that the item was not available. The participants were also keen to notice that green availability statuses implied access while non -green availability statuses implied non-access. However, per the design of the new interface, certain non-green links became green after opening the Full Display page of Primo. This was a significant deviation from the classic interface, where colors indicating availability status did not change. This design element misled one participant. Of note, we did not observe participants experiencing issues with the converted format of the Get It and View It areas (see figures 2 and 3) per se. However, we did notice that three of our participants were unnecessarily distracted by the Show Libraries link when trying to find resource sharing options because WSU had previously styled the Show Libraries links with color and shape. Therefore, our local branding in this area impeded usability and led us to rethink the hierarchy of actions on the Full Display page. Similar to comments made by DeMars, study participants also remarked that the layout of the Full Display was cluttered and difficult to read.20 We therefore took steps to make this page more readable for the viewer. Study participants displayed the greatest difficulty completing the remaining four task sets: selecting a Main Menu item, refining a search via the facets, using the Actions Menu, and navigating the My Favorites functionality. However, web design was not necessarily the culprit in all four areas. Three participants experienced difficulty finding the Databases link (a Main Menu item). After further discussion, it became apparent that this trouble related not to usability but to information literacy—they did not understand the term databases. Therefore, like Majors and Comeaux,21 we recognize the recurring issue of library jargon, and like Brett, Lierman, and Turner,22 we believe that this issue would best be mitigated via library instruction. In agreement with the literature, two participants selected the incorrect facet because they had difficulty distinguishing among the terms articles, newspaper articles, reviews and peer-reviewed.23 Further, one of these participants experienced even more difficulty because of not understanding the inherent functionality of the facet values. That is, this participant did not grasp that the facet value links performed an inclusion process by default. To the contrary, this person believed that they would have had to exclude all unwanted facet values to arrive at the wanted facet value. The change in facet behavior between classic and new interfaces likely caused this confusion. In Primo Classic, WSU had installed a local customization that provided checkboxes and underlining upon hover for each facet value. The new interface did not PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 19 https://doi.org/10.6017/ital.v37i2.10191 provide either one of these clues to the user. Additionally, we observed, similar to Kliewer et al. and Brett, Lierman, and Turner, that participants oftentimes preferred to scan the results list over refining their search via faceting.24 This finding also matches a 2014 Ex Libris user study indicating that users are easily confused by too many interface options and thus tend to ignore them.25 Regarding the Actions Menu, the majority of the participants attempted to find the Email icon in the correct section of the Full Display page (i.e., the “Send To” section). However, because of a technical issue in the design of the new interface, the Email icon was not always present for the participant to find. For others, it was difficult to reach the icon even when it was present as participants had to click the right arrow three to four times to navigate past all the citation manager icons. This observed difficulty in finding existing functionalities in Primo echoes that cited by Majors and Nichols et al.26 Participants also experienced significant difficulty deciphering between the similarly named functionalities of the Citation icon and the Citations section of the Full Display page. As a result of this observed difficulty, we concluded that differentiating sections of the page with distinct naming conventions would be beneficial to users. Like the results reported by Boston University, our study participants encountered significant issues when trying to save items into their My Favorites list.27 We noticed that participants had difficulty making connections between the icons named Keep this Item/Remove this Item and the My Favorites area. During testing, it was clear that many of the participants were drawn to the pin icon for the correctly anticipated functionality but then were confused that the tooltips did not include any language resembling “My Favorites.” From this last observation, we surmised that providing continuity in language between these icons and the My Favorites area would increase usability for our library patrons. Pepitone reported problems with the placement of the My Favorites pin icon,28 but we observed this being less of a problem than the actual terminology used to name the pin icon. Beyond success and failure, a 2014 Ex Libris user study suggested that academic level and discipline play a key role in user behavior.29 However, we were unable to draw meaningful conclusions among user groups because of our small and homogenous participant pool. DECISIONS MADE IN RESPONSE TO USABILITY RESULTS Declined to Change Facets. Although one participant did not understand the inclusion mechanism of the facet values, we declined to investigate a customization in this area. According to the Primo August 2017 release notes, Ex Libris plans to make considerable changes to the faceting functionality.30 Therefore, we decided to wait until after this release to reassess whether customization was warranted. Implemented a Change Labels Citations. We observed confusion between the Citation icon of the Actions Menu and the section of the Full Display page labeled “Citations.” To differentiate between the two items, we changed the Actions Menu icon text to “Cite This Item” (see figure 4) and the heading for the Citations section to “References Cited” (see figure 5). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 20 Figure 4. Cite This Item icon of the Actions Menu. Figure 5. References Cited section of the Full Display page. My Favorites. There was a mismatch among the tooltip texts of the My Favorites icons. We changed the tooltip language for the “Keep this item” pin to read “Add to My Favorites” (see figure 6) and the tooltip language for the “Unpin this item” pin to read “Remove from My Favorites” (see figure 7). Figure 6. Add to My Favorites language for My Favorites Tooltip. Figure 7. Remove from My Favorites language for My Favorites Tooltip. Availability Statuses. Per the design of the new interface, certain non-green links became green after opening the Full Display page of Primo New UI. We implemented CSS code to retain the non-green coloring of the availability statuses after opening the Full Display. In this case, “Check holdings” remains orange (see figure 8). Figure 8. Availability status color of Brief Display, before and after opening the Full Display. PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 21 https://doi.org/10.6017/ital.v37i2.10191 Link Removal Full Display Page Headings. There was confusion as to the function of the headings on the Full Display page. These are anchor tags, but patrons clicked on them as if they were functional links. No patrons used the headings successfully. Therefore, we hid the headings section via CSS (see figure 9). Figure 9. Removal of headings on Full Display page. Links to Other Institutions. We observed participants attempting to use the links to other institutions to place resource sharing requests. Therefore, we removed the hyperlinking functionality of the links in the list, via CSS (see figure 10). Figure 10. Neutralization of links to other institutions. Prioritized the Emphasis of Certain Functionalities Request Options and Show Libraries Buttons. It is usually more important to be able to place a request than find the names of other institutions who own an item. However, the Show Libraries button was originally styled with crimson coloring, which drew unwarranted attention, while the requesting links were not. Therefore, we added styling to the resource-sharing links and removed styling from the Show Libraries button via CSS (see figure 11). Figure 11. Resource sharing link with crimson color, Show Libraries removed of styling. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 22 E-mail Icon. We observed that the E-mail icon of the Actions Menu was difficult to find. Therefore, we decreased the number of icons and moved the emailing functionality to the left side of the Actions Menu (see figure 12). Figure 12. Email icon prioritized over Citation Manager icons. Contrast and Separation Full Display Page Sections. Participants noted that the information on the Full Display page tended to run together. To remedy, we created higher contrast between the foreground and background of the page sections via CSS. We also styled the section titles and dividers with color, among other edits (see figure 13). Figure 13. Separated sections of Full Display page (see figure 3 to compare to the New UI default Full Display page design). PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 23 https://doi.org/10.6017/ital.v37i2.10191 CONCLUSION While providing one of the first studies on Primo New UI, we acknowledge several limitations. Previous studies on Primo had larger study populations compared to this one (which had eight participants). However, we adhered to Nielsen’s findings that usability studies uncover most design deficiencies with five or more participants.31 Additionally, the scope of this study was limited to the usability of the desktop view. We recommend further studies that will concentrate on accessibility compliance and that will test the interface on mobile devices. Regarding the study design, the question arose as to whether the participants’ difficulties reflected poor design functionality or a misunderstanding of library terminology (as noted by Majors and Comeaux).32 The researchers did not carry out pre-tests or an assessment of participants’ level of existing knowledge. This limitation is almost always unavoidable, however, as a task list will always risk not fitting the skills or knowledge of every participant. The lack of some features’ use also might have been because of study design. While not using the facets may reflect that participants are unaware of them, it could also be from the fact that they never had to scroll past the first few items to find the needed resource. Users might have felt a greater need to use the facets had we asked more difficult discovery tasks. The study also contained an investigative bias in that the researchers were part of the working group that developed the customized interface, and then tested those customizations. This bias could have been reduced if the study had used researchers who were not a part of the same group that made these customizations. Despite these limitations, there are still key findings of note. Tasks that participants completed with the greatest ease mapped to those that we assume they do most often, which included basic searching for materials and accessing account information. Tasks beyond these basics proved to be more difficult. This raises the question of whether difficulties were really a function of the interface design or if they reflected ongoing literacy issues. Therefore, it is crucial that designers work with public services and instruction librarians to identify areas where users might be well-served by making certain functionalities more user- friendly and creating educational and training opportunities to increase awareness of these functionalities.33 Bringing diverse perspectives into the study is also crucial so that researchers can discover and be more conscious of commonalities in design and literacy needs, particularly regarding advanced tasks. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 24 APPENDIX A: USABILITY TASKS Note: Search It is the local branding for Primo at Washington State University. 1) Please search for your favorite book or movie. a) Is this item available for you to read or watch? b) How do you know that this item is or isn’t available for you to read or watch? 2) Please sign in to Search It. 3) Please perform a search for “causes of world war II” (do not include quotation marks). a) Limit your search results to Articles. b) For any of the records in your search results list: i) Find the citation for any item and copy it to the clipboard. ii) Email this record to yourself. 4) Please perform a search for “actor’s choice monologues” (do not include quotation marks). a) From the list of results, find a print book that you would need to order from another library. 5) Please perform a search for a print book with ISBN 0582493498. a) This book is checked out. How would you get a copy of it? b) Pretend that this book is NOT checked out. Please show us the information from this record that you would use to find this item on the shelves. 6) Please navigate to your library account (from within Search It). a) Pretend that you have forgotten how many items you have checked out. Please show us how you would find out how many items you currently have checked out. b) Exit your library account area. 7) Please navigate to Advanced Search. a) Perform any search on this page. 8) Please show us where you would go to find help and/or chat with a librarian? 9) Please perform a search using the keywords “gender and media.” a) Add any source to your My Favorites list. Then open My Favorites and click on the title of the source you just added. b) Return to your list of results. Choose any peer-reviewed article that has the full text available. Click on the link that will access the full text. 10) Please find a database that might be of interest to you (e.g., JSTOR). 11) Please sign out of Search It and close your browser. PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 25 https://doi.org/10.6017/ital.v37i2.10191 APPENDIX B: USABILITY RESULTS Note: Search It is the local branding for Primo at Washington State University. Research Question 1: Will the participant be able to find and use the basic search functionality? Associated task(s): 1. Please search for your favorite book or movie. Participant Successful? Commentary 1 Yes Searches for “the truman show” from the beginning. 2 Yes Searches for “pet sematary” from the beginning. 3 Yes Searches for “additive manufacturing” from the beginning. 4 Yes Signs in first, navigates to New Search, searches for “PZT sensor design.” 5 Yes Searches for “the notebook” from the beginning. 6 Yes Searches for “das leben der anderen” from the beginning. 7 Yes Searches for “Legally Blonde” from the beginning. 8 Yes Searches for “Jurassic Park” from the beginning. Research Question 2: Will the participant be able to understand the availability information of the brief results? Associated task(s): 1b. How do you know that this item is or isn’t available for you to read or watch? 4a. From the list of results, find a print book that you would need to order from another library. Participant Successful? Commentary 1 Yes Differentiates between green and orange text; uses the “Check holdings” availability status. Clicks on “Availability and Request Option” heading and then clicks on the resource sharing link. 2 Yes, with difficulty. Says that green “Check holdings” status indicates ability to read the book. Selects book with “Check holdings” status and locates resource sharing link. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 26 Participant Successful? Commentary 3 Yes, with difficulty Unclear. Initially, goes to a record with Online Access; redoes search, eventually locates resource sharing link. 4 Yes Says the record for the item reads “In place” and the availability indicator = 1. The record for the item reads “Check holdings.” 5 Yes Says that status is indicated by statement “Available at Holland/Terrell Libraries.” The record for the item reads “Check holdings.” 6 Yes Says that status is indicated by statement “Available at Holland/Terrell Libraries” and “Item in place.” Clicks on “Check holdings”; says that orange color denotes fact that we don’t have it. 7 Yes Hovers over “Check holdings” status, and then notes that “Availability” statement reads “did not match any physical resources.” The record for the item reads “Check holdings.” 8 Yes Says that status is indicated by statement “Available at Holland/Terrell Libraries.” Says the record for the item reads “Check holdings.” Research Question 3: Will the participant be able to find and use the Sign In and Sign Out features? Associated task(s): 2. Please sign into Search It. 11. Please sign out of Search It and close your browser. Participant Successful? Commentary 1 Yes Navigates to “Guest” link, signs in. 2 Yes Navigates to ellipsis, signs in. Navigates to “User” link, signs out. 3 Yes Navigates to “Guest” link, signs in. Navigates to “User” link, signs out. 4 Yes N/A—already signed in. Navigates to “User” link, signs out. PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 27 https://doi.org/10.6017/ital.v37i2.10191 Participant Successful? Commentary 5 Yes Navigates to “Guest” link, signs in. Navigates to “User” link, signs out. 6 Yes Navigates to “Guest” link, signs in. Navigates to “User” link, signs out. 7 Yes Uses Sign In link from Full Display page. Navigates to “User” link, signs out. 8 Yes Navigates to “Guest” link, signs in. Navigates to “User” link, signs out. Research Question 4: Will the participant be able to understand the behavior of the facets? Associated task(s): 3a. Limit your search results to Articles. 4a. From the list of results, find a print book that you would need to order from another library. 9b. Return to your list of results. Choose any peer-reviewed article that has the full text available. Participant Successful? Commentary 1 Yes Selects Articles facet. N/A—does not use facets (however, participant investigates the Library and Type facets, returns to results lists). 2 Yes Selects Articles facet. N/A—does not use facets. 3 No Uses “Exclude” property to remove everything but Articles. Uses “Exclude” property to remove everything but Print Books. Looks in facet Type for Articles; selects Newspaper Articles instead. 4 Yes, with difficulty Selects Articles facet. Selects Print Books facet. Selects Articles under Type facet, clicks on “Full-text available” status, selects Peer-reviewed Articles facet. 5 No Selects Articles facet. N/A—does not use facets. Screen freezes (technical issue) and participant is forced to redo search. N/A— does not use facets. When further prompted to find only peer- reviewed articles, participant searches pre-filter area and then selects Reviews facet. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 28 Participant Successful? Commentary 6 Yes Selects Articles facet. Clicks on “Check holdings.” Participant hovers over “Online Access” text and then selects Peer-reviewed facet. 7 Yes Looks in drop-down scope, then moves to Articles facet. N/A— does not use facets. N/A—does not use facets. 8 Yes Hovers over Peer-Reviewed Articles facet, and then selects Articles facet. N/A—does not use facets. Selects Peer-reviewed facet. Research Question 5: Will the participant be able to find and use the Actions Menu? Associated task(s): 3.b.i. For any of the records in your search results list, find the citation for any item and copy it to the clipboard. 3.b.ii. For any of the records in your search results list, email this record to yourself. Participant Successful? Commentary 1 Yes Briefly looks at Citation icon, scrolls to bottom of page and looks at Citations area, returns to Citation icon. Scrolls to bottom of page, returns to Actions area, scrolls with arrow to find Email icon, emails to self. 2 No Initially clicks on citation manager icon (Easybib), then clicks on Citation icon and copies to clipboard. Could not find Email icon (technical issue with Search It). Although further discussion reveals that participant expects to see email function within “Send To” heading. 3 No Opens Full Display page of item, scrolls to bottom of page. Clicks on the Citation icon but doesn’t see what looking for. Finds Email icon and emails to self. 4 No Opens Full Display page of item, clicks on the Citation icon, double-clicks to highlight citation. Could not find Email icon. Searches in ellipsis. Attempts the Keep This Item pin. Navigates to My Account. Searches in facets. 5 Yes, with difficulty Finds Citation icon, but then leaves the area via Citations heading and winds up at Web of Science homepage. Hovers over “cited in this” language. Finds the copy functionality. PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 29 https://doi.org/10.6017/ital.v37i2.10191 Participant Successful? Commentary Attempts Sent To heading twice, looks through Actions icons, scrolls to right, finds Email icon. 6 Yes Finds Citation icon, copies to clipboard. Scrolls down page, returns to Actions Menu, scrolls to Email icon, emails record to self. 7 Yes, with difficulty Copies citation from the Brief Result, and then spends some time trying to find “the clipboard.” Navigates to the Email icon. 8 Yes, with difficulty Scrolls to bottom of Full Display page, clicks on Citing This link, clicks on title to record, and then copies first 3 lines of record. Scrolls until finds Email icon, but then moves to Sent To heading, and then back to Email icon, and sends. Research Question 6: Will the participant be able to navigate the Get It and View It areas of the Full Display page? Associated task(s): 5.a. This book is checked out. How would you get a copy of it? 5.b. Please show us the information from this record that you would use to find this item on the shelves. 9.b. Click on the link that will access the full text. Participant Successful? Commentary 1 Yes Clicks on “Check holdings” availability status, clicks on Availability and Request Options heading, clicks on Request Summit Item link. Refers to call number in Alma iframe. Clicks “Full-text available” status, clicks database name. 2 Yes Opens record, locates resource sharing link. Refers to call number; opens stack chart to find call number. Clicks on title, clicks database name. 3 Yes Locates request option. Locates call number in record. Clicks “Full-text available” status, clicks database name. 4 Yes, with difficulty. Clicks on Show Libraries button, then finds request option after searching page. Locates call number in record. Clicks “Full-text available” status but does not click on database name. 5 Yes, with difficulty. Moves to Stack Chart button, then to Show Libraries button, and then to Availability and Request Options heading, clicks on Stack Chart, clicks on Show Libraries, moves into first library listed and INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 30 Participant Successful? Commentary back out, and finally to ILL link. Finds call number on Full Display page. 6 Yes Finds Request Summit option. Identifies call number and Stack Chart as means to find book. Clicks on database name. 7 Yes, with difficulty. Looks at Status statement, scrolls to bottom of page, then Show Libraries button, then Request Summit option. Identifies call number and Stack Chart as means to find book. Attempts to use “Full-text available” link, then clicks on database name. 8 Yes Finds Summit Request option. Identifies call number and Stack Chart as means to find book. Attempts to use “Full-text available” link, then clicks on database name. Research Question 7: Will the participant be able to navigate their My Account area? Associated task(s): 6. Please navigate to your library account (from within Search It). 6a. Pretend that you have forgotten how many items you have checked out. Please show us how you would find out how many items you currently have checked out. 6b. Exit your library account area. Participant Successful? Commentary 1 Yes Navigates to My Account from “User” link. Navigates to Loans tab. Uses back arrow icon. 2 Yes Navigates to My Account from “User” link. Navigates to Loans tab. Uses back arrow icon. 3 Yes Navigates to My Account from Main Menu ellipsis. Navigates to Loans. Uses back arrow icon. 4 Yes Navigates to My Account from Main Menu ellipsis. Navigates to Loans. Uses to back arrow icon. 5 Yes Navigates to My Account from “User” link. Navigates to Loans. Signs out of Search It. 6 Yes Navigates to My Account from “User” link. Navigates to Loans. Uses Search It logo to exit. PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 31 https://doi.org/10.6017/ital.v37i2.10191 Participant Successful? Commentary 7 Yes Navigates to My Account from “User” link. Navigates to Loans. Uses New Search button to exit. 8 Yes Navigates to My Account from “User” link. Navigates to Loans. Uses Search It logo to exit. Research Question 8: Will the participant be able to find and use the Help/Chat and My Favorites icons? Associated task(s): 8. Please show us where would you go to find help and/or chat with a librarian? 9.a. Add any source to your My Favorites list. Then, open My Favorites and click on the title of the source you just added. Participant Successful? Commentary 1 Yes, with difficulty Navigates to Help/Chat icon. Navigates to Keep This Item pin, hesitates, navigates to ellipsis, returns to and clicks on pin. Moves to My Favorites via animation. Clicks on title. 2 Yes, with difficulty Initially navigates to Help/Chat icon, but thinks it is the wrong button because chat is not directly available within Search It. Navigates to Keep This Item pin, hesitates, looks around, selects pin. Moves to My Favorites via animation. Clicks on title. 3 Yes, with difficulty Navigates to Help/Chat icon. Navigates to ellipsis, Actions Menu, and Tags section. Finds Keep This Item pin. 4 No Navigates to Help/Chat icon. Navigates to ellipsis, Keep This Item pin, My Account, and facets Quits search. 5 Yes, with difficulty Navigates to Help/Chat icon. Adds Keep This Item pin after investigating 12 other icons. Moves to My Favorites via animation. Clicks on title. 6 Yes Navigates to Help/Chat icon. Adds Keep This Item pin and moves to My Favorites via animation. Clicks on title. 7 Yes Navigates to Help/Chat icon. Checks Actions menu, adds Keep This Item pin and moves to My Favorites via animation Clicks on title. 8 Yes Navigates to Help/Chat icon. Adds Keep This Item pin and moves to My Favorites via animation. Clicks on title. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 32 Research Question 9: Will the participant be able to find and use the Advanced Search functionality? Associated task(s): 7. Please navigate to Advanced Search. 7a. Perform any search on this page. Participant Successful? Commentary 1 Yes Navigates to Advanced Search. Performs search. 2 Yes Navigates to Advanced Search. Performs search. 3 Yes, with difficulty Navigates to Basic Search drop-down, then to New Search, then to Advanced Search. Has trouble inserting cursor into search box. 4 Yes, with difficulty Navigates to Advanced Search. Builds complex search, then Search It freezes and we have to restart the search tool. 5 Yes Navigates to Advanced Search. Performs search. 6 Yes Navigates to Advanced Search. Performs search. 7 Yes Navigates to Advanced Search. Performs search. 8 Yes Navigates to Advanced Search. Performs search. Research Question #10: Will the participant be able to find and use the Main Menu items? Associated task(s): 10. Please find a database that might be of interest to you (e.g., JSTOR). Participant Successful? Commentary 1 Yes Navigates to “Databases” link of Main Menu. 2 Yes Navigates to “Databases” link of Main Menu. 3 No Types query “stretchable electronics” into search box, but unsure how to find a database in the results lists. 4 No Types query “reinforced concrete” into search box, but unsure how to find a database in the results lists. PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 33 https://doi.org/10.6017/ital.v37i2.10191 Participant Successful? Commentary 5 Yes, with difficulty Is confused by term database. Enters “IEEE” in search box. 6 Yes Navigates to “Databases” link of Main Menu. 7 Yes Searches within drop-down scopes, then facets, then moves to “Databases” link of Main Menu. 8 Yes Navigates to “Databases” link of Main Menu. 1 “Frequently Asked Questions,” Ex Libris Knowledge Center, accessed August 28, 2017, https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/050New_Primo_User_Interface /010Frequently_Asked_Questions. 2 Rice Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces,” Library Trends 61, no. 1 (2012): 186–207, https://doi.org/10.1353/lib.2012.0029. 3 David Comeaux, “Usability Testing of a Web-Scale Discovery System at an Academic Library,” College & Undergraduate Libraries 19, no. 2–4 (2012): 199, https://doi.org/10.1080/10691316.2012.695671. 4 Comeaux, “Usability Testing,” 202. 5 Comeaux, “Usability Testing,” 196–97. 6 Kylie Jarrett, “FindIt@Flinders: User Experiences of the Primo Discovery Search Solution,” Australian Academic & Research Libraries 43, no. 4 (2012): 280, https://doi.org/10.1080/00048623.2012.10722288. 7 Jarrett, “FindIt@Flinders,” 287. 8 Aaron Nichols et al., “Kicking the Tires: A Usability Study of the Primo Discovery Tool,” Journal of Web Librarianship 8, no. 2 (2014): 174, https://doi.org/10.1080/19322909.2014.903133. 9 Nichols, “Kicking the Tires,” 181. 10 Nichols, “Kicking the Tires,” 184. 11 Nichols, “Kicking the Tires,” 184–85. 12 Scott Hanrath and Miloche Kottman, “Use and Usability of a Discovery Tool in an Academic Library,” Journal of Web Librarianship 9, no. 1 (2015): 9, https://doi.org/10.1080/19322909.2014.983259. 13 Greta Kliewer et al., “Using Primo for Undergraduate Research: A Usability Study,” Library Hi Tech 34, no. 4 (2016): 576, https://doi.org/10.1108/lht-05-2016-0052. https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/050New_Primo_User_Interface/010Frequently_Asked_Questions https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/050New_Primo_User_Interface/010Frequently_Asked_Questions https://doi.org/10.1353/lib.2012.0029 https://doi.org/10.1080/10691316.2012.695671 https://doi.org/10.1080/00048623.2012.10722288 https://doi.org/10.1080/19322909.2014.903133 https://doi.org/10.1080/19322909.2014.983259 https://doi.org/10.1108/lht-05-2016-0052 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 34 14 Kelsey Brett, Ashley Lierman, and Cherie Turner, “Lessons Learned: A Primo Usability Study,” Information Technology & Libraries 35, no. 1 (2016): 21, https://doi.org/10.6017/ital.v35i1.8965. 15 Cece Cai, April Crockett, and Michael Ward, “Our Experience with Primo New UI,” Ex Libris Users of North America Conference 2017, accessed November 4, 2017, http://documents.el- una.org/1467/1/CaiCrockettWard_051017_445pm.pdf. 16 Cai, Crockett, and Ward, “Our Experience with Primo New UI.” 17 J. Michael DeMars, “Discovering our Users: A Multi-Campus Usability Study of Primo” (paper presented, International Federation of Library Association and Institutions World Library and Information Conference 2017, Warsaw, Poland, August 14, 2017), 11, http://library.ifla.org/1810/1/S10-2017- demars-en.pdf. 18 Anne M. Pepitone, “A Tale of Two UIs: Usability Studies of Two Primo User Interfaces” (slideshow presentation, Primo Day 2017: Migrating to the New UI, June 12, 2017), https://www.orbiscascade.org/primo-day-2017-schedule/. 19 “Primo Usability Guidelines and Test Script,” Ex Libris Knowledge Center, accessed October 28, 2017, https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/ New_Primo_User_Interface/Primo_Usability_Guidelines_and_Test_Script. 20 DeMars, “Discovering Our Users,” 9. 21 Majors, “Comparative User Experiences,” 190; Comeaux, "Usability Testing," 198–204. 22 Brett, Lierman, and Turner, “Lessons Learned,” 21. 23 Jarrett, “FindIt@Flinders,” 287; Nichols, “Kicking the Tires,” 184; Brett, Lierman, and Turner, “Lessons Learned,” 20–21. 24 Kliewer et al., “Using Primo for Undergraduate Research,” 571–72; Brett, Lierman, and Turner, “Lessons Learned,” 17. 25 Miri Botzer, “Delivering the Experience that Users Expect: Core Principles for Designing Library Discovery Services,” white paper, Nov 25 2015, 10, http://docplayer.net/10248265-Delivering-the- experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer- primo-product-manager-ex-libris.html. 26 Majors, “Comparative User Experiences,” 194; Nichols et al., “Kicking the Tires,” 184–85. 27 Cai, Crockett, and Ward, “Our Experience with Primo New UI,” 28–29. 28 Pepitone, “A Tale of Two UIs,” 29. 29 Botzer, “Delivering the Experience,” 4–5; Christine Stohn, “How do Users Search and Discover? Findings from Ex Libris User Research,” Library Technology Guides, May 5 2015, 7–8, https://librarytechnology.org/document/20650. https://doi.org/10.6017/ital.v35i1.8965 http://documents.el-una.org/1467/1/CaiCrockettWard_051017_445pm.pdf http://documents.el-una.org/1467/1/CaiCrockettWard_051017_445pm.pdf http://library.ifla.org/1810/1/S10-2017-demars-en.pdf http://library.ifla.org/1810/1/S10-2017-demars-en.pdf https://www.orbiscascade.org/primo-day-2017-schedule/ https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/%20New_Primo_User_Interface/Primo_Usability_Guidelines_and_Test_Script https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/%20New_Primo_User_Interface/Primo_Usability_Guidelines_and_Test_Script http://docplayer.net/10248265-Delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html http://docplayer.net/10248265-Delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html http://docplayer.net/10248265-Delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html https://librarytechnology.org/document/20650 PRIMO NEW USER INTERFACE | GALBREATH, JOHNSON, AND HVIZDAK 35 https://doi.org/10.6017/ital.v37i2.10191 30 “Primo August 2017 Highlights,” Ex Libris Knowledge Center, accessed November 2, 2017, https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/Highlights/ 027Primo_August_2017_Highlights. 31 Jakob Nielsen, “How Many Test Users in a Usability Study?,” Nielsen Norman Group, Jun 4, 2012, https://www.nngroup.com/articles/how-many-test-users/. 32 Majors, “Comparative User Experiences,” 190; Comeaux, “Usability Testing,” 200–204. 33 Brett, Lierman, and Turner, “Lessons Learned,” 21. https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/Highlights/%20027Primo_August_2017_Highlights https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/Highlights/%20027Primo_August_2017_Highlights https://www.nngroup.com/articles/author/jakob-nielsen/ https://www.nngroup.com/articles/how-many-test-users/ Abstract Introduction Research Questions Literature Review Method Results Task set(s) related to research question 1: Will the participant be able to find and use the Basic Search functionality? Task set(s) related to research question 2: Will the participant be able to understand the availability information of the brief results? Task set(s) related to research question 3: Will the participant be able to find and use the Sign In and Sign Out features? Task set(s) related to research question 4: Will the participant be able to understand the behavior of the facets? Task set(s) related to research question 5: Will the participant be able to find and use the Actions Menu? Task set(s) related to research question 6: Will the participant be able to navigate the Get It and View It areas of the Full Display page? Task set(s) related to research question 7: Will the participant be able to navigate the My Account area? Task set(s) related to research question 8: Will the participant be able to find and use the Help/Chat and My Favorites icons? Task set(s) related to research question 9: Will the participant be able to find and use the Advanced Search functionality? Task set(s) related to research question 10: Will the participant be able to find and use the Main Menu items? Discussion Decisions made in response to usability results Declined to Change Implemented a Change Labels Link Removal Prioritized the Emphasis of Certain Functionalities Contrast and Separation Conclusion Appendix A: Usability Tasks Appendix B: Usability Results Research Question 1: Associated task(s): Research Question 2: Associated task(s): Research Question 3: Associated task(s): Research Question 4: Associated task(s): Research Question 5: Associated task(s): Research Question 6: Associated task(s): Research Question 7: Associated task(s): Research Question 8: Associated task(s): Research Question 9: Associated task(s): Research Question #10: Associated task(s): 10208 ---- Managing In-Library Use Data: Putting a Web Geographic Information Systems Platform through Its Paces Bruce Godfrey and Rick Stoddart INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 34 Bruce Godfrey (bgodfrey@uidaho.edu) is GIS Librarian and Rick Stoddart (rstoddart@uidaho.edu) is Education Librarian at the University of Idaho Library. ABSTRACT Web Geographic Information System (GIS) platforms have matured to a point where they offer attractive capabilities for collecting, analyzing, sharing, and visualizing in-library use data for space-assessment initiatives. As these platforms continue to evolve, it is reasonable to conclude that enhancements to these platforms will not only offer librarians more opportunities to collect in-library use data to inform the use of physical space in their buildings, but also that they will potentially provide opportunities to more easily share database schemas for defining learning spaces and observations associated with those spaces. This article proposes using web GIS, as opposed to traditional desktop GIS, as an approach for collecting, managing, documenting, analyzing, visualizing, and sharing in-library use data and goes on to highlight the process for utilizing the Esri ArcGIS Online platform for a pilot project by an academic library for this purpose. INTRODUCTION A geographic information system (GIS) is a computer program for working with geographic data. A GIS is an ideal tool for capturing data about library learning spaces because they can be described by a geographic area. The learning spaces might be small or large, irregularly shaped or symmetrical—either way, the shape can be described by a set of geographic coordinates. Tools for storing, managing, documenting, analyzing, and visualizing geographic data can all be found in a GIS. The locations and shapes of geographic features (such as library learning spaces) as well as attributes of those features (such as the type of learning space) can be captured in a GIS. The roots of GISs stretch back to the 1960s. Goodchild characterizes GISs’ advances in spatial analysis during the 1970s and the growth of GIS in the 1980s, coinciding with the proliferation and affordability of desktop computers.1 The enhancement of GIS software from desktop computer applications to online platforms has been underway for some time. The origins of web GIS can be traced back to 1990s, but it is only since the mid- 2000s that products have really matured to a point where they can be viable alternatives to their desktop counterparts. Web GIS first appears in 1993 when Xerox Corporation’s Palo Alto Research Center created an online map viewer.2 Their map viewer, running in a web browser, was the first demonstration of performing GIS tasks without GIS software installed on a local computer. Even though this early web-based GIS application had limited capabilities, the potential of performing GIS operations from computers anywhere and anytime was recognized. The possible capabilities of web GIS began to be more fully discussed in the mid-1990s.3 Web GIS software became available in earnest in 1996 as GIS companies began releasing commercial offerings.4 The first two decades of this century have seen web GIS explode in functionality and scope to become an integral part of most GISs. mailto:bgodfrey@uidaho.edu mailto:rstoddart@uidaho.edu MANAGING IN-LIBRARY USE DATA | GODFREY AND STODDART 35 https://doi.org/10.6017/ital.v37i2.10208 In late 2012, a collaborative mapping platform hosted by Esri (Environmental Systems Research Institute) named ArcGIS Online (https://www.arcgis.com/) was released. Esri is a GIS software company that was founded in 1969, and its products are used by more than 7,000 colleges and universities across the globe.5 The collaborative platform enables users to create, manage, analyze, store, and share maps, applications, and data on the internet. GIS software continues to evolve from desktop computer programs to specialized software applications (i.e., apps) that are part of a web-focused platform. This transformation is profoundly growing the accessibility of the technology to a broader array of users. What was once a technology reserved for geographic information professionals because of its complexity and cost has now been streamlined and put in the hands of nonprofessionals who want to take advantage of its many possibilities. It is no longer reserved for academic disciplines such as geographic information science and remote sensing science; instead, GIS has seen its use grow in humanities and social science to the point where libraries are developing targeted services for these disciplines. 6 Professionals are afforded the ability to share their data more easily, and nonprofessionals are able to utilize those data to create information and knowledge more easily. This transformation bodes well for libraries because it lowers technological hurdles that might have precluded the technology’s use for space-assessment and other place-based initiatives in the past. Now that software-as-a-service (SaaS) mapping platforms such as Mango, GIS Cloud, and ArcGIS Online enable users to access capabilities over the internet, there is no server software for users to install or licensing to configure. Additionally, the training required by personnel to gather, utilize, and manage data has been greatly reduced compared to its desktop predecessor. Academic libraries, and libraries in general, stand to gain from the evolution. THE USE OF DESKTOP GIS FOR SPACE ASSESSMENT The value of space planning efforts in libraries and the observational methods employed to conduct such activities have been well articulated in library research. The use of desktop GIS as a tool for collecting in - library use data in academic libraries has been present for more than a decade. Bishop and Mandel show that libraries’ use of GIS falls into two broad categories, analyzing service area populations and facilities management, the latter of which encompasses “in-library use and occupancy of library study space.”7 Work related to the use of GIS to study library-patron spaces is discussed below. In the past twenty years, academic libraries have seen many transformations in their roles on college and university campuses. GIS technologies have helped document and respond to those transformations. Xia outlined the value of using GIS as a tool for space management in academic libraries more than a decade ago because of its “capacity for analyzing spatial data and interactive information.” 8 In one study, Xia describes using Esri ArcView 3.x desktop software for library space management. ArcView was Esri’s first GIS software to have a graphical user interface; predecessors had command-line interfaces. Xia mentions the use of, at that time, the emerging ArcGIS product, which went on to replace ArcView 3.x. GIS proved to be a valuable tool for Xia to track the spatial distribution of books in the library environment.9 Xia went on to measure and visualize the occupancy of study space using ArcView.10 Lastly, Xia used ArcView as an item-locating system within the physical space of the library.11 More recently, Mandel utilized MapWindow, an open-source desktop GIS originally developed at Idaho State University, for creating maps of fictional in-library use data.12 Mandel’s process demonstrated how a GIS could be utilized to visualize the use of library spaces for marketing materials and services as well as graphically depicting a library’s value. Coyle argued for the use of GIS as a tool to analyze the interior space of the library, and specifically the library collection itself, while not implementing a system with any https://www.arcgis.com/ INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 36 specific GIS package.13 Given and Archibald detailed their use of visual traffic sweeps as an approach to collect and visualize in-library use data.14 Their workflow involved utilizing a Microsoft Excel spreadsheet to capture data and then importing the data into ArcGIS to query and visualize the data. Therefore, GIS wasn’t used for data capture; it was used toward the end of the process to visualize these data. While the body of work details the use of desktop GIS for working with in-library use data, collaborative web GIS platforms now offer opportunities to advance existing research in this arena by streamlining data- collection workflows, sharing database schemas, and enabling broader collaboration with peers, thereby potentially creating opportunities for new research. Fusing the capabilities of these new platforms with traditional observational methods of gathering data on how people are using library spaces extends the body of knowledge and offers interesting new opportunities for research such as cross-institutional comparisons. It is critical for twenty-first-century academic libraries to collect such data to continue to evolve with the changing needs of digital-age campus research and culture. UTILIZING A CLOUD-BASED PLATFORM FOR LEARNING SPACE ASSESSMENT Discussed below is the approach employed for this pilot project to use web GIS to collect, manage, share, and visualize information about library learning spaces. This pilot project utilized the Esri ArcGIS Online platform and client applications accessing that platform (see figure 1). Collector for ArcGIS (http://doc.arcgis.com/en/collector), a ready-made app, was used for data collection. ArcGIS Desktop (http://desktop.arcgis.com) was used at the outset to create the initial database schema. A custom HTML/JavaScript web application was developed to better enable library administrators to visualize the data as a map, table, or chart. Prior to the implementation of this pilot project, the Circulation Department conducted floor sweeps for safety purposes (e.g., making sure certain doors were locked), but space assessment data had never been gathered for the library. Research Study Location All observations were taken during fall 2016 and spring 2017 at the University of Idaho Library and the Gary Strong Curriculum Center. This article focuses on the implementation of the platform for use at the Library. The first floor of the University of Idaho Library underwent a remodel during winter 2016. The remodel included new furniture and different configurations of areas better customized for learning and studying. Spaces such as group study, booths, and brainstorming spaces figured prominently in the remodel. Additionally, expanded food and beverage options and having proximity to open seating areas located near natural light provide a welcoming environment. Library hours were also expanded to 24 hours per day, 5 days a week. With these changes arose the desire to digitally collect data to learn about the use of these new locations by patrons. Utilizing these data to inform decision-making about future changes to the physical spaces in the library, as well as connecting library learning spaces to campus learning outcomes, were goals of this research. http://doc.arcgis.com/en/collector http://desktop.arcgis.com/ MANAGING IN-LIBRARY USE DATA | GODFREY AND STODDART 37 https://doi.org/10.6017/ital.v37i2.10208 Figure 1. Infrastructure for the pilot project. Selecting the ArcGIS Online Platform Using locally existing resources to implement this pilot project was a requirement. Funding was not available to purchase server software or hardware. Personnel time could be carved out of existing positions for this effort, but money was not available to hire additional personnel. The University of Idaho Library does not have a dedicated IT unit, so choices were limited. Purchasing business-intelligence software such as Tableau was cost prohibitive. An open-source tool such as Suma, developed by North Carolina State University Libraries, was not a practical option in this case because the system requirements did not align with the expertise of existing personnel.15 Fortunately, the ArcGIS Online platform was available for this research at no cost to the library, and existing personnel had experience using the platform. The University of Idaho participates in, and contributes financially to, a State of Idaho higher education site license for Esri software. The software is then available to personnel across the institution for research, teaching, and, to a lesser extent at this time, administrative purposes. Since ArcGIS Online is a cloud platform, there is no server software to install and update and no server hardware to configure. Additionally, the University of Idaho GIS librarian was familiar with the capabilities of the platform and available to actively participate in this research. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 38 In short, researchers’ access to and existing expertise with the ArcGIS Online platform, coupled with the extensive capabilities of the platform itself, made it the best choice for this research. Pilot Project Design A public services librarian and the GIS librarian assumed leadership roles for the pilot project. The public services librarian led tasks associated with defining the learning (i.e., the data-collection) spaces, defining the data fields and domains for those spaces, and overseeing personnel responsible for collecting these data. The GIS librarian led tasks associated with creating the database schema, creating the geographic features representing the learning spaces, creating a web application to visualize the data, and managing content on the ArcGIS Online platform. Library personnel were responsible for collecting the data. Gathering ancillary data Having building floor plans in a digital format was helpful for data collectors to orient themselves in the space when looking at a map on a mobile device. Our research team was able to acquire georeferenced building floor plans for our institution from the Information Technology Services unit on campus. Each of the four floors of the library were published to ArcGIS Online as Hosted Tile Layers to serve as a frame of reference for data collectors. Managing content and users ArcGIS Online provides the ability to create and define groups. Groups are collections of items that can be shared among named users. Individual user accounts for each project participant were created, and a group containing items for this pilot project to be shared among those users was created. This approach allowed all data associated with the project to be private and only shared among personnel participating in the project. Database design The primary knowledge product resulting from this research was a web application containing a two- dimensional map, tables, and charts. A geodatabase, which is an assemblage of geographic datasets, needed to be designed and created to provide data to the web application.16 Designing a geodatabase begins with defining the operational layers required to gather information. 17 For this pilot project, one operational layer depicting individual learning spaces was required (see table 1). Table 1. Description of the learning spaces layer Layer Learning spaces Map use Learning spaces define areas intended for a specific type of learning Data source Digitized using building floor plans as a frame of reference Representation Polygons The learning spaces layer was used to store the geometry of the individual learning spaces. A table to store observations for each learning space was needed, and a relationship between each individual space and the observations for each space was required (see figure 2). The relationship binds observations to their appropriate learning spaces. The relationship was defined to allow one learning space to relate to many observations for that space. MANAGING IN-LIBRARY USE DATA | GODFREY AND STODDART 39 https://doi.org/10.6017/ital.v37i2.10208 Figure 2. Data elements of the geodatabase. Fields, analogous to columns in a spreadsheet, were defined for the learning spaces and observations table to store descriptive information. For example, a friendly name was assigned to each learning space. Additionally, domains were defined to manage valid values for specific fields. Domains were necessary for quality control and quality assurance to enforce data integrity, enabling data collectors to pick items from lists rather than having to type the item names. This feature eliminates potential data-collection errors. Field names, data types, field descriptions, and domains for this pilot project can be found in the appendix. Defining data-collection spaces A template was created to define the information required to create each learning space feature. These features were created by digitizing them on a computer screen for each of the four floors of the library using the building floor plans as a frame of reference. Ten learning spaces were defined for the first floor of the library and one each for floors 2, 3, and 4. A map for each floor was created and published to ArcGIS Online as a Hosted Feature Layer.18 Each map contained two layers: one for the floor plan and one for the learning spaces (figure 3). Library personnel used these maps to collect data. Data collection Data collection was accomplished using Collector for ArcGIS installed on mobile devices. This eliminated the need for any software-development costs for data collection. Collector for ArcGIS is a ready-made ArcGIS Online application that is designed to provide an easy-to-use interface for collecting location-based data. The software was installed on a variety of devices, including a Samsung Galaxy tablet, a Surface tablet, and an Apple iPad. The online collection mode was enabled during collection, resulting in data being transferred real-time to ArcGIS Online. The software can collect data in an offline mode, but, because strong internet connections were available in both campus buildings, the online mode was utilized. The collection workflow consisted of library personnel traversing the floors of the library and recording data about the number of users in each space, what the users were doing in the space, and entering additional context comments if necessary. Library staff were encouraged to use their own expertise and observational cues (e.g., textbooks present) when recording data associated with patron activities in library spaces. The date, time, and name of the data collector was recorded automatically, an option available through the ArcGIS Online platform. The user interface for the software was friendly and intuitive and required minimal training (figure 4). A list was provided to select the type of use for the selected space. Data were accessible via ArcGIS Online immediately following collection. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 40 Figure 3. First floor learning spaces of the University of Idaho library overlaid on the building floor plan. Figure 4. The Collector for ArcGIS user interface utilized for data collection. MANAGING IN-LIBRARY USE DATA | GODFREY AND STODDART 41 https://doi.org/10.6017/ital.v37i2.10208 RESULTS OF USING WEB GIS Web GIS, specifically ArcGIS Online, offered the functionality required for collecting and managing in - library use data. Additionally, the platform offers librarians supplementary opportunities for collaborative space-assessment projects. While the ArcGIS Online platform proved to be useful for this pilot project, some of the advantages and limitations encountered are discussed below. Advantage: Ease of Use Through Targeted Applications Esri software has been used in academia for decades. While the early command-line versions and later desktop versions were the playground of those with GIS training, web GIS applications have a decidedly friendlier interface because of the ability to customize applications on the platform for specific purposes. For example, applications with management functionality can be separated from applications intended for data gathering. The need for excessive functionally to be included in one interface is replaced with a more modular framework, resulting in less complex user interfaces as seen in many desktop GIS programs. While some personnel involved with this project had used Esri software for many years and were familiar with the capabilities of the ArcGIS Online, they had not used the platform for data collection prior to this project. Managing users and content for the project proved to be straightforward. It was made even easier when enterprise logins were configured, which allowed personnel to sign in using their institutional user name and password. Authoring the database schema, creating the necessary maps, and publishing those maps as hosted services was not complicated for those with basic desktop and web GIS knowledge. Those responsible for collecting data needed little training using Collector for ArcGIS to begin data collection. Finally, librarians with no GIS background were able to export the data to a familiar format (comma- separated values) to begin analysis using software such as Excel. In short, authoring the database and map services remains best handled by those with GIS experience. However, targeted application interfaces enable user without GIS experience to collect and work with data. Advantage: Participation in Enterprise Architecture Conducting library research on a platform many faculty, students, and staff are beginning to use for research, learning, and administration places librarians within the same collaborative space as the communities they are serving. In the case of this research, our need for building floor plans presented opportunities to more broadly discuss enterprise GIS at our institution by sharing this information. Interaction took place between the Library, Facilities Services, and Information Technology Services, resulting in a cultivation of relationships around data sharing. Furthermore, integration of our enterprise security with the ArcGIS Online platform adds a level of legitimacy to geospatial data management efforts. Advantage: Potential for Cross-Institutional Collaborative Projects The potential for cross-institutional collaboration on library-space assessment and other projects should not be overlooked when using the ArcGIS Online platform. Such collaborations are even more manageable because Esri software is being used by more than 7,000 colleges and universities across the globe. Even though cross-institutional collaboration was not a goal of this research, the opportunities for projects or programs of this nature became abundantly clear. Items created in ArcGIS Online can be shared between organizations. Simply sharing a library-space-assessment database schema with librarians at other institutions would allow them to quickly implement a similar project on the ArcGIS Online platform. This opens the door to new research opportunities. The functionality exists for one institution to host a database that personnel from multiple institutions could populate. A single dataset containing learning spaces of multiple institutions with multiple contributors could be created, managed, and analyzed collaboratively. This could enable lower-resource libraries to participate in projects with larger institutions as economies INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 42 of scale are realized. And it offers the ability to undertake projects across multiple institutions to explore broader space assessment or other research questions. Limitation: Updating Hosted Feature Service Schemas The ability to author and edit schemas entirely in ArcGIS Online has not yet matured to a point where it matches the abilities of its desktop counterpart. Specifically, updating a published schema is currently difficult to accomplish in ArcGIS Online because a user-friendly interface does not exist. However, the task can be accomplished by editing the JavaScript Object Notation (JSON) of the hosted feature service. While this is a current limitation for managers of the hosted feature service and not data collectors, it is anticipated that this will be addressed in future updates. Limitation: User Interface for Standards-Based Metadata Items created as part of the pilot project were documented using the metadata editor provided in ArcGIS Online. ArcGIS Online’s users can create and maintain geospatial standards-based metadata for content. However, the user interface for creating metadata based on either the ISO 19115-series or Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) could be improved by simplifying its complexity and allowing for batch updating specific elements. Item documentation for the platform focuses on creating and editing elements of ArcGIS-format metadata. It should be noted, and potentially added as a point of concern for librarians, that the ability to author and edit metadata based on the ISO and CSDGM standards was introduced three years after the initial release of ArcGIS Online. Limitation: Visualizing Data in Related Tables The ability to visualize data collected as part of this project using ready-made applications in ArcGIS Online yielded unsatisfactory results. The primary limitation was related to working with repeated measurements for the learning spaces. Ready-made applications like Web AppBuilder and Operations Dashboard have limited support for a user-friendly presentation of repeated learning-space observation. Therefore, a custom web application was developed by a University of Idaho student using the Esri JavaScript application programming interface (API). The application provides the ability to select a date range, a time scope (e.g., daytime, nighttime, all hours), a building, and a floor to visualize the data. The learning spaces are colored by the total number of users in a space on the basis of the parameters selected (see figure 5). Figure 5. Map view of the space assessment dashboard application. MANAGING IN-LIBRARY USE DATA | GODFREY AND STODDART 43 https://doi.org/10.6017/ital.v37i2.10208 For each individual space, a chart and table can be displayed to gain further insight (see figures 5 and 6). Figure 6. Chart view of the space assessment dashboard application. Figure 7. Table view of the space assessment dashboard application. Limitations: Data-Collection Software Issues Using Collector for ArcGIS on devices running Windows 10 proved frustrating because of a documented bug with Collector. A “You are not connected to the internet” error would appear randomly, even when there was a valid internet connection. A workaround was implemented to circumvent the issue, but it was a source of frustration for data-collection staff. Offline data-collection mode was experimented with to see if it was a more favorable option; however, the date and time of the data collection are not captured in offline INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 44 mode, so that potential workflow was abandoned. There were no issues encountered for data collectors who used the Samsung Galaxy (running the Android operating system) or an Apple iPad. CONCLUSIONS Web-based GIS platforms such as ArcGIS Online have evolved to the point where they offer the functionality required for collecting and managing in-library use data. The ArcGIS Online platform performed commendably for this pilot project. While ArcGIS Desktop was used to author the original database schema in this project, it is reasonable to conclude that it is only a matter of time until the functionality required to complete the entire workflow in the web-based platform is available. Using mobile and desktop devices outfitted with the Collector for ArcGIS application proved to be a practical way for collecting real-time in-library use data. Managing project users and the items those users were able to access was straightforward. While the visualization tools for repeated measurements data are currently limited in ArcGIS Online, the data are accessible as a web service, and the sky is the limit on custom web- application development. Looking ahead, adjusting schemas to capture height above and below ground level to take advantage of 3D data models and visualization is intriguing. Use of this model may be beneficial for space-assessment projects that seek to gather data more broadly across institutions. Finally, a noteworthy realization from this research is the potential for inter-institutional and cross- institution collaboration of library space–assessment projects, or other projects for that matter. Librarians can begin embracing the web GIS movement alongside those in the communities they participate in and serve. Opportunities to create efficiencies are possible through the simple sharing of database schemas. Additionally, the ability for one institution to host a database enabling personnel at multiple institutions, or at multiple libraries at larger institutions, to contribute data is available and ready for further research. MANAGING IN-LIBRARY USE DATA | GODFREY AND STODDART 45 https://doi.org/10.6017/ital.v37i2.10208 APPENDIX: SCHEMAS FOR EACH OBJECT IN THE GEODATABASE USED FOR DATA COLLECTION Building name table and associated domain values DomainName BuildingName Description Name of the building FieldType SmallInteger Domain Type CodedValue Code Name 0 Library 1 Education Space identifier table and associated domain values DomainName SpaceID Description Identifier for the area FieldType String Domain Type CodedValue Code Name 1A Group Study 1B Café 1C Landing 1D Computer Lab 1E Individual/Small Group Study 1F MILL (134) 1G Group Study (133) 1H Group Study (132) 1I Group Study (131) 1J Classroom (120) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 46 2A 2nd floor 3A 3rd floor 4A 4th floor 3A_1 IMTC Area 1 3B_1 IMTC Area 2 3C_1 IMTC Area 3 3D_1 IMTC Area 4 Type of use table and associated domain values DomainName TypeOfUsage Description Type of usage of the area. FieldType SmallInteger Domain Type CodedValue Code Name 0 Browsing Stacks 1 Individual studying 2 Lounging 3 Meeting / Group Study 4 Service Point (Circulation / Reference / ITS Help) 5 Using Library Computers Space assessment areas feature class table Field DataType Description Domin GlobalID GUID Global Identifier SpaceID String Space Identifier SpaceID Floor String Building Floor BldgName SmallInteger Building Name BuildingName MANAGING IN-LIBRARY USE DATA | GODFREY AND STODDART 47 https://doi.org/10.6017/ital.v37i2.10208 Space assessment areas observations table Field DataType Description Domin TYPE_OF_USAGE SmallInteger Type of Usage TypeOfUsage NUMBER_OF_USERS SmallInteger Number of Users GlobalID GUID Global Identifier SpaceID String Space Identifier SpaceID COMMENTS String General Comments Space assessment areas feature class to observations relationship class Cardinality OneToMany IsAttributed FALSE IsComposite FALSE ForwardPathLabel space_assessment_data BackwardPathLabel space_assessment_areas Description Relationship between the space assessment areas and data collected Origin Class Name Origin Primary Key Origin Foreign Key space_assessment_areas SpaceID SpaceID INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 48 REFERENCES 1 Michael F. Goodchild, “Part 1. Spatial Analysts and GIS Practitioners,” Journal of Geographical Systems 2, no. 1 (2000): 5–10, https://doi.org/10.1007/s101090050022. 2 Pinde Fu and Jiulin Sun, Web GIS: Principles and Applications (Redlands, CA: Esri, 2011), 7. 3 Suzana Dragićević, “The Potential of Web-based GIS,” Journal of Geographical Systems 6, no. 2 (2004): 79– 81, https://doi.org/10.1007/s10109-004-0133-4. 4 Fu and Sun, Web GIS, 9. 5 “Who We Are,” Esri, accessed October 17, 2017, http://www.esri.com/about-esri#who-we-are. 6 Ningning Kong, Michael Fosmire, and Benjamin Dewayne Branch, “Developing Library GIS Services for Humanities and Social Science: An Action Research Approach,” College & Research Libraries 78, no. 4 (2017): 413–27, https://doi.org/10.5860/crl.78.4.413. 7 Bradley Wade Bishop and Lauren H. Mandel, “Utilizing Geographic Information Systems (GIS) in Library Research,” Library Hi Tech 28, no. 4 (2010): 543, https://doi.org/10.1108/07378831011096213. 8 Jingfeng Xia, “Library Space Management: A GIS Proposal,” Library Hi Tech 22, no. 4 (2004): 375, https://doi.org/10.1108/07378830410570476. 9 Jingfeng Xia. “GIS in the Management of Library Pick-up Books,” Library Hi Tech 22, no. 2 (2004): 209–16, https://doi.org/10.1108/07378830410543520. 10 Jingfeng Xia, “Visualizing Occupancy of Library Study Space with GIS Maps,” New Library World 106, no. 5/6 (2005): 219–33, https://doi.org/10.1108/03074800510595832. 11 Jingfeng Xia, “Locating Library Items by GIS Technology,” Collection Management 30, no. 1 (2005): 63–72, https://doi.org/10.1300/J105v30n01_07. 12 Lauren H. Mandel, “Geographic Information Systems: Tools for Displaying In-Library Use Data,” Information Technology & Libraries 29, no. 1 (2010): 47–52, https://doi.org/10.6017/ital.v29i1.3158. 13 Andrew Coyle, “Interior Library GIS,” Library Hi Tech 29, no. 3 (2011): 529–49, https://doi.org/10.1108/07378831111174468. 14 Lisa M. Given and Heather Archibald, “Visual Traffic Sweeps (VTS): A Research Method for Mapping User Activities in the Library Space,” Library & Information Science Research 37, no. 2 (2015): 100–108, https://doi.org/10.1016/j.lisr.2015.02.005. 15 “Suma,” North Carolina State University Libraries, accessed October 17, 2017, https://www.lib.ncsu.edu/projects/suma. 16 “What Is a Geodatabase?,” Esri, accessed October 17, 2017, http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/what-is-a-geodatabase.htm. https://doi.org/10.1007/s101090050022 https://doi.org/10.1007/s10109-004-0133-4 http://www.esri.com/about-esri%23who-we-are https://doi.org/10.5860/crl.78.4.413 https://doi.org/10.1108/07378831011096213 https://doi.org/10.1108/07378830410570476 https://doi.org/10.1108/07378830410543520 https://doi.org/10.1108/03074800510595832 https://doi.org/10.1300/J105v30n01_07 https://doi.org/10.6017/ital.v29i1.3158 https://doi.org/10.1108/07378831111174468 https://doi.org/10.1016/j.lisr.2015.02.005 https://www.lib.ncsu.edu/projects/suma http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/what-is-a-geodatabase.htm MANAGING IN-LIBRARY USE DATA | GODFREY AND STODDART 49 https://doi.org/10.6017/ital.v37i2.10208 17 “Geodatabase Design Steps,” Esri, accessed October 17, 2017, http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/geodatabase-design- steps.htm. 18 “Hosted Layers,” Esri, accessed October 17, 2017, http://doc.arcgis.com/en/arcgis-online/share- maps/hosted-web-layers.htm. http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/geodatabase-design-steps.htm http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/geodatabase-design-steps.htm http://doc.arcgis.com/en/arcgis-online/share-maps/hosted-web-layers.htm http://doc.arcgis.com/en/arcgis-online/share-maps/hosted-web-layers.htm ABSTRACT INTRODUCTION The Use Of Desktop GIS For Space Assessment UTILIZING A CLOUD-BASED PLATFORM FOR LEARNING SPACE ASSESSMENT Research Study Location Selecting the ArcGIS Online Platform Pilot Project Design Gathering ancillary data Managing content and users Database design Defining data-collection spaces Data collection RESULTS OF USING WEB GIS Advantage: Ease of Use Through Targeted Applications Advantage: Participation in Enterprise Architecture Advantage: Potential for Cross-Institutional Collaborative Projects Limitation: Updating Hosted Feature Service Schemas Limitation: User Interface for Standards-Based Metadata Limitation: Visualizing Data in Related Tables Limitations: Data-Collection Software Issues CONCLUSIONS APPENDIX: schemas for each object in the geodatabase used for data collection REFERENCES 10230 ---- Accessible, Dynamic Web Content Using Instagram Jaci Wilkinson INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 19 Jaci Wilkinson (jaci.wilkinson@umontana.edu) is Web Services Librarian at the University of Montana. ABSTRACT This is a case study in dynamic content creation using Instagram’s Application Program Interface (API). An embedded feed of the Mansfield Library Archives and Special Collections’ (ASC) most recent Instagram posts was created for their website’s homepage. The process to harness Instagram’s API highlighted competing interests: web services’ desire to most efficiently manage content, ASC staff’s investment in the latest social media trends, and everyone’s institutional commitment to accessibility. INTRODUCTION The Mansfield Library Archives and Special Collections (ASC) at the University of Montana had a simple enough request. Their homepage had been static for years and it was not possible to add more content creation to anyone’s workload. However, they had a robust Instagram account with more than one thousand followers. Was there any way to synchronize workflows with an Instagram embed on the homepage? The solution was more complicated than we thought. We developed an Instagram embed, but in the process grappled with some fundamental questions of technology in the library. How do we streamline the creation and sharing of ephemeral, dynamic content? How do we reconcile web accessibility standards with the innovative new platforms we want to incorporate on our websites? Libraries have invested heavily in social media to improve their approachability, reduce library anxiety, and interact with their users. At the Mansfield Library, this investment has paid off for ASC. This unit was an early adaptor of Instagram, a photo and short video–sharing application with the public or approved followers. The ASC Instagram account launched in January 2015, and staff quickly settled on the persona of “Banjo Cat” to share collection items and relevant history. Banjo Cat was inspired by a whimsical nineteenth-century photograph in ASC of a cat playing a banjo (see figure 1). ASC now has about 1,200 followers including many other libraries, archives, and special collections. In fact, connecting to a wider community of similar institutions was a driving factor in creating an Instagram account. The ASC staff member who updates the account said, While we have lots of interactions with patrons on Facebook we have basically zero interactions with other institutions. Instagram is all about interacting with other institutions, sharing ideas for posts, commenting on posts. So by learning about this community and participating and interacting with it we are able to . . . learn about programs and ideas that we would probably not have access to otherwise. 1 mailto:jaci.wilkinson@umontana.edu ACCESSIBLE, DYNAMIC WEB CONTENT USING INSTAGRAM | WILKINSON 20 https://doi.org/10.6017/ital.v37i1.10230 Figure 1. Banjo Cat by L. A. De Ribas. Mansfield Library Archives and Special Collections. 1880s. But while ASC’s social media thrived, its website was bereft of dynamic content. Given that the ASC homepage is the ninth most visited page on the library site, it felt like a wasted opportunity to let such a highly trafficked area lack engaging, current, and appealing content. It seemed only natural to harness the energy put into the ASC Instagram account and embed that same light-hearted, community-oriented, and collection-focused content on the ASC homepage. LITERATURE REVIEW Libraries are enthusiastic adopters of social media; one study even shows that as of 2013, 94 percent of academic libraries had a social media presence.2 A 2006 Library Journal article observed the following about MySpace, then a popular social media platform: “Given the popularity and reach of this powerful social network, libraries have a chance to be leaders on their college campuses and in the larger community by realizing the possibilities of using social networking sites like MySpace to bring their services to the public.” 3 This open-minded spirit and willingness to try new technology trends was shrewd. Pew Research reports that as of 2016, 69 percent of Americans use some type of social media. 4 Social media use has grown more representative of the population: the percentage of older adults on at least one social media site continues to increase.5 For academic libraries, the pull of Facebook was immediately strong because of the initial requirement for users to have a .edu address. Academic libraries very early on attempted to connect with students about services, resources, and spaces using Facebook.6 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 21 Dynamic content is a gateway to building interest toward and buy-in to an institution. In user experience literature, “user delight” is “a positive emotional affect that a user may have when interacting with a device or interface.”7 In Walter’s Hierarchy of User Needs, pleasure tops all other needs.8 Figure 2. Aaron Walter’s Hierarchy of User Needs, from Therese Fessenden, “A Theory of User Delight: Why Usability is the Foundation for Delightful Experiences,” Nielsen Norman Group, March 25, 2017, https://www.nngroup.com/articles/theory-user-delight/. Using social media to engage users with special collections has its own niche. Special collections are typically housed in closed stacks and have no digital equivalent. Often the materials housed in special collections are rare, fragile, exotic, beautiful, and unusual; a study of library blogs and social media found that those with higher aesthetic value received more visitors and more revisits.9 Social media “gives users an idea of what the collection offers while it promotes and potentially gains foot traffic.”10 It has even been suggested that social media gives special collections the opportunity to stand in when digitization isn’t possible: “Instead of digitizing a whole collection, librarians can highlight important parts of the collection with a snippet of its history.”11 In creating UCLA’s Powell Library Instagram account, librarian Danielle Salomon https://www.nngroup.com/articles/theory-user-delight/ ACCESSIBLE, DYNAMIC WEB CONTENT USING INSTAGRAM | WILKINSON 22 https://doi.org/10.6017/ital.v37i1.10230 writes, “Special collections items and digital library images can be a treasure trove of social media content. One of our library’s goals is to increase students’ exposure to special collections items, so we draw heavily from these collections.”12 Instagram is a relative newcomer to social media, but it has been consistently successful since its inception in 2010.13 As of 2016, 28 percent of Americans use Instagram, up from 11 percent in 2013.14 Facebook bought Instagram in 2012 and has since bolstered the application’s success by making the two platforms easy to navigate and share between. After Vine, a short video application, was shuttered in 2017, Instagram’s ability to take and post short videos has increased its value. Instagram is distinct in that it is mobile-dependent: it is difficult to run the application through a web browser, and only one device can operate an Instagram account. Within the library community, Instagram’s adoption has been strongest in academic libraries. This is tied to the high number of Instagram users who are college-age.15 Another reason libraries select Instagram is because it has more diverse users than other social media applications, specifically African Americans and Latinos.16 In a 2016 study, Instagram was the second-most pick among college students at Western Oregon University when asked what social media application the library should use (Twitter came in first). The most popular use of Instagram in academic libraries is familiarizing students with services, resources, and spaces. Uses include first-year instruction activities to combat library anxiety and mini-contests that ask users to identify what posted photos are of.17 UCLA’s Powell Library discovered students posting Instagram photos of their spaces, so they initially joined to repost those photos and interact with those users. Instagram makes a library seem approachable. Librarian Joanna Hare reflected on this discovery: “Instagram is really powerful in that respect because you can just snap a few photos [and] show what’s going on . . . so that students don’t view the Library as being intimidating.” 18 Approachability is augmented by delegating photography and posting tasks to library student employees. Social media is less often seen as a way to help create dynamic content for a library’s website. The exceptions to this trend have come from institutions with substantial technology resources. North Carolina State University created an open source software that adds photos posted by anyone on Instagram to a library photo collection when a certain hashtag is used.19 The University of Nebraska’s Calvin T. Ryan Library created an RSS feed that disseminates blog posts to Twitter, Facebook, and the library homepage. Posts from followed accounts in Twitter and Facebook are also a part of the resulting feed. The RSS feed requires use of a third-party tool called Dlvr.it (https://dlvrit.com/), which supports many other social media applications, but not Instagram. A notable absence in literature on social media use in libraries is any mention of accessibility concerns. The “Improving the Accessibility of Social Media for Public Service” toolkit developed by a group of US government offices is a useful resource that includes specific guidelines on making Instagram posts more accessible.20 The toolkit explains that “more and more organizations are using social media to conduct outreach, recruit job candidates and encourage workplace productivity. . . . But not all social media content is accessible to people with certain disabilities, which limits the reach and effectiveness of these platforms. And with 20% of the population estimated to have a disability, government agencies have an obligation to ensure that their messages, services and products are as inclusive as possible.”21 Given the stated importance of social media in library literature, the lack of conversation about accessibility and social media is a barrier to inclusivity. https://dlvrit.com/ INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 23 MANSFIELD LIBRARY ARCHIVES AND SPECIAL COLLECTIONS’ INSTAGRAM FEED Dynamic content was lacking from any part of the ASC website, but staff had a dearth of time and knowledge of the content management system to create web content. There was a drive to solve this problem because a new web services librarian had recently been hired. When the web services librarian learned of ASC’s thriving Instagram presence, she pursued the possibility of including that content on the ASC website. She felt that, in addition to being more efficient, content creation should stay in-house given the highly specialized nature of ASC’s collections, spaces, and resources. The ideal solution would allow ASC staff to create and manage an Instagram feed unassisted; the web services librarian sought the simplest possible solution for them. Our content management system and Instagram’s developer website were first consulted with the hope that one provided an automated embed or plugin. Our content management system, Cascade, could pull in content from Facebook and Twitter but not Instagram, and Instagram did not have an automated feed creator. After more research, we learned that third-party Instagram feed embeds are the only possible way to create an Instagram feed without using Instagram’s API. The API was considered a last-resort option because we knew that ASC staff could not manage the code themselves. The idea of using any third-party service was undesirable because of a lack of control, stability, and accessibility. If the service has technical issues or goes out of business, it would be very noticeable given the visibility of ASC’s homepage. In 2012, a student advocacy organization at the University of Montana filed a Civil Rights Complaint with the US Department of Education focusing on disabled students’ unequal access to electronic and information technologies. Since then, the Mansfield Library has been proactive to eliminate barriers to access.22 Given this history, we are wary of the accessibility of third-party applications to someone using assistive technology, most likely, a screen reader. Juicer (https://www.juicer.io/), for example, is a freely available service for an Instagram feed but in exchange it retains its branding prominently at the top of the feed. An example of Juicer in use can be found on the home page of the Baltimore Aquarium (http://aqua.org/). Tests of Juicer showed that it was not accessible for a screen reader. Finally, it didn’t fit our need: Juicer curated posts from other users depending on the hashtags and reposts, but we only wanted to feature our own content. The unpredictability of other accounts’ posts ending up on the ASC homepage was not desirable. Instagram’s developer site did not make finding a solution easy. The page titled “Embedding” is about embedding individual posts on a webpage, not a whole feed.23 This content does not even link out to an explanation of how to embed a feed. The “Authentication” page is where the process begins because calling the API requires a token an authenticated Instagram account user.24 A user is authenticated by creating a client ID and then receiving an access token. Another interesting roadblock provided by the Instagram developer site is that the “Authentication” page provides no further information about using the access token to call the API. It took outside research to finally figure out the steps needed to make the API requests for ASC’s feed.25 PHP code is used to call the API and copy the three most recent ASC Instagram posts to a local server file. (Using JavaScript to call the API is a poor choice because that code will make the account’s access token public. If anyone sees this token they can use it themselves to pull your feed using the Instagram API.) CSS replicates the look and feel of Instagram with white, minimalistic icons and a simple photo display https://www.juicer.io/ http://aqua.org/ ACCESSIBLE, DYNAMIC WEB CONTENT USING INSTAGRAM | WILKINSON 24 https://doi.org/10.6017/ital.v37i1.10230 that darkens and shows the beginning of the description when a user’s mouse hovers over it. All code from this project is freely available in GitHub.26 There is a catch to this embedded feed process. The directions given through Instagram and by the online sources we used only took us to sandbox mode (in web development, sandbox refers to a restricted or test version of a final product). In sandbox, Instagram limits the number of requests to the API. Unfortunately, a request was made every time someone went to the ASC page. The initial feed stopped working in minutes because we did not realize this limitation of sandbox mode meant. Another look at the Instagram developer site taught us that the only way to leave sandbox was to have our “app,” as Instagram called it, reviewed.27 In other words, Instagram has only set up their API to be used for full application development (like Juicer). We decided not to leave sandbox mode because of uncertainty about what Instagram’s review process would entail. If our app was rejected, would they force us to discontinue our work? The timeline for the approval process was also uncertain. Distrust and uncertainty, unfortunately, guided our decision-making at this stage. Instead of undergoing the review process, the PHP code was reconfigured to call the API only once a day. This made the feed less dynamic because it was not updating in real time. F or our purposes this was not a problem; the ASC Instagram account is updated at most once or twice a week anyway. As a result, we are “scraping” ASC’s Instagram account. Although “crawling, scraping, and caching” are prohibited by Instagram’s terms of use, other Instagram feeds in GitHub have similar workarounds and point out that a plugin/scraper “uses (the) same endpoint that Instagram is using in their own site, so it’s arguable if the TOC [terms of use] can prohibit the use of openly available information.”28 While figuring out how to work with the Instagram API, a major accessibility roadblock cropped up: there was no place for the alt text—descriptive information about the image that is used by assistive technologies for users with low vision. Besides taking or uploading a photo, the only other actions offered to create a new post were to write a caption, tag people, or add a location. Only the caption allowed for a text string. Without alt text, not only is the Instagram feed unintelligible to a screen reader but it disturbs a screen reader user’s interaction with all other content on that page. An ASC staff member discovered a solution when she noticed a Joshua Tree National Park Instagram post with alt text at the bottom of the caption. Although initially put off by the “wordiness,” we concluded this was the only logical way to move forward. The benefits to this format of alt text took focus as we moved through the project: the ASC staff member was able to choose the desired alt text without any additional steps or skills, and we grew to relish the opportunity to explain to curious users what the #alttext hashtag meant and why it was important to us. PHP code isolates all text after #alttext and displays that as the alt text to a screen reader. Since the Instagram feed was implemented, it has been interesting to follow how the Instagram developer site has changed and grown. Although Facebook has owned Instagram for five years, the Instagram developer site is only now starting to link out to Facebook developer content. Most recently, the Instagram developer site has been advertising the Instagram Graph API for use by business accounts. This type of development is useless for us because we have a personal Instagram account, not a business account. And the function of the Instagram Graph API is focused on the internal user and analytics, not the end user and user experience. Even if the Instagram Graph API was available for personal accounts, it is worth asking if this type of data collection would be of use to an organization that doesn’t have the labor of a devoted marketing team. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 25 Dynamic content through social media and web content provides opportunities to create user delight because it focuses on visually appealing, fun, timely, and interesting information. For archives, special collections, and other cultural heritage institutions, this content is particularly useful because it provides a look into collections that are interesting and rare but also fragile and housed in closed stacks. These positives are tempered by the reality many of these institutions face: budgets are tight, staffs are small, and technical expertise might be lacking. This paper demonstrates how important and useful social media is to create dynamic website content. Unfortunately, there is a gap in library literature on accessibility and social media; although social media content is ephemeral or lacks specific utility, libraries need to pay more attention to the various ways users access resources and information through social media, especially if that same content appears on the institution’s website. The ASC’s embedded homepage Instagram feed fits their needs, is accessible, and builds community around their unique collections. By providing all the code created in this project in GitHub,29 including the CSS we used, our hope is that institutions interested in this Instagram feed model could replicate it for their own purposes without extensive technical support. ACKNOWLEDGMENTS I am thankful for the expertise of Carlie Magill, Donna McCrea, and Wes Samson. Without them this project would not have been possible. REFERENCES 1 Carlie Magill, e-mail message to author, August 8, 2017. 2 Michael Sutherland, “RSS Feed 2.0” Code4Lib 31, January 28, 2016, http://journal.code4lib.org/articles/11299. 3 Beth Evans, “Your Space or MySpace?” Library Journal 131 (2006): 8–12. Library, Information Science & Technology Abstracts, EBSCOhost. 4 “Social Media Fact Sheet,” Pew Research Center, January 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 5 Ibid. 6 Brian S. Mathews, “Do you Facebook?” C&RL News, May 2006, http://crln.acrl.org/index.php/crlnews/article/viewFile/7622/7622. 7 Therese Fessenden, “A Theory of User Delight: Why Usability is the Foundation for Delightful Experiences,” Nielsen Norman Group, March 25, 2017, https://www.nngroup.com/articles/theory-user-delight/. 8 Ibid. 9 Daryl Green, “Utilizing Social Media to Promote Special Collections: What Works and What Doesn’t” (paper, 78th IFLA General Conference and Assembly, Helsinki, Finland, June 2012), 11, https://www.ifla.org/past-wlic/2012/87-green-en.pdf. 10 Katrina Rink, “Displaying Special Collections Online,” Serials Librarian 73, no. 2 (2017): 1–9, https://doi.org/10.1080/0361526X.2017.1291462. 11 Ibid. http://journal.code4lib.org/articles/11299 http://www.pewinternet.org/fact-sheet/social-media/ http://crln.acrl.org/index.php/crlnews/article/viewFile/7622/7622 https://www.nngroup.com/articles/theory-user-delight/ https://www.ifla.org/past-wlic/2012/87-green-en.pdf https://doi.org/10.1080/0361526X.2017.1291462 ACCESSIBLE, DYNAMIC WEB CONTENT USING INSTAGRAM | WILKINSON 26 https://doi.org/10.6017/ital.v37i1.10230 12 Danielle Salomon, “Moving on from Facebook,” College & Research Libraries News 74, no. 8 (2013): 408–12, https://crln.acrl.org/index.php/crlnews/article/view/8991. 13 Sarah Perez, “The Rise of Instagram,” TechCrunch, April 24, 2012, https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread- worldwide/. 14 “Social Media Fact Sheet,” Pew Research Center, January 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 15 Lauren Wallis, “#selfiesinthestacks: Sharing the Library with Instagram,” Internet Reference Services Quarterly 19, no. 3–4 (2014): 181–206, https://doi.org/10.1080/10875301.2014.983287. 16 Elizabeth Brookbank, “So Much Social Media, So Little Time: Using Student Feedback to Guide Academic Library Social Media Strategy ,” Journal of Electronic Resources Librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126X.2015.1092344; Salomon, “Moving on from Facebook.” 17 Wallis,“#selfiesinthestacks”; Salomon, “Moving on from Facebook.” 18 Wendy Abbott et al., “An Instagram is Worth a Thousand Words: An Industry Panel and Audience Q&A,” Library Hi Tech News 30, no. 7 (2013): 1–6, https://doi.org/10.1108/LHTN- 08-2013-0047. 19 Salomon “Moving on from Facebook.” 20 “Federal Social Media Accessibility Toolkit Hackpad,” Digital Gov, accessed November 25, 2017, https://www.digitalgov.gov/resources/federal-social-media-accessibility-toolkit-hackpad/ . 21 Ibid. 22 Donna E. McCrea, “Creating a More Accessible Environment for Our Users with Disabilities: Responding to an Office for Civil Rights Complaint,” Archival Issues 38, no. 1 (2017): 7, https://scholarworks.umt.edu/ml_pubs/25/ 23 “Embedding,” Instagram Developer, accessed November 25, 2017, https://www.instagram.com/developer/embedding/. 24 “Authentication,” Instagram Developer, accessed November 25, 2017, https://www.instagram.com/developer/authentication/ . 25 Pranay Deegoju, “Embedding Instagram Feed in Your Website,” Logical Feed, December 25, 2015, https://www.logicalfeed.com/embedding-instagram-feed-in-your-website . 26 Wes Samson, “ws784512 instagram,” GitHub, 2016, https://github.com/ws784512/instagram. 27 “Sandbox Mode,” Instagram Developer, accessed November 25, 2017, https://www.instagram.com/developer/sandbox/. 28 “Terms of Use,” Instagram, accessed November 25, 2017, https://help.instagram.com/478745558852511; and “image-hashtag-feed,” Digitoimisto Dude Oy, accessed November 25, 2017, https://github.com/digitoimistodude/image-hashtag-feed. 29 Samson, “ws784512 instagram.” https://crln.acrl.org/index.php/crlnews/article/view/8991 https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread-worldwide/ https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread-worldwide/ http://www.pewinternet.org/fact-sheet/social-media/ https://doi.org/10.1080/10875301.2014.983287 https://doi.org/10.1080/1941126X.2015.1092344 https://doi.org/10.1108/LHTN-08-2013-0047 https://doi.org/10.1108/LHTN-08-2013-0047 https://www.digitalgov.gov/resources/federal-social-media-accessibility-toolkit-hackpad/ https://scholarworks.umt.edu/ml_pubs/25/ https://www.instagram.com/developer/embedding/ https://www.instagram.com/developer/authentication/ https://www.logicalfeed.com/embedding-instagram-feed-in-your-website https://github.com/ws784512/instagram https://www.instagram.com/developer/sandbox/ https://help.instagram.com/478745558852511 https://github.com/digitoimistodude/image-hashtag-feed Abstract Introduction Literature Review Mansfield Library Archives and Special Collections’ Instagram Feed Acknowledgments references 10237 ---- Letter from the Editor Kenneth J. Varnum INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 1 https://doi.org/10.6017/ital.v36i4.10237 I am excited to have been appointed Editor of Information Technology and Libraries as the journal enters its 50th year. Originally published as the Journal of Library Automation, ITAL has a long history of tracking the rapid-fire changes in technology as it relates to libraries. Much as it has over the past 50 years, technology will continue to change not just the way libraries offer services to their communities, but the way we conceptualize what it is we do. If past is prologue, I have no doubt the next decades will continue to amaze, probably in ways even the most adventurous trend-forecaster won’t get quite right. In the context of the rapid change in how we do our work, what we do will remain the same: collecting, preserving, and providing access to the information and artefacts of our culture, whatever that may be. I would like ITAL to grow and expand, while keeping its core essence the same. That core is high-quality, relevant, and informative articles, reviewed by our peers, and made available to the world. But I think there is more we can do for LITA and the library technology profession by expanding the scope and impact of the journal through seeking and soliciting articles from a wider range of librarians, adding more case studies to the research articles that are at the journal’s core, and being more rapidly responsive to the evolving technology landscape in front of us. To that end, I invite you to think broadly about researching, documenting, and describing the technology-related work you do so that others can learn about it. I welcome questions about how your project might fit into ITAL, and look forward to working with you. I’d like to close by extending my thanks to Bob Gerrity, who served as ITAL’s editor for the past 6 years and stewarded the journal’s transition to an open access publication. I am grateful for his service to ITAL, LITA, and the profession. Sincerely, Kenneth J. Varnum Editor varnum@umich.edu mailto:varnum@umich.edu 10238 ---- President’s Message Andromeda Yelton INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 2 Andromeda Yelton (andromeda.yelton@gmail.com) is LITA President 2017-18 and Senior Software Engineer, MIT Libraries, Cambridge, United States. Before I dive into my column, I’d like to recognize and thank Bob Gerrity for his six years of service as ITAL’s Editor in Chief. He oversaw our shift from a traditional print journal to a fully online one, recognized by Micah Vandegrift and Chealsye Bowley as having the strongest open-access policies of all LIS journals (http://www.inthelibrarywiththeleadpipe.org/2014/healthyself/). I’d like to further extend a welcome to Ken Varnum as our new Editor in Chief. Ken’s distinguished record of LITA service includes stints on the ITAL Editorial Board and the LITA Board of Directors, so he knows the journal very well and I am enthusiastic about its future under his lead. I’m particularly curious to see what will be discussed in ITAL under Ken’s leadership because I’ve just come back from two outstanding conferences which drove home the significance of the issues we wrestle with in library technology, and I’m looking forward to a third. In early November, I attended LITA Forum in scenic Denver. The schedule was packed with sessions on intriguing topics – too many, of course, for me to attend them all – but two in particular stand out to me. In one, Sam Kome detailed how he’s going about a privacy audit at the Claremont Colleges Library. He walked us through an extensive – and sometimes surprising – list of places personally identifiable information can lurk on library and campus systems, and talked through what his library absolutely needs (which is less than he’d thought, and far less than the library has been logging without thinking about it). In the other, Mary Catherine Lockmiller took a design thinking approach to serving transgender populations. She shared a fantastic, practical LibGuide (http://libguides.southmountaincc.edu/transgenderresources), but the part that stuck with me most is her statement that many trans people may never physically enter a library because public spaces are not safe spaces; for this population, our electronic services are our public services. As technologists, we create the point of first, and maybe only, contact. A week later, I attended the inaugural Data for Black Lives conference (http://d4bl.org/) at the MIT Media Lab, steps from my office. This was – and I think everyone in the room felt it – something genuinely new. From the galvanizing topic, to the sophisticated visual and auditory design, to the frisson of genius and creativity buzzing all around a room of artists, activists, professors, poets, data scientists and software engineers, it was a remarkable experience for us all. Those of you who heard Dr. Safiya Noble speak at Thomas Dowling’s LITA President’s program in 2016 are familiar with algorithmic bias. Numerous speakers discussed this at D4BL: the ways that racial disparities in underlying data sets can be replicated, magnified, and given a veneer of objective power when run through the black boxes that power predictive policing or risk assessment for bail hearings. Absent and messy data was a theme as well: in a moment that would make many librarians chuckle (and then wince) knowingly, a panel of music industry executives estimated that 40% of their metadata is wrong, thus making it impossible to credit and compensate artists appropriately. mailto:andromeda.yelton@gmail.com) https://www.google.com/url?q=http://www.inthelibrarywiththeleadpipe.org/2014/healthyself/&sa=D&ust=1512118443864000&usg=AFQjCNEDFyL-YwFgnAdmdzfCRVVnMHLhhQ http://libguides.southmountaincc.edu/transgenderresources http://d4bl.org/) PRESIDENT’S MESSAGE | YELTON 3 https://doi.org/10.6017/ital.v36i4.10238 And yet – in a memorable keynote – Dr. Ruha Benjamin called on us not only to collect data about black death, as she showed us an image of the ambulance bill sent to Tamir Rice’s family, but to listen to our artists and poets as we use our data to imagine black life – this in front of an image of Wakanda. With our data and our creativity, what new worlds can we map? Several of my MIT colleagues also attended D4BL, and as we discussed it afterward we started thinking about how these ideas can drive our own work. How does the imaginary world of Wakanda connect to the archival imaginary, and what worlds can we empower our own creators to imagine with what we collect and preserve? How can we use our data literacy and access to sometimes un-Googleable resources to help community groups collate data on important issues that are not tracked by our public institutions, such as police violence (https://mappingpoliceviolence.org/) or racial disparities in setting bail? With these ideas swirling in my mind, I am looking forward with tremendous excitement to LITA Forum 2018. Building on the work of our Forum Assessment Task Force, we’ll be doing a lot of things differently; in particular, aiming for lots of hands-on, interactive sessions. This will be a conference where, whether you’re a presenter or an attendee, you’ll be able to do things. And these last two conferences have driven home for me how very much there is to do in of library technology. Our work to select, collect, preserve, clean, and provide access to data can indeed have enormous impact. Technology services are front-line services. https://mappingpoliceviolence.org/) 10239 ---- Editorial Board Thoughts: Reinvesting in Our Traditional Personnel Through Knowledge Sharing and Training Mark Dehmlow INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 4 Mark Dehmlow (mdehmlow@nd.edu) is a member of the ITAL Editorial Board and Director of Library Information Technology, Hesburgh Library, University of Notre Dame, South Bend, Indiana. Lately I have been giving a lot of thought to how those of us in technology positions can extend our impact throughout our organizations. With finite budgets and time and relatively low personnel turnover, I have realized that the solution goes beyond merely finding ways that technology can optimize workflows through automation. I have been working in academic library technology for nearly 20 years and when I began my career, virtually all areas of technology required specialized staff – from supporting general computer applications to managing the technical infrastructure that underlay our core systems. These days, technology is still a specialty, but the function of technicians has become more focused on providing infrastructure and much of the general application support we used to provide has become ubiquitous and has become an expectation of almost all library positions. Managing email, creating specialized formulas for data analysis, navigating operating systems, even developing basic databases, are now regular parts of library work. The trend of technological infusion will continue but instead of general technical tasks, almost all new library positions will require deeper technical skills. This is due, in part, to the function of knowledge work becoming more specialized as libraries focus on the areas where they can create the most value and those new domains require more technical expertise to be effective. Perhaps the most striking example of this evolution is in the transition of catalogers to metadata specialists. The days of working with a single metadata format (MARC) in a single, tabular interface (catalog) are quickly slipping away and being replaced by metadata structured in multiple complex schemes, expressed in formats like XML and JSON. Instead of acquiring data from OCLC, libraries need to work with web-based APIs to harvest metadata. And the tools for manipulation require basic programming skills in languages like Python or working with open source applications that look little like the integrated library systems we are used to. Working with these tools can enable metadata experts to customize metadata at scale, but it requires new knowledge and even new ways of thinking about metadata and metadata manipulation. Cataloging isn’t the only position undergoing change in academic libraries, either. Acquisitions is pushing toward greater automation and patron driven selection. The catalog is becoming more like a bookstore and the discovery landscape includes a panoply of resources that are purchased only at the point a user clicks on a link to a resource. Acquisitions is also occurring at larger scale, and requiring the ability to work with thousands of items in a batch, to select based on the qualities of what libraries want to make available, to analyze usage trends, and to load, update, and remove metadata as quickly from our discovery environment as possible. The tools to accomplish this are similar to those for metadata. Beyond technical services, we’re beginning to see the role of the subject selector transition from building broad disciplinary collections toward a focus on curation EDITORIAL BOARD THOUGHTS | DEHMLOW 5 https://doi.org/10.6017/ital.v36i4.10239 of specialized collections requiring digitization and digital curation. The tools to accomplish this are digital asset management systems and web-based digital exhibition tools which are specialized content management systems. Subject selectors are transforming into digital content creators and managers. Technologically-driven change regularly outpaces generational personnel turnover in libraries, and given that technological change continues to grow exponentially, it is clear we need a flexible workforce and an organizational commitment to training and professional growth. While organizations are rewriting positions to include technical skills, we will always have a preponderance of staff that started their careers in libraries with depreciating skillsets. Merely directing staff to webinars, conferences and self-driven development isn’t enough. Multi-day workshops are great as long as there are opportunities to apply learning upon returning to work. To guarantee skill retention, sustained training needs to be directed towards the specific skills needed now and based in actual work, not just theoretical exercises. The challenge, then, becomes how to implement such a program and identifying who can provide the necessary training. How can specialization be disseminated to non-specialists? Many libraries have some of the needed resources close at hand, even if staffing is thin and technical resources scarce. It requires thinking a bit pragmatically to reuse the resources libraries do have, and for technologists to evolve with demands as well, transitioning our roles from technology experts alone to a hybrid of practitioners, teachers, and enablers. Teaching is, itself, a specialty and many IT professionals are unlikely to have developed that skillset. Most libraries, though, have staff who do have experience and expertise in training and pedagogy. Evolving towards in-sourced technology development will undoubtedly require IT staff to first learn effective teaching methods and basic curricula development. They will need a framework to take a set of specific skills and build ad-hoc courses with medium range learning objectives. Teaching can occur in the context of actual work scenarios so that learning is put to practical use as part of that training, and skills retention improved. Libraries can become labs for cross-training and knowledge sharing through leveraging our teachers and technologists in interdisciplinary partnerships and collaboration with a focus on internal growth so that library organizations can meet continuously changing demands. Once staff have been trained in new technical areas, there is another opportunity for IT professionals to extend their impact, by dividing technology-driven projects into the parts that require deep technical work and the parts that require transferable technical skills. If technologists start looking at ways to implement technical solutions in componentized ways instead of as end-to-end solutions, they have the opportunity to empower newly trained staff to contribute in practical ways through building solution foundations and then delegating configurable application inputs. As an example – developing a full application stack requires considerable programming skill, but learning to create and update extensible stylesheets to transform XML-based metadata is a teachable skill. IT professionals could develop applications that take a configuration file and an XSL file as inputs while staff with XSLT training can modify the configuration to include parameters for connecting to APIs or loading XML. Trained staff could then modify the XSL to transform data to their specifications without having to pass the task back to the IT professional. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 6 Moving toward more holistic technology capability in libraries will require all personnel to be committed to evolving to meet the emerging needs of our organizations – IT professionals included. For decades, technologists have been in the privileged position of having the necessary skills to advance the profession’s digital future, but it will be important for technologists in libraries to integrate the many valuable skills other personnel can offer so that they also can evolve in ways that best support our organizations – leveraging foundational library skills to enhance overall organizational capacity to accomplish tasks that are increasingly requiring technical expertise. I won’t pretend it will be easy. It will require libraries to prioritize organizationally-led training, even amidst the flurry of demands around us, but I think it is also critical to the future of the profession, and the old adage that winter pays for summer feels apropos here. Technologists will need to be open to incorporating foundational library skills, to collaborating and learning from other library specialists, to thinking of their positions more broadly, and, for those who live in ivory towers (you know who you are), to eliminating the silos they’ve built and collaborate, cooperate, and engage. Technologists are an important part of library ecosystems with what we contribute operationally, but I think we can have a greater impact if we propagate our knowledge in an effort to increase the profession’s overall technology capacity and become agents to support knowledge workers’ future skill development. 10240 ---- Enhancing Visibility of Vendor Accessibility Documentation Samuel Kent Willis and Faye O’Reilly INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 12 Samuel Kent Willis (samuel.willis@wichita.edu) is Assistant Professor and Technology Development Librarian and Faye O’Reilly (faye.oreilly@wichita.edu) is Assistant Professor and Digital Resources Librarian at Wichita State University. ABSTRACT With higher education increasingly being online or having online components, it is important to ensure that online materials are accessible for persons with print and other disabilities. Library- related research has focused on the need for academic libraries to have accessible websites, in part to reach patrons who are participating in distance-education programs. A key component of a library’s website, however, is the materials it avails to patrons through vendor platforms outside the direct control of the library, making it more involved to address accessibility concerns. Librarians must communicate the need for accessible digital files to vendors so they will prioritize it. In much the same way as contracted workers constructing a physical space for a federal or federally funded agency must follow ADA standards for accessibility, so software vendors should be required to design virtual spaces to be accessible. A main objective of this study was to determine a method of increasing the visibility of vendor accessibility documentation for the benefit of our users. It is important that we, as service providers for the public good, act as a bridge between vendors and the patrons we serve. INTRODUCTION The World Wide Web was developed late in 1989 but reached the public sector the following year and quickly gained prominence.1 Around this same time (1990), the Americans with Disabilities Act (ADA) was also passed, so when it was written the role of the web had yet to take shape. Websites and online content, while not included specifically in the ADA, have been increasingly emphasized when institutions examine the accessibility of their resources for persons with disabilities. More recent legislation, as well as legal-settlement agreements (including with colleges and universities), have included—and even emphasized—the importance of accessible online content. Researchers have argued that in requiring facilities to be accessible, ADA must include digital accessibility.2 With higher education increasingly being online or having online components, it is important to ensure that online materials are accessible for persons with print and other disabilities, many of whom may have received more extensive support in primary and secondary schools. Unless accessibility is pursued with purpose, the level of education and educational materials available for students with disabilities will be severely limited.3 LITERATURE REVIEW Legislation and Existing Guidelines Equal access to information for all patrons is a foundational goal of libraries. In higher education, accessible information and communications technology allows users of all abilities to focus on learning without undue burden.4 Colleges and universities are required by law to provide ENHANCING VISIBILITY OF VENDOR ACCESSIBILITY DOCUMENTATION | WILLIS AND O’REILLY 13 https://doi.org/10.6017/ital.v37i3.10240 reasonable accommodations to allow an individual with a disability to participate fully in the programs and activities of the university. According to Title II of ADA, discrimination on the basis of disability by any state or local government and its agencies is strictly prohibited.5 Section 504 of the Rehabilitation Act of 1973 also prohibits discrimination on the basis of disability by any program or activity receiving federal assistance.6 The Department of Education stated, “Public Educational institutions that are subject to Education’s Section 504, regulations because they receive Federal financial assistance from us are also subject to the Title II regulations because they are public entities (e.g., school districts, State educational agencies, public institutions of vocational education and public colleges and universities).” 7 This piece of legislation usually manifests itself in the physical learning space—wheelchair ramps, braille textbook options, interpreters, and more—but finds little application in the digital spaces of a university, especially in the library’s online research presence. This is an alarming revelation; much higher learning today takes place in an online environment, and inaccessible library resources are a contributing factor to challenges in higher education faced by users with disabilities. To be considered accessible, a digital space, such as a website, online-learning management system, or a research discovery layer, and any Word documents, PDFs, and multimedia presented therein, should be formatted in such a way that it is compatible with assistive technologies, such as screen-reading software. A website should also be navigable without a mouse using visual or auditory clues. Content on a website ought to be clearly and logically organized, with skip navigation links to bypass to the page’s main content. Images should have alternative text descriptions, known as “alt text,” that is brief and informative, describing the content and role of the image. Links should likewise have clear descriptions of the target page. These and similar considerations aim to help persons with impairments that may make reading a monitor or screen difficult.8 Digital spaces like a research database are considered electronic information technology (EIT). EIT is defined as “information technology and any equipment or interconnected system or subsystem of equipment that is used in the creation, conversion or duplication of data or information.”9 Recently this terminology has been converted to information and communications technology (ICT) as per the final rule updating Section 508 in early 2017, but the essence of what it means remains unchanged.10 Legislation regarding digital accessibility exists, specifically Section 508 of the Rehabilitation Act of 1973, but only federal agencies and institutions receiving federal aid are required to abide by these statutes. Lawmakers considered technology as a growing part of daily life in 1998 and amended the Rehabilitation Act with Section 508, requiring federal agencies to make their ICT accessible to people with disabilities.11 In 2017, these standards were updated with a final rule that modernized guidelines for accessibility of future ICT.12 Any research databases or other applications used by college and university libraries to facilitate online learning would be considered ICT and thereby subject to Section 508 requirements. It is evident that libraries not only have legal reasons to comply with Section 508, but ethical reasons as well because making library collections and services universally available is a core value of the library community.13 In addition to legislation, the World Wide Web Accessibility Initiative (WAI) created the Web Content Accessibility Guidelines (WCAG) in 1999 in response to the growing need for web accessibility and to promote universal design. These standards created for web-content creators and web-tool developers are continually updated as new technologies and capabilities emerge— with version 2.0 being released in 2008—and apply specifically to web content and design. Many INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMEBER 2018 14 of these guidelines were absorbed by the 2017 refresh of Section 508 of the Rehabilitation Act of 1973.14 With fourteen guidelines assigned priority levels 1–3, WCAG 2.0, and subsequent revisions to date, offer three levels of conformance with digital-accessibility guidelines: Level A, the most basic level, meaning all mandatory level 1 guidelines are met; Level AA, meaning priority levels 1 and 2 are met; and Level AAA, meaning priority levels 1–3 are met. These conformance levels are important because many ICT vendors will make their claims to conformance with WCAG standards by using provided WAI icons or using statements that refer to the level of conformance.15 WCAG 2.0 guidelines alone are not enough to determine fully if a website or other digital content is truly accessible. It partly depends on it having an intuitive layout for a variety of users, which can only be achieved through usability testing.16 It is crucial that librarians understand what is required for a product or service to be considered accessible, and a firm grasp of WCAG 2.0 and its conformance levels will enrich a librarian’s understanding of web accessibility and Section 508 regulations.17 A Voluntary Product Accessibility Template (VPAT) is a self-assessment document that vendors are required to complete only if they wish to sell their products to the federal government or any institution that chooses to require them. The quality of VPATs varies, but essentially they will list Section 508 standards and for each specify whether they fully or partially support it, do not support it, or if the standard is not applicable. There is then a space for the vendor to provide an explanation for limitations. Since these are voluntary self-assessments, these documents can sometimes be brief and incomplete, but even brief statements can be specific enough to relatively easily verify the claims of support. Because libraries are portals to online content, including e-books, e-journals, databases, streaming media, and more, which are provided largely by third-party vendors, libraries face unique struggles when attempting to comply with federal regulations. Notions of equality and equal access are inherent to libraries and important for the maintenance of a democratic society, which makes accessibility within libraries’ digital content a concerning ethics issue.18 Having little control over how ICT is designed, libraries still must figure out how to address accessibility needs within third-party ICT. In 2012, the Association of Research Libraries (ARL) Joint Task Force on Services to Patrons with Print Disabilities encouraged libraries to require publishers to implement industry best practices, comply with legal requirements for accessibility, include language in publisher and vendor contracts to address accessibility, and request documentation like VPATs.19 The task force’s report was vital in the creation and direction of this study. Existing Literature and Studies As library professionals, we may often make assumptions of the accessibility of a third-party resource when the reality is that greater importance is placed on design of a product; accessibility components are either being added as special features or are being included once the design work is completed.20 Tatomir and Durrance conducted a study on the compatibility of thirty-two library databases with a set of guidelines for accessibility they called the Tatomir Accessibility Check- list.21 This list included checking the usability of these databases with a screen reader and braille renewable display. They found that 44 percent of the databases were inaccessible, with an additional 28 percent being only “marginally accessible,” based on their criteria. This suggests major problems exist within vendor database platforms.22 ENHANCING VISIBILITY OF VENDOR ACCESSIBILITY DOCUMENTATION | WILLIS AND O’REILLY 15 https://doi.org/10.6017/ital.v37i3.10240 Building on this research, Western Kentucky University Libraries conducted a study on VPATs from vendors to determine how accessible seventeen of their databases were.23 The university libraries ran an accessibility scan on those databases and compared the results with the vendors’ VPATs, finding that the templates from the vendors were accurate about 80 percent of the time. Most of the vendors did not address the accessibility of Portable Document Format (PDF) files in their VPAT statements, though it was an important component of their services. Pertinent to this study, Western Kentucky’s work looked for accessibility documentation on vendors’ websites , and when one was not found, contacted the vendors requesting this information. This study was unique for targeting vendor-supplied VPATs rather than only examining the databases themselves or tutorials from vendors. As mentioned previously, this was only done for the libraries’ main database vendors. Mune and Agee published an article on the Ebooks Accessibility Project (EAP) funded by Affordable Learning Solutions at the California State University System. In this project, the researchers compared academic e-book platforms to e-reader platforms used for popular trade publications. They gathered data on the top sixteen library e-book vendors at San Jose State University based on patron usage and title and holdings counts. The results indicated that academic e-book platforms were less accessible than nonacademic platforms, largely because of hesitance in adopting the EPUB 3 format, which by default has superior navigation and document structure to PDF or HTML, common academic options.24 While this study focused solely on the accessibility of e-book materials, a method for contacting vendors used in the EAP study was adapted for the current study, applied at a larger scale. The EAP researchers attempted to locate the vendors’ VPATs online, and they contacted the vendors at least twice to request a VPAT or other accessibility statement when none was located. It is noteworthy that of the sixteen vendors, all but one (94 percent) provided EAP with some form of accessibility documentation, though less than half (44 percent) had a VPAT available.25 Another study, by Joanne Oud, examined vendor-supplied database video tutorials. Half of the twenty-four vendors examined in Oud’s study had tutorials in formats that were not accessible by keyboard or screen reader. This was largely because many of these tutorials were Flash-based.26 Shockwave Flash is neither accessible for persons with disabilities nor good for usability on modern browsers.27 Oud’s findings suggest that tutorial content would be more widely accessible if they were placed in YouTube or another platform that had transcripts and captions available. While the focus of the study was different from our own, it was similar in that Oud examined the accessibility of vendor materials apart from the journals and collections. Also, Oud noted that to make use of vendor tutorials, the website on which they are housed must likewise be accessible and the videos easy to find, but this is often not the case.28 Other studies suggest that vendor websites and platforms often impede access to information. Vendor platforms often have inaccessible PDFs, or the links to the full-text options are not easily located. DeLancey’s study also found more than three-fourths of the vendors examined had images without alternative text and frames without titles, resulting in many users with visual impairments being left out of the content of these images and frames entirely. Of particular note, however, was the finding that not one of the vendors in this study had all forms —buttons, search boxes, and other browser navigation tools—labeled correctly, leaving the sites difficult to navigate.29 Beyond whether the information itself is accessible, the question inevitably arises, can INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMEBER 2018 16 the desired information even be reached? One way or another, the content on these platforms must be accessible and easy to find. Part of the motivation behind the current study stems from what DeLancey put so well: “Only one vendor (out of seventeen), Project Muse, had a publicly available VPAT on their website, though 9 others supplied this documentation upon request in under a week.”30 The first step in improving accessibility of resources for our patrons is to discuss accessibility with them—to determine how accessible information resources are today and identify areas of need. If a VPAT or, minimally, any form of an accessibility statement is not easily discoverable on a vendor’s website—even if it is available upon request—users with disabilities as well as enabled users are not able to benefit from this information. Are the vendors making it a priority in this case? Additionally, since 41 percent of the vendors DeLancey examined had no VPAT at all, what can be done before and aside from reaching out to vendors and stressing the importance of accessibility and of making statements on accessibility easy to find? From legal responsibilities to the dismal reality of digital accessibility, the task of improving library service for patrons with disabilities is daunting, even with the empowering ethical drivers of the library value system. Ostergaard created “Strategies for Acquiring Accessible Electronic Information Sources,” an incredible guide to begin creating a guide that helps librarians develop an accessibility plan informed by her own work committed to accessibility in her library. Steps 3 and 4 of Ostergaard’s strategies are particularly relevant to the current study. Step 3, “Communicating with Vendors,” involves inquiring about the accessibility of electronic products in addition to asking about any future plans for accessibility of their product and requesting VPATs or other vendor supplied accessibility documentation. Step 3 also recommends that librarians request vendors meet WCAG 2.0 best practices and to incorporate a clause in license agreements that clearly defines accessibility of their products as further demonstration of ded ication to accessibility. Such communication, it is hoped, would also lead to improved product development.31 Once vendors are contacted, Ostergaard outlines in step 4 the importance of documenting vendor communication regarding digital accessibility and further suggests assigning a person or team to review information received. Ostergaard’s library changed the name of their acquisitions budget to “access budget,” reallocating a portion of their budget to review existing subscriptions, purchase accessible replacements, or in some cases, convert materials to an accessible format. The documentation review allowed the library to make informed decisions about collections and service availability on behalf of library users, but no mention was made of involving users in this process. The article provided a letter template that encompassed the aforementioned concepts and a request for assessment documentation, such as VPATs and official statements of compliance. The Ostergaard template served as a foundation for the language used in vendor communication for the current study, particularly the VPAT or other accessibility documentation request.32 There have been no studies that suggest a way to implement easily discoverable vendor accessibility documentation—even when said documentation is not readily available to the public on the vendors’ sites. DeLancey suggested creating “an open repository for both vendor supplied documentation, and the results of any usability testing,” but this was suggested for internal library use, not public dissemination.33 If this documentation is made more easily available, we can ENHANCING VISIBILITY OF VENDOR ACCESSIBILITY DOCUMENTATION | WILLIS AND O’REILLY 17 https://doi.org/10.6017/ital.v37i3.10240 increase patron involvement in the discussion of accessibility of vendor-supplied library resources. RESEARCH METHODS Library-related research has focused the need for academic libraries to have accessible websites, in part to reach patrons who are participating in distance-education programs.34 A key component of a library’s website is the materials it avails to patrons from vendors, like databases and database aggregators. Since, however, these materials are accessed via vendor platforms, they are outside the direct control of the library, making it more difficult to address accessibility concerns. Some vendors have put forward significant effort in addressing accessibility needs. Some offer a built-in feature for text-to-speech for HTML files or provide documents in a variety of formats, including TXT and MP3 files, thereby offering a format that works well with common screen- reading programs, or providing a sound file directly. This is of particular benefit to patrons with print disabilities.35 Other vendors, such as Ebook Central (formerly Ebrary), have worked to eliminate their Flash dependencies. This is recognized as a positive step toward making vendor content usable for all. Streaming video and other nonprint-based library materials must also be accessible. A person with visual impairments may be able to hear the soundtrack of the video, but unless an accurate description is provided of what is being presented visually, he or she will miss out on such information, such as the names of those speaking. To complicate matters further, hearing impaired users of these databases will not be privy to what is verbalized unless accurate captions and transcripts, or an interpreter, is made available for the videos. Captions and transcripts are sometimes made available, but can easily be incomplete or incorrect. For example, Alexander Street Press provided closed captioning and transcripts for some collections but not others. Even when the captions or transcripts existed, as with a video we tested from Ethnographic Videos Online, it was of low quality, inscribing the word “object” as “old pics,” “house” as “mess,” and so forth. One vendor, Docuseek, had subtitles to translate from Spanish, but no closed captioning or transcript available. Audio-impaired users could not make full use of the video because the subtitles did not include all information presented in the sound track. (Transcripts can also be useful to visually impaired users using screen readers.) Films on Demand had better captions and transcripts, but did not include all the words on the screen in the transcript, such as the title. Regardless of the medium there are multiple ways to provide accessible versions, but they are seldom automatic. Librarians must communicate the need for accessible digital files to vendors so they will prioritize it. As long as libraries—one of their main customer groups—accept their offerings whether accessible or not for persons with disabilities, vendors have no reason to put great effort into making these improvements. As Colker pointed out, commercial vendors are not required to comply with ADA regulations under Title II or Title III.36 Vendors may also face resource restrictions that hinder their ability to improve their platforms’ accessibility. 37 They are businesses, so it is natural that they would only commit a concerted effort to reformat and enhance their platforms and records if the benefits are expected to outweigh the costs; they must firstly be made aware of the issue, and know that it is important to libraries and their patrons. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMEBER 2018 18 In much the same way as contracted workers constructing a physical space for a federal or federally funded agency must follow ADA standards for accessibility, so software vendors should be required to design virtual spaces to be accessible. This comparison was made by the Department of Education more than twenty years ago, and has the added benefit of greatly reducing the need for accommodation after the fact.38 According to Cardenes, “At a minimum, a public entity has a duty to solve barriers to information access that the public entity’s purchasing choices create.”39 Oswal stressed the importance of integrating the blind user experience into the development of databases from the beginning, as well as finding steps useful for guiding library users after the fact. Merely following the rules set out in federal regulations is not enough to provide exemplary service to library patrons. The patrons as well must be involved in the process to fully address accessibility needs.40 PROCESS AND FINDINGS The first objective of this study was to gain a better understanding of the accessibility of our library’s vendor-provided digital resources through the review of vendor-provided accessibility documentation. The second objective of this study was to determine a method of increasing the visibility of accessibility documentation for the benefit of our users and to communicate to them our commitment to improving service to users with disabilities. With a digital collection consisting of 270 databases, more than 750,000 e-books and e-journals, and more than 12 million streaming media titles, it was difficult to identify an appropriate sample. We needed a collection that would best serve as an illustrative swatch of our library’s digital holdings, and more importantly, a collection that would have the largest impact on our users. We also needed to establish a strategy for obtaining accessibility documentation regarding third-party content as well as create a delivery method for the VPATs and other documentation we discovered in the course of our study. Similar to other institutions, our library maintains a directory of the most used and most useful databases on the library’s homepage in the form of the A–Z List (http://libresources.wichita.edu/az.php). Determinations of usefulness are based on input from our reference librarians, who connect with user needs directly, whereas use comes from annual usage statistics compiled as per standard library procedures. Users can browse this directory by subject, search by title, and sort by database type (full-text, streaming media, etc.), and the A-Z list is a convenient place for users to begin their research. The directory also served as a convenient place to begin this study as it presented us with a sample that not only reflected the needs and habits of our patrons, but an excellent and diverse list of vendors to work with. Beginning with a list of all subscribed databases (270 in 2016) exported directly from the A–Z List’s backend, we sorted the list by vendor and determined that 74 vendors would be investigated. University materials indexed by the directory (i.e., institutional repository and LibGuides) were excluded from this study. As visibility of accessibility documentation is of concern to this study, our investigation began by visiting the database or vendor’s site and conducting a web search to obtain any information about accessibility. We were looking for mentions of the following keywords: “Section 508” or “Section 504,” “W3” or “WCAG,” “VPAT,” “ADA,” and simply “accessibility.” Some sites were intuitive: thirty-four vendors (45 percent) had statements that were found online. Examples of commonly used documentation, which for the purposes of this study will be referred to as http://libresources.wichita.edu/az.php ENHANCING VISIBILITY OF VENDOR ACCESSIBILITY DOCUMENTATION | WILLIS AND O’REILLY 19 https://doi.org/10.6017/ital.v37i3.10240 accessibility statements, included “Accessibility Policy,” “Section 508 Compliance,” or “Accessibility Statement.” Of those thirty-four vendors who posted accessibility documentation online, eleven provided a VPAT or a link thereto in their accessibility statements. If we could not find an accessibility statement on the site, vendors were contacted first via email requesting information and documentation regarding the accessibility of their product using a form letter inspired by the Ostergaard template.41 This email address was either found online— likely the “Contact Us” or technical support email links—or originated in the list of vendors’ contacts maintained in the library management system if another contact could not be found. If a response was not received within thirty days, the vendors were contacted a second time, a suggestion gleaned from Mune and Agee’s work.42 After all vendors included in the study had been contacted, any who did not provide a VPAT were contacted a final time with a specific request for a VPAT. For vendors who responded they could not provide a VPAT or other accessibility statement, we used a screenshot of their response as documentation. The form letter (see appendix A) used in the current study made it known to vendors that their response would be posted publicly for the benefit of our users. Twelve of the remaining vendors responded to our email inquiries with VPATs and seven vendors responded with other accessibility documentation. Figure 1. Results of vendor query for accessibility documentation. In total, eleven VPATs (15 percent) were found online and VPATS from twelve vendors (16 percent) were received in response to our emailed request. Twenty-three vendors (31 percent) had other accessibility documentation available online, while seven vendors (9 percent) provided other accessibility documentation in response to email inquiries. Eight vendors (11 percent) Other Accessibility Documentation Found Online 31% Other Accessibility Documentation Received 9% No Official Statement 11% Did Not Respond 18% VPATS Found Online 15% VPATS Received 16% VPATS 31% Results of Vendor Query for Accessibility Documentation INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMEBER 2018 20 responded they had no official statements or documentation to offer, and thirteen vendors (18 percent) did not respond (see figure 1). With the documentation compiled, we needed to establish an appropriate delivery system that would make this accessibility information visible to library users and therefore further the accessibility efforts. Our collection cross-section, the A–Z List, was chosen because of its prominence in our library’s online research presence as a suitable location to not only store but to convey this documentation to users. We created a clickable icon to be embedded into the databases’ entries in our A–Z List created in LibGuides (a Springshare product). Clicking the icon would take the user to the vendor’s statement page, directly to the VPAT, or to a page we created in LibGuides to store screen captures of vendor emails and VPATs we received as attachments. If a VPAT was available, we linked to it above any other documentation because VPATs present a more rigorous analysis of the accessibility of third-party-created ICT. LibGuides was determined to be a suitable place to house this documentation not only because it made the information easy to find for patrons, but also because Springshare built LibGuides in an increasingly accessible manner and has documented its efforts using VPATs for each product (see appendix B). FURTHER STUDY It is expected that some of the information provided by the vendors is incomplete or inaccurate, even despite their best efforts, so the information we provide to patrons from and about the vendors might at times lead our patrons astray. We briefly examined the VPATs acquired through this project to inform our work moving forward and found errors in at least half of them. Some vendors claimed that skip navigation was available when none was found, while another would have benefitted from it but said it was “not applicable.” Others were too brief to be useful, as no explanations were given for their claims. Building on this current research, we intend, in collaboration with patrons with disabilities, to further verify the accuracy of key statements made by vendors in their VPATs and other accessibility documentation. This analysis will give concrete feedback to vendors on how their sites could be further improved. As stated earlier, giving patrons access requires more than following a set of guidelines; it requires dialog to ensure their needs are fully met.43 It requires more than making the available documents accessible, but also testing the platform used to retrieve the documents for accessibility. As one author put it so well, “A lack of technological access is a solvable problem, but only if it is made a priority.”44 As vendors are not directly subject to enforcement of Section 508 and other statutes regarding accessibility of the products they provide to libraries, VPATs are truly voluntary. As such, the level of effort and detail of the product assessments are inconsistent and accuracy of the documentation is questionable. We intend to continue to be involved in the digital-accessibility initiative in part through our analysis of our digital-library presence, utilizing user input and expanding their role in improving the user experience. This would enable us to further improve our libraries’ service to users with disabilities. If we, as library professionals and institutions, stand together and each say our part, vendors will realize this is an important issue to address. Also, it is important that we, as service providers for the public good, act as a bridge between these vendors—who at times do not avail good service information to their customers—and the patrons we serve. It may be a small step, but providing links to the VPATs and other accessibility statements from vendors right where the patrons need ENHANCING VISIBILITY OF VENDOR ACCESSIBILITY DOCUMENTATION | WILLIS AND O’REILLY 21 https://doi.org/10.6017/ital.v37i3.10240 them is an important step in meeting the patrons where they are and showing them help is available. We can show patrons we care and will work with them to improve the now limited accessibility of not only scholarly information itself, but even of the platforms in which they are housed. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMEBER 2018 22 APPENDIX A: ACCESSIBILITY DOCUMENTATION REQUEST EMAIL TEMPLATE Subject Line: VPAT Request Thank you for the information you provided answering our inquiry regarding the accessibility of your electronic product. Wichita State University Libraries has set a goal of improving the accessibility of the electronic and information technology we provide to our patrons. In accordance with Section 504 of the Rehabilitation Act and Title II of the Americans with Disabilities Act, do you happen to have a Voluntary Product Accessibility Template (VPAT) available, or have you made plans to do further accessibility testing on your product? The VPAT documentation can be found on the U.S. Department of State Website: http://www.state.gov/m/irm/impact/126343.htm. http://www.state.gov/m/irm/impact/126343.htm ENHANCING VISIBILITY OF VENDOR ACCESSIBILITY DOCUMENTATION | WILLIS AND O’REILLY 23 https://doi.org/10.6017/ital.v37i3.10240 APPENDIX B: VPAT AND OTHER ACCESSIBILITY DOCUMENTATION URLS USED IN THE DATABASES A–Z LIST. (List current as of October 20, 2017. Library subscriptions may have changed. Vendors may have updated URLs or added additional documentation since October 20. Research on this project is ongoing. Please see http://libresources.wichita.edu/az.php for a current list of vendor accessibility documentation.) Vendor URLS AAPG (American Association of Petroleum Geologists) No accessibility documentation available ABC-CLIO No response ACLS (American Council of Learned Societies) http://www.humanitiesebook.org/about/for-librarians/#ada- compliance-and-accessibility ACM (Association of Computing Machinery) https://www.acm.org/accessibility ACS (American Chemical Society) https://www.acs.org/content/acs/en/accessibility- statement.html Adam Matthew Digital http://libresources.wichita.edu/c.php?g=583127&p=4026332 AIAA (American Institute of Aeronautics & Astronautics) http://libresources.wichita.edu/ld.php?content_id=32264954 Alexander Street Press https://alexanderstreet.com/page/accessibility-statement American Institute of Physics http://www.scitation.org/faqs American Mathematical Society http://www.ams.org/about-us/VPAT-MathSciNet-2014-AMS.pdf APA (American Psychological Association) http://www.apa.org/about/accessibility.aspx ASM International No response ASME (American Society of Mechanical Engineers) No accessibility documentation available ASTM No accessibility documentation available BioOne http://www.bioone.org/page/resources/accessibility Books 24x7 https://documentation.skillsoft.com/bkb/qrc/AssistiveQRC.pdf Britannica http://help.eb.com/bolae/Accessibility_Policy.htm Business Expert Press http://media2.proquest.com/documents/ebookcentral_vpat.pdf Cabell’s No response Cambridge Crystallographic Data Centre https://www.ccdc.cam.ac.uk/termsandconditions/ Cambridge University Press http://www.cambridge.org/about-us/accessibility/ CAS No accessibility documentation available CLCD (Children’s Literature Comprehensive Database) No response http://libresources.wichita.edu/az.php http://www.humanitiesebook.org/about/for-librarians/#ada-compliance-and-accessibility http://www.humanitiesebook.org/about/for-librarians/#ada-compliance-and-accessibility https://www.acm.org/accessibility https://www.acs.org/content/acs/en/accessibility-statement.html https://www.acs.org/content/acs/en/accessibility-statement.html http://libresources.wichita.edu/c.php?g=583127&p=4026332 http://libresources.wichita.edu/ld.php?content_id=32264954 https://alexanderstreet.com/page/accessibility-statement http://www.scitation.org/faqs http://www.ams.org/about-us/VPAT-MathSciNet-2014-AMS.pdf http://www.apa.org/about/accessibility.aspx http://www.bioone.org/page/resources/accessibility https://documentation.skillsoft.com/bkb/qrc/AssistiveQRC.pdf http://help.eb.com/bolae/Accessibility_Policy.htm http://media2.proquest.com/documents/ebookcentral_vpat.pdf https://www.ccdc.cam.ac.uk/termsandconditions/ http://www.cambridge.org/about-us/accessibility/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMEBER 2018 24 Conference Board http://www.conferenceboard.ca/accessibility/resources.aspx?As pxAutoDetectCookieSupport=1 CQ Press http://library.cqpress.com/cqresearcher/html/public/vpat.html Credo Reference https://credoreference.zendesk.com/hc/en- us/articles/201429069-Accessibility dataZoa http://libresources.wichita.edu/AccessibilityStatements/DataZo aVPAT Docuseek2 https://docuseek2.wikispaces.com/Section+508+Compliance+St atement EBSCO https://www.ebscohost.com/government/full-508-accessibility Ei Engineering Village https://www.elsevier.com/solutions/engineering- village/features/accessibility Elsevier https://www.elsevier.com/solutions/sciencedirect/support/web -accessibility Gale https://support.gale.com/technical/618 Google https://www.google.com/accessibility/initiatives-research.html HathiTrust https://www.hathitrust.org/accessibility HeinOnline https://www.wshein.com/accessibility/ IBISWorld No response IEEE https://www.ieee.org/accessibility_statement.html Infobase Learning http://support.infobaselearning.com/index.php?/Tech_Support/ Knowledgebase/Article/View/1318/0/ada-usability-statement Infogroup http://libresources.wichita.edu/c.php?g=583127&p=4286285 Institute of Physics http://iopscience.iop.org/page/accessibility InterDok No response JSTOR https://about.jstor.org/accessibility/ Kanopy https://help.kanopystreaming.com/hc/en- us/articles/210691557-What-is-Kanopy-s-position-on- accessibility- LexisNexis http://www.lexisnexis.com/gsa/76/accessible.asp Library of Congress https://www.congress.gov/accessibility Mergent No accessibility documentation available National Academies Press No response National Library of Medicine https://www.nlm.nih.gov/accessibility.html Naxos http://libresources.wichita.edu/c.php?g=583127&p=4287131 NCJRS https://www.justice.gov/accessibility/accessibility-information Newsbank http://libresources.wichita.edu/c.php?g=583127&p=4457078 OCLC https://www.oclc.org/en/policies/accessibility.html Ovid http://ovidsupport.custhelp.com/app/answers/detail/a_id/590 9/~/is-the-ovid-interface-section-508-compliant%3F Oxford University Press https://global.oup.com/academic/accessibility/?cc=us&lang=en & ProjectMUSE https://muse.jhu.edu/accessibility http://www.conferenceboard.ca/accessibility/resources.aspx?AspxAutoDetectCookieSupport=1 http://www.conferenceboard.ca/accessibility/resources.aspx?AspxAutoDetectCookieSupport=1 http://library.cqpress.com/cqresearcher/html/public/vpat.html https://credoreference.zendesk.com/hc/en-us/articles/201429069-Accessibility https://credoreference.zendesk.com/hc/en-us/articles/201429069-Accessibility http://libresources.wichita.edu/AccessibilityStatements/DataZoaVPAT http://libresources.wichita.edu/AccessibilityStatements/DataZoaVPAT https://docuseek2.wikispaces.com/Section+508+Compliance+Statement https://docuseek2.wikispaces.com/Section+508+Compliance+Statement https://www.ebscohost.com/government/full-508-accessibility https://www.elsevier.com/solutions/engineering-village/features/accessibility https://www.elsevier.com/solutions/engineering-village/features/accessibility https://www.elsevier.com/solutions/sciencedirect/support/web-accessibility https://www.elsevier.com/solutions/sciencedirect/support/web-accessibility https://support.gale.com/technical/618 https://www.google.com/accessibility/initiatives-research.html https://www.hathitrust.org/accessibility https://www.wshein.com/accessibility/ https://www.ieee.org/accessibility_statement.html http://support.infobaselearning.com/index.php?/Tech_Support/Knowledgebase/Article/View/1318/0/ada-usability-statement http://support.infobaselearning.com/index.php?/Tech_Support/Knowledgebase/Article/View/1318/0/ada-usability-statement http://libresources.wichita.edu/c.php?g=583127&p=4286285 http://iopscience.iop.org/page/accessibility https://about.jstor.org/accessibility/ https://help.kanopystreaming.com/hc/en-us/articles/210691557-What-is-Kanopy-s-position-on-accessibility- https://help.kanopystreaming.com/hc/en-us/articles/210691557-What-is-Kanopy-s-position-on-accessibility- https://help.kanopystreaming.com/hc/en-us/articles/210691557-What-is-Kanopy-s-position-on-accessibility- http://www.lexisnexis.com/gsa/76/accessible.asp https://www.congress.gov/accessibility https://www.nlm.nih.gov/accessibility.html http://libresources.wichita.edu/c.php?g=583127&p=4287131 https://www.justice.gov/accessibility/accessibility-information http://libresources.wichita.edu/c.php?g=583127&p=4457078 https://www.oclc.org/en/policies/accessibility.html http://ovidsupport.custhelp.com/app/answers/detail/a_id/5909/~/is-the-ovid-interface-section-508-compliant%3F http://ovidsupport.custhelp.com/app/answers/detail/a_id/5909/~/is-the-ovid-interface-section-508-compliant%3F https://global.oup.com/academic/accessibility/?cc=us&lang=en& https://global.oup.com/academic/accessibility/?cc=us&lang=en& https://muse.jhu.edu/accessibility ENHANCING VISIBILITY OF VENDOR ACCESSIBILITY DOCUMENTATION | WILLIS AND O’REILLY 25 https://doi.org/10.6017/ital.v37i3.10240 ProQuest http://media2.proquest.com/documents/proquest_academic_vp at.pdf, http://media2.proquest.com/documents/ebookcentral_vpat.pdf, Readex http://uniaccessig.org/lua/wp- content/uploads/2014/11/Readex.pdf SAGE https://us.sagepub.com/en-us/nam/accessibility-0 Salem Press No response SBRnet No response Springer https://github.com/springernature/vpat/blob/master/springerl ink.md Standard & Poor’s No response Swank No accessibility documentation available (http://libresources.wichita.edu/AccessibilityStatements/SWAN Kaccessibility) Taylor & Francis http://libresources.wichita.edu/c.php?g=583127&p=4539268 Thomson Reuters https://clarivate.com/wp- content/uploads/2018/02/PACR_WoS_5.27_Jan-2018_v1.0.pdf, US Department of Commerce http://osec.doc.gov/Accessibility/Accessibliity_Statement.html US Department of Education https://www2.ed.gov/notices/accessibility/index.html US Government Printing Office https://www.gpo.gov/accessibility University of Chicago No accessibility documentation available University of Michigan https://www.press.umich.edu/about#accessibility UpToDate http://libresources.wichita.edu/c.php?g=583127&p=4691631 ValueLine http://libresources.wichita.edu/AccessibilityStatements/ValueLi neAccessibility WRDS (Wharton Research Data Services) https://wrds-www.wharton.upenn.edu/pages/wrds-508- compliance/ Wiley http://olabout.wiley.com/WileyCDA/Section/id-406157.html REFERENCES 1 Neil Savage, “Weaving the Web,” Communications of the ACM 60, no. 6 (June 2017): 22. 2 Ruth Colker, “The Americans with Disabilities Act is Outdated,” Drake Law Review 63, no. 3 (2015): 799. 3 Colker, “The Americans with Disabilities Act,” 817; Joanne Oud, “Accessibility of Vendor-Created Database Tutorials for People with Disabilities,” Information Technology and Libraries 35, no. 4 (2016): 13–14. 4 Laura DeLancey and Kirsten Ostergaard, “Accessibility for Electronic Resources Librarians,” Serials Librarian 71, no. 3–4 (2016): 181, https://doi.org/10.1080/0361526X.2016.1254134. http://media2.proquest.com/documents/proquest_academic_vpat.pdf http://media2.proquest.com/documents/proquest_academic_vpat.pdf http://media2.proquest.com/documents/ebookcentral_vpat.pdf http://uniaccessig.org/lua/wp-content/uploads/2014/11/Readex.pdf http://uniaccessig.org/lua/wp-content/uploads/2014/11/Readex.pdf https://us.sagepub.com/en-us/nam/accessibility-0 https://github.com/springernature/vpat/blob/master/springerlink.md https://github.com/springernature/vpat/blob/master/springerlink.md http://libresources.wichita.edu/AccessibilityStatements/SWANKaccessibility http://libresources.wichita.edu/AccessibilityStatements/SWANKaccessibility http://libresources.wichita.edu/c.php?g=583127&p=4539268 https://clarivate.com/wp-content/uploads/2018/02/PACR_WoS_5.27_Jan-2018_v1.0.pdf https://clarivate.com/wp-content/uploads/2018/02/PACR_WoS_5.27_Jan-2018_v1.0.pdf http://osec.doc.gov/Accessibility/Accessibliity_Statement.html https://www2.ed.gov/notices/accessibility/index.html https://www.gpo.gov/accessibility https://www.press.umich.edu/about#accessibility http://libresources.wichita.edu/c.php?g=583127&p=4691631 http://libresources.wichita.edu/AccessibilityStatements/ValueLineAccessibility http://libresources.wichita.edu/AccessibilityStatements/ValueLineAccessibility https://wrds-www.wharton.upenn.edu/pages/wrds-508-compliance/ https://wrds-www.wharton.upenn.edu/pages/wrds-508-compliance/ http://olabout.wiley.com/WileyCDA/Section/id-406157.html https://doi.org/10.1080/0361526X.2016.1254134 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMEBER 2018 26 5 Americans with Disabilities Act of 1990, Pub. L. No. 101-336, 104 Stat. 327 (1990). 6 Rehabilitation Act of 1973, Pub. L. No. 93-112, 87 Stat. 355 (1973). 7 Discrimination on the Basis of Disability in Federally Assisted Programs and Activities, 77 Fed. Reg. 14,972 (March 14, 2012) (to be codified at 34 CFR pt. 104). 8 DeLancey and Ostergaard, “Accessibility for Electronic Resources,” 180. 9 Architectural and Transportation Barriers Compliance Board, 65 Fed. Reg. 80,500, 80,524 (December 21, 2000) (to be codified at 36 CFR pt. 1194). 10 Architectural and Transportation Barriers Compliance Board, 82 Fed. Reg. 5,790 (January 19, 2017) (to be codified at 36 CFR pt. 1193-1194). 11 29 USC §794d, at 289 (2016). 12 Architectural and Transportation Barriers Compliance Board, 82 Fed. Reg. 5,790, 5,791 (January 19, 2017) (to be codified at 36 CFR pt. 1193-1194). 13 Paul T. Jaeger, “Section 508 Goes to the Library: Complying with Federal Legal Standards to Produce Accessible Electronic and Information Technology in Libraries,” Information Technology and Disabilities 8, no. 2 (2002), http://link.galegroup.com/apps/doc/A207644357/AONE?u=9211haea&sid=AONE&xid=4c7f 77da. 14 Architectural and Transportation Barriers Compliance Board, 82 Fed. Reg. 5,790, 5791 (January 19, 2017) (to be codified at 36 CFR pt. 1193-1194). 15 Ben Caldwell et al., eds., “Web Content Accessibility Guidelines (WCAG) 2.0,” last modified December 11, 2008, http://www.w3.org/TR/2008/REC-WCAG20-20081211/. 16 DeLancey, Laura, “Assessing the Accuracy of Vendor-supplied Accessibility Documentation,” Library Hi Tech 33, no. 1 (2015): 108. 17 Ostergaard, Kirsten, “Accessibility from Scratch: One Library’s Journey to Prioritize the Accessibility of Electronic Information Resources,” Serials Librarian 69, no. 2 (2015): 159, https://doi.org/10.1080/0361526X.2015.1069777. 18 Jaeger, “Section 508.” 19 Mary Case et al., eds., “Report of the ARL Joint Task Force on Services to Patrons with Print Disabilities,” Association of Research Libraries, November 2, 2012, p. 29, http://www.arl.org/storage/documents/publications/print-disabilities-tfreport02nov12.pdf. 20 DeLancey and Ostergaard, “Accessibility for Electronic Resources,” 180. http://link.galegroup.com/apps/doc/A207644357/AONE?u=9211haea&sid=AONE&xid=4c7f77da http://link.galegroup.com/apps/doc/A207644357/AONE?u=9211haea&sid=AONE&xid=4c7f77da http://www.w3.org/TR/2008/REC-WCAG20-20081211/ https://doi.org/10.1080/0361526X.2015.1069777 http://www.arl.org/storage/documents/publications/print-disabilities-tfreport02nov12.pdf ENHANCING VISIBILITY OF VENDOR ACCESSIBILITY DOCUMENTATION | WILLIS AND O’REILLY 27 https://doi.org/10.6017/ital.v37i3.10240 21 Jennifer Tatomir and Joan C. Durrance, “Overcoming the Information Gap: Measuring the Accessibility of Library Databases to Adaptive Technology Users,” Library Hi Tech 28, no. 4 (2010): 581, https://doi.org/10.1108/07378831011096240. 22 Tatomir and Durrance, “Overcoming the Information Gap,” 584. 23 DeLancey, “Assessing the Accuracy,” 104–5. 24 Christina Mune and Ann Agee, “Are E-books for Everyone? An Evaluation of Academic E-book Platforms’ Accessibility Features,” Journal of Electronic Resources Librarianship 28, no. 3 (2016): 172–75, https://doi.org/10.1080/1941126X.2016.1200927. 25 Mune and Agee, “Are E-books for Everyone?,” 175. 26 Joanne Oud, “Accessibility of Vendor-Created Database Tutorials for People with Disabilities,” Information Technology and Libraries 35, no. 4 (2016): 12, https://doi.org/10.6017/ital.v35i4.9469. 27 Mark Hachman, “Tested: How Flash Destroys Your Browser’s Performance,” PC World, August 7, 2015, https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroys- your-browsers-performance.html. 28 Oud, “Accessibility of Vendor-Created Database Tutorials,” 12. 29 DeLancey, “Assessing the Accuracy,” 106–7. 30 DeLancey, “Assessing the Accuracy,” 105. 31 Kirsten Ostergaard, “Accessibility from Scratch: One Library’s Journey to Prioritize the Accessibility of Electronic Information Resources,” Serials Librarian 69, no. 2 (2015): 162–65, https://doi.org/10.1080/0361526X.2015.1069777. 32 Ostergaard, “Accessibility from Scratch.” 164 33 DeLancey, “Assessing the Accuracy,” 111. 34 Cynthia Guyer and Michelle Uzeta, “Assistive Technology Obligations for Postsecondary Education Institutions,” Journal of Access Services 6, no. 1/2 (2009): 29; Oud, “Accessibility of Vendor-Created Database Tutorials,” 7. 35 Mune and Agee, “Are E-books for Everyone?,” 173. 36 Colker, “The Americans with Disabilities Act,” 792–93. 37 DeLancey, “Assessing the Accuracy,” 107. 38 Colker, “The Americans with Disabilities Act,” 814; Mune and Agee, “Are E-books for Everyone?,” 182. https://doi.org/10.1108/07378831011096240 https://doi.org/10.1080/1941126X.2016.1200927 https://doi.org/10.6017/ital.v35i4.9469 https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroys-your-browsers-performance.html https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroys-your-browsers-performance.html https://doi.org/10.1080/0361526X.2015.1069777 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMEBER 2018 28 39 Adriana Cardenes to Dr. James Rosser, April 7, 1997, private collection, quoted in Colker, “The Americans with Disabilities Act is Outdated,” 815. 40 Sushil K. Oswal, “Access to Digital Library Databases in Higher Education: Design Problems and Infrastructural Gaps,” Work 48, no. 3 (2014): 316. 41 Ostergaard, “Accessibility from Scratch,” 164. 42 Mune and Agee, “Are E-books for Everyone?,” 175. 43 DeLancey, “Assessing the Accuracy,” 108; Mune and Agee, “Are E-books for Everyone?,” 181. 44 Colker, “The Americans with Disabilities Act,” 817. ABSTRACT Introduction Literature Review Legislation and Existing Guidelines Existing Literature and Studies Research Methods Process and Findings Further Study Appendix A: Accessibility Documentation Request Email Template Appendix B: VPAT and Other Accessibility Documentation URLs Used in the Databases A–Z List. 10308 ---- Library Space Information Model Based on GIS — A Case Study of Shanghai Jiao Tong University Yaqi Shen INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 99 Yaqi Shen (yqshen@sjtu.edu.cn) is a librarian at Shanghai Jiao Tong University. ABSTRACT In this paper, a library-space information model (LSIM) based on a geographical information system (GIS) was built to visually show the bookshelf location of each book through the display interface of various terminals. Taking Shanghai Jiao Tong University library as an example, both spatial information and attribute information were integrated into the model. In the spatial information, the reading room layout, bookshelves, reference desks, and so on were constructed with different attributes. The bookshelf layer was the key attribute of the bookshelves, and each book was linked to one bookshelf layer. Through the field of bookshelf layer, the book in the query system can be connected with the bookshelf-layer information of the LSIM. With the help of this model, readers can search books visually in the query system and find the books’ positions accurately. It can also be used in the inquiry of special-collection resources. Additionally, librarians can use this model to analyze books’ circulation status, and books with similar subjects that are frequently circulated can be recommended to readers. The library’s permanent assets (chairs, tables, etc.) could be managed visually in the model. This paper used GIS as a tool to solve the problem of accurate positioning, simultaneously providing better services for readers and realizing visual management of books for librarians. INTRODUCTION Geographical information systems (GIS) are powerful tools that can edit, store, analyze, display, and manage geographical data. Early in 1992, several Association of Research Libraries (ARL) institutions, including the University of Georgia, Harvard University, North Carolina State University, and Southern Illinois University, launched the GIS Literacy Project and carried out an extensive survey about the possible applications of GIS in libraries.1 Since then, studies about the application of GIS in library research have attracted more and more attention.2 GIS is effective for library-planning efforts, such as investigating library-service areas, modeling the implications of the opening and closing of library services, informing initial location decisions, and so on.3 The University of Idaho Library adopted GIS to link variables such as age, race, income, and education from the 2000 US Census with the service-area maps of two proposed branch libraries. Based on the thematic maps created, the demographic information about potential library users can be displayed. Most importantly, the maps were also helpful for improving the library-service planning. Koontz et al. from Florida State University investigated the reasons for public-library closure by using GIS. The authors presented a methodology using GIS to describe libraries’ geographic market to illustrate the effects of facility location, relocation, and permanent closure on potential users. Sin used GIS with inequality measures and multiple regressions to analyze statistics from the public-libraries survey and the census-tract data. Then the nationwide LIBRARY SPACE INFORMATION MODEL BASED ON GIS | SHEN 100 https://doi.org/10.6017/ital.v37i3.10308 multivariate study of the neighborhood-level variations was investigated, and the public libraries’ funding and service landscapes were mapped. GIS can also provide strong support for the library accessibility.4 In South Wales, United Kingdom, a case study about a preliminary analysis of spatial variations in accessibility of library services was carried out based on a GIS model. Park further measured the public-library accessibility accurately and provided realistic analysis by using GIS, including descriptive and statistical analyses and a road network–based distance measure. In another paper, Park went a step further to measure readers’ travel time and distance while they are using the library. In addition to using GIS for library planning and accessibility, it can be also applied to managing the collections, including the physical documents and digital databases of an academic library.5 Solar and Radovan from the National and University Library of Slovenia explored the possibility of creating a virtual collection of diverse materials like maps and pictorial documents using GIS. They connected spatial data with other pictorial elements, including views and portrait images with hyperlinks.6 Coyle from Rochester Public Library studied the implementation of GIS in the library collection. He believed that libraries that implemented GIS early on would have an intellectual advantage over those coming on board later.7 Sedighi conducted research about GIS as a decision- support system in analyzing geospatial data in the databases of an academic library. By using the analysis functions of the system, a range of features could be indicated; for example, the spatial relationships of data based on the educational course can be analyzed.8 Boda used a 3D virtual- library model to represent the most prominent and celebrated collection of classical antiquity in the Alexandria library.9 Beyond the applications mentioned above, some libraries have used GIS techniques to analyze reader behaviors.10 Xia developed GIS into an analytical tool for examining the relationships between the height of the shelf and the frequency of book use, revealing that readers tended to pull books off shelves that are easily reachable by human eyes and hands. Mandel used GIS to map the most popular routes that readers took when entering the library. Based on the seating sweeps method, Mandel adopted maps to depict use of tables and computers. The research results of both Xia and Mandel can provide the information of readers’ behavior whereby the books’ positions, and accordingly the entry routes and facilities’ evaluation can be adjusted strategically. Though lots of work has been done about the application of GIS to the library, there are few reports about visually showing the exact position of each book through the library-catalog display interface, which is of great importance both for the readers and the librarians. Xia located library items with GIS and pointed out that updating the starting and ending call numbers for each shelf could be the most tedious work.11 Specifically, GIS cannot tell if the book is not in its correct location or is being used by somebody else. Xia advised combining GIS with radio frequency identification (RFID), both of which have the capability of tracing the location of each book. StackMap, a library-mapping tool providing a collection-mapping product for librarians, was being used at the Hampton Library.12 The Shanghai Jiao Tong University Library built an interface that would use GIS to identify the specific location of each book in the catalog. A GIS model that includes spatial and attribute information was constructed. The connection of GIS, RFID, and OPAC was discussed in detail. Additionally, the relationship between the bookshelves and patrons’ behavior was studied deeply. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMBER 2018 101 It is hoped that this GIS model will bring convenient services for readers and efficient management for librarians. METHODOLOGY Background In 1984, Shanghai Jiao Tong University Circulation System was built based on barcode-reader technology. The first automated library-management system (LMS), MINISIS and IMAGE Library Integrated System, was implemented in 1988. In 1993, the second LMS, the UNIFY online multiuser system, was implemented. In 1994, an Open Public Access Catalogue (OPAC) system was built based on the UNILS, allowing readers to query the library bibliographic record through the computer. In 1998, the third automated LMS, a client/server–based tool, was built based on the Horizon LMS. In 2008, we launched the Aleph integrated library system (ILS). In the same year, Primo, a resource discovery and access system, was introduced. In 2009, the Our Explore interface was built based on the Primo system, providing the services of resource retrieval and access.13 RFID technology was introduced in 2014, and now readers can borrow or return books through self-service machines. Users can find a book via the OPAC or Our Explore system in the Shanghai Jiao Tong University Library homepage (http://www.lib.sjtu.edu.cn/index.php?m=content&c=index&lang=en), a screen shot of which is shown in figure 1. Book information can be found through the systems, but the exact position of the books cannot be exhibited in the system. At the library reference desk, the question readers ask most frequently is where they can find a certain book. The Chinese Library Classification (CLC) system is used to organize the collections in the Shanghai Jiao Tong University. The librarians are very familiar with the classification. However, it is hard for the inexperienced users to understand, even if they have been trained. Although static maps can guide patrons to find the books, patrons sometimes still have difficulties finding the books. If the readers can get the exact bookshelf location for a book through the OPAC or Our Explore system, the users’ experience could be improved significantly, and much of readers’ time for finding the books could be saved. Therefore, it is necessary to introduce GIS to the library with the aim of visually showing the position of each book. Furthermore, library managers need to plan the budget at the end of every year. The arrangement of different subjects should be considered in the planning. Although the usage of the collections by the ILS provides reference for the planning, a library-space information model (LSIM) would bring a new insight. Software There are many kinds of GIS software in this research field, including commercial products such as ArcGIS, MapInfo, and MapGIS as well as free and open-source software (FOSS) solutions. Taking FOSS and ArcGIS for example, FOSS can provide a broader context of the open-source software movement and developments in GIS.14 No single FOSS package can match all the functionality that ArcGIS has for creating thematic maps; therefore, the function of spatial analysis and data processing of ArcGIS is more powerful. The software used in this study is ArcGIS 10.3 trial version. http://www.lib.sjtu.edu.cn/index.php?m=content&c=index&lang=en LIBRARY SPACE INFORMATION MODEL BASED ON GIS | SHEN 102 https://doi.org/10.6017/ital.v37i3.10308 Figure 1. OPAC and Our Explore in the Shanghai Jiao Tong University Library homepage. Methods There are two modules in the LSIM, including spatial information and attributes information, as shown in figure 2. Spatial information, including the building position, the reading-room layout, bookshelf information, and so on, is transferred to shapefile style. Remote-sensing information is used to set the geographic location of the library. These elements are constructed with different attributes, and 2D-attribute and 3D-multipatch data are stored in the geodatabase. ArcMap and ArcScene are used to generate the 2D and 3D maps and analyze the readers’ behavior. We connect the spatial information with data from the OPAC, Our Explore, and RFID. The query fields (which we call “general information”) in the OPAC are title, author, keyword, call number, ISSN, ISBN, system number, barcode, collection location, and publisher. In the Our Explore system, readers can not only search the general information, but also refine the search results by specific fields, such as topic, author, collection location, published date, and CLC. The functions of book reserving and renewing are also supported by these two systems. RFID is introduced to the Shanghai Jiao Tong library to allow self-service, and the fields include collection location, subject, ISSN, ISBN, barcode, and so on. Barcode is the common field in all three systems and is used to connect them. In the RFID system, the bookshelf is the unique identification of each shelf in the bookshelves. In the Shanghai Jiao Tong University Library, the first-book location method is used to manage books in the RFID system. The first book on each bookshelf is recorded as a different bookshelf location, and the books on one bookshelf are assigned to the same bookshelf location. The books are ordered and arranged according to the call number. A book’s current status can be obtained in the INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMBER 2018 103 RFID system by shelf inventory. The books that are borrowed by patrons or not on the right shelf would be recorded in the RFID system. The key attribute information in the LSIM is the bookshelf layer, which is used to describe the book’s position. The field of the bookshelf layer is connected with the RFID data. Taking the bookshelf layer of RFID as the attribute field, the position of a book can be located by the bookshelf layer in the LSIM. Compared to Xia’s research, it is easier to get the bookshelf-layer information based on the RFID in the LSIM.17 Figure 2. Research flowchart. The connection of the OPAC, RFID, and LSIM is shown in figure 3. When the reader locates a book in the OPAC or Our Explore, the barcode will be shown in the system. The bookshelf layer in the RFID system can be retrieved through the barcode immediately. The map of the reading room has been embedded in the OPAC. Furthermore, the coordinates of the book (x, y, height) can be shown through the bookshelf layer. The index of each bookshelf coordination is created in the OPAC, RFID system, and LSIM. The field of the map presentation is built in the OPAC, and the search interface is supported by the ArcMap and ArcScene. The URL link is the content of the field, and its content is varied with the different bookshelves. In short, when the reader searches one book, the related bookshelf coordination is highlighted in the map. Through the bookshelf layer field, the book information in the query system can be connected with that of LSIM. Faculty and students can search books in the query system visually. As shown in figure 2, spatial information and attribute information are connected in the LSIM. Furthermore, a LSIM based on GIS is built to provide better services for readers and enhance librarians’ visual management. LIBRARY SPACE INFORMATION MODEL BASED ON GIS | SHEN 104 https://doi.org/10.6017/ital.v37i3.10308 Figure 3. The connection of the OPAC, RFID system, and LSIM. Figure 4. Finding a book in the Our Explore system. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMBER 2018 105 Figure 5a. The visual position of the book with the call number R318-53/3 (2D). Figure 5b. The visual position of the book with the call number R318-53/3 (3D). LIBRARY SPACE INFORMATION MODEL BASED ON GIS | SHEN 106 https://doi.org/10.6017/ital.v37i3.10308 DISCUSSION Providing Services for Readers by LSIM Visual Query in the Reading Room When a book about biological medicine is required, it can be searched by using the keyword “biological medicine” in Our Explore. Then, as shown in figure 4, a book titled Amalgamation within Evolution can be found with the CLC call number R318-53/3. Readers can find the book with the call number in the corresponding reading room. However, if the LSIM is applied, the search results include not only the text information about the book’s location, but also a visual map. Firstly, the barcode of the book (32832872) is identified and passed to the bookshelf layer. The bookshelf layer (A4R042C04) will be found in the LSIM. Then the book’s spatial position can be shown on a visual map. Figures 5a and 5b show the 2D and 3D visual position of the book with the call number R318-53/3, and these two results can be switched in the system. The red arrow is the book’s position. Based on the visual position, readers can find the book more conveniently. The reading rooms in Shanghai Jiao Tong University Library are organized by subject. In each reading room, the books with related categories are distributed together. Figures 5a and 5b show the layout of one reading room. The books with the large CLC classes, i.e., O, P, Q, R, and S, were studied as an example in the reading room in this paper. The red triangles represent chairs and the light green rectangles represents desks. Shelves are alphabetically labeled. The reference desk, office area, group study room, storehouse, inquiry machines, printers, and stairs are also shown. Special Collections in Different Reading Rooms In the Shanghai Jiao Tong University Library, there are many special collections, such as contract documents, Tsung-Dao Lee’s manuscripts, alumni theses, important findings of research teams, and so on. Because of their rarity, these special collections do not circulate and can only be read in the reading rooms. Furthermore, these collections are located in different branch libraries. The geographical information of these resources can be input into the model. Scholars can use LSIM to achieve the exact positions of these resources, go directly to the related area, and quickly find these special items. Library Analysis and Management Book-Borrowing Situation Analysis Using GIS, it is also possible to show how often books circulate based on their physical location. As shown in figure 6, each rectangle represents a shelf in the reading room. The books with the same topic are placed on the same shelf. The number labeled on the shelf represents the average borrowing frequency of the books on this shelf. Different colors mean different frequency, with scale of five to one hundred. The CLC classes O, P, and Q appearing on the right of the shelves represent mathematical sciences and chemistry, astronomy and geosciences, and bioscience, respectively. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMBER 2018 107 Figure 6. Average borrowing frequency of the books on each shelf in one reading room. Based on analysis of the relationship between borrowing frequency and subject category, the hot spots of the professional fields can be found and shown. In turn, books related to the hot spots can be recommended to readers. Taking class O as an example, the shelf position of the highest borrowing frequency (100) is in row 9, column 2. According to the query system, the theme of the books on this shelf is high polymer chemistry. The books with high borrowing frequency can be highlighted both on the bookshelf and in the query system. If the higher-borrowing-frequency books on the remote shelves meet school discipline development policy, the purchases of these books will be increased. Books related to the subjects with the higher borrowing frequency on the taller or lower shelves will also be considered, and vice versa. Permanent-Assets Management Permanent assets such as chairs, desks, shelves, inquiry machines, printers, etc., can be managed in this model. Information about permanent assets (such as their status, spatial position, etc.) was input in the model, as is shown in figures 5a and 5b. Librarians can find the visual positions of permanent assets at any time, and readers can conveniently find the inquiry machines or printers to search books and print documents. LIBRARY SPACE INFORMATION MODEL BASED ON GIS | SHEN 108 https://doi.org/10.6017/ital.v37i3.10308 FUTURE DIRECTIONS The LSIM is only tested in one reading room and is still experimental. This model will be expanded to the whole library, providing visual information of library books and materials. In the process of using this model, GIS potentiality in the library will be exploited to provide better services for readers and managers. CONCLUSION Based on readers’ need of the book position in the library, the LSIM is built to visually show the exact bookshelf layer of the book. Spatial and attribute information is combined into the model. Based on the model, readers can search for books and find books’ positions. Meanwhile, many special collections located in the different branches can be easily found in the model. The GIS model not only brings convenience to readers, but also supports the library’s analysis and management. Librarians can analyze books’ circulation history based on the relationship between the books’ borrowing frequency and subject categories. Books with higher borrowing frequency and ones related them can be recommended to the readers. Then the number of the purchased books with the higher borrowing frequency in the remote, taller, or lower places will be increased based on the above analysis. Permanent assets can also be managed, and librarians can conveniently find the status and spatial position of the inquiry machines, printers, and so on. In short, the application of GIS in the library will bring a visual insight into the library, providing a better reader experience and better library management. ACKNOWLEDGEMENTS I thank Guo Jing, Chen Jiayi and Huang Qinling, Shanghai Jiao Tong University Library, for their advice on the structure of this article and the grammar of the written English. I also thank Liu Min and Peng Xia, East China Normal University, for their help in the model building. Research was funded by the “Fundamental Research Funds for the Central Universities" (grant 17JCYA13), Shanghai Jiao Tong University. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTMBER 2018 109 ENDNOTES 1 D. Kevin Davie, James Fox, and Barbara Preece, The ARL Geographic Information Systems Literacy Project. SPEC Kit 238 and SPEC Flyer 238 (Washington, DC: Association of Research Libraries, 1999). 2 B. W. Bishop and L. H. Mandel, “Utilizing Geographic Information Systems (GIS) in Library Research,” Library Hi Tech 4, no. 4 (2010): 536–47. 3 Karen Hertel and Nancy Sprague, “GIS and Census Data: Tools for Library Planning,” Library Hi Tech 25, no. 2 (2007): 246–59, https://doi.org/10.1108/07378830710755009; Christie M. Koontz, Dean K. Jue, and Bradley Wade Bishop, “Public Library Facility Closure: An Investigation of Reasons for Closure and Effects on Geographic Market Areas,” Library Information Science Research 31, no. 2 (2009): 84–91, https://doi.org/10.1016/j.lisr.2008.12.002; Sei-Ching Joanna Sin, “Neighborhood Disparities in Access to Information Resources: Measuring and Mappin g U.S. Public Libraries’ Funding and Service Landscapes,” Library Information Science Research 33, no. 1 (2011): 41–53, https://10.1016/j.lisr.2010.06.002. 4 Gary Higgs, Mitch Langford. and Richard Fry, “Investigating Variations in the Provision of Digital Services in Public Libraries Using Network-Based GIS Models,” Library and Information Science Research 35, no. 1 (2013): 24–32, https://doi.org/10.1016/j.lisr.2012.09.002; Sung Jae Park, “Measuring Public Library Accessibility: A Case Study Using GIS,” Library and Information Science Research 34, no. 1 (2012): 13–21, https://doi.org/10.1016/j.lisr.2011.07.007; Sung Jae Park, “Measuring Travel Time and Distance in Library Use,” Library Hi Tech 30, no. 1 (2012): 151–69, https://doi.org/10.1108/07378831211213274. 5 Wang Xuemei et al., “Applications and Researches of Geographic Information System Technologies in Bibliometrics,” Earth Science Informatics 7, no. 3 (2014): 147–52, https://doi.org/10.1007/s12145-013-0132-4. 6 Renata Solar and Dalibor Radovan, “Use of GIS for Presentation of the Map and Pictorial Collection of the National and University Library of Slovenia,” Information Technology and Libraries 24, no. 4 (2005): 196–200, https://doi.org/10.6017/ital.v24i4.3385. 7 Andrew Coyle, “Interior Library GIS,” Library Hi Tech 29, no. 3 (2011): 529–49, https://doi.org/10.1108/07378831111174468. 8 Mehri Sedighi, “Application of Geographic Information System (GIS) in Analyzing Geospatial Information of Academic Library Databases,” Electronic Library 30, no. 3 (2012): 367–76, https://doi.org/10.1108/02640471211241645. 9 István Boda et al., “A 3D Virtual Library Model: Representing Verbal and Multimedia Content in Three Dimensional Space,” Qualitative and Quantitative Methods in Libraries 4, no. 4 (2017): 891–901. 10 Xia Jingfeng, “Using GIS to Measure In-Library Book-Use Behavior,” Information Technology and Libraries 23, no 4 (2004): 184–91, https://doi.org/10.6017/ital.v23i4.9663; Lauren H. Mandel, “Toward an Understanding of Library Patron Wayfinding: Observing Patrons’ Entry Routes in a mailto:https://doi.org/10.1108/07378830710755009 mailto:https://doi.org/10.1016/j.lisr.2008.12.002 mailto:https://10.1016/j.lisr.2010.06.002 https://doi.org/10.1016/j.lisr.2012.09.002 https://doi.org/10.1108/07378831211213274 https://doi.org/10.1108/02640471211241645 https://doi.org/10.6017/ital.v23i4.9663 LIBRARY SPACE INFORMATION MODEL BASED ON GIS | SHEN 110 https://doi.org/10.6017/ital.v37i3.10308 Public Library,” Library and Information Science Research 32, no. 2 (2010): 116–30, https://doi.org/10.1016/j.lisr.2009.12.004; Lauren H. Mandel, “Geographic Information Systems: Tools for Displaying In-Library Use Data,” Information Technology and Libraries 29, no. 1 (2010): 47–52, https://doi.org/10.6017/ital.v29i1.3158. 11 Xia Jingfeng, “Locating Library Items by GIS Technology,” Collection Management 30, no. 1 (2005): 63–72, https://doi.org/10.1300/J105v30n01_07. 12 Matt Enis, “Technology: Capira Adds StackMap,” Library Journal 139, no. 13 (2014): 17. 13 Chen Jin, The History of Shanghai Jiao Tong University Library (Shanghai: Shanghai Jiao Tong University Press, 2013). 14 Francis P. Donnelly, “Evaluating Open Source GIS for Libraries,” Library Hi Tech 28, no. 1, (2010): 131–51, https://doi.org/10.1108/07378831011026742. https://doi.org/10.1016/j.lisr.2009.12.004 https://doi.org/10.6017/ital.v29i1.3158 https://doi.org/10.1300/J105v30n01_07 https://doi.org/10.1108/07378831011026742 ABSTRACT Introduction Methodology Background Software Figure 1. OPAC and Our Explore in the Shanghai Jiao Tong University Library homepage. Methods Figure 4. Finding a book in the Our Explore system. Figure 5a. The visual position of the book with the call number R318-53/3 (2D). Figure 5b. The visual position of the book with the call number R318-53/3 (3D). Discussion Providing Services for Readers by LSIM Visual Query in the Reading Room Special Collections in Different Reading Rooms Library Analysis and Management Book-Borrowing Situation Analysis Figure 6. Average borrowing frequency of the books on each shelf in one reading room. Permanent-Assets Management Future directions Conclusion Acknowledgements Endnotes 10338 ---- Editorial Board Thoughts Halfway Home: User Centered Design and Library Websites Mark Cyzyk INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 4 Mark Cyzyk (mcyzyk@jhu.edu), a member of LITA and the ITAL editorial board, is the Scholarly Communication Architect in The Sheridan Libraries, The Johns Hopkins University, Baltimore, Maryland. Our Library Website has now gone through two major redesigns in the past five or so years. In both cases, a User Centered Design approach was used to plan the site. In contrast to the Single Person Vision and Design by Committee approaches, User Centered Design focuses on the empirical study and eliciting of the needs of users. Great attention is paid to studying them, listening to them, and to exposing their needs as expressed. In both of our cases, the overall design, functionality, and content of the new site was then focused exclusively on the results of such study. If a proposed design element, a bit of functionality, or a chunk of content did not appear as an expressly desired feat ure for our users, it was considered clutter and did not make it onto the site. Both iterations of our Website redesign were strictly governed by this principle. But User Centered Design has blind spots. First, it may well be that what you take to be your comprehensive user base is not as comprehensive as you think. In my library, our primary users are our faculty and student researchers, so great attention was paid to them. This makes sense insofar as we are an academic library within a major research univ ersity. Faculty and student researchers will always be our primary user group. But they are not our comprehensive user group. We have staff, administrators, visitors, members of our Board of Trustees, members of our Friends, outside members of the profession, etc. — and they are all important constituencies in their own ways. Second, unless your sample size of users is large enough to be statistically valid, you are merely playing a game of three blind men and the elephant. Each user individually will be ex pressing his or her own experience and perceived needs based on that experience, and yet none of them, even taken as a group, will be reporting on the whole beast. While personal testimony definitely counts as evidence, it also frequently and insidiously results in blind spots that would otherwise be exposed through having a statistically valid sample of study participants. Third, and perhaps most importantly, User Centered Design discounts the expertise of librarians. Nobody knows a library’s users and patrons as well as librarians. Knowing their users, eliciting their needs, is part of what librarians as one of the “helping professions” do; it is a central tenet of librarianship. There is no substitute for experience and the expertise that follows from it. In the art world, this is connoisseurship. Somehow, the art historian just knows that what is before him is not a genuine Rembrandt. The empirical evidence may ineluctably lead to a different conclusion — yet there remains something missing, something the connoisseur cannot fully elucidate. Similarly, in the medical world the radiologist somehow just knows that the subtle gradations on his screen indicate one type of malady and not another. Interestingly, in the poultry industry there is something called a “chicken sexer.” This is a person who quickly and accurately sorts baby chicks by sex. Training for this vocation mailto:mcyzyk@jhu.edu EDITORIAL BOARD THOUGHTS: HALFWAY HOME | CYZYK 5 https://doi.org/10.6017/ital.v37i1.103813 largely employs what the philosophers call “ostensive definition:” “This one is male; that one is female.” The differences are so small as to be imperceptible. And yet, experienced chicken sexers can accurately sort chicks at an astonishing rate. They just know through experience. Such is the nature of tacit knowledge. In the case of our most recent Website redesign, none of our users expressed any interest whatsoever, for example, in including floor maps as part of the new site. We were assured a demand for floor maps on the site was “not a thing.” So floor maps were initially excluded from the site. This was met with a slow crescendo of grumbling from the librarians, and rightly so. Librarians, and the graduate students at our Information Desk, know through long experience that researchers of varying types find floor maps of the building to be useful. That’s why we’ve handed out paper copies for years. The fact that this need was missed through our focus on User Centered Design points to a blind spot in that process. Valuable experience and the expertise that follows from it should not be dismissed or otherwise diminished through dogmatic adherence to the core principle of User Centered Design. ... And yet, don’t get me wrong: Insofar as it’s the empirical study of select user groups and their expressed concerns and needs, User Centered Design as a design technique and foundational principle is crucially important and useful. It gets us halfway home. 10339 ---- Information Technology and Libraries at 50: The 1960s in Review Mark Cyzyk INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 6 Mark Cyzyk (mcyzyk@jhu.edu), a member of LITA and the ITAL editorial board, is the Scholarly Communication Architect in The Sheridan Libraries, The Johns Hopkins University, Baltimore, Maryland. In the quarter century since graduating from library school, I have now and then run into someone who had what I consider to be a highly inaccurate and unintuitive view of librarians and information technology. Seemingly, in their view, librarians are at worst Luddites and at best technological neophytes. Not so! In my view, librarians have always been at worst technological power users and at best true IT innovators. One has only to scan the first issues of ITAL, or The Journal of Library Automation as it was then called, to put such debate to rest. March 1968 saw the first issue of the first volume of The Journal of Library Automation published. The first article of that inaugural issue sets the scene: “Computer Based Acquisitions System at Texas A&I University” by Ned C. Morris. Here we find librarians not only employing computing technology to streamline library operations (using an IBM 1620 with 40K RAM), but as the article points out, this new system for computerizing acquisitions was an adjunct to the systems they already had in place at Texas A&I for circulation and serials management. This first article in the first issue of the first volume indicates that we’ve dipped a toe into a stream that was already swiftly flowing. The other bookend of that first issue, “The Development and Administration of Automated Systems in Academic Libraries” by Harvard’s Richard de Gennaro, goes meta and takes a comprehensive look at how automated library systems were already being created and the various system development and implementation rubrics under which such development occurred. Much in this article should resonate with current readers of ITAL. I knew immediately that this article was going to be a good read when I encountered, in the very first paragraph: Development, administration, and operations are all bound up together and are in most cases carried on by the same staff. This situation will change in time, but it seems safe to assume that automated library systems will continue to be characterized by instability and change for the next several years. I’d say that was a safe assumption. The second and final volume of the 1960’s contains gems as well. The entirety of Volume 2 Issue 2 that year was devoted to “USA Standard for a Format for Bibliographic Information Interchange on Magnetic Tape” A.K.A. MARC II. Is it possible for something to be dry, yet fascinating? Some titles of this second volume point to the wide range of technological projects underway in the library world in 1969: mailto:mcyzyk@jhu.edu THE 1960S IN REVIEW | CYZYK 7 https://doi.org/10.6017/ital.v37i1.10339 • “An Automated Music Programmer (MUSPROG)” by David F. Harrison and Randolph J. Herber • “A Fast Algorithm for Automatic Classification” by R. T. Dattola • “Simon Fraser University Computer Produced Map Catalogue” by Brian Phillips and Gary Rogers • “Management Planning for Library Systems Development” by Fred L. Bellomy • “Performance of Ruecking’s Word-compression Method When Applied to Machine Retrieval from a Library Catalog” by Ben-Ami Lipetz, Peter Stangl, and Kathryn F. Taylor And this is only in the first two volumes. As this current 2018 volume of ITAL proceeds, we’ll be surveying the morphing information technology and libraries landscape through ITAL articles of the seventies, eighties, and nineties. I think you will see what I mean when I say that librarians have always been at worst technological power users, at best true IT innovators. 10357 ---- PAL: Toward a Recommendation System for Manuscripts Scott Ziegler and Richard Shrake INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 84 Scott Ziegler (sziegler1@lsu.edu) is Head of Digital Programs and Services, Louisiana State University Libraries. Prior to this position, Ziegler was the Head of Digital Scholarship and Technology, American Philosophical Society. Richard Shrake (shraker13@gmail.com) is a Library Technology Consultant based in Burlington, Vermont. ABSTRACT Book-recommendation systems are increasingly common, from Amazon to public library interfaces. However, for archives and special collections, such automated assistance has been rare. This is partly due to the complexity of descriptions (finding aids describing whole collections) and partly due to the complexity of the collections themselves (what is this collection about and how is it related to another collection?). The American Philosophical Society Library is using circulation data collected through the collection- management software package, Aeon, to automate recommendations. In our system, which we’re calling PAL (People Also Liked), recommendations are offered in two ways: based on interests (“You’re interested in X, other people interested in X looked at these collections”) and on specific requests (“You’ve looked at Y, other people who looked at Y also looked that these collections”). This article will discuss the development of PAL and plans for the system. We will also discuss ongoing concerns and issues, how patron privacy is protected, and the possibility of generalizing beyond any specific software solution. INTRODUCTION The American Philosophical Society Library (APS) is an independent research library in Philadelphia. Founded in 1743, the library houses a wide variety of material in early American history, history of science, and Native American linguistics. The majority of the library’s holdings are manuscripts, with a large amount of audio material, maps, and graphics, nearly all of which are described in finding aids created using Encoded Archival Description (EAD) standards. Like similar institutions, the APS has long struggled to find new ways to help library users discover material relevant to their research. In addition to traditional in-person, email, and phone reference, the APS has spent years creating search and browse interfaces, subject guides , and web exhibitions to promote the collections.1 As part of these ongoing efforts to connect users with collections, the APS is working on an automated recommendation system to reuse circulation data gathered through Aeon. Developed by Atlas Systems, Aeon is a “request and workflow management software specifically designed for special collections libraries and archives,” and it enables the APS to gather statistics on both the use of our manuscript collections and on aspects of the library’s users.2 The automated recommendation system, which we’re calling PAL, for “People Also Liked,” is an ongoing effort. This article presents a snapshot of current work. PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 85 https://doi.org/10.6017/ital.v37i3.10357 LITERATURE REVIEW The benefits of recommendations in library OPACs has long been recognized. Writing in 2008 about the library recommendation system BibTip, itself started in the early 2000s, Mönnich and Spiering observe that “library services are well suited for the adoption of recommendation systems, especially services that support the user in search of literature in the catalog.” By 2011 OCLC Research and the Information School at the University of Sheffield began exploring a recommendation system for OCLC’s Worldcat.3 Recommendations for library OPACs commonly fall into one of two categories, content-based or collaborative filtering. Content-based recommendations pair specific users to library items based on the metadata of the item and what is known about the user. For example, if a user indicates in some way that they enjoy mystery novels, items identified as mystery novels might be recommended to them. Collaborative filtering combines users in some way and creates recommendations for one user based on the preferences of another user. There can be a dark side to recommendations. The algorithms that determine which users are similar and thus which recommendations to make are not often understood. Writing about algorithms in library discovery systems broadly, Reidsma points out that “in librarianship over the past few decades, the profession has had to grapple with the perception that computers are better at finding relevant information then people.”4 The algorithms that are doing the finding, however, often carry the same hidden biases that their programmers have. Reidsma encourages a broader understanding of algorithms in general and deeper understanding of recommendation algorithms in particular. The history of recommendation systems in libraries has informed the ongoing development of PAL. We use both the content-based and the collaborative filtering approach to offering recommendations to users. For the purposes of communicating them to nontechnical patrons, we refer to them as “interest-based” and “request-based,” respectively. Furthermore, we are cautious about the role algorithms play in determining which recommendations users see. Our help text reinforces the continued importance of working directly with in-house experts, and we promote PAL as one tool among the many offered by the library. We are not aware of any literature on the development of recommendation tools for archives or special-collections libraries. The nature of the material held in these institutions presents special challenges. For example, unlike book collections, many manuscript and archival collections are described in aggregate: one description might refer to many letters. These issues are discussed in detail below. PUTTING DATA TO USE: RECOMMENDATIONS BASED ON INTERESTS AND REQUESTS The use of Aeon allows the APS to gather and store data, including both data that users supply through the registration form and data concerning which collections are requested. PAL use both types of data to create recommendations. Interest-Based Recommendations The first type of recommendation uses self-identified research interest data that researchers supply when creating an Aeon account. When registering, a user has the option to select from a list of sixty-four topics grouped into seven broad categories (figure 1). The APS selected these INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 86 interests based on suggestions from researchers as well as categories common in the field of academic history. Upon signing in, a registered user sees a list of links (figure 2); each link leads to a full-page view of collection recommendations (figure 3). These recommendations follow the model, “You’re interested in X, other people interested in X looked at these collections.” Request-Based Recommendations Using the circulation data that Aeon collects, we are able to automate recommendations in PAL based on request information. Upon clicking a request link in a finding aid, the user is presented with a list of recommendations on the sidebar in Aeon (figure 4). Each link opens the finding aid for the collection listed. Figure 1. List of interests a user sees when registering for the first time. A user can also revisit this list to modify their choices at any point by following links through the Aeon interface. The selected interests generate recommendations. PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 87 https://doi.org/10.6017/ital.v37i3.10357 Figure 2. List of links appearing on the right-hand sidebar, based on interests that users select. Figure 3. Recommended collections, based on interest, showing collection name (with a link to finding aid), call number, number of requests, and number of users who have requested from the collections. The user sees this list after clicking on option from sidebar, as shown in figure 2. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 88 Figure 4. Request-based recommendation links appearing on the right-hand sidebar after a patron requests an item from a finding aid. THE PROCESS Currently, the data that drives these two functions is obtained from a semidynamic process via daily, automated SQL query exports. Usernames are employed to tie together requests and interests but are subsequently purged from the data before the results are presented to users and staff. This section explains the process in detail and presents code snippets where available. All code is available on GitHub.5 Interest-Based Recommendations For interest-based recommendations, we employ two queries. The first query pulls every collection requested by a user for each topic for which that user has expressed an interest. The second aggregates the data for every user in the system. The following queries get data from the Microsoft SQL database, via a Microsoft Access intermediary, that Aeon uses to store data. Because of the number of interest options in the registration form, and the character length of some of them (“Early America - Colonial History,” for example) we encode the interests in shortened form. “Early America - Colonial History” becomes “EA-ColHist” so as not to run into character limits in the database. This section explores each of these queries in more detail and provides example code. PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 89 https://doi.org/10.6017/ital.v37i3.10357 The first query gathers research topics for all users who are not staff (user status is ‘Researcher’), and where at least one research topic is chosen (‘ResearchTopics’ is not null). The data is exported into an XML file that we call “aeonMssReg.” SELECT AeonData.dbo.Users.ResearchTopics, AeonData.dbo.Transactions.CallNumber, AeonData.dbo.Transactions.Location FROM AeonData.dbo.Transactions INNER JOIN AeonData.dbo.Users ON (AeonData.dbo.Users.UserName = AeonData.dbo.Transactions.Username) AND (AeonData.dbo.Transactions.Username = AeonData.dbo.Users.UserName) WHERE (((AeonData.dbo.Users.ResearchTopics) Is Not Null) AND ((AeonData.dbo.Transactions.CallNumber) Like 'mss%' Or (AeonData.dbo.Transactions.CallNumber) Like 'aps.%') AND ((AeonData.dbo.Users.Status)='Researcher')) FOR XML RAW ('aeonMssReq'), ROOT ('dataroot'), ELEMENTS; The second query combines all data for all users and exports an XML file ‘aeonMssUsers.’ SELECT DISTINCT AeonData.dbo.Users.ResearchTopics, AeonData.dbo.Transactions.CallNumber, AeonData.dbo.Transactions.Location, AeonData.dbo.Transactions.Username FROM AeonData.dbo.Transactions INNER JOIN AeonData.dbo.Users ON (AeonData.dbo.Users.UserName = AeonData.dbo.Transactions.Username) AND (AeonData.dbo.Transactions.Username = AeonData.dbo.Users.UserName) WHERE (((AeonData.dbo.Users.ResearchTopics) Is Not Null) AND ((AeonData.dbo.Transactions.CallNumber) Like 'mss%' Or (AeonData.dbo.Transactions.CallNumber) Like 'aps.%') AND ((AeonData.dbo.Users.Status)='Researcher')) FOR XML RAW ('aeonMssUsers'), ROOT ('dataroot'), ELEMENTS; Each query produces an XML file. These files are parsed using XSL stylesheets into subsets for each research interest. The stylesheets also generate counts of users requesting a collection and number of total requests for a collection by users sharing an interest. An example is the following stylesheet for the topic “Early America - Colonial History,” which pulls from the XML file “aeonMssReg”:

INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 90 This process is repeated for each interest. The data from the query that we modify with XSLT is presented as HTML that we insert into Aeon templates. This HTML includes the collection name (linked to finding aid), call number, number of requests, and number of users in a table. See figure 3 for how this appears to the user. The following shows how XSL is wrapped in HTML.

The collections most frequently requested from researchers who expressed an interest in are listed below with links to each collection's finding aid and the number of times each collection has been requested.

Collection	Call Number	# of Requests	# of Users

To ensure a user only sees the links that match the interests they have selected, we use JavaScript to determine the expressed interests of the current user and display the corresponding links to the HTML pages in a sidebar. This approach works well, but we must account for two quirks. The first is that many interests in the database do not conform to the current list of options because many users predate our current registration form and wrote in free-form interests. Secondly, Aeon stores the research information as an array rather than in a separate table, so we must account for the fact that the Aeon database contains an array of values that includes both controlled and uncontrolled vocabulary. First, we set the array as a variable so we can look for a value that matches our controlled vocabulary and separate the array into individual values for manipulation: // Use var message to check for presence of controlled list of topics var message = "<#USER field='ResearchTopics'>"; // Use var values to separate topics that are collected in one string var values = "<#USER field='ResearchTopics'>".split(","); PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 91 https://doi.org/10.6017/ital.v37i3.10357 We also create variables to generate the HTML entries and links out when we have extracted our research topics: var open = "

Because You Are Interested In "; var close = "

" Next we set a conditional to determine if one of our controlled vocabulary terms appears in the array: //Determine if user has an interest topic from the controlled list if ((message.indexOf("EA-ColHis") > -1) || (message.indexOf("EA-AmRev") > -1) || (message.indexOf("EA-EarlyNat") > -1) || (message.indexOf("EA-Antebellum") > -1) || … If the array contains a value from our controlled vocabulary, we generate a link and translate our internal code back into a human-friendly research topic (“EA-ColHist,” for example, becomes once again “Early American - Colonial History”): for (var i = 0; i < values.length; ++i) { if (values[i]=="EA-ColHis"){ document.getElementById("topic").innerHTML += (open + values[i] + middle + "Early America-Colonial History" + close);} else if (values[i]=="EA-AmRev"){ document.getElementById("topic").innerHTML += (open + values[i] + middle + "Early America- American Revolution" + close);} else if (values[i]=="EA-EarlyNat"){ document.getElementById("topic").innerHTML += (open + values[i] + middle + "Early America- Early National" + close);} else if (values[i]=="EA-Antebellum"){ document.getElementById("topic").innerHTML += (open + values[i] + middle + "Early America- Antebellum" + close);} … See figure 2 for how this appears to the user. Users only see the links that correspond to their stated interest. If the array does not contain a value from our controlled vocabulary, we display the research-topic interests associated with the user account, note that we don’t currently have a recommendation, and provide a link to update the research topics for the account. Else {document.getElementById("notopic").innerHTML = "

You expressed interest in:

<#USER field='ResearchTopics'>

We are unable to provide a specific collection recommendation for you. Please visit our User Profile page to select from our list of research topics.

" } Request-Based Recommendations In addition to interest-based recommendations, PAL supplies recommendations based on past requests a user has made. This section details how these recommendations are generated. Aeon allows users to request materials directly from a finding aid (see figure 6). To generate our request-based recommendations we employ a query depicting the call number and user of every request in the system and export the results to an XML file called “aeonLikeCollections.” INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2018 92 SELECT subquery.CallNumber, subquery.Username, IIf(Right(subquery.trimLocation,1)='.',Left(subquery.trimLocation,Len(subquery.trimLocation)- 1),subquery.trimLocation) AS finallocation FROM ( SELECT DISTINCT AeonData.dbo.Transactions.CallNumber, AeonData.dbo.Transactions.Username, IIf(CHARINDEX(':',[Location])>0,Left([Location],CHARINDEX(':',[Location])-1),[Location]) AS trimLocation FROM AeonData.dbo.Transactions INNER JOIN AeonData.dbo.Users ON (AeonData.dbo.Users.UserName = AeonData.dbo.Transactions.Username) AND (AeonData.dbo.Transactions.Username = AeonData.dbo.Users.UserName) WHERE (((AeonData.dbo.Transactions.CallNumber) Like 'mss%' Or (AeonData.dbo.Transactions.CallNumber) Like 'aps.%') AND ((AeonData.dbo.Transactions.Location) Is Not Null) AND ((AeonData.dbo.Users.Status)='Researcher'))) subquery ORDER BY subquery.CallNumber FOR XML RAW ('aeonLikeCollections'), ROOT ('dataroot'), ELEMENTS; We then process the “aeonLikeCollections” file through a series of XSLT stylesheets, creating lists of every other collection that every user of the current collection has requested. First the stylesheets remove collections that have only been requested once. Then we count the number of times each collection has been requested:

We sort on the collection name and username and then re-sort to combine groups of requested collections with users who have requested each collection.

PAL: TOWARD A RECOMMENDATION SYSTEM FOR MANUSCRIPTS | ZIEGLER AND SHRAKE 93 https://doi.org/10.6017/ital.v37i3.10357

) that was sometimes styled as fake headings (made larger or bolder to look like headings, but not using the proper tags) which needed to be corrected for consistency and accessibility purposes. Replanting and Sprucing Up With an overwhelming majority of the guides (and their associated assets) deleted, it was finally time to rework the remaining guides into clear, easy-to-use resources that would benefit our students. At this point the guides fell into three categories: • Guides that just needed to be pruned and updated. • Guides that should be combined into a single subject area guide. • Guides that should be created to fill an unmet need. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 TENDING TO AN OVERGROWN GARDEN | HYAMS 8 Pruning and updating tasks were generally the least-arduous, as many of the guides included content that was also housed on discrete guides (citations, resource evaluation, etc.). Instead of duplicating, for example, citation formats on every guide, those pages were replaced with navigation-level links out to the existing citation guide. This was also the point that we could do more extensive quality control such as switching to a single content column which further emphasized the extraneous information on many of our guides. Infographics, videos, and long blocks of links or text were scrutinized to determine if they were helping to enhance students’ understanding of the core content or if they were merely providing clutter that would make it more difficult to understand the important information.9 In some cases, by going from guide to guide, it became apparent that there were guides for multiple courses in a subject area where the resources were basically identical. This was most noticeable in the criminal justice and health education subject areas. In these cases, it made little sense to keep separate course guides when the content was basically the same across them. To remedy this duplication, one of the course guides for each subject was transformed into the subject area guide, and resources were added to ensure they covered the same materials that the separate course guides may have covered. The remaining course guides were then marked for future deletion as they were no longer needed. Lastly, subject areas without guides were identified so that work could be done later to create them. As we had discussed moving towards using the “automagic” integration of guide content into our Blackboard Learning Management System (LMS), this step will be key in ensuring that all subject areas have at least some resources students can use. However, as of this time we have yet to finish creating these additional guides, and several subject areas (including computer science, nursing, and gender studies) have no guides at all. NEXT STEPS Now that all of the work to clean and update our LibGuides is done, the most important next step is coming up with a workflow to ensure that the guides stay relevant and useful. The web and systems librarian mostly left the guides alone for the Fall 2019 semester to allow their colleagues time to use them and report back any issues. To the web and systems librarian’s surprise there were few issues reported, but that does not mean there is no room for future improvement. As a department, it is clear that we need a formal plan for maintaining the guides, including update frequency, content review, and guidelines for when guides should be added or deleted. Additionally, immediately following the conclusion of this cleanup project the library’s website was forced into a server migration and full rebuild for reasons outside of the scope of this article. However, as a result there were changes made on the library’s site involving the look and feel of pages that will need to be carried through into our guides and associated Springshare platforms. While most of this work is relatively simple, mimicking changes developed in WordPress to work properly on external services will take time and effort. CONCLUSION Overall, while this project was a massive undertaking (done almost entirely by a single person), the end result, at least on the surface, has made our guides much easier to use and understand. There were obviously several things that, if the project were to be done over, should have been done differently, mostly involving the cleaning of the asset library. However, it is now much easier INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 TENDING TO AN OVERGROWN GARDEN | HYAMS 9 to refer students to guides for their courses and the feelings about the guides amongst the Library faculty have become much more positive. ENDNOTES 1 “LibGuides: The Next Generation!,” Springshare Blog (blog), June 26, 2013, https://blog.springshare.com/2013/06/26/libguides-the-next-generation/. 2 The guide can be viewed at: https://bmcc.libguides.com/guidecleanup. 3 Though the author only learned of the project undertaken at UNC a few years ago, after they had already finished this project, a similar project was outlined here: Sarah Joy Arnold, “Out with the Old, in with the New: Migrating to LibGuides A-Z Database List,” Journal of Electronic Resources Librarianship 29, no. 2 (April 2017): 117–20, https://doi.org/10.1080/1941126X.2017.1304769. 4 Because there was no way to view the documents before a bulk deletion, documents were manually reviewed and deleted as needed. 5 It was only long after this process that Springshare promoted that they could do this on the backend by request. 6 However, it turned out that due to the differences in URL structure between classic Primo and Primo VE that this change was completely unnecessary as the URLs did actually needed to be changed again post-migration. At least they were consistent which meant a systemwide find- and-replace could take care of most of the links. 7 Several studies have been done since the roll out of LibGuides v2 including: Sarah Thorngate and Allison Hoden, “Exploratory Usability Testing of User Interface Options in LibGuides 2,” College and Research Libraries 78, no. 6 (2017): 844–61, https://doi.org/10.5860/crl.78.6.844; Kate Conerton and Cheryl Goldenstein, “Making LibGuides Work: Student Interviews and Usability Tests,” Internet Reference Services Quarterly 22, no. 1 (January 2017): 43–54, https://doi.org/10.1080/10875301.2017.1290002. 8 Of the many guides the author consulted, the following were the most informative: Stephanie Jacobs, “Best Practices for LibGuides at USF,” https://guides.lib.usf.edu/c.php?g=388525&p=2635904; Jesse Martinez, “LibGuides Standards and Best Practices,” https://libguides.bc.edu/guidestandards/getting-started; Carrie Williams, “Best Practices for Building Guides & Accessibility Tips,” https://training.springshare.com/libguides/best-practices-accessibility/video. 9 There is a very detailed discussion of cognitive overload in LibGuides in Jennifer J. Little, “Cognitive Load Theory and Library Research Guides,” Internet Reference Services Quarterly 15, no. 1 (March 1, 2010): 53–63, https://doi.org/10.1080/10875300903530199. https://blog.springshare.com/2013/06/26/libguides-the-next-generation/ https://bmcc.libguides.com/guidecleanup https://doi.org/10.1080/1941126X.2017.1304769 https://doi.org/10.5860/crl.78.6.844 https://doi.org/10.1080/10875301.2017.1290002 https://guides.lib.usf.edu/c.php?g=388525&p=2635904 https://libguides.bc.edu/guidestandards/getting-started https://training.springshare.com/libguides/best-practices-accessibility/video https://doi.org/10.1080/10875300903530199 ABSTRACT Introduction Getting Started Process The Database List The Asset Library The Guides Removing Debris Cosmetic Improvements Replanting and Sprucing Up Next Steps Conclusion ENDNOTES 12191 ---- Making Disciplinary Research Audible: The Academic Library as Podcaster ARTICLES Making Disciplinary Research Audible The Academic Library as Podcaster Drew Smith, Meghan L. Cook, and Matt Torrence INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12191 Drew Smith (dsmith@usf.edu) is Associate Librarian, University of South Florida. Meghan L. Cook (mlcook3@usf.edu) is Coordinator of Library Operations, University of South Florida. Matt Torrence (torrence@usf.edu) is Associate Librarian, University of South Florida. © 2020. ABSTRACT Academic libraries have long consulted with faculty and graduate students on ways to measure the impact of their published research, which now include altmetrics. Podcasting is becoming a more viable method of publicizing academic research to a broad audience. Because individual academic departments may lack the ability to produce podcasts, the library can serve as the most appropriate academic unit to undertake podcast production on behalf of researchers. The article identifies what library staff and equipment are required, describes the process needed to produce and market the published episodes, and offers preliminary assessments of the podcast impact. INTRODUCTION The academic library has always had an essential role in the research activities of university faculty and graduate students, but until the last several years, that role has primarily focused on assisting university researchers with obtaining access to all relevant published research in their fields, making it possible for those researchers to complete a thorough literature review. More recently, that role has evolved to encompass assisting with other aspects of research and publication, including consulting on copyright-related issues, advising researchers on the most appropriate places to publish, preserving publications and data in institutional repositories, helping tenure-track faculty to evaluate their research impact as part of the tenure and promotion process, and hosting open-access journals. Meanwhile, libraries of all types have experimented in the last ten to fifteen years with using social media to promote library collections, services, and events. Many libraries have taken advantage of Facebook, Twitter, and YouTube as part of these efforts. Increasingly, libraries have incorporated makerspaces so that library patrons can create and edit video and audio files, meaning that this same equipment and software is now available to librarians and other library staff for their own purposes. This has resulted in libraries producing promotional videos and podcasts. The dramatic increase in mobile technology (smartphones and tablets) ownership and usage over the last decade has resulted in an increase in the consumption of podcasts wherever the listener happens to be when their ears are not otherwise fully occupied, such as commuting, exercising, and engaging in home chores. As a result, academic libraries are now finding themselves in an excellent position to use podcasting for instructional and promotional purposes in an effort to reach a broad audience. What happens when the university library combines its inherent interest in supporting the promotion of faculty and graduate student research with its ability to create podcasts to quickly and inexpensively reach an international audience? This paper documents the efforts of an academic library at a high-level research university to partner with one of the university’s mailto:dsmith@usf.edu mailto:mlcook3@usf.edu mailto:torrence@usf.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 2 academic departments to use podcasting to promote the research done by that department’s faculty and doctoral candidates. We will describe which library staff were involved, how the podcast was planned, the execution of the podcasting process, the issues that were encountered throughout the process, and how the impact of the podcast was assessed. Calling: Earth, the podcast produced by the University of South Florida (USF) Libraries, can be found at http://callingearth.lib.usf.edu/. LITERATURE REVIEW Podcasting as a means for promoting scholarly communication is a relatively new and uncommon idea in a library setting, therefore the extant literature is scarce on the subject. A high percentage of the contemporary articles on the aforementioned topic focus on the use of podcasts as a means to satisfy a wide array of student learning needs. While pedagogical best practices knowledge is useful, what current literature does exist is not an exact match for the concept of promoting scholarly communication, which offers subject specificity, faculty and graduate interaction, marketing of libraries, and research visibility as aggregate goals. What follows in this literature review is a summary of a slice of the literature related to podcasting, academia, and/or libraries. The researchers chose as a starting point to look at the general use of podcasting, as well as social media, in various academic and library environments. In a recent article on the use of social media and altmetrics, for example, the increased use of these tools is outlined, but with numerous caveats regarding the initial non-probabilistic methods of gathering information on the how and why of their adoption.1 To further emphasize the use of podcasts and, in a related way, social marketing, an examination of an article related to Association of Research Libraries (ARL) efforts in this vein was examined. A comprehensive study of ARL member libraries published in 2011, with not much on this topic published since this date, demonstrated in figure 1 of their research that five of the 37 respondents contained recorded interviews and only one included scholarly publishing content.2 This ten-year vacuum in further research was unexpected but indicates an opportunity for a new type of podcast focusing on academic production. Scholars in academic libraries have long examined student preferences for new technologies and types of information transfer, including the use of podcasts. A study from Sam Houston State University found that 36 percent of users in 2011 were using podcasts for recreational purposes as opposed to much lower use for academic and scholarly communication benefits. 3 In the future, academic creation and utilization of podcasts for scholarly communication is ripe for a hearty statistical and qualitative analysis. Specific to this inquiry, the application of podcasts for scholarly communication in a subject discipline present in the literature appears to be lacking. Furthermore, this literature review emphasizes the dearth of research that relates to promoting the research efforts of geosciences faculty and graduate students. In terms of recent literature, there are also a number of publications available that deal with the history and evolution of podcasting in education and, specifically, higher education. One such current work provides an excellent outline of this growth in use, as well as outlining several major types, or genres, of podcasting in these types of environments. Following a strong and succinct overview of the technology and its use in college and university settings, the author continues to effectively define, with examples, the three main genres they have identified: the “Quick Burst,” the “Narrative,” and the “Chat Show.”4 The model that most represents USF’s Calling: Earth program is “Narrative,” as this includes a subcategory of “Storytelling.” This work is truly beneficial for any group or individual developing, or improving, an educational podcast effort. http://callingearth.lib.usf.edu/ INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 3 In 2011, Peoples and Tilley outlined the emergence of podcasts to disseminate information in academic libraries. One of the excellent questions that arises from this work deals with the access, advancement, and archiving of the content; is this content to be archived, or cataloged, as more permanent material, or is it electronic ephemera?5 This is a question for the USF Calling: Earth podcast group going forward as the level and quality of content and, ideally, use is expanded. Additionally, educators are studying more about the limitations of podcasts; not to rule them out as academic tools, but to inspire and enhance the best possible outcomes. One excellent warning to be heeded by any library hoping to utilize podcasts for education and dissemination of research is summed up well in this quote: “If students do not utilize or do not realize the benefits of the self- pacing multimedia characteristics of podcasting, then the resource becomes a more likely contributor to cognitive overload.”6 There have been a small number of the quantitative elements of podcast use in academic libraries. An article in VINE: The Journal of Information & Knowledge Management Systems outlined, via content analysis and other methods, various unique and shared characteristics of existing academic podcasts, while also furthering the concept of podcasting as a “library service.”7 This may not have been the first publication to make this assertion, but this is a view that is also held by these authors and this view shapes the development and advancement of the USF Libraries podcasting efforts. Librarians of all types must be wary, however, as there are numerous articles that focus on the better understanding of student learning preferences. As presented by a recent article on the success of satellite and distance learners showed, though, these tools often hit the spot on the delivery preferences for these types of students.8 Switching gears to a bit more topic specificity, a number of news and academic articles were identified on the use of podcasts in areas of the geosciences. One such effort is th e Geology Flannelcast. The development and implementation of this combination of education and entertainment, which is also a goal of these authors, is outlined by the creators’ poster presentation at a recent Geological Society of America conference. With a focus on the increasing ease of podcasting technology, reduced cost of equipment, and the use of “conversational atmosphere” within a pedagogical framework, this model stood out as one worth studying.9 Furthermore, the geosciences are, or can be, interesting and exciting. A recent podcast on communicating geosciences with fun and flair is just the encouragement this research group needed to go all-in on this project. And that the geosciences are far from boring!10 As is evidenced by an examination of current and historical literature on this topic, there are multiple opportunities for further exploration and library efforts, expressly as one of the main points of this work is to emphasize faculty and graduate research efforts and scholarly communication and original content creation. In addition to the focus on these publications and presentation efforts, the results will be measured by the initial assessment projects including download and utilization data and, hopefully, positive feedback from participants and library administration. Further measurement is expected to demonstrate advanced citation counts and downloads of the publications of the faculty and graduate student interviewees. It will be correlation and not causation, of course, but the team hopes to have positive feedback for participants and the library. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 4 STAFFING As with any successful project, a project to produce a podcast focused on academic research had to begin with individuals who had either the interest or the expertise, ideally both, to initiate the work. One was an associate librarian with more than 13 years of experiencing in producing regular podcasts, while the other was a library staff member who was a doctoral candidate serving on the USF Libraries Research Platform Team (RPT) for the USF School of Geosciences. The RPT was already tasked with assisting the Geosciences faculty and graduate students in maximizing the impact of their work and had been using various means in order to accomplish this, such as an institutional repository for research output, and tools to measure the impact of previously published work. During a conversation in late 2018, the librarian suggested to the RPT staff person that podcasting could be used to promote research to a variety of audiences, including USF faculty and students, faculty and students at other universities, K-12 science teachers, and members of the general public (both local and beyond). The librarian offered to initiate the podcast and train the RPT staff on how to continue the podcast after a number of episodes had been produced. The librarian brought to the project the needed expertise with launching and maintaining a podcast, while the RPT doctoral candidate was already familiar with the Geosciences faculty and other doctoral candidates and could identify those who would make good candidates for being interviewed about their research. PLANNING The initial planning for the podcast began approximately two months before the first episode release. The original project managers and podcast creators met a number of times to discuss logistics, equipment, and staffing needs, and to agree upon a podcast name (Calling: Earth). Since the notion of podcasting for researcher promotion was an unexplored territory, the support from higher administration was cautious. However, after production of the first episodes, traction behind the podcast grew and additional support for future endeavors was received. The podcasters acquired handheld recording equipment, a Tascam DR-05 Linear PCM Recorder, from the USF Libraries Digital Media Commons and tested it out in multiple environments (for instance, a quiet office versus a recording studio) to find the optimal location to record the interviews. We found the hand-held recording equipment worked well in a quiet office and allowed for travel to the researcher’s office if they requested. The podcast creation team discussed how to add intro and outro music to the podcast that would not violate any copyright restrictions but that would fit the mood of the podcast. Th e RPT staff person knew of a local Tampa-based band, The Growlers, as a potential source for music because the bass guitarist was an adjunct professor and alumnus of the USF School of Geosciences. The alumnus gave permission to use a portion of the band’s recorded music for the podcast. A hosting service was needed to host and publish the podcast. The librarian suggested using Libsyn, because of their 13 years of previous experience with the platform, Libsyn’s inexpensive hosting plan, and the ability to acquire statistics including the geographic locations (countries and states) where the podcast was being downloaded. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 5 EXECUTION Potential interviewees were contacted via email and invited to be interviewed. Once the potential interviewee agreed, a time and a place to conduct the interview was agreed upon. The RPT staff person determined what the most recent research was for each interviewee, and then pro vided that content to the librarian host for review. The host then prepared interview questions based on the research content. The host went over the questions with the interviewee before the interview began to clear the content with the interviewee and to make sure everything would be covered in the interview that the interviewee wished to cover. The interviews took approximately 30 minutes to an hour. Editing of the podcast was done using GarageBand, allowing for the addition of the music to the beginning and end, as well as the host introducing both the general podcast and the specific episode, identifying the academic units involved in the podcast, indicating how listeners might provide feedback, and thanking the music group for allowing the use of their music. In a few rare cases, small interview segments were removed, usually due to the interviewee feeling that it did not represent them well. CHALLENGES As with any new endeavor, challenges were faced at all stages in the process of getting the podcast to production and beyond. Buy-in from Library Administration An early challenge was to gain buy-in from the library administration. This began with requesting that the library fund the hosting service, and the feeling of the administrator was that it was a worthwhile experiment, at least in the short term. Once a number of episodes had been produced, the library administration had a better sense of the quality of the production and how it would serve the interests of the library in its academic support role. Lack of Budget With no budget for this project (beyond the administration’s monthly payment of the hosting service), the podcasters were at the mercy of the quality of the recorders available for library checkout. If the recorders did not produce a high-quality recording, the podcast would possibly lack the sophistication needed for production. Also, high-quality graphics work was needed and required us to look into other library units for help with creating a logo. Getting the Podcast into Apple Podcasts Once content was being produced and published, it was time to submit the podcast to Apple Podcasts. Apple initially rejected the submission because the first logo looked very similar to an iPhone. It should be noted that Apple did not supply a specific explanation of what copyright was being infringed, so the podcasters were faced with making a best guess as to what the problem was. Based on our assumption, we changed the logo and resubmitted the podcast. A further problem arose when Apple required that the new submission use a different RSS feed than the original submission. Eventually the podcasters sought assistance from Libsyn, who explained how to make a minor change to the URL of the RSS feed so that the podcast could be successfully resubmitted. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 6 New Logo Creation The first logo continued to be used for the entire first season, but before the second season was released, the library’s new communications and marketing coordinator assisted with the creation of a new logo that looked more sophisticated and more in-line with other podcast logos. Having an in-house graphics designer was extremely helpful in rolling out a new logo (See figures 1 and 2). Figure 1. Season 1 Logo Figure 2. Current Logo Setting Up Interviews Identifying potential interviewees, requesting interviews, and setting good times and locations for the interviews brought on another batch of challenges. The USF School of Geosciences is composed of geologists, geographers, and environmental scientists so when planning out the schedule for the potential interviewees, an effort was made to involve a wide range of researchers. Some potential interviewees denied the request altogether, while others were not available for the needed time period. Given that the podcast was released every two weeks, there was a little wiggle room for scheduling hiccups, but once or twice a last-minute request to a new potential interviewee was made to ensure production stayed on schedule. Where the interview was held and what time required a lot of back-and-forth emails between the RPT staff person and the interviewee. Preference on time and location was given to the interviewee, but it was requested that, if they did not want to come to the library to be interviewed, their own office/lab space could be used if it was a sufficiently quiet environment for recording purposes. Comfortability of the Interviewee Once an interview began, the challenge of engagement from the host and comfortability of the interviewee became apparent. The host had to engage the researcher at a level appropriate for a general audience, which was challenging given that the research done by the USF School of Geosciences is often at a high-level of critical thinking and problem-solving. To add on to the complexity of the research being explained, the comfort level of the interviewee had the potential INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 7 to dampen the interview. One researcher was so uncomfortable speaking in an interview that they typed up in advance what they wanted to say. ASSESSMENT Libsyn Statistics According to Libsyn statistics (as of July 17, 2020) there were a total of 3,593 unique downloads from 48 different countries of the published 35 episodes of Calling: Earth. In table 1, the 48 countries where Calling: Earth has been downloaded are shown, as well as how many times the podcast has been downloaded in each country. It is worth noting that there are 105 downloads that do not have a location specified, so the total of the downloads in table 1 does not equal the total number of downloads reported by Libsyn. Table 1. Downloads by Country Name Downloads Name Downloads United States 2,729 Chile 3 United Kingdom 103 Denmark 3 India 98 Romania 3 Australia 88 South Africa 3 France 62 Yemen 3 Ireland 50 Argentina 2 Bangladesh 43 Ecuador 2 Spain 37 Poland 2 Russian Federation 36 Taiwan 2 Norway 30 Turkey 2 Portugal 30 Belgium 1 Germany 20 Bulgari 1 Japan 19 Colombia 1 Mexico 18 Costa Rica 1 Italy 14 Estonia 1 Netherlands 12 Greece 1 New Zealand 11 Latvia 1 Brazil 9 Macedonia 1 Korea, Republic of 9 Nigeria 1 Czech Republic 7 Pakistan 1 Ukraine 7 Saudi Arabia 1 China 6 United Arab Emirates 1 Hong Kong 5 Vietnam 1 Sweden 4 Without a location 105 Canada 3 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 8 Preliminary Survey and Scholarly Impact A survey was sent out to the interviewees to gauge their impressions of the podcast and to see if they had noticed any impact to their citations or document downloads. Our goal for the survey was to find out if the podcast was accomplishing the intention for starting a podcast, which was to increase researcher impact by research dissemination, as well as to inform the podcast processes and procedures. The questions asked were: 1. In what ways do you view the Calling: Earth podcast as a way to positively affect your research impact? 2. What evidence do you have, if any, to suggest your research has been positively impacted because of being an interviewee on the Calling: Earth podcast? 3. What would you have liked to be different about your interview process for the Calling: Earth podcast? 4. What suggestions do you have for the future seasons of the Calling: Earth podcast? For example, should the format change, the focus be different, change the length of the interview, etc. Furthermore, each interviewee was asked to contribute their scholarship to the library’s institutional repository, Scholar Commons, to allow for the archiving of their research publications and to use as a means of tracking scholarship impact as a result of the podcast. Once an interviewee’s scholarship was placed in Scholar Commons, a Selected Works profile was created so that a direct link to the scholar’s work could be disseminated through the podcast notes. Impact on faculty has also been noteworthy. The download totals for faculty interview participants (when comparing roughly the same amount of time just prior to and following their published interview) showed an average increase of 30 percent and suggest a strong correlative link between the podcast and researcher impact. Furthermore, anecdotal evidence from interviewees such as “puts my name out there to a wider audience,” “enhances the visibility of my work,” and “allow others to hear about [my research] in a more passive way” indicates the potential impact a researcher can see from being a part of the podcast. A second survey was sent to the faculty, students, and staff of the entire School of Geosciences to determine who was listening to the podcast and conversely, who was not, and their reasons for listening or not listening. The survey contained five questions in total, but according to how the participant selected their answers, not all were available to be answered (figure 3). The first question asked their status in the School of Geosciences (faculty, staff, undergraduate, graduate, or other). The second question asked if they had heard of the podcast and if they had or had not listened to it. If a participant chose the option that they had never heard of the podcast, then the survey ended for them. If a participant chose the option that they had heard of the podcast, but had not listened to it, then the survey directed them to a question that asked them to provide reasons they had not listened to the podcast. If a participant chose the option that they had heard of the podcast, and had listened to at least one episode, the survey directed them to a question that asked how many episodes the participant had listened to and for what reasons were they listening to the podcast. This data was collected to inform the future direction of the podcast. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 9 Figure 3. Flow Chart for the Entire School of Geosciences Survey CHECKLIST FOR PODCAST PLANNING/EXECUTION Based on our experiences in the production of the Calling: Earth podcast, we recommend that academic librarians and library staff use the following list to help with planning and executing the production of their own podcasts: • Get general buy-in from library staff and administration, and update as the planning progresses and budgeting is needed. • Decide on goals, audience, content, format, frequency of production, and methods of assessment. • Work with media staff to design marketing, including podcast title (avoiding duplication with other podcasts) and logo development. • Choose a podcast hosting service. • Identify relevant staff for hosting, recording, editing, and publishing and train as needed. • Evaluate existing hardware and software and make additional purchases as needed. • Contact potential interviewees and create a schedule. • Prepare customized interview questions and share as appropriate with interviewees. • Record interviews. • Edit and publish episodes. • Submit podcast to Apple Podcasts, Spotify, and other popular podcast directories. • Monitor statistics. • Continue to engage in marketing and assessment activities. What choice best describes your current status in the USF School of Geosciences: Which of the following describes you: I have never heard of the Calling: Earth podcast I have heard of the Calling: Earth podcast, but have not listened to it I have heard of the Calling: Earth podcast and have listened to at least 1 episode What choice best describes why you have not listened to the Calling: Earth podcast: I know what a podcast is, but I do not have time to listen to the Calling: Earth podcast. END SURVEY Faculty Staff Graduate Student Undergraduate Student Other I do not know what a podcast is. Other I know what a podcast is, but I am not interested in listening to Calling: earth podcast. I know what a podcast is, but I do not have time to listen to the Calling: Earth podcast. END SURVEY Approximately how many episodes of the Calling: Earth podcast have you listened to? 1 For enjoyable content For awareness of current research in the USF School of Geosciences For instructional purposes For ways to find collaborators Other 2 3 4 5 6 7 8 END SURVEY INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 10 CONCLUSIONS AND FUTURE DIRECTIONS Enthusiasm and anecdotal positive feedback are enough fuel for current activities and levels of enthusiasm and the future of podcasting in libraries also appears open and exciting. At the USF Libraries, Calling: Earth is currently in its third season and with each new episode, new ideas and increased archival content become a permanent part of the library’s legacy and collections. This is another area ripe for future exploration, as this type of original content is archived, cataloged, and disseminated, becoming another part of regular academic impact measure. In this vein, the USF Libraries podcasting group plans to further codify cyclical assessment tools, including the receipt of IRB clearance for future surveys and data collection. In addition to cleaning up and refining these assessment practices, this will also provide the opportunity to publish and present publicly on more specific data. Ideally, the group will be able to correlate the show’s presence to positive citation or metrics levels with show participants. The USF Libraries Geosciences RPT is currently collecting baseline aggregate information, which could then be compared following further maturation and dissemination of the podcast. Causality may never be within reach, but any positive impacts will be exciting and beneficial. It is also the hope of those involved with Calling: Earth that it might provide a model or template for other RPT or library podcasts or media efforts. One of the current benefits is the strong and effective support from the Development and Communication directors at the USF Libraries and their partnerships in the future will certainly be key to the success of this and any other potential projects of this type. In closing, the academic library podcasting landscape is wide-open for further exploration and examination, and the USF Libraries plans to lead and learn. ENDNOTES 1 Cassidy R Sugimoto et al., “Scholarly Use of Social Media and Altmetrics: A Review of the Literature,” Journal of the Association for Information Science and Technology 68, no. 9 (2017): 2,037–62. 2 James Bierman and Maura L. Valentino, “Podcasting Initiatives in American Research Libraries,” Library Hi Tech 29, no. 2 (May 2011): 349, https://doi.org/10.1108/07378831111138215. 3 Erin Dorris Cassidy et al., “Higher Education and Emerging Technologies: Student Usage, Preferences, and Lessons for Library Services,” Reference & User Services Quarterly 50, no. 4 (2011): 380–91, https://doi.org/10.5860/rusq.50n4.380. 4 Christopher Drew, “Educational Podcasts: A Genre Analysis,” E-Learning and Digital Media 14, no. 4 (2017): 201–11, https://doi.org/10.1177/2042753017736177. 5 Brock Peoples and Carol Tilley, “Podcasts as an Emerging Information Resource,” College & Undergraduate Libraries 18, no. 1 (January 2011): 44, https://doi.org/10.1080/10691316.2010.550529. 6 Stephen M Walls et al., “Podcasting in Education: Are Students as Ready and Eager as We Think They Are?”, Computers & Education 54, no. 2 (January 2010): 372, https://doi.org/10.1016/j.compedu.2009.08.018. https://doi.org/10.1108/07378831111138215 https://doi.org/10.5860/rusq.50n4.380 https://doi.org/10.1177/2042753017736177 https://doi.org/10.1080/10691316.2010.550529 https://doi.org/10.1016/j.compedu.2009.08.018 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 MAKING DISCIPLINARY RESEARCH AUDIBLE | SMITH, COOK, AND TORRENCE 11 7 Tanmay De Sarkar, “Introducing Podcast in Library Service: An Analytical Study,” Vine 42, no. 2 (2012): 191–213, https://doi.org/10.1108/03055721211227237. 8 Lizah Ismail, “Removing the Road Block to Students’ Success: In-Person or Online? Library Instructional Delivery Preferences of Satellite Students,” Journal of Library & Information Services in Distance Learning 10, no. 3–4 (2016): 286–311, https://doi.org/10.1080/1533290X.2016.1219206. 9 Jesse Thornburg, “Podcasting to Educate a Diverse Audience: Introducing the Geology Flannelcast,” in Innovative and Multidisciplinary Approaches to Geoscience Education (Posters) (Boulder, CO: Geological Society of America, 2015). 10 Catherine Pennington, “PODCAST: Geology Is Boring, Right? What?! NO! Why Scientists Should Communicate Geoscience...,” n.d., https://britgeopeople.blogspot.com/2018/10/PODCAST- geology-is-boring-right.html. https://doi.org/10.1108/03055721211227237 https://doi.org/10.1080/1533290X.2016.1219206 https://britgeopeople.blogspot.com/2018/10/PODCAST-geology-is-boring-right.html https://britgeopeople.blogspot.com/2018/10/PODCAST-geology-is-boring-right.html ABSTRACT Introduction Literature Review Staffing Planning Execution Challenges Buy-in from Library Administration Lack of Budget Getting the Podcast into Apple Podcasts New Logo Creation Setting Up Interviews Comfortability of the Interviewee Assessment Libsyn Statistics Preliminary Survey and Scholarly Impact Checklist for Podcast Planning/Execution Conclusions and Future Directions ENDNOTES 12197 ---- Using the Harvesting Method to Submit ETDs into ProQuest: A Case Study of a Lesser-Known Approach COMMUNICATIONS Using the Harvesting Method to Submit ETDs into ProQuest A Case Study of a Lesser-Known Approach Marielle Veve INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12197 Marielle Veve (m.veve@unf.edu) is Metadata Librarian, University of North Florida. © 2020. ABSTRACT The following case study describes an academic library’s recent experience implementing the harvesting method to submit electronic theses and dissertations (ETDs) into the ProQuest Dissertations & Theses Global database (PQDT). In this lesser-known approach, ETDs are deposited first in the institutional repository (IR), where they get processed, to be later harvested for free by ProQuest through the IR’s Open Archives Initiative (OAI) feed. The method provides a series of advantages over some of the alternative methods, including students’ choice to opt-in or out from ProQuest, better control over the embargo restrictions, and more customization power without having to rely on overly complicated workflows. Institutions interested in adopting a simple, automated, post-IR method to submit ETDs into ProQuest, while keeping the local workflow, should benefit from this method. INTRODUCTION The University of North Florida (UNF) is a midsize public institution established in 1972, with the first theses and dissertations (TDs) submitted in 1974. Since then, copies have been deposited in the library, where bibliographic records are created and entered in the library catalog and the Online Computer Library Center (OCLC). During the period of 1999 to 2012, some TDs were also deposited in ProQuest by the graduate school on behalf of students who decided to. This practice, however, was discontinued in the summer of 2012, when the institutional repository, Digital Commons, was established and submission to it became mandatory. Five years later, in the summer of 2017, interest in getting UNF TDs hosted in ProQuest resurfaced. This renewed interest grew out from a desire of some faculty and graduate students to see the institution’s electronic theses and dissertations (ETDs) posted there, in addition to a recent library subscription to the ProQuest Dissertations & Theses Global database (PQDT). A month later, conversations between the library and graduate school began on the possibility of resuming hosting UNF ETDs in ProQuest. Consensus was reached that the PQDT database would be a good exposure point for our ETDs, in addition to the institutional repository (IR), yet some concerns were raised. One of the concerns was cost of the service and who would be paying for it. Neither the library nor the graduate school had allocated funds for this. The next concern was the possibility of ProQuest imposing restrictions that could prevent students, or the university, from posting ETDs in other places. It was important to make sure there were no such restrictions. Another concern was expressed over students entering embargo dates in ProQuest that do not match the embargo dates selected for the IR. This is a common problem encountered by other libraries.1 For that reason, we wanted to keep the local workflow. The last concern expressed during the conversations was preserving students’ right to opt-in or out from distributing their theses in ProQuest. This is something both the graduate school and library have been adamant mailto:m.veve@unf.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 2 about. In higher education, requiring students to submit to ProQuest is a controversial issue which has raised ethical concerns and has been highly debated over the years.2 Once conversations between the library and graduate school were held and concerns were gathered, the library moved ahead to investigate the available options to submit ETDs into ProQuest. LITERATURE REVIEW Currently, there are three options to submit ETDs into ProQuest: (1) submission through the ProQuest ETD Administrator tool, (2) submission via File Transfer Protocol (FTP), and (3) submission through harvests performed by ProQuest.3 ProQuest ETD Administrator Submission Option In this option, a proprietary submission tool called ProQuest ETD Administrator is used by students, or assigned administrators, to upload ETDs into ProQuest. Inside the tool, a fixed metadata form is completed with information on the degree, subject terms are selected from a proprietary list, and keywords are provided. The whole administrative and review process gets done inside the tool. Afterwards, zip packages with the ETDs and ProQuest’s Extensible Markup Language (XML) files are sent to the institution via FTP transfers, or through direct deposits to the IR using the Simple Web-service Offering Repository Deposit (SWORD) protocol. The ETD Administrator submission method presents several shortcomings. First, the ProQuest XML metadata that is returned to the institutions must be transformed into IR metadata for ingest in the IR, a process that can be long and labor intensive.4 Second, the subject terms supplied in the returned files come from a proprietary list of categories maintained by ProQuest, which does not match the Library of Congress Subject Headings (LCSH) used by libraries.5 Third, control over the metadata provided is lost because the metadata form cannot be altered, plus customizations to other parts of the system can be difficult to integrate. 6 Fourth, there have been issues with students indicating different embargo periods in the ProQuest and IR publishing options, with instances of students choosing to embargo ETDs in the IR, while not in ProQuest.7 Lastly, this method does not allow students’ choice, unless the ETDs are submitted separately in two systems in a process that can be burdensome. Ultimately, for these reasons, we found the ETD Administrator not a suitable option for our institution. FTP Submission Option In this option, an administrator sends zip packages with the institution’s ETD files and ProQuest XML metadata to ProQuest via FTP.8 At the time of this investigation, there was a $25 charge per ETD submitted through this method.9 We did not want to pursue this option because of the charge and the tedious metadata transformations that would be needed between IR and ProQuest XML schemas. Another way to go around this would have been to submit the ETDs through the VIREO application. VIREO is an open source, ETD management system used by libraries to freely submit ETDs into ProQuest via FTP.10 This alternative, however, was not an option for us as our IR, Digital Commons, does not support the VIREO application. Harvesting Submission Option This is the latest method available to submit ETDs into ProQuest. In this option, ETDs are submitted first into an IR, or other internal system, where they get processed to be later harvested by ProQuest through the IR’s existing Open Archives Initiative (OAI) feed.11 At the time of this writing, we were not able to find a single study that documents the use of this method. This option INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 3 looked appealing and worth pursuing as it met most of our desired criteria. First, with this option, students’ choice would not be compromised as ETDs would be submitted to ProQuest after being posted in the IR. Second, because the ETD Administrator would not be used, issues with conflicting embargo dates and unalterable metadata forms would be avoided. In addition, the local workflow would be retained, thus eliminating the need for tedious metadata transformations between ProQuest and IR schemas. From the available options, this one seemed the most feasible solution for our institution. IMPLEMENTATION OF THE HARVESTING METHOD AT UNF After research on the different submittal options was performed, the library approached ProQuest to express interest in depositing our future ETDs into their system by using a post-IR option. In the first communications, ProQuest suggested we use the ETD Administrator to submit ETDs because is the most commonly used method. When we expressed interest in the harvesting option, they said “we have not been harvesting from BePress sites” (the company that makes Digital Commons) and suggested we use the FTP option instead.12 Ten months later, they clarified the harvests could be performed from BePress sites and that the option is free, with the only requirement of a non-exclusive agreement between the university and ProQuest. The news appeased both the library’s and the graduate school’s previous concerns, as we would be able to adopt a free method that would not compromise on students’ choice nor restrict students from posting in other places, while keeping the local workflow. After agreement on the submittal method was established, planning and testing of the harvesting method began. The library worked with ProQuest and BePress to customize the harvesting process while the university’s Office of the General Counsel worked with ProQuest on the negotiation process. Negotiation Process Before ProQuest could harvest UNF ETDs, two legal documents needed to be in place. The first document was the Theses and Dissertations Distribution Agreement, which specifies the conditions under which ETDs can be obtained, reproduced, and disseminated by ProQuest. The document had to be signed by the UNF’s Board of Trustees and ProQuest. The agreement stipulated the following conditions: • The agreement must be non-exclusive. • The university must make the full-text Uniform Resource Locators (URLs) and abstracts of ETDs available to ProQuest. • ProQuest must harvest the ETDs from the university’s IR. • The university and students have the option to elect not to submit individual works or to withdraw them. • No fees are due from the university or students for the service. • ProQuest must include the ETDs in the PQDT database. The second document that needed to be in place was the Theses and Dissertations Availability Agreement, which grants the university the non-exclusive right to reproduce and distribute the ETDs. This agreement between students and UNF specifies the places where ETDs can be hosted and the embargo restrictions, if any. UNF already has been using this document as part of its ETD workflow, but the document needed to be modified to include the additional option to submit INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 4 ETDs into ProQuest. Beginning with the spring 2019 semester, the revised version of the agreement provided students with two hosting alternatives: posting in the IR only or in the IR and ProQuest. Local Steps Performed Before the Harvesting The workflow begins when students upload their ETDs and supplemental files (Certificate of Approval and Availability Agreements) directly into the Digital Commons IR. In there, students complete a metadata template with information on the degree and keywords related to the thesis are provided. After this, the graduate school reviews the submitted ETDs and approves them inside the IR platform. Next, the Library Digital Projects’ staff downloads the native PDF files of ETDs, processes them, and creates public and archival versions for each ETD. Availability Agreements are reviewed to determine which students chose to embargo their ETDs and which ones chose to host them in ProQuest, in addition to the IR. If students choose to embargo their ETDs, the embargo dates are entered in the metadata template. If students choose to publish their ETDs in ProQuest, a “ProQuest: Yes” option is checked in their metadata template, while students who choose not to host in ProQuest would get a “ProQuest: No” in their template. (The ProQuest field is a new administrative field that was added to the ETD metadata template, starting with the spring 2019 semester, to assist with the harvesting process. It was designed to alert ProQuest of the ETDs that were authorized for harvesting. More detail on its functionality will be provided in the next section.) The reason library staff enters the ProQuest and embargo fields on behalf of students is to avoid having students enter incorrect data on the template. Following this review, the Metadata Librarian assigns Library of Congress Subject Headings to each ETD and creates authority files for the authors. These are also entered in the metadata template. Afterwards, the ETDs get posted in the Digital Commons’ public display, with the full- text PDF files available only for the non-embargoed ETDs. Information that appears in the public display of Digital Commons will also appear immediately in the OAI feed for harvesting. At this point, two separate processes take place: 1. Metadata Librarian harvests the ETDs’ metadata from the OAI feed and converts it into MARC records that are sent to OCLC, with the IR’s URL attached. The workflow is described at https://journal.code4lib.org/articles/11676. 2. On the seventh of each month, ProQuest harvests the full-text PDF files, with some metadata, of the non-embargoed ETDs that were authorized for harvesting from the OAI feed. Harvesting Process (Customized for Our Institution) To perform the harvests, ProQuest creates a customized robot for each institution that crawls OAI- PMH compliant repositories to harvest metadata and full-text PDF files of ETDs.13 The robot performs a date-limited OAI request to pull everything that has been published or edited in an IR’s publication set during a specific timeframe. Information to formulate the date limited request is provided to ProQuest by the institution for the first harvest only, subsequently, the process gets done automatically by the robot. The request contains the following elements: INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 5 • Base URL of the OAI repository • Publication set • Metadata prefix or type of metadata • Date range of titles to be harvested In the particular case of our institution, we needed to customize the robot to limit the harvests to authorized ETDs only. To achieve this, we worked with BePress to add a new, hidden field at the bottom of our Digital Commons’ ETD metadata template. The field, called ProQuest, consisted of a dropdown menu with 2 alternatives: “ProQuest Yes” or “ProQuest No” (see figure 1). The field was mapped to an element in the OAI feed that displays the value of “ProQuest: Yes” or “ProQuest: No,” thus alerting the robot of the ETDs that were authorized for harvesting and the ones that were not. The element used to map the ProQuest field in the OAI feed is the , which is a Qualified Dublin Core (QDC) element (figure 2). For that reason, the robot needs to perform the harvests from the QDC OAI feed in order to see this field. Figure 1. Display of the ProQuest Field’s Dropdown Menu in the Metadata Template Figure 2. Display of the ProQuest Field in the QDC OAI Feed After the ETDs authorized for harvesting have been identified with help from the “ProQuest: Yes” field, the robot narrows down the ones that can be harvested at the present moment by using the element. This element, as the name implies, provides the date when the full - text file of an ETD becomes available. It also displays in the QDC OAI feed (see figure 3). If the date is on or before the monthly harvest day, the ETD is currently available for harvesting. If the date is in the future, the robot identifies that ETD as embargoed and adds its title to a log of embargoed ETDs with some basic metadata (including the ETD’s author and the last time it was checked). The log of embargoed ETDs is then pulled out in the future to identify the ETDs that come out of embargo so the robot can retrieve them. Figure 3. Display of the Element in the QDC OAI Feed INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 6 After the ETDs that are currently available for harvesting have been identified (because they have the “ProQuest: Yes” field and a present or past availability date), the robot performs a harvest of their full-text PDF files by using the third element, which displays at the bottom of records in the OAI feed (figure 4). The third element contains a URL with direct access to the complete PDF file of ETDs that are currently not embargoed. ETDs that are currently on embargo contain a URL that redirects the user to a webpage with the message: “The full-text of this ETD is currently under embargo. It will be available for download on [future date]” (see figure 5). Figure 4. Display of the Third Element at the Bottom of Records in the QDC OAI Feed Figure 5. Message that Displays in the URL of Embargoed ETDs Once the metadata and full-text PDF files of authorized, non-embargoed ETDs have been obtained by the robot, they get queued for processing by the ProQuest editorial team, who then assigns them International Standard Book Numbers (ISBNs) and ProQuest’s proprietary terms. It takes an average of four to nine weeks for the ETDs to display in the PQDT database after been harvested. Records in the PQDT come with the institutional repository’s original cover page and a copyright statement that leaves copyright to the author. Afterwards, the process gets repeated once a month. This frequency can be set to quarterly or semi-annually if desired. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 7 ADDITIONAL POINTS ON THE HARVESTING METHOD Handling of ETDs that come out of embargo. When the embargo period of an ETD expires, the full-text PDF of it becomes automatically available in the IR’s webpage, and consequently, in the third element that displays in the OAI record. Each month, when the robot prepares to crawl the OAI feed, it will first check for the titles in the log of embargoed ETDs to determine if any of them have become fully available through the third element. The ones that become available are then pulled by the robot through this element. Handling of metadata edits performed after the ETDs have been harvested and published in PQDT. Edits performed to metadata of ETDs will trigger a change of date in the element that displays in the OAI records. This change of date will alert the robot of an update that took place in a record, which is then manually edited or re-harvested, depending on the type of update that took place. Sending MARC records to OCLC. As part of the harvesting process, ProQuest provides free MARC records for the ETDs hosted in their PQDT database. These can be delivered to OCLC on behalf of the institution on an irregular basis. Records are machine-generated “K” level and come with URLs that link to the PQDT database and with ProQuest’s proprietary subject terms. We requested to be excluded from these deliveries and continue our local practice of sending MARC records to OCLC with LCSH, authority file headings, and the IR’s URLs. Notifications of harvests performed by ProQuest and imports to the PQDT database. When harvests or imports to the PQDT have been performed by ProQuest, institutions do not get automatically notified. Still, they can request to receive scheduled monthly reports of the titles that have been added to the PQDT. UNF requested to receive these monthly reports. Usage statistics of ETDs hosted in PQDT. Usage statistics of an institution’s ETDs hosted in the PQDT can be retrieved from a tool called Dissertation Dashboard. This tool is available to the institution’s ETD administrators and provides the number of times some aspect of an ETD (e.g., citations, abstract viewings, page previews, and downloads) has been accessed through the PQDT database. Royalty payments to authors. Students who submit ETDs through this method are also eligible to receive royalties from ProQuest. OBSTACLES FACED During the planning phase, we encountered some obstacles that hindered progress on the implementation. These were: • Amount of time it took to get the ball rolling. Initially, we were misled by the assumption we would not be able to use the harvesting method to submit ETDs into ProQuest because we were BePress users, as we were originally told, but that ended up not being the case. Ten months later, we were notified by the same source that the harvesting option for BePress sites would be possible and doable by ProQuest. These were ten months that delayed the implementation process. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 8 • Amount of time it took to get the paperwork finalized and signed before the harvesting. From the moment first contact was initiated with ProQuest, to the moment the last agreement was finalized and signed by both parties, 21 months went by. There was a lot of back and forth in the negotiation process and paperwork between the University and ProQuest. • Inconsistent lines of communication. There were multiple parties involved in the communication process and some of the emails began with one person only to be later transferred to someone else. This lack of consistency in the communication lines made it difficult to determine who was in charge of particular tasks at certain stages of the process. CONCLUSION AND RECOMMENDATIONS Although problems were encountered at the beginning, implementation of the harvesting process at UNF was a complete success. Once the process started, it ran smoothly without complications. Harvests were performed on schedule and no issues with unauthorized content been pulled from the OAI were faced. Fields used to alert the robot in the OAI of the ETDs authorized for harvesting worked as planned, and so did the embargo log used to identify and pull the out of embargo ETDs. It should be noted that Digital Commons users who want to exclude embargoed ETDs from displaying in the OAI can do so by setting up an optional yes/no button in their submission form. This button prevents metadata of particular records from displaying in the OAI feed. We did not pursue this option because we have been using the ETD metadata that displays in th e OAI to generate the MARC records we send to OCLC. In addition, we took the necessary precautions to avoid exposing full content of the embargoed ETDs in the OAI feed. Institutions planning to use this method should be very careful with the content they display in the OAI as to avoid embargoed ETDs from been mistakenly pulled by ProQuest. Access restrictions can be set by either suppressing the metadata of embargoed ETDs from displaying in the OAI or by suppressing the URLs with full access to the embargoed ETDs. The same precaution should be taken if planning to provide students with the choice to opt-in or out from ProQuest. Altogether, the harvesting option proved to be a reliable solution to submit ETDs into ProQuest without having to compromise on students’ choice nor rely on complicated workflows with metadata transformations between IR and ProQuest schemas. Institutions interested in adopting a simple, automated, post-IR method, while keeping the local workflow, should benefit from this method. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 9 ENDNOTES 1 Dan Tam Do and Laura Gewissler, “Managing ETDs: The Good, the Bad, and the Ugly,” in What’s Past Is Prologue: Charleston Conference Proceedings, eds. Beth R. Bernhardt et al. (West Lafayette, IN: Purdue University Press, 2017), 200-04, https://doi.org/10.5703/1288284316661; Emily Symonds Stenberg, September 7, 2016, reply to Wendy Robertson, “Anything to watch out for with etd embargoes?,” Digital Commons Google Users Group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7Csort:da te/digitalcommons/RNInGtRarNY/6byzT9apAQAJ. 2 Gail P. Clement, “American ETD Dissemination in the Age of Open Access: ProQuest, NoQuest, or Allowing Student Choice,” College & Research Libraries News 74, no. 11 (December 2013): 562– 66, https://doi.org/10.5860/crln.74.11.9039; FUSE, 2012-2013, Graduate Students Re-FUSE!, https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/Graduate%20Students %20Re-FUSE.pdf?sequence=25&isAllowed=y. 3 “PQDT Submissions Options for Universities,” ProQuest, http://contentz.mkt5049.com/lp/43888/382619/PQDTsubmissionsguide_0.pdf . 4 Meghan Banach Bergin and Charlotte Roh, “Systematically Populating an IR With ETDs: Launching a Retrospective Digitization Project and Collecting Current ETDs,” in Making Institutional Repositories Work, eds. Burton B. Callicott, David Scherer, and Andrew Wesolek (West Lafayette, IN: Purdue University Press, 2016), 127–37, https://docs.lib.purdue.edu/purduepress_ebooks/41/. 5 Cedar C. Middleton, Jason W. Dean, and Mary A. Gilbertson, “A Process for the Original Cataloging of Theses and Dissertations,” Cataloging and Classification Quarterly 53, no. 2 (February 2015): 234–46, https://doi.org/10.1080/01639374.2014.971997. 6 Wendy Robertson and Rebecca Routh, “Light on ETD’s: Out from the Shadows” (presentation, Annual Meeting for the ILA/ACRL Spring Conference, Cedar Rapids, IA, April 23, 2010), http://ir.uiowa.edu/lib_pubs/52/; Yuan Li, Sarah H. Theimer, and Suzanne M. Preate, “Campus Partnerships Advance both ETD Implementation and IR Development: A Win-win Strategy at Syracuse University,” Library Management 35, no. 4/5 (2014): 398–404, https://doi.org/10.1108/LM-09-2013-0093. 7 Do and Gewissler, “Managing ETDs,” 202; Banach Bergin and Roh, “Systematically Populating,” 134; Donna O’Malley, June 27, 2017, reply to Andrew Wesolek, “ETD Embargoes through ProQuest,” Digital Commons Google Users Group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7Csort :date/digitalcommons/Gadwi8INfgA/sg7de7SdCAAJ. 8 Gail P. Clement and Fred Rascoe, “ETD Management & Publishing in the ProQuest System and the University Repository: A Comparative Analysis,” Journal of Librarianship and Scholarly Communication 1, no. 4 (August 2013): 8, http://doi.org/10.7710/2162-3309.1074. 9 “U.S. Dissertations Publishing Services: 2017-2018 Fee Schedule,” ProQuest. https://doi.org/10.5703/1288284316661 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7Csort:date/digitalcommons/RNInGtRarNY/6byzT9apAQAJ https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7Csort:date/digitalcommons/RNInGtRarNY/6byzT9apAQAJ https://doi.org/10.5860/crln.74.11.9039 https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/Graduate%20Students%20Re-FUSE.pdf?sequence=25&isAllowed=y https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/Graduate%20Students%20Re-FUSE.pdf?sequence=25&isAllowed=y http://contentz.mkt5049.com/lp/43888/382619/PQDTsubmissionsguide_0.pdf https://docs.lib.purdue.edu/purduepress_ebooks/41/ https://doi.org/10.1080/01639374.2014.971997 http://ir.uiowa.edu/lib_pubs/52/ https://doi.org/10.1108/LM-09-2013-0093 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7Csort:date/digitalcommons/Gadwi8INfgA/sg7de7SdCAAJ https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7Csort:date/digitalcommons/Gadwi8INfgA/sg7de7SdCAAJ http://doi.org/10.7710/2162-3309.1074 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 USING THE HARVESTING METHOD TO SUBMIT ETDS INTO PROQUEST | VEVE 10 10 “Support: ProQuest Export Documentation,” Vireo Users Group, https://vireoetd.org/vireo/support/ProQuest-export-documentation/. 11 “PQDT Global Submission Options, Institutional Repository + Harvesting,” ProQuest, https://media2.proquest.com/documents/dissertations-submissionsguide.pdf. 12 Marlene Coles, email message to author, January 19, 2018. 13 “ProQuest Dissertations & Theses Global Harvesting Process,” ProQuest. https://vireoetd.org/vireo/support/ProQuest-export-documentation/ https://media2.proquest.com/documents/dissertations-submissionsguide.pdf ABSTRACT INTRODUCTION LITERATURE REVIEW ProQuest ETD Administrator Submission Option FTP Submission Option Harvesting Submission Option IMPLEMENTATION OF THE HARVESTING METHOD AT UNF Negotiation Process Local Steps Performed Before the Harvesting Harvesting Process (Customized for Our Institution) ADDITIONAL POINTS ON THE HARVESTING METHOD Handling of ETDs that come out of embargo. Handling of metadata edits performed after the ETDs have been harvested and published in PQDT. Sending MARC records to OCLC. Notifications of harvests performed by ProQuest and imports to the PQDT database. Usage statistics of ETDs hosted in PQDT. Royalty payments to authors. OBSTACLES FACED CONCLUSION AND RECOMMENDATIONS ENDNOTES 12207 ---- Intro to Coding Using Python at the Worcester Public Library PUBLIC LIBRARIES LEADING THE WAY Intro to Coding Using Python at the Worcester Public Library Melody Friedenthal INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12207 Melody Friedenthal (mfriedenthal@mywpl.org) is a Public Services Librarian, Worcester Public Library. ABSTRACT The Worcester Public Library (WPL) offers several Digital Learning courses to our adult patrons, and among them is “Intro to Coding Using Python”. This 6-session class teaches basic programming concepts and the vocabulary of software development. It prepares students to take more intensive, college-level classes. The Bureau of Labor Statistics predicts a bright future for software developers, web developers, and software engineers. WPL is committed to helping patrons increase their “hireability” and we believe our Python class will help patrons break into these lucrative and gratifying professions… or just have fun. HISTORY AND DETAILS OF OUR CLASS I came to librarianship from a long career in software development, so when I joined the Worcester Public Library in January 2018 as a Public Services Librarian, my manager proposed that I teach a class in programming. She asked me to research what language would be best. Python got high marks for ease of use, flexibility, growing popularity, and a very active online community. Once I selected a language, I had to choose an environment to teach it in – or so I thought. I had absolutely no experience in front of a classroom, and few pedagogical skills, so I sought out an online Python course within which to teach. I decided to use the Code Academy (CA) website as our programming environment. CA has self- guided classes in a number of subjects and the free Beginning Python course seemed to be just what we needed. I went through the whole class myself before using it as courseware. My intent was to help students register for CA, then, each day, teach them the concepts in that day’s CA lesson. They would then be set to do the online lesson and assignments. We first offered Python in June 2018. Problems with CA came up right from the start: students registered for the wrong class (despite the handout explicitly naming the correct class) and CA frequently tried to upsell to a not-free Python class. Since CA’s classes are MOOCs (Massive Open Online Courses), the developers built in an automated way of correcting student code: embedded behind each web page of the course, there’s code that examines the student’s code and decides whether it is acceptable or not. Good in theory, not so good in practice. CA’s “code-behind” is flawed and sometimes prevented students from advancing to the next lesson. mailto:mfriedenthal@mywpl.org INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 INTRO TO CODING USING PYTHON AT THE WORCESTER PUBLIC LIBRARY | FRIEDENTHAL 2 Moreover, some of the CA tasks were inane. For example, one lesson incorporated a kind of Mad Libs game. This is where the instructions ask, for example, for 13 nouns and 11 adjectives, and these are combined with set sentences to generate a silly story. This assignment turned out to be too long and difficult to fulfill, preventing students from advancing. Although I used CA the first few times I offered the class, I subsequently abandoned it and wrote my own classroom material. After determining that CA wasn’t appropriate, I chose an online IDE where the students could code independently. This platform worked well when I tested it ahead of time, but when the whole class tried to log on at once, we received denial-of-service error messages. Hurriedly moving on to Plan C, I chose Thonny, a free Python IDE which we downloaded to each PC in the Lab (see https://thonny.org/). Each student receives a free manual (see figure 1), which I wrote. Every time I’ve offered this class I’ve edited the manual, clarifying those topics the students had a hard time with. I’ve also added new material, including commands students have shown me. It is now 90 pages long, written in Microsoft Word, and printed in color. We use soft binders with metal fasteners. Figure 1. Intro to Coding Using Python manual developed for the course. https://thonny.org/ INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 INTRO TO CODING USING PYTHON AT THE WORCESTER PUBLIC LIBRARY | FRIEDENTHAL 3 The manual consists of the following sections: • Cover: course name, dates we meet, time class starts and ends, location, instructor’s name, manual version number, and a place for the student to write their own name. • Syllabus: goals for each of the six sessions. This is aspirational. • Basic information about programming, including an online alternative to Thonny, for students who don’t have a computer at home and wish to use our public computers for homework. • Lessons 1 – 17: “Hello World” and beyond. • Lesson 18: Object Oriented Design, which I consider to be advanced, optional material. Skipped if time is pressing or the class isn’t ready for it. • Lesson 19: Wrap-up: o How to write good code. o How to debug. o List of suggested topics for further study. o Online resources for Python forums and community. • List of WPL‘s print resources on Python and programming. • Relevant comic strips and cartoons. In March 2019, my manager asked me to start assigning homework. If a student attends all six sessions and makes a decent attempt at each assignment, at the sixth session they receive a Certificate of Completion. The certificate has the WPL name & logo, the student’s name, and my signature. Typically three or four students earn a certificate. Homework is emailed to me as an attachment. This class meets on Tuesday evenings and I tell students to send me their homework as soon as possible. Inevitably, several students don’t email me until the following Monday. While I don’t give out grades, I do spend considerable time reviewing homework, line by line, and I email back detailed feedback. When the January 2020 course started, I found that between October’s class and January, Outlook implemented a security protocol which removes certain file extensions from incoming email. And – you can see where this is going – the .py Python extension was one of them. I told students to rename their Python code files from xxxx.py to xxxx.py.doc, where “xxxx” is their program name. This fools Outlook into thinking the file is a Microsoft Word document and the email is delivered to me intact. When it arrives, I remove the .doc extension from the attachment and save it to a student-specific file. Then I open the file in Thonny and review it. Physically, our Computer Lab contains an instructor’s computer and twelve student computers (see figure 2). It also has a projector which projects the active window from the instructor’s computer onto a screen: usually the class manual. I use dry erase markers in a variety of colors to illustrate the concepts on a whiteboard. There is also a supply of pencils on hand for student note- taking use. The class is offered once per season. Although the classroom can accommodate twelve students, we set our maximum registration to fourteen, which allows us to maximize attendance even if patrons cancel or don’t show up. And if all fourteen do attend the first class, we have two lap tops I INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 INTRO TO CODING USING PYTHON AT THE WORCESTER PUBLIC LIBRARY | FRIEDENTHAL 4 can bring into the Lab. We also maintain a small waitlist, usually of five spots. We’ve offered this class seven times and the registration and waitlists have been full every time. Sometimes we have to turn students away. Figure 2. Classroom at Worcester Public Library. However, we had a problem with registered patrons not showing up, so last spring we implemented a process where, about a week before class starts, I email each student, asking them confirm their continued interest in the class. I tell them that if they are no longer interested—or don’t respond - I will give the seat we reserved for them to another interested patron (from the waitlist). In this email I also outline how the course is structured and that they can each earn a Certificate of Completion. I tell them class starts promptly at 5:30 and to please plan accordingly. Some students don’t check their email. Some patrons show up without ever registering; they are told registration is required and to try again in a few months. I keep track of attendance on an Excel spreadsheet. Here in Worcester, MA, weather is definitely a factor for our winter sessions. INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 INTRO TO CODING USING PYTHON AT THE WORCESTER PUBLIC LIBRARY | FRIEDENTHAL 5 Over time I’ve made the class more dynamic. I have a student read a paragraph in the manual aloud. I’ve switched around the order of some lessons, in response to student questions. I have them play a game to teach Boolean logic: “If you live in Worcester And you love pizza, stand up!”… then: “If you live in Worcester Or you love pizza, stand up!” Students range from experienced programmers (of other languages), to people with no experience but great aptitude, to people who just never seem to “get it”. This material is technical and I try hard to communicate the concepts but I lose a few students every time. We ask our patrons for feedback on all of our programs. Our Python students have written: • “… the classes were formatted in an organized manner that was beginner friendly” • “The manual is a big help. I'm thankful that the program is free.” • “… coding is fun and I learned a new skill.” • “This made me think critically and helped me understand where my errors in the programs were.” WPL is proud to offer classes that make a difference in our patrons’ lives. ABSTRACT History and Details of Our Class 12209 ---- Applying Gamification to the Library Orientation: A Study of Interactive User Experience and Engagement Preferences ARTICLES Applying Gamification to the Library Orientation A Study of Interactive User Experience and Engagement Preferences Karen Nourse Reed and A. Miller INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12209 Karen Nourse Reed (karen.reed@mtsu.edu) is Associate Professor, Middle Tennessee State University. A. Miller (a.miller@mtsu.edu) is Associate Professor, Middle Tennessee State University. © 2020. ABSTRACT By providing an overview of library services as well as the building layout, the library orientation can help newcomers make optimal use of the library. The benefits of this outreach can be curtailed, however, by the significant staffing required to offer in-person tours. One academic library overcame this issue by turning to user experience research and gamification to provide an individualized online library orientation for four specific user groups: undergraduate students, graduate students, faculty, and community members. The library surveyed 167 users to investigate preferences regarding orientation format, as well as likelihood of future library use as a result of the gamified orientation format. Results demonstrated a preference for the gamified experience among undergraduate students as compared to other surveyed groups. INTRODUCTION Background Newcomers to the academic campus can be a bit overwhelmed by their unfamiliar environment: there are faces to learn, services and processes to navigate, and an unexplored landscape of academic buildings to traverse. Whether one is an incoming student or recently hired employee of the university, all need to become quickly oriented to their surroundings to ensure productivity. In the midst of this transition, the academic library may or may not be on the list of immediate inquiries; however, the library is an important place to start. Newcomers would be wise to familiarize themselves with the building and its services so that they can make optimal use of its offerings. Two studies found that students who used the library received better grades and had higher retention rates. 1 Another study regarding university employees revealed that untenured faculty made less use of the library than tenured faculty, a problem attributed to lack of familiarity with the library.2 Researchers have also found that faculty will often express interest in different library services without realizing that these services are in fact available.3 It is safe to say that libraries cannot always rely on newcomers to discover the physical and electronic services on their own; they need to be shown these items in order to mitigate the risk of unawareness. In consideration of these issues, the Walker Library at Middle Tennessee State University (MTSU) recognized that more could be done to welcome its new arrivals to campus. The public university enrolls approximately 21,000 students, the majority of whom are undergraduates. However, with a Carnegie classification of doctoral/professional and over one hundred graduate degree programs, there was a strong need for specialized research among the university’s graduate students and faculty. Other groups needed to use the library too: non-faculty employees on campus as well as community users who frequently used Walker Library for its specialized and general collections. The authors realized that when new members of these different groups mailto:karen.reed@mtsu.edu mailto:a.miller@mtsu.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 2 arrived on campus, few opportunities were available for acclimation to the library’s services or building layout. Limited orientation experiences were conducted within library instruction classes, but these sessions primarily taught research skills and targeted freshman general- education classes as well as select upper-division and graduate classes. In short, it appeared that students, employees, and visitors to the university would largely have to discover the library’s services on their own through a search on the library website or an exploration of the physical library. It was very likely that, in doing so, the newcomers might miss out on valuable services and information. As MTSU librarians, the authors felt strongly that library orientations were important to everyone at the university so that they might make optimal use of the library’s offerings. The authors based this opinion on their knowledge of relevant scholarly literature as well as their own anecdotal experiences with students and faculty.4 The authors defined the library orientation differently from library instruction: in their view, an orientation should acquaint users with the services and physical spaces of the library, as compared to instruction that would teach users how to use the library’s electronic resources such as databases. The desired new approach would structure orientations in response to the different needs of the library’s users. For example, the authors found that undergraduates typically had distinct library interests compared to faculty. It was recognized that library orientations were time-consuming for everyone: library patrons at MTSU often did not want to take the time for a physical tour, nor did the library have the staffing to accommodate large-scale requests. The authors turned to the gamification trend, and specifically interactive storytelling, as a solution. Interactive storytelling has previous applications in librarianship as a means of creating an immersive and self-guided user experience.5 However, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. To overcome this gap, the authors developed an online, interactive, game-like experience via storytelling software to orient four different groups of users to the library’s services. These groups were undergraduate students, graduate students, faculty members (which included both faculty and staff at the university), and community members (i.e., visitors to the university or alumni); see figure 1 for an illustration of each groups’ game avatars. These groups were invited to participate in the gamified experience called LibGO (short for library game orientation). After playing LibGO, participants gave feedback through an online survey. This paper will give a brief explanation of the creation of the game, as well as describe the results of research conducted to understand the impact of the gamified experience across the four user groups. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 3 Figure 1. LibGO players were allowed to self-select their user group upon entering the game. Each of the four user groups was assigned an avatar and followed a logic path specified for that group. LITERATURE REVIEW Traditional Orientation Searches for literature on library orientation yield very broad and yet limited details about users of the traditional library orientation method. It is important to note that the terms “library tour” and “library orientation” can be somewhat vague, because this terminology is not interchangeable, yet is frequently treated as such in the literature.6 These terms are often included among library instruction materials which predominately influence undergraduate students.7 Kylie Bailin, Benjamin Jahre, and Sarah Morris define orientation as “any attempt to reduce library anxiety by introducing students to what a college/university library is, what it contains, and where to find information while also showing how helpful librarians can be.”8 Their book is a culmination of case studies of academic library orientation in various forms worldwide where the common theme across most chapters is the need to assess, revise, and change library orientation models as needed, especially in response to feedback, staff demands, and the evolving trend of libraries and technology.9 Furthermore, the majority of these studies are undergraduate-focused, and often freshman-focused, while only a few studies are geared towards graduate students. Other traditional orientation problems discussed in the literature include students lacking intrinsic motivation to attend library orientation, library staff time required to execute the orientation, and lack of attendance.10 Additionally, among librarians there seems to be consensus that the traditional library tours are the least effective means of orientation, yet they are the most highly used and with attention predominately focused on the undergraduate population alone. 11 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 4 In 1997, Pixey Anne Mosely described the traditional guided library tour as ineffective, and documented the trend of libraries discontinuing it in favor of more active learning options.12 Her study surveyed 44 students who took a redesigned library tour, all of whom were undergraduates (with freshmen as the target population). Although Mosely’s study only addressed one group of library users, it does attempt to answer a question on library perception whereby 93 percent of surveyed students indicated feeling more comfortable in using the library after the more active learning approach.13 A comparison study by Marcus and Beck looked at traditional vs treasure hunt orientations, and ultimately discovered that perception of the traditional method is limited by the selective user population and lack of effective measurements. They cited the need for continued study of alternative approaches to academic library orientation.14 A study by Kenneth Burhanna, Tammy Eschedor Voelker, and Julie Gedeon looked at the traditional library tour from the physical and virtual perspective. Confronted with a lack of access to the physical library, these researchers at Kent State University decided to add an online option for the required traditional freshman library tour.15 Their study compared the efficacy of learning and affective outcomes between face-to-face library tours and those of online library tours. Of the 3,610 students who took the required library tour assignment, 3,567 chose the online tour method and 63 opted or were required to take the in-person, librarian-led tour. Surveys were later sent to a random list of 250 students who did not take the in-person tour and the 63 students who did take the in-person tour. Of the 46 usable responses all but one were undergraduates and 39 (85 percent) of them were freshman.16 This is a small sample size with a ratio of slightly greater than 2:1 for online versus in-person tour participation. Although results showed that an instructor’s recommendation on format selection was the strongest influencing factor, convenience was also significant for those who selected the online option (81.5 percent). In contrast, only 18.5 percent of the students who took the face-to- face tour rated it as convenient. The authors found that regardless of tour type, students were more comfortable using the library (85 percent) and more likely to use library resources (80 percent) after having taken a library tour. Interestingly, students who took the online tour seemed slightly more likely to visit the physical library than those who took the in-person tour. Ultimately the analysis of both tours showed this method of library orientation encourages library resource use, and the “online tour seems to perform as well, if not slightly better than the in-person tour.”17 Gamification Use in Libraries An alternative format to the traditional method is gamification. Gamification has become a familiar trend within academic libraries in recent years, and most often refers to the use of a technology - based game delivery within an instructional setting. Some users find gamified library instruction to be more enjoyable than traditional methods. For these people, gamification can potentially increase student engagement as well as retention of information.18 The goal of gamification is to create a simplified reality with a defined user experience. Kyle Felker and Eric Phetteplace emphasized the importance of user interaction over “specific mechanics or technologies” in thinking about the gamification design process.19 Proponents of gamification of library instructional content indicate that it connects to the broader mission of library discovery and exploration as exemplified through collaboration and the stimulation of learning.20 Additional benefits of gamification are its teaching, outreach and engagement functions.21 Many researchers have documented specific applications of online gaming as a means of imparting library instruction. Mary J. Broussard and Jessica Urick Oberlin described the work of librarians at Lycoming College in developing an online game as one approach to teaching about INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 5 plagiarism.22 Melissa Mallon offered summaries of nine games produced for higher education, several of which were specifically created for use by academic libraries.23 Many of these online library games reviewed used Flash, or required players to download the game before playing. By contrast, J. Long detailed an initiative at Miami University to integrate gamification into the library instruction, a project which utilized Twine.24 Twine is an in-browser method and therefore avoids the problem of requiring users to download additional software prior to playing the game. Other libraries have used online gamification specifically as a tool for library orientations. Although researchers have demonstrated that the library orientation is an important practice in establishing positive first impressions of the library and counteracting library anxiety among new users, the differences between in-person versus online delivery formats are unclear.25 Several successful instances have been documented in which the orientation was moved to an online game format. Nancy O’Hanlon, Karen Diaz, and Fred Roecker described a collaboration at Ohio State University Libraries between librarians and the Office of First Year Experience; for this project, they created a game to orient all new students to the library prior to arrival on campus.26 The game was called “Head Hunt,” and was cited among those games listed in the article by Mallon. 27 Anna-Lise Smith and Lesli Baker reported the “Get a Clue” game at Utah Valley University which oriented new students over two semesters.28 Another orientation game developed at California State University-Fresno was noteworthy for its placement in the university’s learning management system (LMS).29 In reviewing the literature regarding online library gamification efforts, there appear to be several best practices. Several studies cite initial student assessment to understand student knowledge and/or perceptions of the content, followed by an iterative design process with a team of librarians and computer programmers.30 Felker and Phetteplace reinforced the need for this iterative process of prototyping, testing, deployment, and assessment as one key to success; however they also stated that the most prevalent reason for failure is that the games are not fun for users.31 Librarians are information experts, and are not necessarily trained in fun game design. Some libraries have solved this problem by partnering with or hiring professional designers; however for many under-resourced libraries, this is not an option.32 Taking advantage of open- source tools, as well as the documented trial-and-error practices of others, can be helpful to newcomers who wish to break into new library engagement methods utilizing gamification. As literature has shown, a traditional library tour may have a place in the list of library services, but for whom and at what cost are questions with limited answers in studies done to date. Gamification has offered an alternative perspective but with narrow accounts of its success in the online storytelling format and for users outside of the heavily focused freshman group. Across all literature of library orientation studies, there is little reference to other library user populations such as faculty, staff, community users, distance students, or students not formally part of a class that requires library orientation. DEVELOPMENT OF THE LIBRARY GAME ORIENTATION (LIBGO) LibGO was developed by the authors with not only a consideration for the Walker Library user experience, but also with a specific attention to the differing needs of the multiple user groups served by the library. This user-focused concern led to exploring creative methodologies such as user experience research and human-centered design thinking, a process of overlapping phases that produces a creative and meaningful solution in a non-linear way. The three pillars of design INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 6 thinking are inspiration, ideation, and iteration.33 Defining the problem and empathizing with the users (inspiration) led into the ideation phase, whereby the authors created low- and high-fidelity prototypes. The prototypes were tested and improved (iteration) through the use of beta testing in which playtesters interacted with the gamified orientation. The authors were novice developers of the gamified orientation, and this entailed a learning curve for not only the design thinking mindset but also the technical achievability. The development started with design thinking conversations and quickly turned to low-fidelity prototypes designed on paper. The development soon advanced to the actual coding so that the authors could get early designs tested before launching the final version. Prior to deployment on the library’s website, LibGO underwent a series of playtesting by library faculty, staff, and student employees. This testing was invaluable and led to such improvements as streamlining of processes and less ambiguity of text. LibGO was developed with the Twine open-source software (https://twinery.org), a product which is primarily used for telling interactive, non-linear stories with HTML. Twine was an excellent application for this project as it allowed the creation of an online and interactive “choose your own adventure” styled library orientation game, in which users could explore the library based upon their selection of one of multiple available plot directions. With a modest learning curve and as an open source software, Twine is highly accessible for those who are not accustomed to coding. For those who know HTML, CSS, JavaScript, variables, and conditional logic, Twine’s capabilities can be extended. The library’s interactive orientation adventure requires users to select one of the four available personas: undergraduate student, graduate student, faculty, or community member. Users subsequently follow that persona through a non-linear series of places, resources and points of interest built with the HTML output of using Twee (Twine’s programming language). See figure 2 for an example point of interest page and figure 3 for an example of a user’s final score after completing the gamified experience. Once the Twine story went through several iterations of design and testing, the HTML file was placed on the library’s website for the gamified orientation to be implemented with actual users. https://twinery.org/ INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 7 Figure 2. This instructional page within LibGO explains how to reserve different library spaces online. Upon reading this content, the user will progress by clicking on one of the hypertext lines in blue font at the bottom. Figure 3. Based upon the displayed avatar, this LibGO page is representative of a graduate student’s completion of LibGO. The page indicates the player’s final score and gives additional options to return to the home page or complete the survey. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 8 Purpose of Study LibGO utilized the common "choose your own adventure" format whereby players progress through a storyline based upon their selection of one of multiple available plot directions. Although the literature suggests that other technology-based methods are an engaging and instructive mode of content delivery, little prior research exists regarding this specific approach to library outreach. Furthermore, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. The researchers wanted to understand the potential of interactive storytelling as a means to educate a range of users on library services as well as make the library more approachable from a user perspective. The study was designed to understand the user experience of each of the four groups. The researchers hoped to discern which users, if any, found the gamified experience to be a helpful method of orientation to the library’s physical and electronic services. Another area of inquiry was to determine whether this might be an effective delivery method by which to target certain segments of the campus for outreach. Finally, the study intended to determine whether this method of orientation might incline participants toward future use of the library. METHODOLOGY Overview The authors selected an embedded mixed methods design approach in which quantitative and qualitative data were collected concurrently through the same assessment instrument.34 The survey instrument primarily collected quantitative data, however a qualitative open-response question was embedded at the end of the survey: this question gathered additional data by which to answer the research questions. Each data set (one quantitative and one qualitative) was analyzed separately for each participant group, and then the groups were compared to develop a richer understanding of participant behavior. Research Questions The data collection and subsequent analysis attempted to answer the following questions: 1. Which group(s) of library users prefer to be oriented to library services and resources through the interactive storytelling format, as compared to other formats? 2. Which group(s) of library users are more likely to use library services and resources after participating in the interactive storytelling format of orientation? 3. What are user impressions of LibGO, and are there any differences in impression based on the characteristics of the unique user group? Participants Participants for the study were recruited in-person and via the library website. In-person recruitment entailed the distribution of flyers and use of signage to recruit participants to play LibGO in a library computer lab during a one-day event. Online recruitment lasted approximately ten weeks and simply involved the placement of a link to LibGO on the home page of th e library’s website. A total of 167 responses were gathered through both methods and participants were distributed as shown in table 1. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 9 Table 1. Composition of Study’s Participants Group number Affiliation Number of responses 1 Undergraduate students 55 2 Graduate students 62 3 Faculty 13 4 Staff 28 5 Community members 9 TOTAL 167 For the purposes of statistical data analysis, groups 3 and 4 were combined to produce a single group of 41 university employee respondents; also, group 5’s data was not included in the statistical analysis due to the low number of participants. Qualitative data for all groups, however, was included in the non-statistical analysis. Survey Instrument A survey with twelve total questions was developed for this study and was administered online through Qualtrics. After playing LibGO, participants were asked to voluntarily complete the survey; if they agreed, they were redirected to the survey’s website. Before answering any survey questions, the instrument administered an informed consent statement to participants . All aspects of the research, including the survey instrument, were approved through the university’s Institutional Review Board (protocol number 18-1293). The first part of the survey (see appendix A) consisted of ten questions, each with a ten-point Likert scaled response. The first five questions were each designed to measure a Preference construct, and the next five questions each measured a Likelihood construct. The Pref erence construct referred to participant’s preference for a library orientation: did they prefer LibGO’s online interactive storytelling format, or did they prefer another format such as in-person talks? The Likelihood construct referred to the participant’s self-perceived likelihood of more readily engaging with the library in the future (both in-person and online) after playing LibGO. The second part of the survey gathered the participant’s self-reported affiliation (see table 1 for the list of possible group affiliations) as well as offered participants an open-ended response area for optional qualitative feedback. Data Collection The study’s data was collected in two stages. In stage one, LibGO was unveiled to library visitors during a special campus-wide week of student programming events. On the library’s designated event day, the researchers held a drop-in event at one of the library’s computer labs (see figure 4 for an example of event advertisement). Library visitors were offered a prize bag and snacks if they agreed to play LibGO and complete the survey. During the three-hour-long drop-in session, 58 individual responses were collected: the vast majority of these came from undergraduate students (51 responses), with additional responses from graduate students (n = 4), university staff employees (n = 2), and one community member responding. Community members were defined as anyone not currently directly affiliated with the university; this group may have included prospective students or alumni. Stage 2 began the following day after the library drop-in event, and simply involved the placement of a link to LibGO on the home page of the library’s website. Any visitor to the library’s website could click on the advertisement to be taken to LibGO. This link INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 10 remained active on the library website for ten weeks, at which point the final data was gathered. A total of 167 responses were gathered during both stages and participants were distributed as previously shown in table 1. Figure 4. Example of Student LibGO Event Advertisement RESULTS Quantitative Findings Statistical analysis of each of the ten quantitative questions required the use of one-way ANOVA in SPSS. A post hoc test (Hochberg’s GT2) was run in each instance to account for the different sample sizes. For all statistical analysis, only the data from undergraduates, graduate students, and university employees (a group which combined both faculty and staff results) were utilized. A listing of mean comparisons by group, for each of the ten survey questions, may be found in table 2. The analysis of the one-way ANOVAs yielded statistically significant results for three of the ten individual questions in the first part of the survey: questions 2, 3, and 6 (see table 3). Table 2. Descriptive Statistics for Survey Results (10-point scale, with 10 as most likely) Survey Question Mean for Undergraduate Students Mean for Graduate Students Mean for University Employees 1. In considering the different ways to learn about Walker Library, do you find this library orientation game to be more or less preferable as compared to other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)? 7.02 6.39 6.02 2. In your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources? 8.13 6.94 7.12 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 11 3. If your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own?) 7.38 5.94 5.98 4. Please indicate your level of agreement with the following statement: “As compared to playing the game, I would have preferred to learn about the library’s resources and services by my own exploration of the library website?” 6.11 6.50 5.88 5. Please indicate your level of agreement with the following statement: “As compared to playing the game, I would have preferred to learn about the library’s resources and services through an in- person orientation tour.” 6.11 5.08 5.76 6. After playing this orientation game, are you more or less likely to visit Walker Library in person? 8.27 6.94 6.90 7. After playing this library orientation game, are you more or less likely to use the Walker Library website to find out about the library (such as hours of operation, where to go to get different materials/services, etc.)? 7.82 6.97 7.20 8. After playing this library orientation game, are you more or less likely to seek help from a librarian at Walker Library? 6.95 6.58 6.63 9. After playing this library orientation game, are you more or less likely to use the library’s online resources (such as databases, journals, e-books)? 7.67 7.15 6.90 10. After playing this library orientation game, are you more or less likely to attend a library workshop, training, or event? 6.96 6.73 6.24 TABLE 3. Overall Statistically Significant Group Differences df F p w2 Question 2 2 3.714 .027 .03 Question 3 2 4.508 .012 .04 Question 6 2 7.178 .001 .07 Question 2 asked “In your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources?” The one-way ANOVA found that there was a statistically significant difference between groups (F(2,155) = 3.714, p = .027, ω2 = .03). The post hoc comparison using the Hochberg’s GT2 test revealed that undergraduates were statistically significantly more likely to prefer LibGO in this manner (M = 8.13, SD = 1.94, p = .031) as INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 12 compared to the graduate students (M = 6.94, SD = 2.72). There was no statistically significant difference between undergraduates and the university employees (p = .145). According to criteria suggested by Roger Kirk, the effect size of .03 indicates a small effect in perceived usefulness of LibGO as an introduction among undergraduates.35 Question 3 asked “If your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)?” The one-way ANOVA found that there was a statistically significant difference between groups (F(2, 155) = 4.508, p = .012, ω2 = .04). The post hoc comparison using the Hochberg’s GT2 test found that undergraduates were statistically significantly more likely to prefer LibGO over other orientation options (M = 7.38, SD = 2.49, p = .021) as compared to graduate students (M = 5.94, SD = 3.06). There was no statistically significant difference between undergraduates and university employees (p = .053). The effect size of .04 indicates a small effect regarding undergraduate preference for LibGO versus other orientation options. Question 6 asked “After playing this library orientation game, are you more or less likely to visit Walker Library in person?” The one-way ANOVA found that there was a statistically significant difference between groups (F(2,155) = 7.178, p = .001, ω2 = .07). The post hoc comparison using the Hochberg’s GT2 test revealed that undergraduates were statistically significantly more likely to visit the library after playing LibGO (M = 8.27, SD = 2.09, p = .003) as compared to graduate students (M = 6.94, SD = 2.20). Additionally, the test found that undergraduates were statistically significantly more likely to visit the library after playing LibGO (p = .007) as compared to university employees (M = 6.90, SD = 2.08). According to criteria suggested by Kirk, the effect size of .07 indicates a medium effect regarding undergraduate potential to visit the library in person after playing LibGO. 36 In addition to testing each individual survey question, tests were run to understand the possible group differences by construct (Preference and Likelihood). The Preference construct was an aggregate of survey questions 1-5, and the Likelihood construct was an aggregate of survey questions 6-10. For both constructs, the one-way ANOVA found results which were not statistically significant. In all, the quantitative findings indicated three areas by which the experience of playing LibGO was more helpful for the surveyed undergraduates than the other surveyed groups (i.e., graduate students or university employees). At this point, the analysis turned to the qualitative data so as to better understand participant views of LibGO. Qualitative Findings Analysis of the qualitative results was limited to the data collected in the survey’s final question. Question 12 was an open-response area, and was intentionally prefaced with a vague prompt: “Do you have any final thoughts for the library (suggestions, additions, modification, comments, criticisms, praise, etc.)?” Of the 167 total survey responses, 67 individuals chose to answer this question. Preliminary analysis showed that the feedback derived from this question covered a spectrum of topics, ranging from remarks on the LibGO experience itself to broader concerns regarding other library services. Open coding strategies were utilized to interpret the content of participant responses. Under this methodology, the responses were evaluated for general themes and then coded and grouped INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 13 under a constant comparative approach.37 NVivo 12 software was used to code all 67 participant responses. Initial coding yielded eight open codes, but these were later consolidated into six final codes (see table 4). One code (LibGO Improvement Tip) was rather nuanced and yielded five axial codes (see table 5). Axial codes denoted secondary concerns which fell under a larger category of interest. Although some participants gave longer feedback which addressed multiple concerns, care was taken to segregate each distinct concern to a specific code. Therefore, it is important to note that some comments addressed multiple concerns, and so the total number of concerns (n = 76) is greater than the total number of individuals responding to the prompt (n = 67). TABLE 4. Distribution of Qualitative Codes by User Group Code Undergraduate Graduate Faculty Staff Community member Total # concerns Positive feedback 7 7 1 4 2 21 Negative feedback 1 2 0 3 0 6 In-person tour preference 2 3 0 1 0 6 LibGO improvement tip 5 11 1 3 3 23 Library services feedback 2 4 3 0 0 9 Library building feedback 1 7 1 2 0 11 Total: 18 34 6 13 5 76 Discussion of Qualitative Themes Positive Feedback (21 separate concerns). Affirmative comments regarding LibGO were primarily split between undergraduate and graduate students, with a small number of comments coming from the other groups. Although all groups stated that the game was helpful, one undergraduate wrote “I wish I would’ve received this orientation at the very beginning of the year!” A graduate student declared “This was a creative way to engage students, and I think it should be included on the website for fun.” Both community members commented on the utility of LibGO in providing an orientation without having to physically come to the library; for example, “Interactive without having to actually attend the library in person which I liked.” Additionally, a community member pointed out the instructional capability of LibGO, writing “I think I learned more from the game than walking around in the library.” Negative Feedback (6 separate concerns). Unfavorable comments regarding LibGO primarily challenged the orientation’s characterization as a “game” in terms of its lack of fun. One graduate student wrote a comment representative of this concern by stating, “The game didn’t really seem like a game at all.” A particularly searing comment came from a university staff member who INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 14 wrote, “Calling this collection of web pages an ‘interactive game’ is a stretch, which is a generous way of stating it.” In-person Tour Preference (6 separate concerns). A small number of concerns indicated a preference for in-person orientations versus online. One undergraduate cited the ability to ask questions during an in-person tour as an advantage of that delivery medium. A graduate student mentioned their desire for kinesthetic learning over an online approach, writing, “I prefer hands - on exploration of the library.” LibGO Improvement Tip (23 separate concerns). Suggested improvements to LibGO were the largest area of qualitative feedback and produced five axial themes (subthemes); see table 5 for a breakdown of the five axial themes by group. 1. Design issues were the largest cited area of improvement, and the most commonly mentioned design problem was the inability of the user to go back to previously seen content. Although this functionality did in fact exist, it was apparently not intuitive to users; design modifications in future iterations are therefore critical. Other users made suggestions as to the color scheme used and the ability to magnify image sizes. 2. User experience was another area of feedback, and primarily included suggestions on how to make LibGO a more fun experience. One graduate student offered a role-playing game alternative. Another graduate student expressed an interest in a game with side missions, in addition to the overall goals, where tokens could be earned for completed missions; the student justified these changes by stating “I feel that incorporating these types of idea will make the game more enjoyable.” In suggesting similar improvements, one undergraduate stated that LibGO “felt more like a quiz than a game.” 3. Technology issues primarily addressed two related issues: images not loading and broken links. Images not loading could be dependent on many factors, including the user’s browser settings, internet traffic (volume) delaying load time, or broken image links, among others. Broken links could be the root issue since the images used in LibGO were taken from other areas of the library website. This method of gathering content pointed out a design vulnerability of using existing image locations (controlled by non-LibGO developers) rather than images exclusively for LibGO. 4. Content issues were raised exclusively by graduate students. One student felt that LibGO placed an emphasis on physical spaces in the library and did not give a deep enough treatment to library services. Another graduate student asked for “an interactive map to click on so that we physically see the areas” of the library, thus making the interaction more user-friendly with a visual. 5. Didn’t understand purpose is a subtheme where improvement is needed and is based on two comments made by the two university staff members. One wrote that “An online tour would have been better and just as informative,” although LibGO was not only designed to be an online tour of the library, but also an orientation of the library’s services. The other staff member wrote, “I read the rules but it was still unclear what the objective was.” In all, it is clear that LibGO’s purpose was confusing for some. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 15 Table 5. LibGO Improvement Tip Axial Codes by User Group Axial Code Undergraduate Graduate Faculty Staff Community member Total # concerns Design 4 3 0 0 1 8 User experience 1 2 1 0 1 5 Tech issue 0 1 0 1 0 2 Content 0 5 0 0 1 6 Didn’t understand purpose 0 0 0 2 0 2 Total: 5 11 1 3 3 23 Library Services Feedback (9 separate concerns). Several participants took the opportunity to provide feedback on general library services rather than on LibGO itself. Undergraduates simply gave general positive feedback about the value of the library, but many graduate students gave recommendations regarding specific electronic resource improvements. Additionally, one graduate student wrote, “I think it is critical to meet with new graduate students before they start their program,” something the library used to do but had not pursued in recent years. Although these comments did not directly pertain to LibGO, the authors accepted all of them as valuable feedback to the library. Library Building Feedback (11 separate concerns). This was another theme in which graduate students dominated the comments. Feedback ranging from requests for microwave use, additional study tables and better temperature control in the building appeared. Several participants asked for greater enforcement of quiet zones. Like the Library Services Feedback, the authors again took these comments as helpful to the overall library rather than LibGO. DISCUSSION The results of this study indicated that some groups of library visitors better received the gamified library orientation experience than other groups. Undergraduate students indicated the largest appreciation for a library orientation via LibGO. Specifically, they demonstrated a statistically significant difference over the other groups in supporting LibGO’s usefulness as an orientation tool, a preference for LibGO over other orientation formats, and a likelihood of future use of the physical library after playing LibGO. These very encouraging results provide evidence for the efficacy of alternative means of library orientation. The qualitative results provided additional helpful insight regarding the user impressions from each of the five surveyed groups. This feedback demonstrated that a variety of groups benefited from the experience of playing LibGO, including some community members who appreciated LibGO as a means of becoming acclimated to the library without having to enter the building. A virtual orientation format was not ideal for a few players who indicated a preference for a face-to- face orientation due to the ability to ask questions. Many people identified areas of improvement for LibGO. Graduate students in particular offered a disproportionate number of suggestions as compared to the other groups. While they provided a great deal of helpful feedback, it is possible that graduate students were so distracted by the perceived problems that they could not fully take in the experience or gain value from LibGO’s orientation purpose. It is also very likely that LibGO INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 16 simply was not very fun for these players: several players noted that it did not feel like a game but rather a collection of content. The review of literature indicated that this amusement issue is a common pitfall of educational games. Although the authors tried to design an enjoyable orientation experience, it is possible that more work is needed to satisfy user expectations. The mixed-methods design of this study was instrumental in providing a richer understanding of user perceptions. While the statistical analysis of participant survey responses was very helpful in identifying clear trends between groups, the qualitative analysis helped the authors draw valuable conclusions. Specifically, the open-response data demonstrated that additional groups such as graduate students and community members appreciated the experience of playing LibGO; this information was not readily apparent through the statistical analysis. Additionally, the qualitative analysis demonstrated that many groups had concerns regarding areas of improvement that may have impaired their user experience. These important findings could help guide future directions of the research. In all, the authors concluded this phase of the research feeling satisfied that LibGO showed great promise for library orientation delivery but could benefit from continued development and future user assessment. Although undergraduate students seemed most receptive overall to a virtual orientation experience, other groups appeared to have benefited from the resource. STUDY LIMITATIONS A primary limitation of this study was its small sample size. As the entire university campus was targeted for participation in the study, the number of respondents was far too small to generalize the results. Despite this limitation however, the study’s population reflected many different groups of library patrons on campus. The findings are therefore valuable as a means of stimulating future discussion regarding the value of alternative library orientation methods utilizing gamification. Another limitation is that the authors did not pre-assess the targeted groups for their prior knowledge of Walker Library services and building layout, nor for their interest in learning about these topics. It is possible that various groups did not see the value in learning about the library for a variety of reasons. Faculty members, in particular, may have considered their prior knowledge adequate for navigating the electronic holdings or building layout without recognizing the value of the other many services offered physically and electronically by the library. All groups may have experienced a level of “library anxiety” that prevented them from being motivated to learn more about the library.38 It is difficult to understand the range of covariate factors without a pre-assessment. Finally, there was qualitative evidence supporting the limitation that LibGO did not properly convey its stated purpose of orientation rather than imparting research skills. Without understanding LibGO’s focus on library orientation, users could have been confused or disappointed by the experience. Although care was taken to make this purpose explicit, some users indicated their confusion in the qualitative data. This observed problem points to a design flaw that undoubtedly had some bearing on the study’s results. CONCLUSION & FUTURE RESEARCH Convinced of the importance of the library orientation, the authors sought to move this traditional in-person experience to a virtual one. The quantitative results indicated that the gamified INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 17 orientation experience was useful to undergraduate students in its intended purpose of acclimating users to the library, as well as encouraging their future use of the physical library. At a time in which physical traffic to the library has shown a marked decline, new outreach strategies should be considered.39 The results were also helpful in showing that this particular iteration of the gamified orientation was preferred over other delivery methods by undergraduate students, as compared to other groups, to a statistically significant level. This is an important finding as it demonstrates that a diversified outreach strategy is necessary: different groups of library patrons desire their orientation information in different formats. The next logical question to ask however is: Why did the other groups examined through the statistical data analysis (graduate students and faculty) not appreciate the gamified orientation to the same level as undergraduates? The answers to this question are complicated and may be explained in part by the qualitative analysis. Based upon those findings, it is possible that the game did not appeal to these groups on the basis of fun or enjoyment; this concern was specifically mentioned by graduate students. Faculty members, including staff, provided a smaller level of qualitative feedback; it is therefore difficult to speculate as to their exact reasons for disengagement with LibGO. With this concern in mind, the authors would like to concentrate their next iteration of research on the specific library orientation needs of graduate students and faculty. Both groups present different, but critical, needs for outreach. Graduate students were the largest group of survey respondents, presumably indicating a high level of interest in learning more about the library. Many graduate programs at MTSU are delivered partially or entirely online; as a result, these students may be less likely to come to campus. Due to graduate students’ relatively infrequent visits to campus, a virtual library orientation could be even more meaningful for them in meeting their need for library services information. Faculty are another important group to target because if they lack a full understanding of the library’s offerings, they are unlikely to assign assignments that wholly utilize the library’s services. Although it is possible that faculty prefer an in-person orientation, many new faculty have indicated limited availability for such events. A virtual orientation seems conducive to busy schedules. However, it is possible that the issue is simply a matter of marketing: faculty may not know that a virtual option is available, nor do they necessarily understand all that the library has to offer. In all, future research should begin with a survey to understand what both groups already know about the library, as well as the library services they desire. Another necessary step in future research would be the expansion of the development team to include computer programmers. Although the authors feel that LibGO holds great promise as a virtual orientation tool, more needs to be done to enhance the user’s enjoyment of the experience. Twine is a user-friendly software that other librarians could pick up without having to be computer programmers; however, programmers (professional or student) could bring a design expertise to the project. Future iterations of this project should incorporate the skills of multiple groups, including expertise in libraries, user research, visual design, interaction design, programming, marketing, and testers from each type of intended audience. Collectively, this group will have the greatest impact on improving the user experience and ultimately the usefulness of a gamified orientation experience. This experience with gamification, and specifically interactive storytelling, was a valuable experience for Walker Library. These results should encourage other libraries seeking an alternate INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 18 delivery method for orientations. The authors hope to build upon the lessons learned from this mixed methods research study of LibGO to find the correct outreach medium for their range of library users. ACKNOWLEDGMENTS Special thanks to our beta playtesters and student assistants who worked the LibGO Event, which was funded, in part, by MT Engage and Walker Library at Middle Tennessee State University. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 19 APPENDIX A: SURVEY INSTRUMENT INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 20 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 21 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 22 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 23 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 24 ENDNOTES 1 Sandra Calemme McCarthy, “At Issue: Exploring Library Usage by Online Learners with Student Success,” Community College Enterprise 23, no. 2 (January 2017): 27–31; Angie Thorpe et al., “The Impact of the Academic Library on Student Success: Connecting the Dots,” Portal: Libraries and the Academy 16, no. 2 (2016): 373–92, https://doi.org/10.1353/pla.20160027. 2 Steven Ovadia, “How Does Tenure Status Impact Library Usage: A Study of LaGuardia Community College,” Journal of Academic Librarianship 35, no. 4 (January 2009): 332–40, https://doi.org/10.1016/j.acalib.2009.04.022. 3 Chris Leeder and Steven Lonn, “Faculty Usage of Library Tools in a Learning Management System,” College & Research Libraries, 75, no. 5 (September 2014): 641–63, https://doi.org/10.5860/crl.75.5.641. 4 Kyle Felker and Eric Phetteplace, “Gamification in Libraries: The State of the Art,” Reference and User Services Quarterly 54, no. 2 (2014): 19-23, https://doi.org/10.5860/rusq.54n2.19; Nancy O’Hanlon, Karen Diaz, and Fred Roecker, “A Game-Based Multimedia Approach to Library Orientation,” (paper, 35th National LOEX Library Instruction Conference, San Diego, May 2007), https://commons.emich.edu/loexconf2007/19/; Leila June Rod-Welch, “Let’s Get Oriented: Getting Intimate with the Library, Small Group Sessions for Library Orientation,” (paper, Association of College and Research Libraries Conference, Baltimore, March 2017), http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/LetsGetOriented.pdf. 5 Kelly Czarnecki, “Chapter 4: Digital Storytelling in Different Library Settings,” Library Technology Reports, no. 7 (2009): 20-30; Rebecca J. Morris, “Creating, Viewing, and Assessing: Fluid Roles of the Student Self in Digital Storytelling,” School Libraries Worldwide, no. 2 (2013): 54–68. 6 Sandra Marcus and Sheila Beck, “A Library Adventure: Comparing a Treasure Hunt with a Traditional Freshman Orientation Tour,” College & Research Libraries 64, no. 1 (January 2003): 23–44, https://doi.org/10.5860/crl.64.1.23. 7 Lori Oling and Michelle Mach, “Tour Trends in Academic ARL Libraries,” College & Research Libraries, 63, no. 1 (January 2002): 13-23, https://doi.org/10.5860/crl.63.1.13. 8 Kylie Bailin, Benjamin Jahre, and Sarah Morriss, “Planning Academic Library Orientations: Case Studies from Around the World,” (Oxford, UK: Chandos Publishing, 2018): xvi. 9 Bailin, Jahre, and Morriss, “Planning Academic Library Orientations.” 10 Marcus and Beck, “A Library Adventure”; A. Carolyn Miller, “The Round Robin Library Tour,” Journal of Academic Librarianship 6, no. 4 (1980): 215–18; Michael Simmons, “Evaluation of Library Tours,” EDRS, ED 331513 (1990): 1-24. 11 Marcus and Beck, “A Library Adventure”; Oling and Mach, “Tour Trends”; Rod-Welch, “Let’s Get Oriented.” https://doi.org/10.1353/pla.20160027 https://doi.org/10.1016/j.acalib.2009.04.022 https://doi.org/10.5860/crl.75.5.641 https://commons.emich.edu/loexconf2007/19/ http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/LetsGetOriented.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/LetsGetOriented.pdf https://doi.org/10.5860/crl.64.1.23 https://doi.org/10.5860/crl.63.1.13 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 25 12 Pixey Anne Mosley, “Assessing the Comfort Level Impact and Perceptual Value of Library Tours,” Research Strategies 15, no. 4 (1997): 261–70, https://doi.org/10.1016/S0734- 3310(97)90013-6. 13 Mosley, “Assessing the Comfort Level Impact and Perceptual Value of Library Tours.” 14 Marcus and Beck, “A Library Adventure,” 27. 15 Kenneth J. Burhanna, Tammy J. Eschedor Voelker, and Jule A. Gedeon, “Virtually the Same: Comparing the Effectiveness of Online Versus In-Person Library Tours,” Public Services Quarterly 4, no. 4(2008): 317–38, https://doi.org/10.1080/15228950802461616. 16 Burhanna, Voelker, and Gedeon, “Virtually the Same,” 326. 17 Burhanna, Voelker, and Gedeon, “Virtually the Same,” 329. 18 Felker and Phetteplace, “Gamification in Libraries.” 19 Felker and Phetteplace, “Gamification in Libraries,”20. 20 Felker and Phetteplace, “Gamification in Libraries.” 21 Felker and Phetteplace, “Gamification in Libraries”; O’Hanlon et al., “A Game-Based Multimedia Approach.” 22 Mary J. Broussard and Jessica Urick Oberlin, “Using Online Games to Fight Plagiarism: A Spoonful of Sugar Helps the Medicine Go Down,” Indiana Libraries 30, no. 1 (January 2011): 28–39. 23 Melissa Mallon, “Gaming and Gamification,” Public Services Quarterly 9, no. 3 (2013): 210–21, https://doi.org/10.1080/15228959.2013.815502. 24 J. Long, “Chapter 21: Gaming Library Instruction: Using Interactive Play to Promote Research as a Process,” Distributed Learning (January 1, 2017), 385–401, https://doi.org/10.1016/B978-0- 08-100598-9.00021-0. 25 Rod-Welch, “Let’s Get Oriented.” 26 O’Hanlon et al., “A Game-Based Multimedia Approach.” 27 Mallon, “Gaming and Gamification.” 28 Anna-Lise Smith and Lesli Baker, “Getting a Clue: Creating Student Detectives and Dragon Slayers in Your Library,” Reference Services Review 39, no. 4 (November 2011): 628–42, https://doi.org/10.1108/00907321111186659. 29 Monica Fusich et al., “HML-IQ: Frenso State’s Online Library Orientation Game,” College & Research Libraries News 72, no. 11 (December 2011): 626–30, https://doi.org/10.5860/crln.72.11.8667. https://doi.org/10.1016/S0734-3310(97)90013-6 https://doi.org/10.1016/S0734-3310(97)90013-6 https://doi.org/10.1080/15228950802461616 https://doi.org/10.1080/15228959.2013.815502 https://doi.org/10.1016/B978-0-08-100598-9.00021-0 https://doi.org/10.1016/B978-0-08-100598-9.00021-0 https://doi.org/10.1108/00907321111186659 https://doi.org/10.5860/crln.72.11.8667 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 APPLYING GAMIFICATION TO THE LIBRARY ORIENTATION | REED AND MILLER 26 30 Broussard and Oberlin, “Using Online Games”; Fusich et al., “HML-IQ”; O’Hanlon et al., “A Game- Based Multimedia Approach.” 31 Felker and Phetteplace, “Gamification in Libraries.” 32 Felker and Phetteplace, “Gamification in Libraries”; Fusich et al., “HML-IQ.” 33 “Design Thinking for Libraries: A Toolkit for Patron-Centered Design,” Ideo (2015), http://designthinkingforlibraries.com. 34 John W. Creswell and Vicki L. Plano Clark, Designing and Conducting Mixed Methods Research (Thousand Oaks, CA: Sage Publications, 2007). 35 Roger Kirk, “Practical Significance: A Concept Whose Time Has Come,” Educational and Psychological Measurement, no. 5 (1996). 36 Kirk, “Practical Significance.” 37 Sandra Mathison, “Encyclopedia of Evaluation,” SAGE, 2005, https://doi.org/10.4135/9781412950558. 38 Rod-Welch, “Let’s Get Oriented.” 39 Felker and Phetteplace, “Gamification in Libraries.” http://designthinkingforlibraries.com/ https://doi.org/10.4135/9781412950558 ABSTRACT INTRODUCTION Background Literature Review Traditional Orientation Gamification Use in Libraries Development of the Library Game Orientation (LibGO) Purpose of Study Methodology Overview Research Questions Participants Survey Instrument Data Collection Results Quantitative Findings Qualitative Findings Discussion of Qualitative Themes Discussion Study Limitations Conclusion & Future Research Acknowledgments Appendix A: Survey Instrument ENDNOTES 12211 ---- Likes, Comments, Views: A Content Analysis of Academic Library Instagram Posts ARTICLES Likes, Comments, Views A Content Analysis of Academic Library Instagram Posts Jylisa Doney, Olivia Wikle, and Jessica Martinez INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12211 Jylisa Doney (jylisadoney@uidaho.edu) is Social Sciences Librarian, University of Idaho. Olivia Wikle (omwikle@uidaho.edu) is Digital Initiatives Librarian, University of Idaho. Jessica Martinez (jessicamartinez@uidaho.edu) is Science Librarian, University of Idaho. © 2020. ABSTRACT This article presents a content analysis of academic library Instagram accounts at eleven land-grant universities. Previous research has examined personal, corporate, and university use of Instagram, but fewer studies have used this methodology to examine how academic libraries share content on this platform and the engagement generated by different categories of posts. Findings indicate that showcasing posts (highlighting library or campus resources) accounted for more than 50 percent of posts shared, while a much smaller percentage of posts reflected humanizing content (emphasizing warmth or humor) or crowdsourcing content (encouraging user feedback). Crowdsourcing posts generated the most likes on average, followed closely by orienting posts (situating the library within the campus community), while a larger proportion of crowdsourcing posts, compared to other post categories, included comments. The results of this study indicate that libraries should seek to create Instagram posts that include various types of content while also ensuring that the content shared reflects their unique campus contexts. By sharing a framework for analyzing library Instagram content, this article will provide libraries with the tools they need to more effectively identify the types of content their users respond to and enjoy as well as make their social media marketing on Instagram more impactful. INTRODUCTION Library use of social media has steadily increased over time; in 2013, 86 percent of libraries reported using social media to connect with their patron communities.1 The ways in which libraries use social media tend to vary, but common themes include marketing services, content, and spaces to patrons, as well as creating a sense of community.2 Even with this wealth of research, fewer studies have examined how libraries use Instagram, and those that do often utilize a formal or informal case study methodology.3 This research seeks to fill that gap by examining the types of content shared most frequently by a subset of academic library Instagram accounts. Although this research focused on academic libraries, its methods and findings could be leveraged by educational institutions and non-profits in their own investigations of Instagram usage and impact. LITERATURE REVIEW Since its inception in 2010, Instagram’s number of account holders has been steadily increasing. By 2019, more than one billion user accounts were active each month, making it the third most popular social media network in the world, and the Pew Research Center has reported that Instagram is the second most used social media platform among people ages 18-29 in the United States, after Facebook.4 Instagram has estimated that 90 percent of user accounts follow at least one business account.5 Previous research has also shown that individuals who use Instagram to follow specific brands have the highest rates of engagement with, and commitment to, those mailto:jylisadoney@uidaho.edu mailto:omwikle@uidaho.edu mailto:jessicamartinez@uidaho.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 2 brands when compared to users of other social media platforms.6 Though businesses are fundamentally different in the products or services they are trying to market, academic libraries share a desire to provide information to, and engage with, their followers. As such, in the past decade, libraries have begun to adopt Instagram as a way to market their libraries and interact with patrons.7 However, methods and parameters for libraries’ use of Instagram vary across types of libraries and even within specific library types.8 Research has demonstrated that academic libraries’ use of social media, including Instagram, is often for the purpose of increasing the sense of community among librarians and patrons by marketing the library’s services and encouraging student feedback and interaction.9 Similarly, Harrison et al. discovered that academic library social media posts reflected three main themes: “community connections, inviting environment, and provision of content.”10 Chatten and Roughley have also reported that libraries’ use of social media ranges from providing customer service to promoting the library and building a community of users.11 Indeed, when comparing modern social networking systems, such as Instagram, to older platforms, such as Myspace, Fernandez posited that today’s popular social media sites encourage networking and are especially suited to creating community.12 Ideally, community engagement in the virtual social media environment would encourage more patrons to enter the library and thus engage in more face-to-face encounters.13 Libraries’ methods for measuring the success of their social media engagement are as varied as the ways in which they use social media. Assessment of libraries’ social media efficacy is tricky, and highly variable from institution to institution. Hastings has cautioned that librarians should recognize that patrons both actively and passively interact with social media content.14 For this reason, while a large number of comments or likes may be identified as positive markers for active engagement, passive forms of engagement, such as the number of times a post appeared in users’ Instagram feeds, may also be relevant.15 Therefore, when librarians measure the success of an Instagram post by examining only the number of likes and comments, they should be aware that they are measuring a very specific type of engagement: one which, on its own, may not determine a post’s full reach or effectiveness. Other ways to measure engagement include monitoring how the number of people subscribed to an account changes over time, evaluating reach and impressions,16 or analyzing the content of comments (a type of qualitative measure that may indicate the type of community developing around the library’s social media). Despite, or perhaps because of, the general excitement surrounding the possibilities that libraries’ engagement with social media can produce, very little has been written about how different types of libraries (such as academic libraries, law libraries, public libraries, etc.), or libraries in general, use these platforms.17 Additionally, many librarians may lack expertise in marketing, including those who are managing social media accounts.18 As social media culture continues to evolve, librarians should move toward a more targeted and pragmatic approach to their Instagram practices. This refinement in social media practices may enable libraries to develop more structure, so that they may create and share the type of content that would achieve their desired result at a given time. However, in order to develop this kind of measured approach, it is necessary for researchers to first analyze libraries’ current Instagram practices to determine how posts are being used and the outcomes they generate. One effective method of analyzing Instagram content centers on coding and classifying images. While many such schemas have been developed for analyzing images posted by Instagram users and businesses, transferring these schemas to academic contexts has been difficult. 19 To address INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 3 this gap, Stuart et al. adapted a schema that had been used to examine how “news media [and] non-profits,” as well as businesses, used Instagram.20 This new schema allowed Stuart et al. to classify Instagram posts produced by academic institutions in the UK and measure the effect of these universities’ attempts to engage with students via Instagram.21 Stuart et al.’s schema, which classified Instagram images into six categories (orienting, humanizing, interacting, placemaking, showcasing, and crowdsourcing), was the basis for the present study.22 METHODS Research Questions The impetus for this study was to learn more about how academic libraries use Instagram to connect with their campus communities and promote their services and events. The authors of the present study adapted the research questions posed by Stuart et al. to reflect academic library contexts:23 • RQ1: Which type of post category is used most frequently by libraries on Instagram? • RQ2: Is the number of likes or the existence of comments related to the post category? Identifying a Sample Population This study investigated a small subset of academic institutions: the University of Idaho’s sixteen peer institutions. These peers have similar “student profiles, enrollment characteristics, research expenditures, [or] academic disciplines and degrees”; each is designated as a land-grant institution; and the University of Idaho considers three to be “aspirational peers.”24 After selecting this population, the authors investigated the library websites of each of the sixteen peer institutions to determine whether or not they had a library-specific Instagram account. When a link was not available on the library websites, the authors conducted a search within Instagram as well as a general Google search in an attempt to identify these Instagram accounts. Of the University of Idaho’s sixteen peer institutions, eleven had active, library-specific Instagram accounts. Data Collection The authors undertook manual data collection between November and December 2018 for these eleven library Instagram accounts. Initial information about each Instagram account was gathered prior to the study on October 23, 2018: the date of the first post, the total number of posts shared by the account, the total number of followers, and the total number of accounts followed. For each account, the authors identified posts shared from January 1, 2018, to June 30, 2018. The “print to PDF” function available in the Chrome browser was used to preserve a record of the content, in case the accounts were later discontinued while research was underway. If a post included more than one image, only the first image was captured in the PDF and analyzed. To organize the 3 77 Instagram posts shared within this timeframe, the authors assigned each institution a unique, five- digit identifier; file names included this identifier as well as the date of the post (e.g. , 00004_IGpost_20180423). This file naming convention ensured that posts were separated based on institution and that future studies could use the same file naming convention, even if the sample size increased significantly. The authors added the file names of all 377 Instagram posts to a shared Google Sheet, and for each post they reported the kind of post (photo or video), the number of likes, and whether comments existed. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 4 Research Data Analysis Content Analysis This project adapted the coding schema Stuart et al. employed to investigate the ways in which UK universities used Instagram.25 Expanding on research by McNely, Stuart et al. employed six Instagram post categories: orienting, humanizing, interacting, placemaking, showcasing, and crowdsourcing.26 For the purposes of the present study, the authors used the same category names when coding library Instagram posts. However, they updated and adapted the descriptions of each category over the course of two rounds of coding to better reflect academic library contexts (see table 1). Within this coding schema, the authors elected to apply only a single category name (i.e., a code) to each library Instagram post. Interrater Reliability During the first round of coding, the authors selected two or three institutions every month, independently coded the posts based on the initial adapted schema, met to discuss discrepancies, and identified the final code based on consensus.27 However, during these discussions, it became evident that there was substantial disagreement concerning how specific categories were interpreted. To examine the impact of this disagreement, the authors calculated Fleiss’ kappa, which can be used to assess interrater reliability when two or more coders categorically evaluate data.28 Although this project’s Fleiss’ kappa (0.683554901) was relatively close to a score of 1.0, demonstrating moderate agreement between each of the three coders, the authors recognized that additional fine-tuning of the adapted coding schema would allow for a more accurate representation of the types of content shared by academic libraries. After updating the schema (table 1), a small sample of collected Instagram posts (20 percent, or 76 posts) was randomly selected for independent recoding by each of the authors. Again, after coding this random sample individually, the authors met to seek consensus. Anecdotal feedback from the coders, as well as an increase in the project’s Fleiss’ kappa (0.795494117), demonstrated that the updated coding schema was more robust and representative. Based on this evidence, the authors randomly distributed the remaining 301 posts amongst themselves; each post was coded by one author. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 5 Table 1. Coding Schema for Library Instagram Posts [Adapted from: Emma Stuart, David Stuart, and Mike Thelwall, “An Investigation of the Online Presence of UK Universities on Instagram,” Online Information Review 41, no. 5 (2017): 588, https://doi.org/10.1108/OIR-02-2016-0057.] Category Description Example1 Crowdsourcing Posts that were created with the intention of generating feedback within the platform. If the content of the post itself fits within a different classification category, but the image is accompanied by text that explicitly asks for viewer feedback, then the post should be classified as crowdsourcing. Includes requests for followers to like, comment on, or tag others in a particular post. Humanizing Posts that aim to emphasize human character or elements of warmth, humor, or amusement. This includes historic/archival photos used to convey these sentiments. This code is only used if both the text and the photo or video can be categorized as humanizing because many library posts contain a “humanizing” element. 1 Sample images from the University of Idaho Library’s Instagram account. https://doi.org/10.1108/OIR-02-2016-0057 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 6 Category Description Example1 Interacting Posts with candid photographs or videos at library and library- associated events. Includes events within or outside the library. Orienting Posts that situate the library within its larger community, especially regarding locations, artifacts, or identities. Text often includes geographic information. Placemaking Posts that capture the atmosphere of the library through its physical space and attributes. Includes permanent murals, statues, etc. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 7 Category Description Example1 Showcasing Posts that highlight library or campus resources, services, or future events. Can include current or on-going events if people are not the focus of the image (e.g. exhibit, highlight of collection, etc.). These posts can also present information about library operations, such as hours and fundraising. Posts can also entice their audience to do something, outside of Instagram, such as visit a specific website. RESULTS General Data about the Library Instagram Accounts As of October 23, 2018 (the date this initial information was gathered), the eleven academic library Instagram accounts had shared a combined 3,124 posts. Most libraries created their Instagram accounts and started posting between 2013 and 2016, but one library shared a post in 2012 and one created their account in April 2018. Since the date of their first post, each account had shared 284 posts on average, while the actual number of posts shared across accounts ranged from 62 to 520. The number of followers and accounts followed across these eleven accounts ranged from 115 to 1,390 and 65 to 2,717, respectively. Between January 1, 2018 , and June 30, 2018, these eleven library Instagram accounts shared a total of 377 posts. The number of posts shared by each account during this time period ranged from four to 57, with an average of 34 posts. RQ1: Which Type of Post Category is Used most Frequently by Libraries on Instagram? Of the 377 posts analyzed, 359 included photos and 18 included videos. More than 50 percent of posts shared were coded as showcasing, with humanizing (18 percent) and crowdsourcing (9.8 percent) being the next most common categories (see table 2), although data demonstrated that individual libraries differed in their use of specific post categories (see table 3). When examining frequency based on category of post, the authors identified slight differences between video and photo posts. As with photos, the majority of videos (55.6 percent) were still coded as showcasing; however, the second most common post category for videos was interacting (16.7 percent). INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 8 Table 2. Number and Percentage of Posts by Category for Posts with Photos or Videos Category Number of Posts Percentage of Posts Crowdsourcing 38 10.1% Humanizing 68 18.0% Interacting 16 4.2% Orienting 28 7.4% Placemaking 33 8.8% Showcasing 194 51.5% Total 377 100% Table 3. Percentage of Posts by Category and Library for Posts with Photos or Videos Library Crowdsourcing Humanizing Interacting Orienting Placemaking Showcasing Lib 1 7.7% 15.4% 0% 23.1% 30.8% 23.1% Lib 2 4.2% 50.0% 0% 4.2% 0% 41.7% Lib 3 56.1% 10.5% 1.8% 3.5% 7.0% 21.1% Lib 4 0% 4.1% 4.1% 4.1% 2.0% 85.7% Lib 5 0% 24.4% 2.2% 20.0% 26.7% 26.7% Lib 6 7.5% 18.9% 3.8% 11.3% 11.3% 47.2% Lib 7 0% 20.0% 0% 0% 10.0% 70.0% Lib 8 0% 21.6% 9.8% 5.9% 0% 62.7% Lib 9 0% 25.0% 25.0% 0% 0% 50.0% Lib 10 0% 16.1% 6.5% 0% 9.7% 67.7% Lib 11 0% 15.0% 5.0% 5.0% 5.0% 70.0% RQ2: Is the Number of Likes or the Existence of Comments Related to the Post Category? Number of Likes by Category The results of the coding process also indicated that the number of likes differed based on the category of post. When examining photo posts, the authors noted that every post received at least five likes, with most posts receiving between 20-39 likes (see table 4). On average, crowdsourcing INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 9 photo posts generated the highest average number of likes across all categories, followed by orienting and placemaking posts (see table 5). However, it is important to recognize that crowdsourcing posts often asked visitors to participate in a post by “liking” it, often with the chance to win a library-sponsored contest, which may partially explain the higher average number of likes. Table 4. Number of Posts by Category and Range of Likes for Posts with Photos (does not include posts with videos) Range of Likes Category 5-19 20-39 40-59 60-79 80-99 100- 119 120- 140 Crowdsourcing 0 11 16 6 1 1 1 Humanizing 16 26 10 9 5 0 1 Interacting 5 5 3 0 0 0 0 Orienting 2 7 9 8 0 1 0 Placemaking 3 10 12 3 2 1 1 Showcasing 67 83 27 5 1 0 1 Total 93 142 77 31 9 3 4 Table 5. Average Number of Likes by Category for Posts with Photos (does not include posts with videos) Category Average Number of Likes Number of Posts Crowdsourcing 53.6 36 Humanizing 39.9 67 Interacting 27.8 13 Orienting 50.0 27 Placemaking 46.9 32 Showcasing 27.6 184 Existence of Comments by Category The authors also examined the existence of comments, another metric for engagement with Instagram posts. Data demonstrated that 78.9 percent of crowdsourcing posts included INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 10 comments, while a much lower percentage of placemaking (30.3 percent), orienting (28.6 percent), and humanizing (26.5 percent) posts generated this type of engagement (see table 6). As with the data on the number of “likes,” many crowdsourcing posts encouraged visitors to comment on a particular post, at times with an incentive connected to this type of engagement. Table 6. Presence of Comments by Category for Posts with Photos or Videos Category Number of Posts with Comments Number of Posts without Comments Total Number of Posts Percentage of Posts with Comments Crowdsourcing 30 8 38 78.9% Humanizing 18 50 68 26.5% Interacting 3 13 16 18.8% Orienting 8 20 28 28.6% Placemaking 10 23 33 30.3% Showcasing 40 154 194 20.6% Total 109 268 377 28.9% DISCUSSION As noted previously, the post category used most frequently by these eleven libraries on Instagram was showcasing (51.5 percent). The fact that libraries were more likely to share this type of content—which highlighted library resources, events, or collections—is understandable, as library promotion is one of the foundational reasons libraries spend the time and effort required to maintain social media accounts.29 This finding differs substantially from previous research with UK universities, which classified only 28.8 percent of posts as showcasing.30 When examining other post categories, it also became clear that UK universities shared humanizing posts more frequently (31 percent) than the eleven libraries (18 percent) included in this study.31 Although the results of this study demonstrated that showcasing posts were shared most often, the data also indicates that showcasing posts were neither the category with the most likes on average nor the category that received comments most often. Crowdsourcing posts were the category with the highest average number of likes (53.6) with orienting posts coming in at a close second (50), followed by placemaking (46.9) and humanizing (39.9) posts. Showcasing posts, along with interacting posts, only generated slightly more than half the number of likes on average, when compared to the other categories (27.6 and 27.8, respectively). The category with the largest proportion of comments was crowdsourcing posts, with 78.9 percent of posts in this category generating comments from visitors. However, this result is likely skewed, as one of the library Instagram accounts had exceptionally successful crowdsourcing posts, which often included a giveaway or other incentive for participation. In fact, when this institution was removed from the data set, only six crowdsourcing posts remained, two of which generated INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 11 comments. To better determine whether crowdsourcing posts are always this effective at generating engagement, it would be necessary to code a larger sample of Instagram pos ts. It is clear that while showcasing posts were the most common among the Instagram accounts analyzed, they also received the lowest number of likes, on average, and generated comments less frequently than all but one post category. While this may seem disheartening, it is important to remember that the showcasing category includes informational posts that convey library hours, services, or closures; this information that may be effectively relayed to users without necessitating an active response in the form of likes and comments. Therefore, one might use different criteria to determine the success of showcasing posts, perhaps examining Instagram data related to reach (the total number of unique visitors that view a post) and impressions (the total number of times a post is viewed).32 Data on reach and impressions are only available to Instagram account “owners.” In the current study, the authors did not quantify these types of engagement as their goal was to evaluate the content and metrics available to all Instagram users, rather than the data that was only available to the “owners” of these library Instagram accounts. In addition to answering the research questions, coding these Instagram posts prompted several new questions regarding the types of information libraries and other institutions share online. One such question includes: With both universities and academic libraries working with students, why did academic libraries share a smaller percentage of interacting posts than UK universities? 33 Additional research is needed to answer this question, but anecdotally, this difference may be related to the fact that universities, as a whole, have a larger number of opportunities to promote and share instances of interaction via Instagram than libraries. For example, general university Instagram accounts often include photos of students and affiliates interacting at large scale events such as sports games, musical performances, and other student gatherings that take place across campus. Library-specific accounts on the other hand, have fewer opportunities to post photos that capture individuals “interacting” candidly. Further, the fact that libraries tend to be proponents of privacy rights may inhibit library staff from taking photos of their users and sharing them online without first getting permission. Therefore, differences related to the number of events and the organization type may contribute to whether or not universities and libraries share interacting posts; more research is needed to examine this hypothesis. Another issue that arose during coding was that, if not for their inclusion of a request to comment, many crowdsourcing posts could have been classified under other categories. If an account follower looked only at the photos included in many of the crowdsourcing posts without reading the captions, they may not interpret those posts as crowdsourcing. Therefore, a future research project might examine whether applying secondary categories to crowdsourcing posts, as a means of further classifying images and not just their captions, could generate a more comprehensive picture of what libraries are sharing on their Instagram accounts. The authors also discovered that a majority of the library Instagram posts included in this sample contained humanizing elements. Almost all posts attempted to convey warmth, humor, or assistance, and therefore had the potential to be classified as humanizing. To successfully adapt Stuart et al.’s coding schema for academic library Instagram accounts, the authors specified that a post had to have both a humanizing caption as well as a humanizing photo to be coded as such.34 As with crowdsourcing posts, adding secondary categories to humanizing posts could better reflect the dual nature of this content and help future coders more accurately interpret the types of content shared by academic libraries. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 12 LIMITATIONS AND FUTURE RESEARCH The number of library Instagram accounts selected as well as the use of a six-month timeframe were limitations of the current study. In the future, selecting a larger sample size and a different group of academic libraries would serve to advance the discipline’s understanding of the types of content shared by academic libraries and how users interact with these Instagram posts. Additionally, collecting Instagram posts shared during an expanded timeframe could allow researchers to explore whether library Instagram accounts consistently share the same types of content at various points throughout the year. As mentioned in the Discussion section, future research could also include adding secondary categories to posts, which would allow researchers to gather more granular information about the types of content shared and the relationships between post category, comments, and likes. Lastly, to better understand the post categories that generate the greatest engagement, collaborative research between institutions could allow researchers to gather and analyze metrics that are only available to account owners, such as impressions and reach. With this type of collaboration, researchers could also investigate how social media outreach goals influence the types of content shared on library Instagram accounts. For example, researchers could conduct interviews or surveys with libraries and ask questions such as: what does your library hope to accomplish with its Instagram account, who are you attempting to reach, how do you define a successful post, what metrics do you use to evaluate your Instagram presence, and do your social media outreach goals influence the types of content shared on Instagram? Pursuing these types of questions, in addition to examining the actual content shared, would allow researchers to gain a more complete picture of what a successful social media presence looks like for an academic library. CONCLUSION This research provides initial insight into the Instagram presence of a subset of academic libraries at land-grant institutions in the United States. Expanding on the research of Stuart et al., this project used an adapted coding schema to document and analyze the content and efficacy of academic libraries’ Instagram posts.35 The results of this study suggest that social media accounts, including those used by academic libraries, perform better when they reflect the community the library inhabits by highlighting content that is unique to their particular constituents, rather than simply functioning as another platform through which to share information. This study’s findings also demonstrate that academic libraries should strive to create an Instagram presence that encompasses a variety of post categories to ensure that their online information sharing meets various needs. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 13 ENDNOTES 1 Nancy Dowd, “Social Media: Libraries are Posting, but is Anyone Listening?,” Library Journal 138, no. 10 (May 7, 2013), 12, https://www.libraryjournal.com/?detailStory=social-media-libraries-are- posting-but-is-anyone-listening. 2 Marshall Breeding, Next-Gen Library Catalogs (London: Facet Publishing, 2010); Zelda Chatten and Sarah Roughley, “Developing Social Media to Engage and Connect at the University of Liverpool Library,” New Review of Academic Librarianship 22, no. 2/3 (2016), https://doi.org/10.1080/13614533.2016.1152985; Amanda Harrison et al., “Social Media Use in Academic Libraries: A Phenomenological Study,” The Journal of Academic Librarianship 43, no. 3 (2017), https://doi.org/10.1016/j.acalib.2017.02.014; Nicole Tekulve and Katy Kelly, “Worth 1,000 Words: Using Instagram to Engage Library Users,” Brick and Click Libraries Symposium, Maryville, MO (2013), https://ecommons.udayton.edu/roesch_fac/20; Evgenia Vassilakaki and Emmanouel Garoufallou, “The Impact of Twitter on Libraries: A Critical Review of the Literature,” The Electronic Library 33, no. 4 (2015), https://doi.org/10.1108/EL- 03-2014-0051. 3 Yeni Budi Rachman, Hana Mutiarani, and Dinda Ayunindia Putri, “Content Analysis of Indonesian Academic Libraries’ Use of Instagram,” Webology 15, no. 2 (2018), http://www.webology.org/2018/v15n2/a170.pdf; Catherine Fonseca, “The Insta-Story: A New Frontier for Marking and Engagement at the Sonoma State University Library,” Reference & User Services Quarterly 58, no. 4 (2019), https://www.journals.ala.org/index.php/rusq/article/view/7148; Kjersten L. Hild, “Outreach and Engagement through Instagram: Experiences with the Herman B Wells Library Account,” Indiana Libraries 33, no. 2 (2014), https://journals.iupui.edu/index.php/IndianaLibraries/article/view/16633; Julie Lê, “#Fashionlibrarianship: A Case Study on the Use of Instagram in a Specialized Museum Library Collection,” Art Documentation: Bulletin of the Art Libraries Society of North America 38, no. 2 (2019), https://doi.org/10.1086/705737; Danielle Salomon, “Moving on from Facebook: Using Instagram to Connect with Undergraduates and Engage in Teaching and Learning,” College & Research Libraries News 74, no. 8 (2013), https://doi.org/10.5860/crln.74.8.8991. 4 “Our Story,” Instagram, https://business.instagram.com/; Chloe West, “17 Instagram Stats Marketers Need to Know for 2019,” Sprout Blog, April 22, 2019, https://web.archive.org/web/20191219192653/https://sproutsocial.com/insights/instagra m-stats/; Pew Research Center, “Social Media Fact Sheet,” last modified June 12, 2019, http://www.pewinternet.org/fact-sheet/social-media/. 5 “Our Story,” Instagram. 6 Joe Phua, Seunga Venus Jin, and Jihoon Jay Kim, “Gratifications of Using Facebook, Twitter, Instagram, or Snapchat to Follow Brands: The Moderating Effect of Social Comparison, Trust, Tie Strength, and Network Homophily on Brand Identification, Brand Engagement, Brand Commitment, and Membership Intention,” Telematics and Informatics 34, no. 1 (2017), https://doi.org/10.1016/j.tele.2016.06.004. https://www.libraryjournal.com/?detailStory=social-media-libraries-are-posting-but-is-anyone-listening https://www.libraryjournal.com/?detailStory=social-media-libraries-are-posting-but-is-anyone-listening https://doi.org/10.1080/13614533.2016.1152985 https://doi.org/10.1016/j.acalib.2017.02.014 https://ecommons.udayton.edu/roesch_fac/20 https://doi.org/10.1108/EL-03-2014-0051 https://doi.org/10.1108/EL-03-2014-0051 http://www.webology.org/2018/v15n2/a170.pdf https://www.journals.ala.org/index.php/rusq/article/view/7148 https://journals.iupui.edu/index.php/IndianaLibraries/article/view/16633 https://doi.org/10.1086/705737 https://doi.org/10.5860/crln.74.8.8991 https://business.instagram.com/ https://web.archive.org/web/20191219192653/https:/sproutsocial.com/insights/instagram-stats/ https://web.archive.org/web/20191219192653/https:/sproutsocial.com/insights/instagram-stats/ http://www.pewinternet.org/fact-sheet/social-media/ https://doi.org/10.1016/j.tele.2016.06.004 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 14 7 Fonseca, “The Insta-Story;” Hild, “Outreach and Engagement;” Lê, “#Fashionlibrarianship;” Rachman, Mutiarani, and Putri, “Content Analysis;” Salomon, “Moving on from Facebook;” Tekulve and Kelly, “Worth 1,000 Words.” 8 Vassilakaki and Garoufallou, “The Impact of Twitter.” 9 Breeding, Next-Gen Library Catalogs; Hild, “Outreach and Engagement;” Rachman, Mutiarani, and Putri, “Content Analysis;” Vassilakaki and Garoufallou, “The Impact of Twitter.” 10 Harrison, Burress, Velasquez, Schreiner, “Social Media Use,” 253. 11 Chatten and Roughley, “Developing Social Media.” 12 Peter Fernandez, “‘Through the Looking Glass: Envisioning New Library Technologies’ Social Media Trends that Inform Emerging Technologies,” Library Hi Tech News 33, no. 2 (2016), https://doi.org/10.1108/LHTN-01-2016-0004. 13 Robin M. Hastings, Microblogging and Lifestreaming in Libraries (New York: Neal-Schumann Publishers, 2010). 14 Hastings, Microblogging. 15 Robert David Jenkins, “How Are U.S. Startups Using Instagram? An Application of Taylor's Six- Segment Message Strategy Wheel and Analysis of Image Features, Functions, and Appeals” (MA thesis, Brigham Young University, 2018), https://scholarsarchive.byu.edu/etd/6721. 16 Lucy Hitz, “Instagram Impressions, Reach, and Other Metrics you Might be Confused About,” Sprout Blog, January 22, 2020, https://sproutsocial.com/insights/instagram-impressions/. 17 Vassilakaki and Garoufallou, “The Impact of Twitter.” 18 Mark Aaron Polger and Karen Okamoto, “Who’s Spinning the Library? Responsibilities of Academic Librarians who Promote,” Library Management 34, no. 3 (2013), https://doi.org/10.1108/01435121311310914. 19 Yuhen Hu, Lydia Manikonda, and Subbarao Kambhampati, “What We Instagram: A First Analysis of Instagram Photo Content and User Types,” Eighth International AAAI Conference on Weblogs and Social Media (2014), https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/viewPaper/8118; Jenkins, “How Are U.S. Startups Using Instagram?;” Brian J. McNely, “Shaping Organizational Image- Power Through Images: Case Histories of Instagram,” Proceedings of the 2012 IEEE International Professional Communication Conference, Piscataway, NJ (2012), https://doi.org/10.1109/IPCC.2012.6408624; Emma Stuart, David Stuart, and Mike Thelwall, “An Investigation of the Online Presence of UK Universities on Instagram,” Online Information Review 41, no. 5 (2017): 584, https://doi.org/10.1108/OIR-02-2016-0057. 20 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence;” McNely, “Shaping Organizational Image-Power,” 3. https://doi.org/10.1108/LHTN-01-2016-0004 https://scholarsarchive.byu.edu/etd/6721 https://sproutsocial.com/insights/instagram-impressions/ https://doi.org/10.1108/01435121311310914 https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/viewPaper/8118 https://doi.org/10.1109/IPCC.2012.6408624 https://doi.org/10.1108/OIR-02-2016-0057 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 LIKES, COMMENTS, VIEWS | DONEY, WIKLE, AND MARTINEZ 15 21 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence.” 22 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 588. 23 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 585. 24 “University of Idaho’s peer institutions,” University of Idaho, accessed October 8, 2019. 25 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 588. 26 McNely, “Shaping Organizational Image-Power,” 4; Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 588. 27 Johnny Saldaña, The Coding Manual for Qualitative Researchers (Los Angeles: Sage Publications, 2013), 27. 28 “Fleiss’ Kappa,” Wikipedia, https://en.wikipedia.org/wiki/Fleiss%27_kappa. 29 Chatten and Roughley, “Developing Social Media.” 30 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 590. 31 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 590. 32 Hitz, “Instagram Impressions, Reach, and Other Metrics.” 33 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 590. 34 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence,” 588. 35 Stuart, Stuart, and Thelwall, “An Investigation of the Online Presence.” https://en.wikipedia.org/wiki/Fleiss%27_kappa ABSTRACT INTRODUCTION LITERATURE REVIEW METHODS Research Questions Identifying a Sample Population Data Collection Research Data Analysis Content Analysis Interrater Reliability RESULTS General Data about the Library Instagram Accounts RQ1: Which Type of Post Category is Used most Frequently by Libraries on Instagram? RQ2: Is the Number of Likes or the Existence of Comments Related to the Post Category? Number of Likes by Category Existence of Comments by Category DISCUSSION LIMITATIONS AND FUTURE RESEARCH CONCLUSION ENDNOTES 12219 ---- Analytics and Privacy: Using Matomo in EBSCO’s Discovery Service ARTICLES Analytics and Privacy Using Matomo in EBSCO’s Discovery Service Denise FitzGerald Quintel and Robert Wilson INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12219 Denise FitzGerald Quintel (denise.quintel@mtsu.edu) is Discovery Services Librarian and Assistant Professor, Middle Tennessee State University. Robert Wilson (robert.wilson@mtsu.edu) is Systems Librarian and Assistant Professor, Middle Tennessee State University. © 2020. ABSTRACT When selecting a web analytics tool, academic libraries have traditionally turned to Google Analytics for data collection to gain insights into the usage of their web properties. As the valuable field of data analytics continues to grow, concerns about user privacy rise as well, especially when discussing a technology giant like Google. In this article, the authors explore the feasibility of using Matomo, a free and open-source software application, for web analytics in their library’s discovery layer. Matomo is a web analytics platform designed around user-privacy assurances. This article details the installation process, makes comparisons between Matomo and Google Analytics, and describes how an open-source analytics platform works within a library-specific application, EBSCO’s Discovery Service. INTRODUCTION In their 2016 article from The Serials Librarian, Adam Chandler and Melissa Wallace summarized concerns with Google Analytics (GA) by reinforcing how “reader privacy is one of the core tenets of librarianship.”1 For that reason alone, Chandler and Wallace worked to implement and test Piwik (now known as Matomo) on the library sites at Cornell University. Taking a cue from Chandler and Wallace, the authors of this paper sought out an analytics solution that was robust and private, that could easily work within their discovery interface, and provide the same data as their current analytics and discovery service implementation. This paper will expand on some of the concerns from the 2016 Wallace and Chandler article, make comparisons, and provide installation details for other libraries. Libraries typically use GA to support data-informed decisions or build discussions on how users interact with library websites. The goal of this pilot project was to determine the similarities between Google Analytics and Matomo, how viable Matomo might be as a Google Analytics replacement, and seek to bring awareness to privacy concerns in the library. Matomo could easily be installed on multiple websites. However, this project looked into a specific instance of monitoring, that of the library’s discovery layer, EBSCO Discovery Service (EDS). LITERATURE REVIEW Google Analytics The 2005 release of Google Analytics was a massive boon to libraries who long searched for an easy to implement and budget-friendly tool for analytics. Shortly after its release, academic libraries were quick to adopt the platform and install its JavaScript code into their library web pages.2 In a little over a decade, there have been nearly forty scholarly articles published that discuss the ways in which Google Analytics is used for libraries’ websites. Articles that not only mailto:denise.quintel@mtsu.edu mailto:robert.wilson@mtsu.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 2 introduced the service, but also discuss the various ways libraries utilize the platform.3 In fact, in their survey of 279 libraries, O’Brien et al.’s 2018 research found that 88 percent of libraries surveyed had implemented Google Analytics or Google Tag Manager.4 In contrast, during that same period, authors found Matomo, or its earlier name, Piwik, discussed in a total of five scholarly articles, with only three libraries who wrote about using it as a web analytics tool.5 In addition to measuring website use, libraries found that Google Analytics allowed for several different assessments. In using Google Analytics, libraries could provide immediate feedback for projects, indicate website design change possibilities, create key performance indicators, and determine research paths and user behaviors.6 Convenience of implementation and use, minimal cost, and a user-friendly interface were all reasons cited for the widespread and fast adoption.7 Although the early literature covers a lot of ground about the reporting possibilities and the coverage of Google Analytics, there is rarely a mention of user privacy. Early articles that mention privacy provide a cursory discussion, reiterating that the data collected by Google is anonymous and therefore, protects the privacy of the user. Recently, there has been a shift in literature, with articles that now provide more in-depth discussions about user privacy and the concerns libraries have with third parties that collect and host user data. O’Brien et. al discussed the problematic ways that libraries adopted and implemented GA, by either overlooking front-facing policies or implementing it without the consent of their users.8 In their webometrics study, O’Brien et. al found that very few libraries (1 percent) had implemented HTTPS with the GA tracking code, only 14 percent had used IP anonymization, and not a single site utilized both features.9 The concern is not solely Google’s control of the data, but in Google’s involvement with third-party trackers. Third parties, as Pekala remarks, are rarely held accountable.10 With an advertisement revenue of $134 billion in 2019, representing 84 percent of its total revenue, it is important to remember that Google is an advertising company.11 Google's search engine monetization transformed it into one of the world's most recognizable brands. As the most visited site in the world, Google is firmly committed to security, especially when it comes to data theft. Google offers protection from unwanted access into user accounts, even providing ways for high-risk users, such as journalists or political campaigns, to purchase additional security keys for advanced protection.12 But while Google keeps data breaches and hackers at bay, the user data that Google collects and stores for advertising revenue tells a different story. Goo gle stores user data for months on end; only after nine months is advertisement data anonymized by removing parts of IP addresses. Then, after 18 months, Google will finally delete stored cookie information.13 Recent surveys are reporting an increase in users who want to know how companies are collecting information to provide data-driven services. In a 2019 Pew Research Survey, 62 percent of respondents believe it is impossible to go through their daily lives untracked by companies. Additionally, even with the ease that certain data-driven services bring, “81 [percent] of the public reported that the potential risks they face because of data collection by companies outweigh the benefits.”14 CISCO Technologies, in a 2019 personal data survey, found a segment of the population (32 percent) that not only cares about data privacy and wants control of their data, but has also taken steps to switch providers, or companies, based on their data policies. 15 Additionally, in Pew Research Survey results published as recently as April 2020, Andrew Perrin reports that an even larger number of U.S. adults (52 percent) are now choosing to not use products or services INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 3 out of concerns for their privacy and the personal information companies collect. 16 With a growing population of users who make inquiries about who, or what, is in control of their data, a web analytics tool that can easily answer those questions might serve libraries, and their users, well. COMPARISONS Google Analytics had been the library’s only web analytics tool until the start of the pilot project. During the pilot period, the authors simultaneously ran both analytics tools. Once Matomo was installed the authors found several similarities between the two products, and discovered that nearly identical analyses could occur, given the quality and quantity of the data collected. The pilot study focused only on one analytics project, which would be the library’s discovery layer— EBSCO’s Discovery Service. Authors worked with their dedicated EBSCO engineer to replicate the Google Analytics EDS widget, and have it configured to send output to Matomo instead. In making comparisons, one of the common statements about GA and Matomo, is that the numbers will never be exact matches. Oftentimes with much higher counts presented in GA than in Matomo. Several forums and blogs, even Matomo themselves, admit that there are several possible reasons why there is a noticeable difference between the two.17 Those involved in the discussion theorize that this is due to GA spam hits, bot hits, and Matomo’s ability for users to limit tracking. Beyond the counts, both products measure the same kinds of metrics for websites.18 For this project, the authors only wanted to look at specific metrics within EDS, those measurements that look more closely at the user, rather than the larger aggregate data. For the sake of the analysis, it is important to note that although both products have several great features; this is a specific situation where the researchers use certain features in terms of analytics. The analytics we collect for EDS strive to answer specific questions: • Are users searching for known item or exploratory searches? How often? • Are users utilizing the facets and limiters? How often? Although you can use both products to count page views or set events for your website, when looking at meaningful metrics for our discovery system, we focus more on the user level. In Google Analytics, the best way to capture these is by going through the User Explorer tool, which breaks up a user journey into search terms, events, and actions that occur during sessions. In the same way, Matomo provides anonymized user profiles that include search terms, events, and actions in its Visits Log report. In GA, you can export this User Explorer data in JSON format, but only at one user at a time, as seen in figure 1. This restriction also means you cannot see data from multiple users, with those details, on a single page. To contrast, in Matomo’s Visits Log, you can export the same data (search terms, events, actions) from multiple users in CSV, XML, PHP, TSV, JSON, or HTML formats. As seen in figure 2, Matomo offers a snapshot of this data in an easy-to-read single page, versus Google’s one user at a time option which requires clicking through to see a user report. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 4 Figure 1. Screenshot of the Google Analytics User Explorer Tool Figure 2. Screenshot of the Matomo Visits Log Report INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 5 In summary, libraries using either of these analytics tools can measure usage and users with page views, visits, and unique visitors. Looking at how users navigate a site is possible with the available user paths, from the initial search, to events as seen in figures 3 and 4, and an exit page URL. Goals can be set and maintained with conversion metrics tied to referrers, visits, user location, devices, or user attributes. Like Google Analytics, Matomo can run reports on engagement and performance, and share customizable user-friendly graphs or graphs or other visual representation. Figure 3. Peer Reviewed Limiter as Event Action in Google Analytics Figure 4. Peer Reviewed Limiter Use as Event Name in Matomo Comparisons on Privacy Both Google Analytics and Matomo offer ways to protect the privacy of your users. Both offer IP anonymization, the option for data deletion after a certain time, and both provide Do Not Track feature for users. It is important to note the way Google offers these adjustments to the user. For Matomo, Do Not Track is a default behavior, meaning that the tracker automatically honors a browser’s settings for all sites, which is sometimes not the case, as respecting the Do Not Track browser setting is voluntary for websites, not mandatory .19 Google Analytics offers the same service, as long as it is implemented by the user through a browser extension.20 IP anonymization and data deletion are all features that Matomo users can adjust easily from the dashboard, whereas Google Analytics users will need to make those adjustments programmatically. 21 In Matomo, you can choose to automatically delete your old visitor logs from the database, although Matomo recommends keeping detailed Matomo logs from three to six months, and then INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 6 delete the older log data.22 Quite the contrast is Google Analytics where a user makes a data deletion request to Google, which then creates a report for your review, before submitting the request to Google. Even after submitting a request, Google still allows for seven days to reverse that decision. In terms of data retention, Google Analytics gives you the option to retain user data anywhere from 14 months to 50 months, with the option to never expire. Fourteen months is the shortest amount of time you can retain user data for, nothing less.23 IP anonymization is the default for Matomo analytics but is an opt-in feature for Google Analytics. Again, like data retention, any adjustments to IP anonymization in Matomo can occur in the dashboard with options to have two or three bytes removed from the address. Google Analytics will adjust the last octet to zero.24 Both products are similar in several ways, but the standout feature of Matomo is that the data belongs only to your institution. In his interview with Katherine Schwab for Fast Company, Mathieu Aubry, Matomo’s founder states it clearly: When [Google] released Google Analytics, [it] was obvious to me that a certain percent of the world would want the same technology, but decentralized, where it’s not provided by a centralized corporation and you’re not dependent on them… If you use it on your own server, it’s impossible for us to get any data from it.25 IMPLEMENTATION AND INSTALLATION Originally released as Piwik in 2007, Matomo was designed as a replacement to phpMyVisites.26 It is an open-source software application licensed under GNU GPL v3.27 It is designed as a PHP/MySQL application allowing the server operating system (OS) and web service to best match a user’s needs or institutional preferences and expertise.28 To match the organization’s preferences and expertise, this Matomo instance was set up as a Linux-Apache-MySQL/PHP (LAMP) stack server (CentOS 7 in our case) with Apache 2.4.6 and MySQL-MariaDB 5.5.60. The required configurations needed to run Matomo are well-documented on the Matomo documentation site as well as the download and documentation area. Depending on the version of Matomo, the mileage a user gets with the documentation may vary. For example, on the recent upgrade to 3.11.0, the instance displayed a warning notification that PHP v7.0 had reached end of life and recommended updating to PHP v7.1 or greater to accommodate future Matomo versions. However, at the time of this writing, the minimum PHP version required stated in Matomo’s documentation is 5.5.9 or greater.29 Like many PHP applications, once the prerequisite applications are installed (PHP, MySQL, and the selected web service, Apache in this case), the Matomo install is completed by browsing to the server’s URL or IP address on port 80. Browsing to the index.php path in a web browser will guide a user through the install process. The installer will also review file directories on the server and inform a user of any permissions problems that will need to be addressed for correct install and use. Compared to other PHP application install experiences, installing Matomo was straightforward and easier to follow than many. Within a few minutes, the admin user was created and the first website was added. The web-based administration area is also more robust and easier to use than many comparable applications. Many features that might typically require configuration file changes directly on the server, including Matomo upgrades, can be configured through the administration area. While the administration page has many options relating to paid-for premium features, there are several INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 7 particularly helpful free configuration cards in the interface. Most notably is the “System Summary” card that displays the current version of Matomo, PHP, and MySQL as well as total users, segments, goals, tracking failures, total websites configured, and a few other metrics. There is a “Tracking Failures” card that notifies of issues with websites, and a “Need Help?” card that links to the Matomo Community forums. Finally, the “System Check” card displays any warnings or errors as well as a link to the full system check report. This is extremely helpful when Matomo has been installed but the instance still needs additional configuration changes or follow-up tasks on upgrades. If there are warnings or errors, the full system report will often have recommendations of changes to make either in the administration page or on the server in the configuration files. These administration features make maintenance a straightforward process. Since setting up the server, two upgrades have been completed. In both cases, an email notification was received indicating a new stable release was available. On login to Matomo, this information also appeared as a banner. Simply clicking on the download update option automatically updated the service without any need to access the server directly or via SSH. In both cases the updates ran smoothly with one exception. In that case, several files were created or overwritten with the root user as the owner. As a result, Matomo indicated an issue with the files and/or path not being found. In actuality, the files did exist, but Matomo no longer had permission to read them. Resolution of the problem required browsing to the directory path indicated in a warning on the server and changing ownership from the root user to the apache user to match other files. Despite this issue, the update process is much more user-friendly than similarly structured applications. Standalone implementation and installation of Matomo is made simple by the installation documentation that is readily available on the Matomo.com website, especially if one is familiar with PHP/MySQL applications. Adding one or two websites whose architectures a new Matomo user is well-acquainted with is a good way for new users to pilot and get introduced to Matomo’s overall functions without being so overwhelmed that the more granular functions are never learned. A system admin may find maintenance and updates to this service less problematic with less interruption of the service than similarly structured applications while users may find the overall functionality of Matomo easier to use and finer points of reporting and analytics more transparent and easier to understand than Google Analytics. Once installed, the authors then tested Matomo on a low-traffic library site. After tracking proved successful, EDS was entered as a new website in the Matomo dashboard and the JavaScript tracking tag was placed in the bottom branding of EDS. The process of adding EDS as a new site to Matomo was as easy as expected, and the data collection was almost immediate. To mirror the EDS and Google Analytics integration, the authors worked with their EBSCO Library Service Engineer to create a Matomo widget. Luckily, another engineer had previously worked on an integration when it was known as Piwik. Instead of building from the ground up, the Piwik widget only needed clean and updated code to match the Google Analytics widget, which would allow for the tracking of events and site searches. Adding a user outside of the organization to Matomo was necessary for the EBSCO engineer to fine-tune the widget. Matomo admins can set up users with specific permissions within the system, with access to only a specific site. Each Matomo user has their own email address and password (not domain-specific), settings, and users can even customize their dashboard. After INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 8 testing proved successful, the new Matomo widget moved into the live profile of EDS, and data collection commenced. SECURITY Though the service is in a pilot stage with limited data collection, the authors wanted to ensure an SSL certificate was in place for login to Matomo. With EFF’s Certbot (https://certbot.eff.org/), the authors installed a Let’s Encrypt (https://letsencrypt.org/) SSL certificate. The SSL certificate is automatically renewed every three months via a cronjob on our server. Because of the power of the administration interface, caution should be used when assigning the “Super User” role to user accounts. It would also be wise to require two-factor authentication (2FA) on the service. Turning on 2FA is a very simple process and Matomo works with multiple third-party authentication utilities including Authy, LastPass, and 1Password. While each user can choose to activate 2FA, an admin can require it for all users if desired. CONCLUSION As the amount of research and rate of adoption testifies, since 2005 GA has set the benchmark for assessment of library web asset success and has made possible a completely new understanding of the library user experience and overall assessment of library services. Matomo’s earliest iteration appeared shortly after in 2007 and is a viable alternative to proprietary web analytics applications with a few notable advantages over GA. From a long-term perspective, the two biggest advantages of Matomo is that it is licensed under a copyleft GPL free and open source software (FOSS) license and is designed with user privacy at heart. For libraries, using FOSS applications whenever possible allows them to practice what they preach. FOSS does not mean cost-free. In fact, free in the FOSS sense is more akin to freedom (freedom to download, modify, distribute, and change the code) rather than free of charge. Budgeting for a hosted subscription, support, or the costs of a library running and maintaining the application itself or through an Infrastructure as a Service (IaaS) provider like Amazon Web Services (AWS) or Microsoft’s Azure is necessary, but the freedom Matomo provides by ensuring the library is in control of its patron data, that it is protected, and that data is not at risk of becoming a product in and of itself may well be worth the cost. Like other initiatives in the open-access movement or open-education resources, and as third- party data collection and privacy on the web becomes a more mainstream concern, opting to use Matomo to protect patron privacy principles allows libraries to be the leaders on issues relating to privacy and intellectual freedom. As noted earlier, there are other feature-based advantages Matomo provides that impact the day-to-day aspects of monitoring web asset use and assessment, like export options and viewing the full log of visits. Lastly, by focusing on EDS in this pilot, the authors were able to demonstrate and verify that Matomo rises to the challenge not just with traditional web asset analytics requirements, but to library-specific applications like proprietary discovery layer services. https://certbot.eff.org/ https://letsencrypt.org/ INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 9 ENDNOTES 1 Adam Chandler and Melissa Wallace, “Using Piwik Instead of Google Analytics at the Cornell University Library.” Serials Librarian 71, no. 3 (October 2016): 174, https://doi.org/10.1080/0361526X.2016.1245645. 2 Tabatha Farney and Nina McHale, “Introducing Google Analytics for Libraries,” Library Technology Reports 49, no. 4 (May 2013): 5, https://journals.ala.org/ltr/article/download/4269/4881. 3 Paul Betty, “Assessing Homegrown Library Collections: Using Google Analytics to Track Use of Screencasts and Flash-Based Learning Objects,” Journal of Electronic Resources Librarianship 21, no. 1 (2009): 75–92, https://doi.org/10.1080/19411260902858631; Jason D. Cooper and Alan May, “Library 2.0 at a Small Campus Library,” Technical Services Quarterly 26, no. 2 (2009): 89–95, https://doi.org/10.1080/07317130802260735; Stephan Spitzer, “Better Control of User Web Access of Electronic Resources,” Journal of Electronic Resources in Medical Libraries 6, no. 2 (2009): 91–100, https://doi.org/10.1080/15424060902931997; Julie Arendt and Cassie Wagner, “Beyond Description: Converting Web Site Usage Statistics into Concrete Site Improvement Ideas,” Journal of Web Librarianship 4, no. 1 (2010): 37–54, https://doi.org/10.1080/19322900903547414; Steven J. Turner, “Website Statistics 2.0: Using Google Analytics to Measure Library Website Effectiveness,” Technical Services Quarterly 27, no. 3 (2010): 261–78, https://doi.org/10.1080/07317131003765910; Gail Herrera, “Measuring Link-Resolver Success: Comparing 360 Link with a Local Implementation of WebBridge,” Journal of Electronic Resources Librarianship 23, no. 4 (2011): 379–88, https://doi.org/10.1080/1941126X.2011.627809; Wayne Loftus, “Demonstrating Success: Web Analytics and Continuous Improvement,” Journal of Web Librarianship 6, no. 1 (2012): 45–50, https://doi.org/10.1080/19322909.2012.651416; Tabatha A. Farney, “Click Analytics: Visualizing Website Use Data,” Information Technology & Libraries 30, no. 3 (2011): 141–8, https://doi.org/10.6017/ital.v30i3.1771. 4 Patrick O’Brien et al., “Protecting Privacy on the Web: A Study of HTTPS and Google Analytics Implementation in Academic Library Websites,” Online Information Review 42, no. 6 (2018): 734–51, https://doi.org/10.1108/OIR-02-2018-0056. 5 Junior Tidal, “Using Web Analytics for Mobile Interface Development,” Journal of Web Librarianship 7, no. 4 (2013): 451–64, http://doi.org/10.1080/19322909.2013.835218; Ramiro Federico Uviña, “Bibliotecas Y Analítica Web: Una Cuestión De Privacidad = Libraries and Web Analytics: A Privacy Matter,” Información, Cultura Y Sociedad no. 33 (2015): 105–12, http://revistascientificas.filo.uba.ar/index.php/ICS/article/view/1906; Sukumar Mandal, “Site Metrics Study of Koha OPAC through Open Web Analytics and Piwik Tools,” Library Philosophy and Practice (2019), https://digitalcommons.unl.edu/libphilprac/2835; Mohammad Azim and Nabi Hasan, “Web Analytics Tools Usage among Indian Library Professionals,” 2018 5th International Symposium on Emerging Trends and Technologies in Libraries and Information Services, (2018): 31-35, https://doi.org/10.1109/ETTLIS.2018.8485212. 6 Ian Barba et al., “Web Analytics Reveal User Behavior: TTU Libraries’ Experience with Google Analytics,” Journal of Web Librarianship 7, no. 4 (2013): 389–400, https://doi.org/10.1080/19322909.2013.828991. https://doi.org/10.1080/0361526X.2016.1245645 https://journals.ala.org/ltr/article/download/4269/4881 https://doi.org/10.1080/19411260902858631 https://doi.org/10.1080/07317130802260735 https://doi.org/10.1080/15424060902931997 https://doi.org/10.1080/19322900903547414 https://doi.org/10.1080/07317131003765910 https://doi.org/10.1080/1941126X.2011.627809 https://doi.org/10.1080/19322909.2012.651416 https://doi.org/10.1080/19322909.2012.651416 https://doi.org/10.1108/OIR-02-2018-0056 http://doi.org/10.1080/19322909.2013.835218 http://revistascientificas.filo.uba.ar/index.php/ICS/article/view/1906 https://digitalcommons.unl.edu/libphilprac/2835 https://doi.org/10.1109/ETTLIS.2018.8485212 https://doi.org/10.1080/19322909.2013.828991 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 10 7 Betty, “Assessing Homegrown Library Collections.” 8 O’Brien et al., “Protecting Privacy on the Web,” 734. 9 O’Brien et al., “Protecting Privacy on the Web,” 741. 10 Shayna Pekala, “Privacy and User Experience in 21st Century Library Discovery,” Information Technology & Libraries 36, no. 2 (2017): 50, https://doi.org/10.6017/ital.v36i2.9817. 11 J. Clement, “Advertising Revenue of Google from 2001 to 2019,” Statista, February 5, 2020, https://www.statista.com/statistics/266249/advertising-revenue-of-google; Lily Hay Newman, “The Privacy Battle to Save Google From Itself,” Wired, November 1, 2018, https://www.wired.com/story/google-privacy-data/; Ben Popken, “Google Sells the Future, Powered by Your Personal Data,” NBC News, May 10, 2018, https://www.nbcnews.com/tech/tech-news/google-sells-future-powered-your-personal- data-n870501; Richard Graham, “Google and Advertising: Digital Capitalism in the Context of Post-Fordism, the Reification of Language, and the Rise of Fake News,” Palgrave Communications 3, no. 45 (2017): 2-4, https://doi.org/10.1057/s41599-017-0021-4. 12 “Google Advanced Protection Program,” Google, https://landing.google.com/advancedprotection/. 13 “Google Privacy and Terms, Advertising,” Google, https://policies.google.com/technologies/ads?hl=en-US. 14 Brooke Auxier et al., “American and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information,” November 15, 2019, Pew Research, https://www.pewresearch.org/internet/wp-content/uploads/sites/9/2019/11/Pew- Research-Center_PI_2019.11.15_Privacy_FINAL.pdf. 15 “Consumer Privacy Survey,” November 2019, CISCO, https://www.cisco.com/c/dam/en/us/products/collateral/security/cybersecurity-series- 2019-cps.pdf. 16 Andrew Perrin, “Half of Americans Have Decided Not to Use a Product or Service Because of Privacy Concerns,” Pew Research, April 14, 2020, https://www.pewresearch.org/fact- tank/2020/04/14/half-of-americans-have-decided-not-to-use-a-product-or-service-because- of-privacy-concerns/. 17 “Matomo vs. Google Analytics 360,” Matomo.org, https://matomo.org/matomo-vs-google- analytics comparison; Lemon, “A Comparison of Data: Piwik vs. Google Analytics,” The FPlus (blog), November 30, 2016, https://thefpl.us/wrote/about-piwik; Himanshu Sharman, “Best Google Analytics Alternatives in 2020—Matomo & Piwik Pro,” OptimizeSmart (blog), March 30, 2020, https://www.optimizesmart.com/introduction-to-piwik-best-google-analytics- alternative. 18 “Matomo vs. Google Analytics 360,” Matomo.org. https://doi.org/10.6017/ital.v36i2.9817 https://www.statista.com/statistics/266249/advertising-revenue-of-google https://www.wired.com/story/google-privacy-data/ https://www.nbcnews.com/tech/tech-news/google-sells-future-powered-your-personal-data-n870501 https://www.nbcnews.com/tech/tech-news/google-sells-future-powered-your-personal-data-n870501 https://doi.org/10.1057/s41599-017-0021-4 https://landing.google.com/advancedprotection/ https://policies.google.com/technologies/ads?hl=en-US https://www.pewresearch.org/internet/wp-content/uploads/sites/9/2019/11/Pew-Research-Center_PI_2019.11.15_Privacy_FINAL.pdf https://www.pewresearch.org/internet/wp-content/uploads/sites/9/2019/11/Pew-Research-Center_PI_2019.11.15_Privacy_FINAL.pdf https://www.cisco.com/c/dam/en/us/products/collateral/security/cybersecurity-series-2019-cps.pdf https://www.cisco.com/c/dam/en/us/products/collateral/security/cybersecurity-series-2019-cps.pdf https://www.pewresearch.org/fact-tank/2020/04/14/half-of-americans-have-decided-not-to-use-a-product-or-service-because-of-privacy-concerns/ https://www.pewresearch.org/fact-tank/2020/04/14/half-of-americans-have-decided-not-to-use-a-product-or-service-because-of-privacy-concerns/ https://www.pewresearch.org/fact-tank/2020/04/14/half-of-americans-have-decided-not-to-use-a-product-or-service-because-of-privacy-concerns/ https://matomo.org/matomo-vs-google-analytics%20comparison/ https://matomo.org/matomo-vs-google-analytics%20comparison/ https://thefpl.us/wrote/about-piwik https://www.optimizesmart.com/introduction-to-piwik-best-google-analytics-alternative https://www.optimizesmart.com/introduction-to-piwik-best-google-analytics-alternative INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 ANALYTICS AND PRIVACY | QUINTEL AND WILSON 11 19 Ryan Singel, “Google Holds Out Against ‘Do Not Track’ Flag,” Wired, April 15, 2011, https://www.wired.com/2011/04/chrome-do-not-track; Kieren McCarthy, “Do Not Track Is Back in the US Senate,” The Register, May 20, 2019, https://www.theregister.co.uk/2019/05/20/do_not_track; “How Do I Turn on the Do Not Track Features?,” Mozilla, https://support.mozilla.org/en-US/kb/how-do-i-turn-do-not-track- feature. 20 “Google Analytics Opt-Out Browser Add-On,” Google, https://support.google.com/analytics/answer/181881. 21 “IP Anonymization,” Google, https://developers.google.com/analytics/devguides/collection/analyticsjs/ip-anonymization. 22 “Managing Your Database’s Size,” Matomo.org, https://matomo.org/docs/managing-your- databases-size/ - deleting-old-unprocessed-data. 23 “Data Retention,” Google, https://support.google.com/analytics/answer/7667196?hl=en&ref_topic=2919631. 24 “IP Anonymization,” Google. 25 Katherine Schwab, “It’s Time to Ditch Google Analytics,” Fast Company, February 1, 2019, https://www.fastcompany.com/90300072/its-time-to-ditch-google-analytics. 26 “Matomo and phpMyVisites,” Matomo.org, https://matomo.org/faq/general/faq_437. 27 “Licenses,” Matomo.org, https://matomo.org/licences. 28 “Matomo (software),” Wikipedia, https://en.wikipedia.org/wiki/Matomo_(software). 29 “Matomo Requirements,” Matomo.org, https://matomo.org/docs/requirements. https://www.wired.com/2011/04/chrome-do-not-track https://www.theregister.co.uk/2019/05/20/do_not_track https://support.mozilla.org/en-US/kb/how-do-i-turn-do-not-track-feature https://support.mozilla.org/en-US/kb/how-do-i-turn-do-not-track-feature https://support.google.com/analytics/answer/181881 https://developers.google.com/analytics/devguides/collection/analyticsjs/ip-anonymization https://matomo.org/docs/managing-your-databases-size/%20-%20deleting-old-unprocessed-data https://matomo.org/docs/managing-your-databases-size/%20-%20deleting-old-unprocessed-data https://support.google.com/analytics/answer/7667196?hl=en&ref_topic=2919631 https://www.fastcompany.com/90300072/its-time-to-ditch-google-analytics https://matomo.org/faq/general/faq_437 https://matomo.org/licences https://en.wikipedia.org/wiki/Matomo_(software) https://matomo.org/docs/requirements ABSTRACT Introduction Literature Review Google Analytics Comparisons Comparisons on Privacy Implementation and Installation Security Conclusion Endnotes 12235 ---- Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results ARTICLES Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results Sam Grabus INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12235 Sam Grabus (smg383@Drexel.edu) is an Information Science PhD Candidate at Drexel University’s College of Computing and Informatics, and Research Assistant at Drexel’s Metadata Research Center. This article is the 2020 winner of the LITA/Ex Libris Student Writing Award. © 2020. ABSTRACT This research compares automatic subject metadata generation when the pre-1800s Long-S character is corrected to a standard < s >. The test environment includes entries from the third edition of the Encyclopedia Britannica, and the HIVE automatic subject indexing tool. A comparative study of metadata generated before and after correction of the Long-S demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the Long-S is not corrected. Results confirm that correcting the Long-S increases the availability of terms that can be used for creating quality metadata records. A relationship is also demonstrated between shorter entries and an increase in omitted terms when the Long-S is not corrected. INTRODUCTION The creation of subject metadata for individual documents is long known to support standardized resource discovery and analysis by identifying and connecting resources with similar aboutness .1 In order to address the challenges of scale, automatic or semi-automatic indexing is frequently employed for the generation of subject metadata, particularly for academic articles, where the abstract and title can be used as surrogates in place of indexing the full text. When automatically generating subject metadata for historical humanities full texts that do not have an abstract, anachronistic typographical challenges may arise. One key challenge is that presented by the historical “Long-S” < ſ >. In order to account for these idiosyncrasies, there is a need to understand the impact that they have upon the automatic subject indexing output. Addressing this challenge will help librarians and information professionals to determine whether or not they will need to correct the Long-S when automatically generating subject metadata for full-text pre-1800s documents. The problem of the Long-S in Optical Character Recognition (OCR) for digital manuscript images has been discussed for decades.2 Many scholars have researched methods for correcting the Long- S through the use of rule-based algorithms or dictionaries.3 While the problem of the Long-S is well-known in the digital humanities community, automatic subject metadata generation for a large corpus of pre-1800s documents is rare, as is research about the application and evaluation of existing automatic subject metadata generation tools on 18th-century documents in real-world information environments. The impact of the Long-S upon automatic subject metadata generation results for pre-1800s texts has not been extensively explored. The research presented in this paper addresses this need. The paper reports results from basic statistical analysis and visualization using the Helping Interdisciplinary Vocabulary Engineering (HIVE) tool automatic mailto:smg383@Drexel.edu INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 2 subject indexing results, before and after the correction of the historical Long-S in the 3rd edition of the Encyclopedia Britannica. Background work was conducted over the Summer and Fall of 2019, and the research presented was conducted during Winter 2020. The work was motivated by current work on the “Developing the Data Set of Nineteenth-Century Knowledge” project, a National Endowment for the Humanities collaborative project between Temple University’s Digital Scholarship Center and Drexel University’s Metadata Research Center. The grant is part of a larger project, Temple University’s “19th-Century Knowledge Project,” which is digitizing four historical editions of the Encyclopedia Britannica.4 The next section of this paper presents background covering the historical Encyclopedia Britannica data, the automatic subject metadata generation tool used for this project, a brief background of “the Long-S Problem,” and the distribution of encyclopedia entry lengths in the 3rd edition. The background section will be followed by research objectives and method supporting the analysis. Next, the results are presented, demonstrating prevalence of terms omitted from the automatic subject metadata generation results if the Long-S is not corrected to a standard small < s > character, as well as the impact of encyclopedia entry length upon these results. The results are followed by a contextual discussion, and a conclusion that highlights key findings and identifies future research. BACKGROUND Indexing for the 19th-Century Knowledge Project The 19th-Century Knowledge Project, an NEH-funded initiative at Temple University, is fully digitizing four historical editions of the Encyclopedia Britannica (the 3rd, 7th, 9th, and 11th). The long-term goal of the project is to analyze the evolving conceptualization of knowledge across the 19th century.5 The 3rd edition of the Encyclopedia Britannica (1797) is the earliest edition being digitized for this project. The 3rd edition consists of 18 volumes, with a total of 14,579 pages, and individual entries ranging from four to over 150,000 words. For each individual entry, researchers at Temple have created individual TEI-XML files from the OCR output. In order to enrich accessibility and analysis across this digital collection, The Knowledge Project will be adding controlled vocabulary subject headings into the TEI headers of each encyclopedia entry XML file. Considering the size of this corpus, both in terms of entry length and number of entries, automatic subject metadata generation will be required for the creation of this metadata. The Knowledge Project will employ controlled vocabularies to replace or complement naturally extracted keywords for this process. Using controlled vocabularies adheres to metadata semantic interoperability best practices, ensures representation consistency, and helps to bypass linguistic idiosyncrasies of these 18th and 19th Century primary source materials. 6 We selected two versions of the Library of Congress Subject Headings (LCSH) as the controlled vocabularies for this project. LCSH was selected due to its relational thesaurus structure, multidisciplinary nature, and continued prevalence in digital collections due to its expressiveness and status as the largest general indexing vocabulary.7 In addition to the headings from the 2018 edition of LCSH, headings from the 1910 LCSH are also implemented in order to provide a more multi-faceted representation, using temporally-relevant terms that may have been removed from the contemporary LCSH. The tool applied for this process is HIVE, a vocabulary server and automatic indexing application. 8 HIVE allows the user to upload a digital text or URL, select one or more controlled vocabularies, and performs automatic subject indexing through the mapping of naturally extracted keywords to the available controlled vocabulary terms. HIVE was initially launched as an IMLS linked open INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 3 vocabulary and indexing demonstration project in 2009. Since that time, HIVE has been further developed, with the addition of more controlled vocabularies, user interface options, and the RAKE keyword extraction algorithm. The RAKE keyword extraction algorithm has been selected for this project after a comparison of topic relevance precision scores for three keyword extraction algorithms.9 The Long-S Problem Early in our metadata generation efforts, we discovered that the 3rd edition of the Encyclopedia Britannica employs the historical Long-S. Originating in early Roman cursive script, the Long-S was used in typesetting up through the 18th century, both with and without a left crossbar. By the end of the 18th century, the Long-S fell out of use with printers.10 As outlined by lexicographers of the 17th and 18th centuries, the rules for using the Long-S were frequently vague, complicated, inconsistent over time, and varied according to language (English, French, Spanish, or Italian). 11 These rules specified where in a word the Long-S should be used instead of a short < s >, whether it is capitalized, where it may be used in proximity to apostrophes, hyphens, and the letters < f >, < b >, < h >, and < k >; and whether it is used as part of a compound word or abbreviation.12 This is further complicated by the inclusion of the half-crossbar, which occasionally results in two consequences: (a) The Long-S may be interpreted by OCR as an < f >, and < b > and < f > may be interpreted by OCR as a Long-S. Figure 1 shows an example from the 3rd edition entry on Russia, in which the original text specifies “of” (line 1 in top figure), yet the OCR output has interpreted the character as a Long-S. The Long-S may also occasionally be interpreted by the OCR as a lower- case < l >, such as the “univerlity of Dublin” in the 3rd edition entry on Robinson (The most Rev Sir Richard). These complications and inconsistencies are challenges when developing Python rules for correcting the Long-S in an automated way, and even preexisting scripts will need to be adapted for individual use with a particular corpus. Figure 1. Example from the 3rd edition entry on Russia, comparing the original use of a letter < f > in “of” to the OCR output of the same passage, which mistakenly interprets the character as a Long-S. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 4 Despite the transition away from the Long-S towards the end of the 18th century, the 3rd edition of the Encyclopedia Britannica (published in 1797) implements the Long-S throughout, with approximately 100,594 instances of the Long-S in the OCR output. When performing metadata generation with the HIVE tool on the OCR output for an entry, the Long-S is most often interpreted by the automatic metadata generation tool as an < f >, which can result in (a) inaccurate keyword extraction (e.g., Russians→ Ruffians), and (b) when mapping extracted keywords to controlled vocabulary terms, essential topics could be unidentifiable, and HIVE will subsequently omit them from the results because they cannot be mapped to controlled vocabulary terms. Figure 2 provides a truncated view of Long-S words in the 3rd edition entry on Rum, which are subsequently removed from the pool of automatically extracted keywords when performing the automatic subject indexing sequence in HIVE. Using keyword extraction algorithms that are largely dependent upon term frequencies, automatic subject indexing for an entry on Rum may be substantially hindered when meaningful and frequently occurring words such as sugar, and yeast are removed. Figure 2. Examples of the Long-S in the 3rd edition Encyclopedia Britannica entry on Rum. Using this example entry, the automatic subject indexing results were compared using Python, to determine which terms only appear when the Long-S has been corrected to the standard < s >. The comparison showed that 16 total terms no longer appeared in the results when the Long-S was not corrected to a standard < s >: ten terms using the 2018 LCSH, and six terms using the 1910 LCSH. These omitted results included the terms sugar and yeast. The next section will discuss the encyclopedia entry word count for this corpus, and the possible impact that this may have upon automatic subject indexing between corrected and uncorrected Long-S instances. Encyclopedia Entry Lengths Consistent with other Encyclopedia Britannica editions in the 18th and 19th centuries, the encyclopedia entries in the 3rd edition vary substantially in length. A convenience sample of 3,849 3rd edition entries ranging in length from 2 to 202,848 words demonstrated an arithmetic mean of INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 5 826.60, and a median word count of 71. As shown in figure 3, this indicates a significant skew towards shorter entry lengths. For the vast majority of encyclopedia entries in this corpus, a low total word count may impact the degree of Long-S impact for automatic subject indexing results, given the importance of term availability and frequency for keyword extraction algorithms. Figure 3. Scatterplot of word count for a convenience sample of 3,849 3rd Edition Encyclopedia Britannica entries. Large-scale metadata generation requires time, labor, and resources, and it becomes more costly when accounting for the complications of correcting the Long-S for a particular corpus. Library and information professionals working with digital humanities resources will need to understand the impact of correcting or not corrected the Long-S in the corpus before designating resources and developing a protocol for generating the automatic or semi-automatic metadata for full-text resources. This includes understanding whether or not the length of each individual document will affect the degree of Long-S impact upon the results. This challenge, and issues reviewed above, are in the research presented below. OBJECTIVES The overriding goal of this work is to determine the prevalence of omitted terms in automatic subject indexing results when the Long-S is not corrected in the 3rd edition entries of the Encyclopedia Britannica. Research questions: 1. What is the average number of terms that are omitted from automatic subject indexing results when the Long-S is not corrected to a standard < s >? 2. How does the encyclopedia entry length affect the number of terms that are omitted when the Long-S is not corrected to a standard < s >? This analysis will approach these goals by performing a comparative analysis of automatic subject indexing results to determine the number of terms that are omitted from the results when the Long-S is not corrected to a standard letter < s >. Basic descriptive statistics are generated to determine central tendency. The quantity of terms omitted are then compared with encyclopedia INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 6 entry word counts. These objectives were shaped by collaboration between Drexel University’s Metadata Research Center and Temple University’s Digital Scholarship Center. The next section of this paper will report on methods and steps taken to address these objectives. METHODS We approached this research by performing a comparative analysis of subject metadata generated both before and after the correction of the historical Long-S in the 3rd edition of the Encyclopedia Britannica. The HIVE tool was used to automatically generate the subject metadata. Descriptive statistics were applied, and visualizations produced from the results were also examined to identify trends. Figure 4. The 30 Encyclopedia Britannica 3rd edition Encyclopedia Britannica entries randomly selected for this study, sorted in ascending order by their word counts. The protocol for performing this research involved the following steps: 1. Compile a sample for testing: 1.1. A random sample of 30 encyclopedia entries was identified from a convenience sample of entries that comprise the letter S volumes of the 3rd edition. The entries range, in length, from 6 to 6,114 words. The median word count for entries in this sample is 99 words. 1.2. The sample of terms selected for this study and their respective word counts are visualized in figure 4. 1.3. For each entry, the Long-S terms in the original XML file were extracted to a list. 2. Perform automatic subject indexing sequence upon entries to generate lists of terms: 2.1. Using the 2018 and 1910 versions of the LCSH. 2.2. With fixed maximum subject heading results set to 40: 20 maximum terms returned with the 2018 LCSH, and 20 maximum terms returned with the 1910 LCSH. 2.3. Before Long-S correction and after Long-S correction, using the Oxygen XML Editor TEI to TXT transformation. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 7 3. Perform outer join on Python Data Frames, between terms generated when the Long-S has been corrected vs. terms generated when the Long-S has not been corrected. The resulting left outer join list displays terms that are omitted from the automatic indexing results if the Long-S is not corrected to a standard small < s >. The quantity of terms omitted are recorded for comparison. 4. Analysis: Descriptive statistics were generated to determine central tendency for the number and percentage of words omitted when the Long-S is not corrected. The quantity of terms omitted are also visualized in a continuous scatterplot with the corresponding word counts, to demonstrate that the quantity of terms omitted when the Long-S is not corrected seems to relate to the length of the document being automatically classified. RESULTS The results report the prevalence of omitted terms when the Long-S is not corrected to a standard < s >, as well as a visualization of the number of terms omitted as they relate to the encyclopedia entry length. For each of the 30 sample entries automatically indexed with HIVE, a fixed maximum number of 40 entries were returned: a maximum of 20 terms using the 2018 LCSH, and a maximum of 20 terms using the 1910 LCSH. As seen in table 1, central tendency is measured using the arithmetic mean and median, along with the standard deviation and range. The average number of terms omitted from an entry’s results is 6.73, and the average percentage of terms omitted from an entry’s results is 26.51 percent, with the 2018 and 1910 editions of LCSH performing at similar rates. The full results are displayed in appendix A. Table 1. Measures of centrality, standard deviation, range, and percentage for quantity of terms omitted when the Long-S is not corrected to a standard < s >, rounded to the hundredth. For each entry, a maximum of 40 terms were returned: 20 using 2018 LCSH and 20 using 1910 LCSH. The total results returned varies according to the entry length. These totals are reported in appendix B. (N= 30 entries.) For each entry in the sample, the results in appendix A display the total words omitted when the Long-S is not corrected, the number of 2018 LCSH terms omitted, the number of 1910 LCSH terms omitted, and the encyclopedia entry word count. Figure 5 visualizes the total number of terms omitted for each entry when the Long-S is not corrected, demonstrating an increase in terms omitted for entries with lower word counts. These results are broken down by vocabulary used in figure 6, demonstrating that both vocabularies used to generate these results indicate a significant increase in omitted terms for shorter entries. Column1 Both Vocabularies 2018 LCSH 1910 LCSH Average, Terms Omitted 6.73 3.67 3.07 Median, Terms Omitted 5 3 2 Standard Deviation 6.53 3.84 3.17 Range, Terms Omitted 0-24 0-13 0-11 Average Percentage, Omitted Terms 26.51% 27.51% 24.28% Median Percentage, Omitted Terms 22.36% 20.00% 19.09% INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 8 Figure 5. Number of automatic subject indexing terms that are omitted when the Long-S is not corrected to a standard < s > as compared by encyclopedia entry word count. Figure 6. Number of automatic subject indexing terms that are omitted when the Long-S is not corrected to a standard < s > as compared by encyclopedia entry word count, separated by controlled vocabulary version. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 9 DISCUSSION The analysis above presents measures of centrality for quantity of terms omitted if the Long-S is not corrected to a standard < s > prior to automatic subject indexing using HIVE, as well as a visualization to represent the relationship between encyclopedia entry word count and number of terms omitted. Although researchers have identified challenges with the Long-S and have focused a great deal on the technologies and methods used to correct it, there is still limited work on looking at the results of not correcting the Long-S character when performing an automatic subject indexing sequence. This research demonstrated an average of 6.73 potentially relevant terms omitted from automatic indexing results when the Long-S is not corrected, accounting for an average of 26.51 percent of the total results, with an approximately equal distribution of omitted terms across the two controlled vocabulary versions used. When the quantity of terms omitted is visualized using a continuous scatterplot, the results also demonstrated a significant increase in omitted terms for shorter entries, with longer entries less affected. These results reflect the impact of term frequency and total word count in keyword extraction and automatic subject indexing, with longer documents having a greater pool of total terms from which to identify key terms. Considering the complexities and similarities of the typographical characters in the original manuscript, the OCR output process for this corpus occasionally mistakes the letters < s >, < f >, < r >, and < l >. As a result, an occasional Long-S word in this study did not originally contain an < s > (e.g., sor instead of for). Correction of these Long-S OCR errors requires the development of a dictionary-based script. An additional complication of this research is that the corrected OCR output for the encyclopedia entries still contains a few errors not related to the Long-S, which will prevent the mapping of the term to any controlled vocabulary term (e.g., in the entry on Sepulchre, the OCR output for the term Palestine was Palestinc). These results are specific to this particular corpus of 3rd edition Encyclopedia Britannica entries, but it is very likely that testing another set of pre-1800s documents containing the Long-S would also illustrate that for best results with any algorithm or tool, the Long-S needs to be corrected. The results are also specific to the two versions of the LCSH used, both the 1910 LCSH and the 2018 LCSH, which are available in the HIVE tool. The 1910 version is key for the time period being studied, and the 2018, more contemporary to today, has supported additional analysis on the impact of the Long-S. Both of these vocabularies are important to the larger 19th-Century Knowledge Project. It should be noted that while the LCSH is updated weekly, we were limited to what is available via the HIVE tool, and any discrepancies that may be found with the 2020 LCSH will very likely have a minimal effect upon metadata generation results. It should be noted that the 2020 LCSH will be incorporated into HIVE soon and can be explored in future research. CONCLUSION AND NEXT STEPS The objective of this research was to determine the impact of correcting the Long-S in pre-1800s documents when performing an automatic metadata generation sequence using keyword extraction and controlled vocabulary mapping. This was accomplished by performing an automatic subject indexing sequence using the HIVE tool, followed by a basic statistical analysis to determine the quantity of terms omitted from the results when the Long-S is not corrected to a standard < s >. The number of omitted terms was also compared with the encyclopedia entry word count and visualized to demonstrate a significant increase in omitted terms for shorter INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 10 encyclopedia entries. The study was conclusive in confirming that the correction of the Long-S is a critical part of our workflow. The significance of this research is that it demonstrates the necessity of correcting the Long-S prior to performing an automatic subject indexing on historical documents. Beyond the correction of the Long-S, the larger next steps for this project are to continue to explore automatic metadata generation for this corpus. These next steps include the comparison of results using contemporary vs. historical vocabularies and streamlining a protocol for bulk classification procedures and integration of terms into the TEI-XML headers. The research presented here can inform other digital humanities and even science-oriented projects, where researchers may not be aware of the impact of the Long-S on automatic metadata generation not only for subjects, but also named entities, particularly when automatic approaches with controlled vocabularies are desired. ACKNOWLEDGEMENTS The author thanks Dr. Jane Greenberg and Dr. Peter Logan for their guidance. The author acknowledges the support of the NEH grant #HAA-261228-18. INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 11 APPENDIX A Entry Term Total Words Omitted 2018 LCSH Terms Omitted 1910 LCSH Terms Omitted Encyclopedia Entry Word Count SARDIS 24 13 11 381 SUCTION 24 13 11 38 STYLITES, PILLAR SAINTS 19 13 6 199 SHADWELL 14 10 4 211 SALICORNIA 13 6 7 254 SEPULCHRE 11 3 8 348 SITTA NUTHATCH 9 5 4 620 SPRAT 9 3 6 475 SERAPIS 8 5 3 587 STRADA 8 1 7 189 SHOAD 7 4 3 463 SIGN 7 5 2 68 SHOOTING 6 3 3 6114 STRATA 6 3 3 2920 STEWARTIA 5 4 1 72 SUBCLAVIAN 5 3 2 20 SCHWEINFURT 4 2 2 84 SCROLL 4 2 2 45 SPALATRO 4 3 1 99 SPECIAL 4 3 1 24 SAMOGITIA 3 2 1 112 SHAKESPEARE 3 0 3 3855 SINAPISM 2 1 1 25 SECT 1 1 0 20 SEVERINO 1 1 0 38 SHADDOCK 1 1 0 6 SCARLET 0 0 0 65 SHALLOP, SHALLOOP 0 0 0 42 SOLDANELLA 0 0 0 56 SPOLETTO 0 0 0 99 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 12 APPENDIX B *N = 30 entries Average Terms Returned Median Terms Returned Corrected 24.77 / 40 possible 28 / 40 possible Uncorrected 26.47 / 40 possible 29 / 40 possible 2018 LCSH Corrected 14.10 / 20 possible 19 / 20 possible 2018 LCSH Uncorrected 13.47 / 20 possible 18.5 / 20 possible 1910 LCSH Corrected 11.27 / 20 possible 11 / 20 possible 1910 LCSH Uncorrected 10.13 / 20 possible 9 / 20 possible INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 EVALUATING THE IMPACT OF THE LONG-S | GRABUS 13 ENDNOTES 1 Liz Woolcott, “Understanding Metadata: What is Metadata, and What is it For?,” Routledge (November 17, 2017), https://doi.org/10.1080/01639374.2017.1358232; Koraljka Golub et al., “A framework for evaluating automatic indexing or classification in the context of retrieval,“ Journal of the Association for Information Science and Technology 67, no. 1 (2016), https://doi.org/10.1002/asi.23600; Lynne C. Howarth, “Metadata and Bibliographic Control: Soul-Mates or Two Solitudes?,“ Cataloging & Classification Quarterly 40, no. 3-4 (2005), https://doi.org/10.1300/J104v40n03_03. 2 A. Belaid et al., “Automatic indexing and reformulation of ancient dictionaries“ (paper presented at the First International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, 2004), https://doi.org/10.1109/DIAL.2004.1263264. 3 Beatrice Alex et al., “Digitised Historical Text: Does it have to be mediOCRe" (paper presented at the KONVENS 2012 (LThist 2012 workshop), Vienna, September 21, 2012); Ted Underwood, “A half-decent OCR normalizer for English texts after 1700," The Stone and the Shell, December 10, 2013, https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english- texts-after-1700/. 4 “Nineteenth-century knowledge project," (GitHub Repository), 2020, https://tu- plogan.github.io/. 5 “Nineteenth-century Knowledge Project.” 6 Marcia Lei Zeng and Lois Mai Chan, “Metadata Interoperability and Standardization - A Study of Methodology, Part II," D-Lib Magazine 12, no. 6 (2006); G. Bueno-de-la-Fuente, D. Rodríguez Mateos, and J. Greenberg, “Chapter 10 - Automatic Text Indexing with SKOS Vocabularies in HIVE" (Elsevier Ltd, 2016); Sheila Bair and Sharon Carlson, “Where Keywords Fail: Using Metadata to Facilitate Digital Humanities Scholarship," Journal of Library Metadata 8, no. 3 (2008), https://doi.org/10.1080/19386380802398503. 7 John Walsh, “The use of Library of Congress Subject Headings in digital collections," Library Review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 8 Jane Greenberg et al., “HIVE: Helping interdisciplinary vocabulary engineering,“ Bulletin of the American Society for Information Science and Technology 37, no. 4 (2011), https://doi.org/10.1002/bult.2011.1720370407. 9 Sam Grabus et al., “Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries,” NASKO 7 (2019), pp. 138-48, https://doi.org/10.7152/nasko.v7i1.15635. 10 Karen Attar, “S and Long S," in Oxford Companion to the Book, eds. Michael Felix Suarez and H. R. II Woudhuysen (Oxford: Oxford University Press, 2010); Ingrid Tieken-Boon van Ostade, “Spelling systems,“ in An Introduction to Late Modern English (Edinburgh University Press, 2009). 11 Andrew West, “The Rules for Long-S," TUGboat 32, no. 1 (2011). 12 Attar, “S and Long S.” https://doi.org/10.1080/01639374.2017.1358232 https://doi.org/10.1002/asi.23600 https://doi.org/10.1300/J104v40n03_03 https://doi.org/10.1109/DIAL.2004.1263264 https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tu-plogan.github.io/ https://tu-plogan.github.io/ https://doi.org/10.1080/19386380802398503 https://doi.org/10.1108/00242531111127875 https://doi.org/10.1002/bult.2011.1720370407 https://doi.org/10.7152/nasko.v7i1.15635 ABSTRACT INTRODUCTION Background Indexing for the 19th-Century Knowledge Project The Long-S Problem Encyclopedia Entry Lengths Objectives Methods Results Discussion Conclusion and Next Steps Acknowledgements Appendix A Appendix B 12367 ---- Seeing through Ontologies EDITORIAL BOARD THOUGHTS Seeing through Vocabularies Kevin Ford INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12367 Kevin Ford (kevinford@loc.gov) is Librarian, Linked Data Specialist in the Library of Congress’s Network Development and MARC Standards Office. He works on the Library’s Bibframe Initiative, and similar projects, such as MADS/RDF, and is a member of the ITAL Editorial Board. The ideas and opinions expressed here are those of the author and do not necessarily reflect those of his employer. “Ontologies” are popular in library land. “Vocabularies” are popular too, but it seems that the library profession prefers “ontologies” over “vocabularies” when it comes to defining classes and properties that attempt to encapsulate some realm of knowledge. Bibframe, MADS/RDF, BIBO, PREMIS, and FRBR are well-known “ontologies” in use in the library community.1 They were defined either by librarians or to be used mainly in the library space, or both. SKOS, FOAF, Dublin Core, and Schema are well known “vocabularies.”2 They are used widely by libraries though none were created by librarians or specifically for library use. In all cases, those ontologies and vocabularies were created for the very purpose of publication for broader use, which is one of the primary objectives behind creating one: to define a common set of metadata elements to facilitate the description and sharing of data within a group or groups of users. Ontologies and vocabularies are common when working with RDF (Resource Description Framework), a very simple data model in which information is expressed as a series of triple statements, each consisting of three parts: a subject, a predicate, and an object. The types of ontologies and vocabularies referred to here are in fact defined using RDF—Thing A is a Class and Thing Z is a Property. Those using any given ontology or vocabulary employ the defined classes and properties to further describe their Things, for a lack of a better word. It is useful to provide an example. The first block of triples below represents Class and Property definitions in RDF Schema (RDFS), which provides some very basic means to define classes and properties and some relationships between them, such as the domains and ranges for properties. The second block is instance data. ontovoc:Book rdf:type rdfs:Class ontovoc:authoredBy rdf:type rdf:Property ontovoc:authorOf rdf:type rdf:Property ex:12345 rdf:type ontovoc:Book ex:12345 ontovoc:authoredBy ex:abcde ontovoc:Book is defined as a Class and ontovoc:authoredBy is defined as a Property. Using those declarations, it is possible to then assert that ex:12345, which is an identifier, is of type ontovoc:Book and was authored by ex:abcde, an identifier for the author. Is the first block— the definitions—an “ontology” or a “vocabulary?” Putting aside the question for now, air quotes— in this case literal quotes—have been employed around “ontologies” and “vocabularies” to suggest that these are more terms of art than technical distinctions, though it must also be acknowledged that there is a technical distinction to be made. mailto:kevinford@loc.gov INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 SEEING THROUGH VOCABULARIES | FORD 2 Ontologies in the RDF space frequently, if not always, use classes and properties from the Web Ontology Language (known as OWL) to define a specific realm’s classes and properties and how they relate to each other within that realm of knowledge. This is because OWL is a more expressive definition language than basic RDFS. Using OWL, and considering the example above, ontovoc:authoredBy could be defined as an inverse of ontovoc:authorOf. ontovoc:authoredBy owl:inverseOf ontovoc:authorOf In this way, and given the little instance data above (the two triples that begin ex:12345), it is then possible to infer the following bit of knowledge: ex:abcde ontovoc:authorOf ex:12345 Now that the owl:inverseOf triple/declaration has been added to the definitions, it’s worth re- asking: Do the definitions represent an “ontology” or a “vocabulary?” A purist might answer “not an ontology,” but only because those statements have not been combined in a document, which itself has been given a URI and declared to be an owl:Ontology. That’s the actual OWL Class that says, “This is an OWL Ontology.” But let’s say those statements had been added to a document published at a URI and declared to be an owl:Ontology. Is it an ontology now? Perhaps in a strict sense the answer is “yes.” But in a practical sense few would view those four declarations, wrapped neatly in a document that has been given a URI and called an Ontology, as an “ontology.” It doesn’t quite rise to the occasion—“ontologies” almost always have a broader scope and employ more formal semantics—making its use a term of art, often, rather than a real technical distinction. Yet, based on the same narrow definition (a published document declaring itself to be an OWL:Ontology) combined with a far more extensive set of class and property definitions with defined relationships between them, it is possible to describe FOAF as an ontology.3 But it is widely known as, and understood as, a “vocabulary.” (There is also an experimental version of Schema as OWL.4) And that gets to the crux of the issue in many ways. Putting aside the technical distinction that can be argued to identify something as an “ontology” versus a “vocabulary,” there are non-technical semantics at work here—what was earlier described as a “term of art”—about when, how, and why something is deemed an “ontology” versus a “vocabulary.” The library community appears to think of their creations as “ontologies” and not “vocabularies,” even when the documentation tends to avoid the word “ontology.” For example, the opening sentence of the Bibframe and MADS/RDF documentation very clearly introduces each as a “vocabulary,” as does FRBR in RDF.5 On the surface they may be presented as “vocabularies,” which they are of course, but despite this prominent self-declaration they are not seen in the same light as FOAF or Schema but instead as something more exacting, which they also are. It is worth contemplating why they are viewed principally as “ontologies” and to examine whether this has been beneficial. Perhaps the ideas behind designating something a “vocabulary” are, in fact, more in line with the way libraries operate, whereas “ontologies” represent an ideal (and who doesn’t set their sights on the ideal?), striving toward which only exposes shortcomings and sows confusion. The answer to “why” is historical and probably derives from a combination of lofty thinking, traditional standards practices, and good ol’ misunderstanding. Traditional standards practices favor more formal approaches. Libraries’ decades-long experience with XML and XML Schema INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 SEEING THROUGH VOCABULARIES | FORD 3 contributed significantly to this mindset. XML Schema provides a way to describe the precise construction of an XML document and it can then be used to validate the XML document. XML Schema defines what elements and attributes are permitted in the XML document and frequently dictates their order. It can further constrain the values of an element or attribute to a select list of options. In many ways, XML Schema was the very expression of metadata quality control. Librarians swooned. With the right controls and technology in place, it was impossible to produce poor, variable metadata. In the case of semantic modelling, OWL is certainly a more formal approach. It’s founded in description logics whose expressions take the form of occult-like mathematics, at least as viewed by a librarian with a humanities background. OWL can be used to declare domains and ranges for properties. One can also designate a property as a Datatype Property, meaning it takes a value such as a string or a date, as its value, or an Object Property, which means it will reference another RDF resource as its object. But these declarations are actually more about inferencing—deriving information by applying the ontology against some instance data—and not about restrictions, constraints, or validation. To be clear, there are ways to apply restrictions in OWL—“wine can be either red or white”—but this is a form of advanced OWL modelling that is not well understood and not often implemented, and virtually never in ontologies designed by librarians. Conversely, indicating a domain for a property, for example, is easy, relatively straightforward, and seductive because it gives the appearance that the property can only be used with resources of a specific class. Consider: The domain of ontovoc:authoredBy is ontovoc:Book. That does not mean that the ontovoc:authoredBy can only be used with a ontovoc:Book resource. It means that whatever resource uses ontovoc:authoredBy must therefore be a ontovoc:Book. Defining that domain for that property is not restricting its use only to books; it allows one to derive the additional knowledge that the thing it is used with must be a book even if it doesn’t identify itself as one. This may seem like a subtle distinction and/or it may seem like tortured logic, but if it does it may suggest that one’s point of view, one’s mindset, favors constraints, restrictions, and validations. And that’s OK. That’s library training and conditioning, completely reinforced in our daily work. It’s what has been taught in library schools for decades and practiced by library professionals even longer. Names should be entered “last name, first name” and any middle initial, if known, included. The data in this field should only be a three-character language code from this approved list of language codes. These rules and the consistency resulting from these rules are what make library data so often very high quality. Google loves MARC records from our community for this very reason. Wishing to exert strong control at the definition level when creating a model or metadata scheme with an eye to data quality, it is a natural inclination for librarians to gravitate to a more formal means of defining a model, especially one that seems to promise constraints. So, despite these models self-describing at a high-level as vocabularies, the models themselves employ a considerable amount of OWL at the technical level, which becomes the focus of any users wishing to implement the model. Users comprehend these models as something more than a vocabulary and therefore view the model through this more complex lens. Unfortunately, because OWL is poorly understood (sometimes by creators and sometimes by users, and sometimes by both), this leads to various problems. On the one hand, creators and users believe there are technical restrictions or constraints where there are, in fact, none. When this happens, the “constraint” is INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 SEEING THROUGH VOCABULARIES | FORD 4 either identified as a problem (“Consider removing the range for this property”) or—and this is more damaging—the property (read: model/vocabulary/ontology) is avoided. Even when it is recognized that the “constraint” is not a real restriction (just a means to infer knowledge), forging ahead can generate new issues. When faced with a domain and range declaration, for example, forging ahead can result in inaccurate, imprecise, or simply undesirable inferences. Most of the currently open “issues” (about 50 at the time of writing) about Bibframe follow a basic pattern: 1) there is a declaration about this Property or this Class that makes it difficult to use because of how it has been defined with OWL; 2) we cannot really use it presently because it would cause potential inferencing issues; 3) consider altering the OWL definitions.6 Pursuing an (OWL) ontology, while formal and seemingly comforting because it feels a little like constraining the metadata schema, can result in confusion and a lack of adoption. Given that vocabularies and ontologies are developed and published to encourage users to describe their data in a way that fosters wide consumption by others, this is unfortunate to say the least. It is notable that SKOS, FOAF, Dublin Core, and Schema have very different scopes and potentially much wider user bases than the more library-specific ontologies (Bibframe, MADS/RDF, BIBO, etc.). There is something to be learned here: the smaller the domain, the more effective an ontology might be; the larger the universe, a more general approach may be better. It is further true that FOAF, Dublin Core, and Schema define specific domains and ranges for many of their properties, but they have strived for clarity and simplicity. The creators of Schema, for example, eschewed the formal semantics behind RDFS and OWL and redefine domain and range to better match their needs and (perhaps unexpectedly) most users’ automatic understanding.7 What is generally true is that each of the “vocabularies” approached the creation and defining of their models so as to minimize the use of formal semantics, and promoted this as a feature. In this way, they limited or removed altogether the actual or psychological barriers to adoption. Their offering was more accessible, less fussy. Bearing in mind the differences in scale and scope, they have been rewarded with a wider adopter base and passionate advocates. The decision to create a “vocabulary” or an “ontology” is a technical one and a political one, both of which must be in alignment. It’s a mindset and it is a statement. It is entirely possible to define the model at a technical level using OWL, making it by definition an ontology, but to have it be perceived, and used, as a vocabulary because it is flexible and not strictly defined. Likewise, it is not enough to call something a vocabulary, but in reality be a model burdened with formal semantics that is then expected to be adopted and used widely. If the objective is to fashion a (pseudo?) restrictive metadata set with rules that inform its use, and which is strongly bonded with a specific community, develop an “ontology,” but recognize that this may result in confusion and lack of uptake. If, however, the desire is to cultivate a metadata element set that is flexible, readily useable, and positioned to grow in the future because it employs fewer rules and formal semantics, create a “vocabulary.” That’s really what is being communicated when we encounter ontologies and vocabularies. Interestingly, the political difference between “vocabulary” and “ontology” appears, in fact, to be understood by librarians: library models self-identify as “vocabularies.” But once past those introductory remarks, the truth is exposed quickly in the widespread use of OWL, revealing beyond doubt that it is not a flexible, accommodating vocabulary but a strictly defined model. To dispense with the air quotes: as librarians we’re creating ontologies and calling them vocabularies. We really want to be creating vocabularies that are ontologies in name only. INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 SEEING THROUGH VOCABULARIES | FORD 5 ENDNOTES 1 “Bibframe Ontology,” Library of Congress, accessed May 21, 2020, http://id.loc.gov/ontologies/bibframe.html; “MADS/RDF (Metadata Authority Description Schema in RDF),” Library of Congress, accessed May 21, 2020, http://id.loc.gov/ontologies/madsrdf/v1.html; “Bibliographic Ontology Specification,” The Bibliographic Ontology, accessed May 21, 2020, http://bibliontology.com/; “PREMIS 3 Ontology,” Premis Editorial Committee, accessed May 21, 2020, http://id.loc.gov/ontologies/premis3.html; Ian Davis and Richard Newman, “Expression of Core FRBR Concepts in RDF,” accessed May 21, 2020, https://vocab.org/frbr/. 2 Alistair Miles and Sean Bechhofer, editors, “SKOS Simple Knowledge Organization System Reference,” W3C, accessed May 21, 2020, https://www.w3.org/TR/skos-reference/; Dan Brickley and Libby Miller, “FOAF Vocabulary Specification 0.99,” accessed May 21, 2020, http://xmlns.com/foaf/spec/; “DCMI Metadata expressed in RDF Schema Language,” Dublin Core™ Metadata Initiative, accessed May 21, 2020, https://www.dublincore.org/schemas/rdfs/; “Welcome to Schema.org,” Schema.org, accessed May 21, 2020, http://schema.org/. 3 “FOAF Ontology,” xmlns.com, accessed May 21, 2020, http://xmlns.com/foaf/spec/index.rdf. 4 See “OWL” at “Developers,” schema.org, accessed May 21, 2020, https://schema.org/docs/developers.html. 5 See “Bibframe Ontology” and “MADS/RDF (Metadata Authority Description Schema in RDF)” above. 6 “Issues,” Bibframe Ontology at GitHub, accessed 21 May 2020, https://github.com/lcnetdev/bibframe-ontology/issues. 7 R.V. Guha, Dan Brickley, and Steve Macbeth, “Schema.org: Evolution of Structured Data on the Web,” acmqueue 15, no. 9 (15 December 2015): 14, https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1. http://id.loc.gov/ontologies/bibframe.html http://id.loc.gov/ontologies/madsrdf/v1.html http://bibliontology.com/ http://id.loc.gov/ontologies/premis3.html https://vocab.org/frbr/ https://www.w3.org/TR/skos-reference/ http://xmlns.com/foaf/spec/ https://www.dublincore.org/schemas/rdfs/ http://schema.org/ http://xmlns.com/foaf/spec/index.rdf https://schema.org/docs/developers.html https://github.com/lcnetdev/bibframe-ontology/issues https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1 ENDNOTES 12383 ---- Facing What’s Next, Together LITA PRESIDENT’S MESSAGE Facing What’s Next, Together Emily Morton-Owens INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12383 Emily Morton-Owens (egmowens.lita@gmail.com) is LITA President 2019-20 and the Acting Associate University Librarian for Library Technology Services at the University of Pennsylvania Libraries. When I wrote my March editorial, I was optimistically picturing some of the changes that we are now seeing for LITA—while being scarcely able to imagine how the world and our profession would need to adapt quickly to the impacts on library services as a result of COVID-19. It is a momentous and exciting change for us to turn the page on LITA and become Core, yet this suddenly pales in comparison to the challenges we face as professionals and community members. Libraries’ rapid operational changes show how important the ingenuity and dedication of technology staff are to our libraries. Since states began to shut down, our listserv, lita-l, has hosted discussions on topics like how to provide person-to-person reference and computer assistance remotely, how to make computer labs safe for re-occupancy, how to create virtual reading lists to share with patrons, and how to support students with limited internet access. There has been an explosion in practical problem-solving (ILS experts reconfiguring our systems with new user account settings and due dates), ingenuity (repurposing 3D printers and conservation materials to make masks), and advocacy (for controlled digital lending). Sometimes the expense of library technologies feels heavy, but these tools have the ability to scale services in crucial ways—making them available to more people at the same time, available to people who can only take advantage after hours, available across distances. Technologists are focused on risk, resilience, and sustainability, which makes us adaptable when the ground rules change. Our websites communicate about our new service models and community resources; ILL systems regenerate around increased digital delivery; reservation systems for laptops now allocate the use of study seating. Our library technology tools bridge past practices, what we can do now, and what we’ll do next. One of our values as ALA members is sustainability. (We even chose this as the theme for LITA’s 2020 team of Emerging Leaders.) Sustainability isn’t about predicting the future and making firm plans for it; it’s about planning for an uncertain future, getting into a resilient mindset, and including the community in decision-making. Although the current crisis isn’t climate-related per se, this way of thinking is relevant to helping libraries serve their communities. We will need this agile mindset as we confront new financial realities. Our libraries and ALA itself are facing difficult budget challenges, layoffs, reorganizations, and fundamental conversations about the vitalness of the services we provide. My favorite example from my own library of a COVID-19 response is one where management, technical services, and IT innovated together. Our leadership negotiated an opportunity for us to gain access to digitized, copyrighted material from HathiTrust that corresponds to print materials currently locked away in our library building. Thanks to decades of careful effort by our technical services team, we had accurate data to match our print records with records for the digital versions. Our IT team had processes for loading the new links into our catalog almost mailto:egmowens.lita@gmail.com INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 FACING WHAT’S NEXT, TOGETHER | MORTON-OWENS 2 instantaneously. The result was a swift and massive bolstering of our digital access precisely when our users needed it most. This collaboration perfectly illustrates how natural our merger with ALCTS and LLAMA is. As threats to our profession and the ways we’ve done things in the past gather around us, I am heartened by the strengths and opportunities of Core. It is energizing to be surrounded by the talent of our three organizations working together. I hope more of our members experience that over the summer and fall, as we convene working groups and hold events together, including a unique social hour at ALA Virtual and an online fall Forum. I close out my year serving as the penultimate LITA president in a world with more sadness and uncertainty than we could have foreseen. We are facing new expectations and new pressures, especially financial ones. As professionals and community members, we are animated by our sense of purpose. While LITA has been transformed by our vote to continue as Core, the support and inspiration we provide each other in our association will carry on. 12391 ---- LibraryVPN: A New Tool to Protect Patron Privacy PUBLIC LIBRARIES LEADING THE WAY LibraryVPN A New Tool to Protect Patron Privacy Chuck McAndrew INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12391 Chuck McAndrew (chuck.mcandrew@leblibrary.com) is Information Technology Librarian, Lebanon (NH) Public Libraries. Due to increased public awareness of online surveillance, a rise in massive data breaches, and spikes in identity theft, there is a high demand for privacy enhancing services. VPN (Virtual Private Network) services are a proven way to protect online security and privacy. VPN’s effectiveness and ease of use have led to a boom in VPN service providers globally. VPNs protect privacy and security by offering an encrypted tunnel from the user’s device to the VPN provider. VPNs ensure that no one who is on the same network as the user can learn anything about their traffic except that they are connecting to a VPN. This prevents surveillance of data from any source, including commercial snooping such as your ISP trying to monetize your browsing habits by selling your data, malicious snooping such as a fake wifi hotspot in an airport hoping to steal your data, or government-level surveillance that can target political activists and reporters in repressive countries. Some people might ask why we need a VPN as HTTPS becomes more ubiquitous and provides end to end encryption for your web traffic. HTTPS will encrypt the content that goes over the network, but metadata such as the site you are connecting to, how long you are there, and where you go next are all unprotected. Additionally, some very important network protocols, such as DNS, are unencrypted and anyone can see them. A VPN eliminates all of those issues. However, there are two major problems with current VPN offerings. First, all reliable VPN solutions require a paid subscription. This puts them out of reach of economically vulnerable populations who often have no access to the internet in their homes. In order to access online services, they may rely on public internet connections such as those provided by restaurants, coffee shops, and libraries. Using publicly accessible networks without the security benefits of a VPN puts people’s security and privacy at great risk. This risk could be eliminated by providing free access to a high-quality VPN service. The second problem is that using a VPN requires people to place their trust in whatever VPN company they use. Some (especially free solutions) have proven not to be worthy of that trust by containing malware or leaking and even outright selling customer data. Companies that abuse customer data are taking advantage of vulnerable populations who are unable to afford more expensive solutions or who do not have the knowledge to protect themselves. Together, these two problems create a situation where having security and privacy is only available to those who can afford it and have the knowledge to protect themselves. Libraries are ideally positioned to help with this situation. Libraries work to provide privacy and security to people every day. This can mean teaching classes, making privacy resources available, and even advocating for privacy- friendly laws. mailto:chuck.mcandrew@leblibrary.com https://www.forbes.com/sites/forbestechcouncil/2018/07/10/the-future-of-the-vpn-market/#5b08fd8e2e4d https://research.csiro.au/ng/wp-content/uploads/sites/106/2016/08/paper-1.pdf INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 LIBRARYVPN | MCANDREW 2 Libraries are also located in almost every community in the United States and enjoy a high level of trust from the public. Librarians can be thought of as being a physical VPN. People who come into libraries know that what they read and information that they seek out will be protected by the library. In fact, libraries have helped to get laws protecting the library records of patrons in all 50 states of the USA. People know that when a library offers a service to their community it isn’t because they want to sell their information or show them advertisements. With libraries, our patrons are not the product. Libraries also already provide many online services to all members of their community, regardless of financial circumstances. Examples include access to online databases, language learning software, and online access to periodicals such as the New York Times or Consumer Reports. Many of these services would cost too much for individual patrons to access individually. By pooling their resources, communities are able to make more services available to all of their citizens. To help address the above issues, the Lebanon Public Libraries, in partnership with the Westchester (New York) Library System, the LEAP Encryption Access Project (https://leap.se/), and TJ Lamanna (Emerging Technology Librarian from Cherry Hill Public Library and Library Freedom Institute Graduate) started the LibraryVPN project. This project will allow libraries to offer a VPN to their patrons. Patrons will be able to download the LibraryVPN application on a device of their choosing and connect to their library’s VPN server from wherever they are. LibraryVPN was first conceived a number of years ago, but the real start of the project was when it received an IMLS National Leadership Grant (LG-36-19-0071-19) in 2019. This grant was to develop integrations between LEAP’s existing VPN solution and integrated library systems using SIP2 which will allow library patrons to sign in to LibraryVPN using their library card. This grant also included development of a Windows client (there was already a Mac and Linux client) and alpha testing at the Lebanon Public Libraries and Westchester Library System. We are currently working on moving into the testing phase of the software, and planning phase two of this project. Phase two of LibraryVPN will involve expanding our testing to up to 12 libraries and conducting end-user testing with patrons and library staff. We have submitted an application for IMLS funding for phase two and are actively looking for libraries that are excited about protecting patron privacy and would like to help us beta test this software. If you work for a library that would be interested in participating, you can reach us via email at libraryvpn@riseup.net or @libraryvpn on twitter. If you would like to help out with this project in another way, we would love to have more help. Please reach out. We currently are thinking about three deployment models for libraries in phase two. First would be an on-premises deployment. This would be for larger library systems with their own servers and IT staff. LibraryVPN is free and open source software and can be deployed by anyone. Since it uses SIP2 to connect to your ILS, it should work with any ILS that supports the SIP2 protocol. This deployment model has the advantage of not requiring any hosting fees but does require the library system to have staff that can deploy and manage public facing services. Drawbacks to this approach would include higher bandwidth use and dealing with abuse complaints. Phase 2 testing should give us better data about how much of an issue this will be, but https://leap.se/ mailto:libraryvpn@riseup.net INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020 LIBRARYVPN | MCANDREW 3 our experience hosting a Tor exit node at the Lebanon Public Libraries suggest that it won’t be too bad to deal with. Our second deployment model would be cloud hosting. If a library has IT staff who can deploy services to the cloud, they could host their own LibraryVPN service without needing their own hardware. However, when deploying to the cloud, there will be ongoing costs for running the servers and bandwidth used. Figuring out how much bandwidth an average user will consume is part of the data we are hoping to get from our phase 2 testing so we can offer guidelines to libraries who choose to deploy their own LibraryVPN service. Finally, we are looking at a hosted version of LibraryVPN. We anticipate that smaller systems that do not have dedicated servers or IT staff will be interested in this option. In this case, there would be ongoing hosting and support costs, but managing the service would not be any more complicated than subscribing to any other service the library hosts for their patrons. LibraryVPN is a new project that is pushing library services outside of the library to where the library is. We want to make sure that all of our patrons are protected, not just those with the financial ability and technical know-how to get their own VPN service. As librarians, we understand that privacy and intellectual freedom are joined, and we want to maximize both. As the American Library Association’s Code of Ethics says, “We protect each library user's right to privacy and confidentiality.” http://www.ala.org/tools/ethics 12405 ---- Letter from the Editor: A Blank Page LETTER FROM THE EDITOR A Blank Page Kenneth J. Varnum INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020 https://doi.org/10.6017/ital.v39i2.12405 Nothing is as daunting as a blank page, particularly now. As I sat down to write this issue’s letter, I was struck by how much fundamental uncertainty is in our lives, so much trauma. A blank page can emphasize our concerns that the old familiar should return at all, or that a new, better, normal will emerge. At the same time, a blank page can be liberating at a time when so much of our social, professional, and personal lives needs to be reconceptualized and reactivated in new, healthier , more respectful and inclusive ways. We are collectively faced with two important societal ailments. The first is the literal disease of the COVID-19 pandemic that has been with us for only months. The other is the centuries-long festering disease of racial injustice, discrimination, and inequality that typifies (particularly, but not uniquely) American society. While some of us may be in better positions to help heal one or the other of these two ailments, we can all do something in both, as different as they are. Lend emotional support to those in need of it, take part in rallies if your personal health and circumstances allow, and advocate for change to government officials at all levels from local to national. Learn about the issues and explore ways you can make a difference on either or both fronts. I hope I am not being foolish or naive when I say I believe the blank page before us as a society will be liberating: an opportunity to shift ourselves toward a better, more equitable, more just path. * * * * * * To rephrase Humphrey Bogart’s Rick Blaine in Casablanca, “it doesn’t take much to see that the problems of three little people library association divisions don’t amount to a hill of beans in this crazy world.” But despite the small global impact of our collective decision, I am glad our ALCTS, LLAMA, and LITA colleagues chose a united future as Core: Leadership, Infrastructure, Futures. Watch for more information about what the merged division means for our three divisions and this journal in the months to come. Sincerely, Kenneth J. Varnum, Editor varnum@umich.edu June 2020 https://core.ala.org/ mailto:varnum@umich.edu 12457 ---- The Role of the Library in the Digital Economy ARTICLE The Role of the Library in the Digital Economy Serhii Zharinov INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.12457 Serhii Zharinov (serhii.zharinov@gmail.com) is Researcher, State Scientific and Technical Library of Ukraine. © 2020. ABSTRACT The gradual transition to a digital economy requires all business entities to adapt to the new environmental conditions that are taking place through their digital transformation. These tasks are especially relevant for scientific libraries, as digital technologies make changes in the main subject field of their activities, the processes of creating, storing, and information disseminating. In order to find directions for the transformation of scientific libraries and determine their role in the digital economy, a study of the features of digital transformation and the experience of the digital transformation of foreign libraries was conducted. Management of research data, which is implemented through the creation of Current Research Information Systems (CRIS) was found to be one of the most promising areas of the digital transformation of libraries. The problem area of this direction and ways of engaging libraries in it have been also analyzed in the work. INTRODUCTION The transition to a digital economy contributes to the even greater penetration of digital technologies into our lives and the emergence of new conditions of competition and trends in organizations’ development. Big Data, machine learning, and artificial intelligence are becoming common tools implemented by the pioneers of digital transformation in their activities.1 Significant changes in the main functions of libraries, storage and dissemination of information caused by the development of digital technologies, affect the operational activities of libraries, user and partners’ requests to the library, and ways to meet them. In the process of adapting to these changes, the role of libraries in the digital economy is changing. This study is designed to find current areas of library development and to determine the role of the library in the digital economy. Achieving this goal requires study of the “digital economy” concept and the peculiarities of the digital transformation of organizations in order to better understand the role of the library in it; research on the development of libraries and determine what best fits the new role of the library in the digital economy; identification of obstacles to the development of this area and ways to engage libraries in it. THE CONCEPT OF THE “DIGITAL ECONOMY” The transition to an information society and digital economy will gradually change all industries, and all companies must change accordingly.2 Taking advantage of the digital economy is the main driving force of innovation, competitiveness, and economic development of the country.3 The transition to a digital economy is not instant but occurs over many years. The topic emerged starting at the end of the twentieth century, but in recent years has experienced rapid growth. In the Web of Science (WoS) citation database, publications with this term in the title began to be published in 1996 (figure 1). mailto:serhii.zharinov@gmail.com INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 2 Figure 1. The number of publications in the WoS citation database for the query “digital economy.” One of the first books devoted entirely to the study of the digital economy concept is the work of Don Tapscott, published in 1996. In this book, the author understands the digital economy as an economy in which the use of digital computing technologies in economic activity becomes its dominant component.4 Thomas Mesenbourg, an American statistician and economist, identified in 2000 the three main components of the digital economy: e-business, e-commerce, and e-business infrastructure.5 A number of works on the development of indicators to assess the state of the digital economy, in particular, the work of Philip Barbet and Nathalie Coutinet, are based on the analysis of these components.6 Alnoor Bhimani, in his 2003 paper, “Digitization and Accounting Change,” defined the digital economy as “the digital interrelationships and dependencies between emerging communication and information technologies, data transfers along predefined channels and emerging platforms, and related contingencies within and across institutional and organizational entities.”7 Bo Carlsson’s 2004 article described the digital economy as a dynamic state of the economy characterized by the constant emergence of new activities based on the use of the Internet and new forms of communication between different authors of ideas, whose communication allows them to generate new activities.8 In 2009, John Hand gave the meaning of the digital economy as the new design or use of information and communication technologies that help transform the lives of people, society, or business.9 0 100 200 300 400 500 600 700 1995 2000 2005 2010 2015 2020 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 3 Ciocoiu Carmen Nadia, in her 2011 article, explained the digital economy as a state of the economy where knowledge and networking begin to play a more important role than capital in a post- industrial society due to technology.10 In a 2014 article, Kit Lesya defined the digital economy as an element of the network economy, characterized by the transformation of all spheres of the economy by transferring information resources and knowledge to a computer platform for further use.11 Ukrainian scientists Mykhailo Voinarenko and Larysa Skorobohata, in a study of network tools in 2015, gave the following definition of the digital economy: “The digital economy, unlike the Internet economy, assumes that all economic processes (except for the production of goods) take place independently of the real world. Goods and services do not have a physical medium but are ‘electronic.’”12 Yurii Pivovarov, director of the Ukrainian Association for Innovation Development (UAID), gives the following definition: “Digital economy is any activity related to information technology. And in this case, it is important to separate the terms: digital economy and IT sphere. After all, it is not about the development of IT companies, but about the consumption of services or goods they provide—online commerce, e-government, etc.—using digital information technology.”13 Taking into account the above, in this study, the digital economy is defined as digital infrastructure encompasses all business entities and their activities. The transition to the digital economy is the process of creating conditions for the digital transformation of organizations, the creation of digital infrastructure, and the process of gradual involvement of various economic entities and certain sectors of the economy in the digital infrastructure. One of the first practical and political manifestations of the transition to the digital economy was the European Commission’s Index of Digital Economy and Society (DESI), first published in 2014. The main components of the index are communications, human capital, Internet use, digital integration, and digital public services. Among European countries in 2019, there is significant progress in the digitalization of business and in the interaction of society with the state.14 For Ukraine, the first step towards the digital economy was the Digital Economy and Development Concept of Ukraine, which defines the understanding of the digital economy, the direction and principles of transition to it.15 Thus, for active representatives of the public sector, this concept is a signal that the development of structures and organizations should be based not on improving operational efficiency, but on transformation in accordance with the requirements of Industry 4.0. Confirmation of the seriousness of the Ukrainian government’s intentions in this direction is the creation of the Ministry of Digital Transformation in 2019 and the digitization of the latest public services through online services.16 One of the priority challenges which needs to be solved at the stage of transition to the digital economy is the development of skills in working with digital technologies in the entire population . This is relevant not only for Ukraine, but also for the European Union. In Europe, a third of the active workforce does not have basic skills in working with digital technologies; in Ukraine, 15.1 percent of Ukrainians do not have digital skills, and the share of the working population with below-average digital skills is 37.9 percent.17 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 4 Part of the solution to this challenge in Ukraine is entrusted to the “Digital Education” project, implemented by the Ministry of Digital Transformation (osvita.diia.gov.ua), which through the mini-series created by him for different target audiences should form digital literacy in the population of Ukraine. FEATURES OF DIGITAL TRANSFORMATION Developed digital skills in the population make the digital transformation of organizations not just a competitive advantage, but a prerequisite for their survival. Thus, the larger the target audience is accustomed to the benefits of the digital economy, the more actively the organization is to adapt to new requirements and customer needs, to the new competitive environment. Digital transformation of the organization is a complex process that is not limited to the implementation of software in the company’s activities or automation of certain components of production. It includes changes to all elements of the company, including methods of manufacturing and customer service, the organization’s strategy and business model, approaches , and management methods. According to a study by McKinsey, the integration of new technologies into a company's operations can reduce profits in 45 percent of cases.18 Therefore, it is extremely important to have a comprehensive approach to digital transformation, understanding the changes being implemented, choosing the method of their implementation, and gradually involving all structural units and business processes in the process of transformation. The Boston Consulting Group study identified six factors necessary for the effective use of the benefits of modern technologies:19 • connectivity of analytical data; • integration of technologies and automation; • analysis of results and application of conclusions; • strategic partnership; • competent specialists in all departments; and • flexible structure and culture. McKinsey consultants draw attention to the low percentage of successful digital transformation practices and based on the successful experience of 83 companies form five categories of recommendations that can contribute to successful digitalization:20 • involvement of leaders experienced in digitalization; • development of digital staff skills; • creating conditions for the use of digital skills by staff; • digitization of tools and working procedures of the company; and • establishing digital communication and ensuring the availability of information. Experts at the Institute of Digital Transformation identify four main stages of digital transformation in the company:21 1. Research, analysis and understanding of customer experience. 2. Involvement of the team in the process of digital transformation and implementation of corporate culture, which contributes to this process. 3. Building an effective operating model based on modern systems. 4. Transformation of the business model of the organization. https://osvita.diia.gov.ua/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 5 The “Integrated Model of Digital Transformation” study identifies one of the key factors of successful digital transformation, focusing on priority digital projects, the development and implementation of which should be engaged in specific organizational teams. The authors identify three main functional activities for digital transformation teams, the implementation of which provides a gradual comprehensive renewal of the company, namely: the creation and implementation of digital strategy, digital activity management, digitization of operational activities.22 In their study, Ukrainian scientists Natalia Kraus, Oleksandr Holoborodko, and Kateryna Kraus determine that the general pattern for all digital economy projects is their focus on a specific consumer and comprehensive use of available information about the latter and the conditions of project effectiveness.23 Initially, the project is pre-tested on a small scale, and only after obtaining satisfactory results from the testing of new principles of activity in a narrow target audience is the project scaled to a wider range of potential users. All this reduces the risks associated with digital transformation. Eliminating unnecessary changes and false hypotheses on a small scale allows to avoid overspending at the stage of a comprehensive transformation of the entire enterprise. Therefore, the process of effective digital transformation should begin with the involvement of experienced leaders in the field of digital transformation, analysis of the weaknesses of the organization, and building of a plan for its comprehensive transformation, which is divided into individual projects implemented by individual qualified teams with a gradual increase in the volume of these projects, while confirming their effectiveness on a small scale. The process of digital transformation should be accompanied by constant training of employees in digital skills. The goal of digital transformation is to build an efficient, high-profile company that can quickly adapt to new environmental conditions, which is achieved through the introduction of digital technologies and new methods and tools of organization management. DIRECTIONS OF LIBRARY DEVELOPMENT IN THE DIGITAL ECONOMY Based on the study of the digital economy concept and the peculiarities of digital transformation, the review of library development in the digital economy was conducted to find the library’s place in digital infrastructure and potential projects that can be implemented on a separate library as part of its comprehensive transformation plan. The main task is to determine the new role of the library in the digital economy and the areas that best meet it. The search for directions in the development of the library in response to the spread of digital technology began at the end of the last century. One of the first concepts to reflect the impact of the internet on the library sector is the concept of the digital library, published in 1999.24 In 2006, the concept of “library 2.0” emerged, which is based on the use of WEB 2.0 technologies, dynamic sites, users become data authors, open-source software, API interfaces, data added to one database is immediately fed to partner databases.25 The spread of the use of social networks, mobile technologies, and their successful use in library practice has led to the formation of the concept of “library 3.0.”26 The development of Open Source, Cloud Service, Big Data, Augmented Reality, Context-Aware, and other technologies has influenced library activities, which is reflected in the “library 4.0.”27 Researchers, scholars, and the professional community continued to develop the concepts of the modern library, drawing on the experience of implementing changes in library activities and taking into account the development of other areas, and in 2020 articles began to appear which described the concept of “library 5.0,” based on a personalized approach to students, INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 6 support of each student during the whole period of study, development of skills necessary for learning and a set of other supporting actions integrated into the educational process.28 In determining the current role of the library in the digital economy, it is necessary to pay attention to a study by Denis Solovianenko, who in identifies research and educational infrastructure as one of the key elements of scientific libraries of the twenty-first century.29 Olga Stepanenko considers libraries as part of the information and communication infrastructure, the development of which is one of the main tasks of the transformation of the socioeconomic environment in accordance with the needs of the digital economy, which ensures high efficiency of stakeholders the pace of digitalization of the state economy, which occurs through the development of its constituent elements.30 The importance of traditional library services replacing digital infrastructure, based on the example of the Moravian Library, is proved in a study by Michal Indrak and Lenka Pokorna, published in April 2020.31 Projects that contribute to the library’s adaptation to the conditions of the digital economy, implemented in the environment of public libraries, include: digitization of library collections (including historical heritage) and the creation of a database of full-text documents; providing free access to the Internet via library computers and Wi-Fi; organization of online customer service, development of services that do not require a physical presence in the library; organization of events for the development of digital skills of users, work with information.32 Under such conditions, the role of the librarian as a specialist in the field of information changes from being a custodian to being an intermediary, a distributor.33 One of the main objectives of library activity in the digital economy becomes overcoming a digital divide, dissemination of knowledge about modern technologies and innovations, the assistance of their use by the community, development of digital skills in all users of the library.34 An example of the digital public library is the Digital North Library project in Canada, which resulted in the creation of the Inuvialuit Digital Library (https://inuvialuitdigitallibrary.ca). The project lasted four years, bringing together researchers from different universities and the community in the region, who together digitized cultural heritage documents and created metadata. The library now has more than 5,200 digital resources collected in 49 catalogues. The implementation of this project provides access to library services and information to a significant number of people living in remote areas of Northern Canada and unable to visit libraries (https://sites.google.com/ualberta.ca/dln/home?authuser=0, https://inuvialuitdigitallibrary.ca).35 Other representatives of modern digital libraries, one of the main tasks of which is the preservation of cultural heritage and the spread of national culture, are the British Library (https://www.bl.uk), the Hispanic Digital Library—Biblioteca Nacional de España (http://www.bne.es), Gallica Digital Library in France (https://gallica.bnf.fr), the German Digital Library—Deutsche Digitale Bibliothek (https://www.deutsche-digitale-bibliothek.de), and the European Library (https://www.europeana.eu). Another direction was the development of analytical skills in information retrieval. Academic libraries, operating with their competencies in information retrieval and information technology, which refined the results of the analysis were able to better identify trends in academia and expand cooperation with teachers to update their curricula.36 Libraries become active participants https://inuvialuitdigitallibrary.ca/ https://sites.google.com/ualberta.ca/dln/home?authuser=0 https://inuvialuitdigitallibrary.ca/ https://www.bl.uk/ http://www.bne.es/ https://gallica.bnf.fr/ https://www.deutsche-digitale-bibliothek.de/ https://www.europeana.eu/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 7 in the process of teaching, learning, and assessment of acquired knowledge in educational institutions. T. O. Kolesnikova, in her research of models of library development, substantiates the expediency of creating information intelligence centers for the implementation of the latest scientific advances in training and production processes, the involvement of libraries in the activities of higher educational establishments in the educational process, and the creation of centralized repositories as directions of development for university libraries of Ukraine.37 One of the advantages of the development and dissemination of digital technologies is the possibility of forming individual curricula for students. Involvement of university libraries in this area is one of the new areas of their activities in the digital economy.38 One of the important areas of operation for departmental and scientific-technical libraries that contribute to increasing the innovative potential of the country is activity in the area of intellectual property. Consulting services in the field of intellectual property, information support for scientists, creation of electronic patent information databases in the public domain , and other related services are important components of libraries in many countries. Consulting services in the field of intellectual property, information support for scientists, creation of electronic patent- information databases in the public domain and other related services are important components of libraries in many countries.39 Another important component of libraries’ transformation is the deepening of their role in scientific communication; expanding the boundaries of the use of information technology in order to integrate scientific information into a single network; creation and management of information technology infrastructure of science.40 The presence of libraries on social networks has become an important component of their digital transformation. On the one hand, libraries have thus created another source of information dissemination and expanded the number of service delivery channels, for the implementation of which they have developed online training videos and interactive help services.41 On the other hand, social networks have become a marketing tool to engage the audience in the digital fund of the library and its online services. An additional important component of the presence of libraries in social networks was the establishment of contacts and exchange of ideas with other professional organizations, which contributed to the further expansion of the network of library partners.42 Another area of activity that libraries take on in the digital economy is the management of research data, which is confirmed by the significant number of publications on this topic in professional scientific and research journals for 2017–18.43 Joining this area allows libraries to become part of the scientific digital information and communication infrastructure, the creation of which is one of the main tasks of digital transformation on the way to the digital economy.44 The development of this area contributes to the digitalization of scientific and information sphere, systematization and structuring of all scientific research data has a positive effect on the effectiveness of research, the level of scientific novelty of the results of intellectual activity. The Ukrainian Institute of the Future with the Digital Agency of Ukraine consider digital transformation as the integration of modern digital technologies into all spheres of business. The introduction of modern technologies (Artificial Intelligence, Blockchain, Koboty, Digital Twins, IIoT Platforms and others) in the production process will lead to the transition to Industry 4.0. According to their forecasts, the key competence in Industry 4.0 should be data processing and INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 8 analytics.45 Research information is an integral part of this competence, so the development of this area is one of the most promising for the library in the digital economy. The tools used in the management of research data are called Current Research Information Systems, abbreviated as CRIS. In Ukraine, there is no such system connected to the international community. 46 The change of the library’s role from a repository to its manager, the alignment of the functions and tasks of a CRIS with the key requirements of the digital economy, and the advantages of such systems, together with the fact that they are still not used in Ukraine, make this area extremely relevant for research and a promising area of work of scientific libraries, so we’ll consider it more thoroughly. PROBLEMS IN RESEARCH DATA MANAGEMENT The global experience of research information management shows several problems in the process of research data management. Some of them are related to the processes of workflow organization, control, and reporting. This is due to the use of several poorly coordinated systems to organize the work of scientists. Data sets from different systems without metadata are very difficult to combine into a single system, and it is almost impossible to automate the process. All this is manifested in the lack of information security of the decision-making process in the field of science, both at the state level and at the level of individual structures. This situation can lead to wrong management decisions and can lead to overspending on similar, duplicate projects; increasing the cost of the process of recruiting and finding scientists with relevant experience for research, finding the equipment needed for research. CRIS, which began to appear in Europe in the 1990s, are designed to overcome these shortcomings and promote the effective organization of scientific work. Such systems are now widespread throughout the world, with a total of about five hundred, which are mainly concentrated in Europe and India. However, there is currently no research information management system in Ukraine that meets international standards and integrates with international scientific databases. This omission slows down Ukraine’s integration into the international scientific community. The solution to this problem may be the creation of the National Electronic Scientific Information System URIS (Ukrainian Research Information System).47 The development of this system is an initiative of the Ministry of Education and Science of Ukraine. It is based on combining data from Ukrainian scientific institutions with data from CrossRef and other organizations, as well as ensuring integration with other international CRIS systems through the use of the CERIF standard. Future developers of the system face a number of challenges, both specific and already studied by foreign scientists. A significant number of studies in this area are designed to over come the problem of lack of access to research data, as well as to solve problems of data standardization and openness. In the global experience, the directions of collection processes management and development of structured data sets, their distribution on a commercial basis, and also ways of receiving the advantage of providing them in open access are investigated. The mechanisms of financing these processes are studied, in particular, the effective ways of attracting patronage funds are analyzed. The possibilities of licensing the received data sets and their distribution, approaches and tools that can be the most effective for the library are determined. In particular, Alice Wise describes INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 9 the experience of settling some legal aspects by clarifying the use of the site in the license agreement, which covers the conditions of access to information and search in it, while maintaining a certain level of anonymity.48 The problem of data consistency is related to the lack of uniform standards for information retention, which would relate to the format of the data, the metadata itself, the methods of their generation and use. Thus, the use of different standards and formats in repositories and archives leads to problems with data consistency in researchers, which, in turn, affects the quality of service delivery and makes it impossible to use multiple data sets.49 Another important problem for the dissemination of research data is the lack of tools, components in libraries, and repositories of higher educational establishments and scientific institutions. It is worth to develop the infrastructure so that at the end of the projects, in addition to the research results, the scientists publish the research data they used and generated. This approach will be convenient both for authors (in case they need to reuse the research data) and for other scientists (because they will have access to data that can be used in their own research).50 The development of the necessary tools is quite relevant, especially because researcher-practitioners are in favor of sharing the data they create with other researchers and the licensed use of other people’s datasets in conducting their own research, according to international surveys.51 Another reason for the low prevalence of research data is that datasets have less of an impact on a researcher’s reputation and rating than publications.52 This is partly due to the lack of citation tracking infrastructure in datasets, in contrast to the publication of research results, and the lack of standards for storing and publishing data. Prestigious scientific journals have been struggling with this problem for several years. For example, the American Economic Review requires authors whose articles contain empirical work, modelling, or experimental work to provide information about research data in a volume enough for replication.53 Nature and Science require authors to preserve research data and provide them at the request of the editors of the journals.54 One of the reasons for the underdeveloped infrastructure in research data management is the weak policy of disseminating free access to this data, as a result of which even a small part of usable scientific data remains closed by license agreements and cannot be used by other scientists.55 Open science initiatives related to publications have been operating in the scientific field for a long time, but their dissemination to research data remains insufficient. The development of the URIS system will provide management of scientific information, will solve problems highlighted in the above scientific works of researchers; will promote the efficient use of funds, will simplify the process of finding data for conducting research; will discipline research , and therefore will have a positive impact on the entire economy of Ukraine. LIBRARY AND RESEARCH INFORMATION MANAGEMENT Library involvement in the development process for scientific information management systems will be an important future direction of their work. Such systems, which could include all the necessary information about scientific research, will contribute to the renewal and development of the library sphere of Ukraine, will promote the transition of the state to a digital economy. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 10 The creation of the URIS system is designed to provide access to research data generated by both Ukrainian and foreign scientists. Such a system can ensure the development of cooperation in the field of research, intensification of knowledge exchange, and interaction through the open exchange of scientific data and integration of Ukrainian scientific infrastructure into the world scientific and information space. According to surveys conducted by the international organizations EuroCRIS and OCLC, of the 172 respondents working in the field of research information management, 83 percent said that libraries play an important role in the development of open science, copyright, and the deposit of research results. The share of libraries that play a major role in this direction was 90 percent. Almost 68 percent of respondents noted the significant contribution of libraries in filling in the metadata needed to correctly identify the work of researchers in various databases; 60 percent noted the important role of libraries in verifying the correctness of metadata filling by researchers, and almost 49 percent of respondents assess the role of libraries as the main one in the management of research data (figure 4). Figure 4. The proportion of organizations among 172 users of CRIS-systems that assess the role of libraries in the management of research information as basic or supporting.56 At the same time, the activity of libraries in the direction of assistance in information management of scientific research can take various forms, which should be adopted by scientific libraries of Ukraine; some of these forms will be useful to public libraries that can become science ambassadors in their communities. Based on the experience of foreign libraries, we have identified areas of activity in which the library can join the management of research information. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Financial support for RIM Project management Maintaining or servicing technical operations Impact assessment and reporting Strategic development, management and planning Creating internal reports for departments System configuration Outreach and communication Initiating RIM adaption Research data management Metadata validation workflows Metadata entry Training and support Open access, copyright and deposit INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 11 One of the main directions for libraries that cooperate with CRIS users or are themselves the organizers of such systems is the introduction and support of open science. Historically, libraries support open science because they provide access to scientific papers, but they can further expand their activities. Using open data resources and promoting them among the scientific community, involving scientific users in disseminating their own research results on the principles of open science, supporting users in disseminating their publications, creating conditions for increasing the citation of scientific papers, tracking information about user publications, creating and support of public profiles of scientists in scientific and professional resources and scientific social networks—all this will help to intensify researchers in engaging in open science and take advantages of this area. The analysis of the world experience shows that in the activity of scientific libraries there is a significant intensification of support for the strategic goals of the structures that finance their activities and to which they are subordinated. Libraries are moving away from the usual customer service and expanding their activities through the use of their own assets and the introduction of new modern tools. Such libraries try to promote the development of parent structures, increase modern competencies to meet the needs and goals of these institutions better. By introducing and implementing various tools for the development of management, libraries synchronize their strategy with the strategy of the parent structure to achieve a synergistic effect. The next important direction of library development is their socialization. Wanting to get rid of the antiquated understanding of the word library, many of them conduct campaigns aimed at changing the image of the library in the imagination of users, communities, and society. An important component of this system step is to build relationships with the target audience, creating user communities around the library, which are not only its users but also supporters, friends, and promoters. Building relationships with members of the scientific community allows libraries to reduce resistance to change as a result of the introduction of scientific information management systems; to influence users positively so that they introduce new tools into their usual activities, receive benefits, and become an active part of the scientific space structuring process. Recently, work with metadata has undergone some changes. The need for identification and structuring of data in the world scientific space leads to the fact that they are already filled not only by libraries but also by other organizations that produce, issued, publish scientific results and scientific literature. Scientists are beginning to make more active use of modern standards in the field of information in order to promote their own work. Libraries, in turn, take on the role of consultant or contractor with many years of experience working with metadata and sufficient knowledge in this area. On the other hand, filling in metadata by users frees up the time of librarians and creates conditions for them to perform other functions, such as information management, creation of automated data collection and management systems integrated with scientific databases—both Ukrainian and international. Another area of research information management is the direct management of this process. Thus, CRIS are developed and implemented with the contribution of scientific libraries in different countries of the world. This allows libraries to combine disparate data obtained from different sources, compile scientific reports, evaluate the effectiveness of scientific activities of the institution, create profiles of scientific institutions and scientists, develop research network s, etc. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 12 Scientists and students can find the results of scientific research, and look for partners and sources of funding for research. Research managers have access to up-to-date scientific information, which allows to more accurately assess the productivity and influence of individual scientists, research groups and institutions. Business representatives get access to up-to-date information on promising scientific developments, and the public—a way to control research conducting effectively. CONCLUSIONS Ukraine is on the path to a digital economy, characterized by the penetration of new technologies in all areas of human activity, simplification of access to information, goods and services, blurring the geographical boundaries of companies, increasing the share of automated and robotic production units, strengthening the role of creation and use databases. These changes affect all sectors of the economy, and all organizations, without exception, need to adapt accordingly. Rapid response to relevant changes helps to increase competitiveness both at the level of individual organizations and at the level of the state economy. Adaptation to the conditions of the digital economy occurs through digital transformation—a complex process that requires a review of all business processes of the organization and radically changes its business model. The digital transformation of the organization takes place through the involvement of management, which is competent in digitization, updating management methods, developing digital skills, establishing efficient production and services, implementing digital to ols and building digital communication, implementing individual development projects, and adapting to new user needs. The digital transformation of the economy occurs through the transformation of its individual sectors, creating conditions for the transformation of their representatives. One of the first steps in the process of transition to the digital economy is the establishment of digital information and communication infrastructure. Libraries are representatives of the information sphere, which were the main operators of information in the analogue era. Significant changes in the subject area of their activities require the search for a new role for libraries. Modern projects and directions of library development are integral elements of transformation to the conditions of the digital economy. The result of completing this complex implementation will allow libraries to update their management methods, the range of services, and the channels of their provision; change fixed assets through their digitization, structuring the data and creating metadata; affect approaches to communication with users and cooperation with both domestic and international partners; change the functions and positioning of the library; and will enable them become effective information operator-managers. In the digital economy, the role of the library is changing from passively collecting and storing information to actively managing it. One of the areas of development that most comprehensively meets this role is the management of research data, which is implemented through the creation of CRIS systems. Thus, the main asset of libraries is a digital, structured database, which is automatically and regularly updated, the main focus of which is to support the decision-making process. The library becomes an assistant in conducting research, finding funding, partners, fixed assets and information; a partner in the strategic management of both scientific organizations and the state at the level of committees and ministries. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 13 The development of this area in Ukraine requires solving a number of technical, administrative, and managerial questions that are relevant not only in Ukraine, but also around the world. In particular, libraries need to address the issue of data integration and consistency, its accessibility and openness, copyright, and personal data issues. Solving the problems of creation and operation of CRIS systems in Ukraine are promising areas for future research. ENDNOTES 1 Andriy Dobrynin, Konstantin Chernykh, Vasyl Kupriyanovsky, Pavlo Kupriyanovsky and Serhiy Sinyagov, “Tsifrovaya ekonomika—razlichnyie puti k effektivnomu primeneniyu tehnologiy (BIM, PLM, CAD, IOT, Smart City, BIG DATA i drugie),” International Journal of Open Information Technologies 4, no. 1 (2016): 4–10, https://cyberleninka.ru/article/n/tsifrovaya- ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart- city-big-data-i-drugie. 2 Jurgen Meffert, Volodymyr Kulagin, and Alexander Suharevskiy, Digital @ Scale: nastolnaya kniga po tsifrovizatsii biznesa (Moscow: Alpina, 2019). 3 Victoria Apalkova, “Kontseptsiia rozvytku tsyfrovoi ekonomiky v Yevrosoiuzi ta perspektyvy Ukrainy,” Visnyk Dnipropetrovskoho universytetu. Seriia «Menedzhment innovatsii» 23, no. 4 (2015): 9–18, http://nbuv.gov.ua/UJRN/vdumi_2015_23_4_4. 4 Don Tapscott, The Digital Economy: Promise and Peril in the Age of Networked Intelligence (New York: McGraw-Hill, 1996). 5 Thomas L. Mesenbourg, Measuring the Digital Economy (Washington, DC: Bureau of the Census, 2001). 6 Philippe Barbet and Nathalie Coutinet, “Measuring the Digital Economy: State-of-the-Art Developments and Future Prospects,” Communications & Strategies, no. 42 (2001): 153, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.576.1856&rep=rep1&type=pdf . 7 Alnoor Bhimani, “Digitization and Accounting Change,” in Management Accounting in the Digital Economy, edited by Alnoor Bhimani, 1-12 (London: Oxford University Press, 2003), https://doi.org/10.1093/0199260389.003.0001. 8 Bo Carlsson, “The Digital Economy: What is M=New and What is Not?,” Structural Change and Economic Dynamics 15, no. 3 (September 2004): 245–64, https://doi.org/10.1016/j.strueco.2004.02.001. 9 John Hand, “Building Digital Economy—The Research Councils Programme and the Vision,” Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 16, (2009): 3, https://doi.org/10.1007/978-3-642-11284-3_1. 10 Carmen Nadia Ciocoiu, “Integration Digital Economy and Green Economy: Opportunities for Sustainable Development,” Theoretical and Empirical Researches in Urban Management 6, no. 1 (2011): 33–43, https://www.researchgate.net/publication/227346561. https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie http://nbuv.gov.ua/UJRN/vdumi_2015_23_4_4 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.576.1856&rep=rep1&type=pdf https://doi.org/10.1093/0199260389.003.0001 https://doi.org/10.1016/j.strueco.2004.02.001 https://doi.org/10.1007/978-3-642-11284-3_1 https://www.researchgate.net/publication/227346561 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 14 11 Lesya Zenoviivna Kit, “Evoliutsiia Merezhevoi Ekonomiky,” Visnyk Khmelnytskoho Natsionalnoho Universytetu, Ekonomichni nauky, no. 3 (2014): 187–94, http://nbuv.gov.ua/UJRN/Vchnu_ekon_2014_3%282%29__42. 12 Mykhailo Voinarenko and Larissa Skorobohata, “Merezhevi Instrumenty Kapitalizatsii Informatsiino-intelektualnoho Potentsialu ta Innovatsii,” Visnyk Khmelnytskoho Natsionalnoho Universytetu, . Ekonomichni nauky, no. 3 (2015): 18–24, http://elar.khnu.km.ua/jspui/handle/123456789/4259. 13 Yurii Pivovarov, “Ukraina Perehodut na “Cifrovu Economic,” Sccho ce oznachae,” edited by Miroslav Liskovuch. Ukrinform (January 21, 2020). https://www.ukrinform.ua/rubric- society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html. 14 European Commission, “Digital Economy and Society Index,” Brussels, Belgium, https://ec.europa.eu/commission/news/digital-economy-and-society-index-2019-jun-11_en. 15 Kabinet Ministriv Ukrainu, “Pro Skhvalennia Kontseptsii Rozvytku Tsyfrovoi Ekonomiky ta Suspilstva Ukrainy na 2018–2020 Roky ta Zatverdzhennia Planu Zakhodiv Shchodo yii Realizatsii,” (Kyiv: 2018), https://zakon.rada.gov.ua/laws/show/67-2018-%D1%80. 16 Kabinet Ministriv Ukrainu, “Pytannia Ministerstva Tsyfrovoi Transformatsii,” (Kyiv: 2019), https://zakon.rada.gov.ua/laws/show/856-2019-%D0%BF. 17 Piatuy, “Biblioteky Stanut Pershymy Oflain-khabamy: Mintsyfry Zapustyt Kursy z Tsyfrovoi Osvity,” https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry- zapustyt-kursy-z-tsyfrovoi-osvity-206206.html. 18 Jacques Bughin, Jonathan Deaki, and Barbara O’Beirne, “Digital Transformation: Improving the Odds of Success,” McKinsey & Company, https://www.mckinsey.com/business- functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of- success. 19 Domynyk Fyld, Shylpa Patel, and Henry Leon, “Kak Dostich Tsifrovoy Zrelosti,” The Boston Consulting Group Inc. (2018), https://www.thinkwithgoogle.com/_qs/documents/5685/ru_AdWords_Marketing___Sales_89 1609_Mastering_Digital_Marketing_Maturity.pdf. 20 Hortense de la Boutetière, Alberto Montagner, and Angelika Reich, “Unlocking Success in Digital Transformations,” McKinsey & Company, https://www.mckinsey.com/business- functions/organization/our-insights/unlocking-success-in-digital-transformations. 21 Top Lea, “Tsyfrova Transformatsiia Biznesu: Navishcho vona Potribna i Shche 14 Pytan,” BusinessViews, https://businessviews.com.ua/ru/business/id/cifrova-transformacija- biznesu-navischo-vona-potribna-i-sche-14-pitan-2046. 22 Vasily Kupriyanovsky, Andrey Dobrynin, Sergey Sinyagov, and Dmitry Namiot, “Tselostnaya Model Transformatsii v Tsifrovoy Ekonomike—Kak Stat Tsifrovyimi Liderami,” International Journal of Open Information Technologies 5, no. 1 (2017): 26–33, http://nbuv.gov.ua/UJRN/Vchnu_ekon_2014_3%282%29__42 http://elar.khnu.km.ua/jspui/handle/123456789/4259 https://www.ukrinform.ua/rubric-society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html https://www.ukrinform.ua/rubric-society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html https://ec.europa.eu/commission/news/digital-economy-and-society-index-2019-jun-11_en https://zakon.rada.gov.ua/laws/show/67-2018-%D1%80 https://zakon.rada.gov.ua/laws/show/856-2019-%D0%BF https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry-zapustyt-kursy-z-tsyfrovoi-osvity-206206.html https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry-zapustyt-kursy-z-tsyfrovoi-osvity-206206.html https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.thinkwithgoogle.com/_qs/documents/5685/ru_AdWords_Marketing___Sales_891609_Mastering_Digital_Marketing_Maturity.pdf https://www.thinkwithgoogle.com/_qs/documents/5685/ru_AdWords_Marketing___Sales_891609_Mastering_Digital_Marketing_Maturity.pdf https://www.mckinsey.com/business-functions/organization/our-insights/unlocking-success-in-digital-transformations https://www.mckinsey.com/business-functions/organization/our-insights/unlocking-success-in-digital-transformations https://businessviews.com.ua/ru/business/id/cifrova-transformacija-biznesu-navischo-vona-potribna-i-sche-14-pitan-2046 https://businessviews.com.ua/ru/business/id/cifrova-transformacija-biznesu-navischo-vona-potribna-i-sche-14-pitan-2046 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 15 https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike- kak-stat-tsifrovymi-liderami. 23 Nataliia Kraus, Alexander Holoborodko, and Kateryna Kraus, “Tsyfrova Ekonomika: Trendy ta Perspektyvy Avanhardnoho Kharakteru Rozvytku,” Efektyvna Ekonomika no. 1 (2018): 1–7, http://www.economy.nayka.com.ua/pdf/1_2018/8.pdf. 24 David Bawden and Ian Rowlands, “Digital Libraries: Assumptions and Concepts,” International Journal of Libraries and Information Studies (Libri), no. 49 (1999): 181–91, https://doi.org/10.1515/libr.1999.49.4.181. 25 Jack M. Maness, “Library 2.0: The Next Generation of Web-based Library Services,” LOGOS 13, no. 3 (2006): 139–45, https://doi.org/10.2959/logo.2006.17.3.139. 26 Woody Evans, Building Library 3.0: Issues in Creating a Culture of Participation (Oxford: Chandos Publishing, 2009). 27 Younghee Noh, “Imagining Library 4.0: Creating a Model for Future Libraries,” The Journal of Academic Librarianship 41, no. 6 (November 2015): 786–97, https://doi.org/10.1016/j.acalib.2015.08.020. 28 Helle Guldberg et al., “Library 5.0,” Septentrio Conference Series, UiT The Arctic University of Norway, no. 3 (2020), https://doi.org/10.7557/5.5378. 29 Denys Solovianenko, “Akademichni Biblioteky u Novomu Sotsiotekhnichnomu Vymiri. Chastyna Chetverta. Suchasnyi Riven Dyskursu Akademichnoho Bibliotekoznavstva ta Postup E-nauky,” Bibliotechnyi visnyk no.1 (2011): 8–24, http://journals.uran.ua/bv/article/view/2011.1.02. 30 Olga Petrivna Stepanenko, “Perspektyvni Napriamy Tsyfrovoi Transformatsii v Konteksti Rozbudovy Tsyfrovoi Ekonomiky,” in Modeliuvannia ta informatsiini systemy v ekonomitsi : zb. nauk. pr., edited by V. K. Halitsyn, (Kyiv: KNEU, 2017), 120–31, https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120- 131.pdf?sequence=1&isAllowed=y. 31 Michal Indrák and Lenka Pokorná, “Analysis of Digital Transformation of Services in a Research Library,” Global Knowledge, Memory and Communication (2020), https://doi.org/10.1108/GKMC-09-2019-0118. 32 Irina Sergeevna Koroleva, “Biblioteka—Optimalnaya Model Vzaimodeystviya s Polzovatelyami v Usloviyah Tsifrovoy Ekonomiki,” Informatsionno-bibliotechnyie sistemyi, resursyi i tehnologii no. 1 (2020): 57–64, https://doi.org/10.20913/2618-7515-2020-1-57-64. 33 James Currall and Michael Moss, “We are Archivists, But are We OK?”, Records Management Journal 18, no. 1 (2008): 69–91, https://doi.org/10.1108/09565690810858532. 34 Kirralie Houghton, Marcus Foth and Evonne Miller, “The Local Library across the Digital and Physical City: Opportunities for Economic Development,” Commonwealth Journal of Local Governance no. 15 (2014): 39–60, https://doi.org/10.5130/cjlg.v0i0.4062. https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike-kak-stat-tsifrovymi-liderami https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike-kak-stat-tsifrovymi-liderami http://www.economy.nayka.com.ua/pdf/1_2018/8.pdf https://doi.org/10.1515/libr.1999.49.4.181 https://doi.org/10.2959/logo.2006.17.3.139 https://doi.org/10.1016/j.acalib.2015.08.020 https://doi.org/10.7557/5.5378 http://journals.uran.ua/bv/article/view/2011.1.02 https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120-131.pdf?sequence=1&isAllowed=y https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120-131.pdf?sequence=1&isAllowed=y https://doi.org/10.1108/GKMC-09-2019-0118 https://doi.org/10.20913/2618-7515-2020-1-57-64 https://doi.org/10.1108/09565690810858532 https://doi.org/10.5130/cjlg.v0i0.4062 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 16 35 Sharon Farnel and Ali Shiri, “Community-Driven Knowledge Organization for Cultural Heritage Digital Libraries: The Case of the Inuvialuit Settlement Region,” Advances in Classification Research Online no. 1 (2019): 9–12, https://doi.org/10.7152/acro.v29i1.15453. 36 Elizabeth Tait, Konstantina Martzoukou, and Peter Reid, “Libraries for the Future: The Role of IT Utilities in the Transformation of Academic Libraries,” Palgrave Communications no. 2 (2016): 1–9, https://doi.org/10.1057/palcomms.2016.70. 37 Tatiana Alexandrovna Kolesnykova, “Suchasna Biblioteka VNZ: Modeli Rozvytku v Umovakh Informatyzatsii,” Bibliotekoznavstvo. Dokumentoznavstvo. Informolohiia no. 4 (2009): 57–62, http://nbuv.gov.ua/UJRN/bdi_2009_4_10. 38 Ekaterina Kudrina and Karina Ivina, “Digital Environment as a New Challenge for the University Library,”Bulletin of Kemerovo State University. Series: humanities and social sciences 2, no. 10 (2019): 126–34, https://doi.org/10.21603/2542-1840-2019-3-2-126-134. 39 Anna Kochetkova, “Tsyfrovi Biblioteky yak Oznaka XXI Stolittia,” Svitohliad no. 6 (2009): 68–73, https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68- kochetkova.pdf. 40 Victoria Alexandrovna Kopanieva, “Naukova Biblioteka: Vid E-katalohu do E-nauky,” Bibliotekoznavstvo. Dokumentoznavstvo. Informolohiia no. 6 (2016): 4–10, http://nbuv.gov.ua/UJRN/bdi_2016_3_3. 41 Christy R. Stevens, “Reference Reviewed and Re-Envisioned: Revamping Librarian and Desk- Centric Services with LibStARs and LibAnswers,” The Journal of Academic Librarianship 39, no. 2 (March 2013): 202–14, https://doi.org/10.1016/j.acalib.2012.11.006. 42 Samuel Kai-Wah Chu and Helen S Du, “Social Networking Tools for Academic Libraries,” Journal of Librarianship and Information Science 45, no. 1 (February 17, 2012): 64–75, https://doi.org/10.1177/0961000611434361. 43 ACRL Research Planning and Review Committee, “2018 Top Trends in Academic Libraries A Review of the Trends and Issues Affecting Academic Libraries in Higher Education,” C&RL News 79, no.6 (2018): 286–300. https://doi.org/10.5860/crln.79.6.286. 44 Currall and Moss, “We are Archivists, but are We OK?”, 69–91, https://doi.org/10.1108/09565690810858532. 45 Valerii Fishchuk et al., “Ukraina 2030E— Kraina z Rozvynutoiu Tsyfrovoiu Ekonomikoiu,” Ukrainskyi instytut maibutnoho, 2018, https://strategy.uifuture.org/kraina-z-rozvinutoyu- cifrovoyu-ekonomikoyu.html. 46 EuroCRIS, “Search the Directory of Research Information System (DRIS),” https://dspacecris.eurocris.org/cris/explore/dris. 47 MON, “MON Zapustylo Novyi Poshukovyi Servis dlia Naukovtsiv—Vin Bezkoshtovnyi ta Bazuietsia na Vidkrytykh Danykh z Usoho Svituю,” https://mon.gov.ua/ua/news/mon- https://doi.org/10.7152/acro.v29i1.15453 https://doi.org/10.1057/palcomms.2016.70 http://nbuv.gov.ua/UJRN/bdi_2009_4_10 https://doi.org/10.21603/2542-1840-2019-3-2-126-134 https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68-kochetkova.pdf https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68-kochetkova.pdf http://nbuv.gov.ua/UJRN/bdi_2016_3_3 https://doi.org/10.1016/j.acalib.2012.11.006 https://doi.org/10.1177/0961000611434361 https://doi.org/10.5860/crln.79.6.286 https://doi.org/10.1108/09565690810858532 https://strategy.uifuture.org/kraina-z-rozvinutoyu-cifrovoyu-ekonomikoyu.html https://strategy.uifuture.org/kraina-z-rozvinutoyu-cifrovoyu-ekonomikoyu.html https://dspacecris.eurocris.org/cris/explore/dris https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 THE ROLE OF THE LIBRARY IN THE DIGITAL ECONOMY | ZHARINOV 17 zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na- vidkritih-danih-z-usogo-svitu. 48 Nancy Herther et al., “Text and Data Mining Contracts: The Issues and Needs,” Proceedings of the Charleston Library Conference, 2016, https://doi.org/10.5703/1288284316233. 49 Karen Hogenboom and Michele Hayslett, “Pioneers in the Wild West: Managing Data Collections.” Portal: Libraries and the Academy 17, no. 2 (2017): 295–319, https://doi.org/10.1353/pla.2017.0018. 50 Philip Young et al., “Library Support for Text and Data Mining,” A Report for the University Libraries at Virginia Tech, 2017, http://bit.ly/2FccOwu. 51 Carol Tenopir et al., “Data Sharing by Scientists: Practices and Perceptions,” PloS One 6 (2011), no. 6, https://doi.org/10.1371/journal.pone.0021101. 52 Filip Kruse and Jesper Boserup Thestrup, “Research Libraries’ New Role in Research Data Management, Current Trends and Visions in Denmark,” Liber Quarterly 23, no.4 (2014): 310– 35, https://doi.org/10.18352/lq.9173. 53 American Economic Review, “Data and Code.” AER Guidelines for Accepted Articles. Instructions for Preparation of Accepted Manuscripts, 2020, https://www.aeaweb.org/journals/aer/submissions/accepted-articles/styleguide#IIC. 54 “Data Access and Retention.” The Publication Ethics and Malpractice Statement, (New York: Marsland Press, 2019), http://www.sciencepub.net/marslandfile/ethics.pdf. 55 Patricia Cleary et al., “Text Mining 101: What You Should Know,” The Serials Librarian 72, no.1-4 (May 2017): 156–59, https://doi.org/10.1080/0361526X.2017.1320876. 56 Rebecca Bryant et al., Practices and Patterns in Research Information Management Findings from a Global Survey (Dublin: OCLC Research, 2018), https://doi.org/10.25333/BGFG-D241. https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu https://doi.org/10.5703/1288284316233 https://doi.org/10.1353/pla.2017.0018 http://bit.ly/2FccOwu https://doi.org/10.1371/journal.pone.0021101 https://doi.org/10.18352/lq.9173 https://www.aeaweb.org/journals/aer/submissions/accepted-articles/styleguide#IIC http://www.sciencepub.net/marslandfile/ethics.pdf https://doi.org/10.1080/0361526X.2017.1320876 https://doi.org/10.25333/BGFG-D241 ABSTRACT INTRODUCTION THE CONCEPT OF THE “DIGITAL ECONOMY” FEATURES OF DIGITAL TRANSFORMATION DIRECTIONS OF LIBRARY DEVELOPMENT IN THE DIGITAL ECONOMY PROBLEMS IN RESEARCH DATA MANAGEMENT LIBRARY AND RESEARCH INFORMATION MANAGEMENT CONCLUSIONS ENDNOTES 12483 ---- Automated Fake News Detection in the Age of Digital Libraries ARTICLE Automated Fake News Detection in the Age of Digital Libraries Uğur Mertoğlu and Burkay Genç INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.12483 Uğur Mertoğlu (umertoglu@hacettepe.edu.tr) is a PhD Candidate, Hacettepe University. Burkay Genç (bgenc@cs.hacettepe.edu.tr) is Assistant Professor, Hacettepe University. © 2020. ABSTRACT The transformation of printed media into the digital environment and the extensive use of social media have changed the concept of media literacy and people’s habits of news consumption. While online news is faster, easier, comparatively cheaper, and offers convenience in terms of people's access to information, it speeds up the dissemination of fake news. Due to the free production and consumption of large amounts of data, fact-checking systems powered by human efforts are not enough to question the credibility of the information provided, or to prevent its rapid dissemination like a virus. Libraries, long known as sources of trusted information, are facing challenges caused by misinformation as mentioned in studies about fake news and libraries.1 Considering that libraries are undergoing digitization processes all over the world and are providing digital media to their users, it is very likely that unverified digital content will be served by world’s libraries. The solution is to develop automated mechanisms that can check the credibility of digital content served in libraries without manual validation. For this purpose, we developed an automated fake news detection system based on Turkish digital news content. Our approach can be modified for any other language if there is labelled training material. This model can be integrated into libraries’ digital systems to label served news content as potentially fake whenever necessary, preventing uncontrolled falsehood dissemination via libraries. INTRODUCTION Collins dictionary which chose the term “fake news” as the “Word of the Year 2017,” describes news as the actual and objective presentation of a current event, information, or situation that is published in newspapers and broadcast on radio, television, or online.2 We are in an era where everything goes online, and news is not an exception. Many people today prefer to read their daily news online, because it is a cost-effective and convenient way to remain up to date. Although this convenience has lucrative benefits for society, it can also have harmful side effects. Having access to news from multiple sources, anytime, anywhere has become an irresistible part of our daily routines. However, some of these sources may provide unverified content which can easily be delivered right to your mobile device. Most importantly, potential fake news content delivered by these sources may mislead society and cause social disturbances such as triggering violence against ethnic minorities and refugees, causing unnecessary fear related to health issues, or even sometimes result in crisis, devastating riots and strikes. Not having a steady definition compared to news, fake news is often defined according to the data used or the limited perspective of the study in the literature. For example; DiFranzo and Gloria- Garcia defined the fake news as “false news stories that are packaged and published as if they were genuine.”3 On the other hand, Guess et al. see the term as “a new form of political misinformation” within the domain of politics, whereas Mustafaraj is more direct and defines it as mailto:umertoglu@hacettepe.edu.tr mailto:bgenc@cs.hacettepe.edu.tr INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 2 “lies presented as news.”4 A comprehensive list of 12 definitions can be found in Egelhofer and Lecheler.5 In simplified terms, news which is created to deceive or mislead readers can be called fake news. However, the concept of fake news is a quite broad one that needs to be specified meticulously. Fake news is created for many purposes and emerges in many different types. Having an interwoven structure, most of these types are shown in figure 1. Although, it is not easy to cluster these types into separate groups, they can be categorized according to the information quality or based on the intention as it is created to deceive deliberately or not, as Rashkin et al. did.6 We propose the following classification where the two dimensions represent the potential impact and the speed of propagation. Figure 1. The volatile distribution of the fake news types (clustered in four regions: sr, Sr, Sr, SR) with respect to two dimensions: speed of propagation and potential impact. The four regions visualized are clustered according to their dangerousness. First of all, it should be noted that to order types of fake news in a stable precision is quite a challenging task. The variations within the field highly depend on dynamic factors such as timespan, actors, and echo- chamber effect. Hence, this figure should be considered as a clustering effort. There are possible intersecting areas of types within the regions. We will now give examples for two regions, “sr” and “SR.” For example, the SR grouping shows characteristics of high-risk levels and fast dissemination. This includes varieties of fake news such as propaganda, manipulation, misinformation, hate news, INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 3 provocative news, etc. We usually encounter this in the domain of politics. This kind of news may cause critical and nonrecoverable results in politics, the economy, etc., in a short period of time. The rise of the term fake news itself can also be attributed to this kind of news. On the other hand, the relatively less severe group (sr) of fake news, comprising of satire, hoax, click-bait, etc., has low-risk levels and a slow speed of dissemination. A frequently used type of this group, click-bait, is a sensational headline or link that urges the reader to click on a post, link, article, image, or video. These kinds of news have a repetitive style. It can be said that readers become aware of falsehood after experiencing a few times. So, risk level is lower, and dissemination is slower. Vosoughi et al. stated the assumption that “Falsehood diffuses significantly farther, faster, deeper, and more broadly than the truth.”7 So indeed, just one piece of fake news may affect many more people than thousands of true news items do because of the dramatic circulation of fake news. In their recent survey about fake news, Zhou and Zafarani highlighted that fake news is a major concern for many different research disciplines especially information technologies. 8 Being a trusted source of information for a long time, libraries will play an important role in fighting against fake news problem. Kattimani et al. claims that the modern librarian must be equipped with necessary digital skills and tools to handle both printed collections and newly emerging digital resources.9 Similarly, we foresee that digital libraries, which can be defined as collections of digital content licensed and maintained by libraries, can be a part of the solution as an authority service with a collective effort. Connaway et al. point to the key role of information professionals such as librarians, archivists, journalists, and information architects in helping society use the products and services related to news in a convenient way. 10 As libraries all over the world are transitioning into digital content delivery services, they should implement mechanisms to avoid fake and misleading content being disseminated through them under the guidance of information professionals. To lay out proper future directions for the solution strategy, a clear understanding of interaction between library and information science (LIS) community and fake news must be addressed. Sullivan states that the LIS community has been affected deeply in the aftermath of the 2016 US presidential elections.11 Moreover, he quotes many other scientists, emphasizing libraries’ and librarians’ role in the fight against fake news. For example, Finley et al. say that libraries are the direct antithesis of fake news, the American Library Association (ALA) called fake news an anathema to the ethics of librarianship in 2017, Rochlin emphasizes the role of librarians in this fight, and talks about the need to adopt fake news as a central concern in librarianship and many other researchers name librarians in the front lines of the fight against fake news.12 Today, the struggle to detect fake news and prevent their spread is so popular that competitions are being organized (e.g., http://www.fakenewschallenge.org/) and conferences are being held (e.g., Bobcatsss 2020). The struggle against fake news can be classified under three main venues: • Reader awareness • Fact-checking organizations and websites • Automated detection systems The first item requires awareness of individuals against fake news and a collective conscience within the society against spreading fake news. To this end, visual and textual checklists, frameworks, and guidance lists are being published by official organizations, such as IFLA’s13 http://www.fakenewschallenge.org/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 4 (International Federation of Library Associations) infographic which contains eight steps to spot fake news. The RADAR framework and the Currency, Relevance, Authority, Accuracy, and Purpose (CRAAP) test are some of the efforts trying to increase reader-awareness of fake news.14 Unfortunately, due to the nature of fake news and the clever way they are created triggering people’s hunger to spread sensational information, it is very difficult to achieve full control via this strategy. Some studies explicitly showed that humans are prone to get confused when it comes to spotting lies or deciding whether a news item is fake or not.15 Furthermore, people often overlook facts that conflict with their current belief, especially in politics and controversial social issues.16 The second strategy focuses on third-party manually driven systems for checking and labelling content as fake or valid. Recently, we have seen many examples of offline and online organizations trying to work according to this strategy, such as a growing body of fact-checking organizations, start-ups (Storyzy, Factmata, etc.), and other projects with similar purposes.17 Unfortunately, these manually powered systems cannot cope with the huge amounts of digital content being steadily produced. Therefore, they focus only on a subset of digital content that they classify as having higher priority. Even for this subset of content, their reaction speed is much slower than the fake information’s spread speed. Therefore, automated and verified systems emerge as an inevitable last option. The third strategy offers automated fact-checking systems, which once trained, can deliver content labelling at unprecedented speeds. Today, many researchers are researching automated solutions and building models with different methodologies.18 Notwithstanding the latest studies, there is still a lot to do in the realm of automated fake news detection. Automated fact-checking systems will be detailed in the rest of the paper. Thanks to the internet, the collections of digital content served by digital libraries can be accessed by a great number of users without distance and time limits. Therefore, we propose a solution to the problem by positioning digital libraries as automated fact-checking services, which label digital news content as fake or valid as soon as or before it is served through library systems. The main reason we associate this approach with digital libraries is their access to a wide variety of digital content which can be used to train the proposed mathematical models, as well as their role in the society as the publisher of trusted information. To this end, we develop a mathematical model that is trained using existing news content served by digital libraries, and capable of labelling news content as fake or valid with unprecedented accuracy. The proposed solution uses machine learning techniques with an optimized set of extracted features and annotated labels of existing digital news content. Our study mainly contributes (a) a new set of features highly applicable for agglutinative languages, (b) the first hybrid model combining a lexicon/dictionary- based approach with machine learning methods to detect fake news, and (c) a benchmark dataset prepared in Turkish for fake news detection. LITERATURE REVIEW Contemporary studies have indicated that social, economic, and political events in recent years, especially after the 2016 US presidential elections, are increasingly associated with the concept of fake news.19 Since then, fake news has begun to be used as a tool in many domains. On the other hand, researchers motivated by finding automated solutions started to make use of machine learning, deep learning, hybrid models, and other methodologies for their solutions. https://storyzy.com/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 5 Although computational deception detection studies applying NLP (Natural Language Processing) operations are not new, textual deception in the context of text-based news is a new topic for the field of journalism.20 Accordingly, we believe that there is a hidden body language of news text, which has linguistic clues indicating whether the news is fake or not. Thus, lexical, syntactic, semantic, and rhetorical analysis when used with machine learning and deep learning techniques offers encouraging directions. The textual deception spread over a wide spectrum and the studies have utilized many different techniques. There are some prominent studies which took the problem as a binary classification problem utilizing linguistic clues.21 Although it is still early to say the linguistic characteristics of fake news are fully understood, research into fake-news detection in English-language texts is relatively advanced compared to that in other languages. In contrast, agglutinative languages such as Turkish have been little researched when it comes to fake news detection. Agglutinative languages enable the construction of words by adding various morphemes, which means that words that are not practically in use may exist theoretically. For example, “gerek-siz-leş-tir-ebil- ecek-leri-miz-den-dir,” is a theoretically possible word that means “it is one of the things that we will be able to make redundant,” but it is not a practical one. Shu et al. classified the models for the detection of fake news in their study.22 According to this study, the automated approaches can focus on four types of attributes to detect fake news: knowledge based, style based, stance based, or propagation based. Among these, it can be said that the most useful approaches are the ones which focus on the textual news content. Th e textual content can be studied by an automated process to extract features that can be very helpful in classifying content as fake or valid. Many scholars have tried to build models for automatic detection and prediction of fake news using machine learning algorithms, deep learning algorithms, and other techniques. These scholars approach the detection of fake news from many different perspectives and domains. For example, in one of the studies, scientific news and conspiracy news were used.23 In Shu et al.’s study based on credibility of news, the headlines were used to determine whether the article was clickbait or not. In another study, Reis et al. worked on Buzzfeed articles linked to the 2016 US election using machine learning techniques with a supervised learning approach.24 Studies which try to detect satire and sarcasm can be attributed to subcategories of fake news detection.25 Our observation, in line with the general view, is that satire is not always recognizable and can be misunderstood for real news.26 For this reason, we included satirical news in our dataset. It should be noted that although satire or sarcasm can be classified by automated detection systems, experts should still evaluate the results of the classification. While some scholars used specific models focusing on unique characteristics, some others such as Ruchansky et al. proposed hybrid deep models for fake news detection making use of multiple kinds of features such as temporal engagement between users and news articles over time and generated a labelling methodology based on those features.27 In related studies, many features such as automatic extracted features, hand-crafted features, social features, network information, visual features, and some others such as psycholinguistic features, are applied by researchers.28 In this work, we focused on news content features, however the social context features can also be adapted using different tiers such as user activity patterns, INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 6 analysis of user interaction, profile metadata, social network/graph analysis etc. to extract features. We also have some of these features in our data but not having ground truth quantitatively, we avoided using these features. METHODOLOGY In this section, we present our motivation for this work which we visualized in a framework and named Global Library and Information Science (GLIS_1.0). Subsequently, we discuss the construction of the automated detection system as the key element of the GLIS_1.0 framework. We explain the framework, model, dataset, features, and the techniques used in this section. Framework The main structure of the proposed framework is shown in figure 2. This framework consists of highly cohesive but flexible layers. Figure 2. The GLIS_1.0 framework main structure. In the presentation layer one can find the different sources of news that are publicly available. These sources can be accessed directly using their websites or can be searched for via search engines. The news is received by fact-checking organizations which classify them manually, digital libraries which archives and serves them, and automated detection systems (ADS) which classify them automatically. Digital libraries work together with fact-checking organizations and ADSs to present clean and valid news to the public. Moreover, search engines use digital libraries systems to label their results as fake or valid. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 7 Fact-checking organizations should also benefit from the output of ADSs, as instead of manually checking heaps of news content, they could now focus on news labeled as potentially fake by an ADS. Through GLIS, ADSs make the life of fact-checking organizations and digital libraries much easier, all the while increasing the quality of news served to the public. Considering this is a high-level overview of a structure given in figure 2, there may be many other components, mechanisms, or layers, but the key elements of this structure are automated detection systems and the digital libraries. A critical approach to this framework can be why we need such an authority mechanism. The answer will be quite simple, technological progress is not the only solution. On the contrary, tech giants have already been subject to regulatory scrutiny for how they handle personal information.29 Also, their policy related to political ads has been questioned. Furthermore, they are often blamed for failing to fight fake news. Indeed, there is an urgent need for a global action more than ever. Digital libraries are much more than a technological advancement. Hence, they should be considered as institutions or services which can be a great authority service to provide news to society since the printed media disappears day by day. The threats caused by fake news are real and dangerous, but only recently have researchers from different disciplines been trying to find possible solutions such as educational, technological, regulatory, or political. Digital librarianship can be the intersection of all these solutions for promoting information/media literacy. Hence, digital librarianship will make use of many automated detection systems (ADS) to serve qualified news. In the following section, we discuss ADS in detail. Model An overview of our model of automated detection system solution which is very critical for the framework is shown in figure 3. Our fake news detection model consists of two phases. First is the Language Model/Lexicon Generation and the second is Machine Learning Integration. In this work, we used machine learning algorithms via supervised learning techniques which learn from labeled news data (training) and helps us to predict outcomes for unforeseen news data (test). Dataset We collected our data from three sources: • The primary source is the GDELT (Global Database of Events, Language and Tone) Project (https://www.gdeltproject.org/), a massive global news media archive offering free access to news text metadata for researchers worldwide. It can almost be considered a digital library of news in its own right. However, GDELT does not provide the actual news text and only serves processed metadata along with the URL of the news item. GDELT normally does not check for the validity of any news items. However, we have only used news from approved news agencies and completely ignored news from local and lesser-known sources to maximize the validity of the news we have automatically obtained through GDELT. Moreover, we have post-processed the obtained texts by cross-validating with teyit.org data to clean any potential fake news obtained through GDELT links. https://www.gdeltproject.org/ INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 8 Figure 3. Integrated fake news detection model with main phases combining language-model based approach with machine learning approach. • The second source is teyit.org which is a fact-checking organization based in Turkey, compliant to the principles of IFCN (International Fact-Checking Network) aiming to prevent spreading of false information through online channels. Manually analyzing each INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 9 news item, they tag them as fake, true, or uncertain. We used their results to automatically download and label each news text. • Lastly, our team collected manually curated and verified fake and valid news obtained from various online sources and named it as MVN (Manually Verified News). This set includes fake and valid news that we have manually accumulated in time during our studies and that were not overlapping with the news obtained from GDELT and teyit.org sources. We named our dataset TRFN. In Phase 2, the data is very similar to the one we used in Phase 1. However, to see the effectiveness of model, we made modifications to exclude old news before 2017 and added new items from 2019. The news in our dataset span a time frame between 2017– 2019 and are uniformly distributed. Table 1 outlines the dataset statistics, namely where the news text comes from, its class (fake or valid), the amount of distinct texts and the corresponding data collection method. It can be seen from the table that most of our valid news come from the GDELT source, whereas teyit.org, a fact-checking organization, contributes only fake news. Table 1. TRFN Dataset Summary after cleaning and duplicate removal. Dataset Class Size of Processed Data Collection Method GDELT NON-FAKE 82708 Automated Teyit.org FAKE 1026 MVN NON-FAKE 1049 Manual FAKE 400 All news items were processed through Zemberek (http://code.google.com/p/zemberek), the Turkish NLP engine for extracting different morphological properties of words within texts. After this processing phase, all obtained features were converted into tabular format and made available for future studies. This dataset is now available for scholarly studies upon request. In a study of this nature, the verifiability of the data used is important. As we have already mentioned, most of the data we used comes from verified sources such as mainstream news agencies accessed through GDELT and teyit.org archives which are verified by teyit.org staff. All data used in training the mathematical models which are to be explained in the rest of the paper are either directly or indirectly verified. Another important issue was generalizability of the dataset, which determines whether the results of the study are only applicable to specific domains or to all available domains. Although focusing on a specific news domain would clearly improve our accuracies, we preferred to work in the general domain and included news from all specific domains. The distribution of domains in our dataset is visualized in figure 4. This distribution closely matches the distribution one would experience reading daily news in Turkey. Hence, we have no domain specific bias in our training dataset. http://code.google.com/p/zemberek INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 10 Figure 4. The distribution of domains in the dataset. (SciTechEnvWetNatLife = Science, Technology, Environment, Weather, Nature, Life. EduCultureArtTourism = Education, Culture, Art, Tourism.) Moreover, we obtained highly correlated evidence showing syntactic similarities with the other NLP studies in Turkish during the exploratory data analysis. For example, the results of a study by Zemberek developers (http://zembereknlp.blogspot.com/2006/11/kelime-istatistikleri.html) to find the most common words in Turkish experimented with over five million words is compatible with most common words in our corpus. This evidence can be attributed to representability of our dataset. The last issue worth discussing is the imbalanced nature of the dataset. An imbalanced dataset occurs in a binary classification study when the frequency of one of the classes dominates the frequency of the other class in the dataset. In our dataset, the amount of fake news is highly surpassed by the amount of valid news. This generally results in difficulties in applying conventional machine learning methods to the dataset. However, it is a frequently observed phenomenon due to the disparity of variable classes in these kinds of problems in real world. To avoid potential problems due to the imbalanced nature of the dataset, we used SMOTE (Synthetic http://zembereknlp.blogspot.com/2006/11/kelime-istatistikleri.html INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 11 Minority Over-sampling Technique) which is an over-sampling method.30 It creates synthetic samples of the minority class that are relatively close in the feature space to the existing observations of the minority class. Features In this study, we discarded some features because of their relatively low impact on overall performance during the exploratory data analysis and subsequently in the training phase. The most effective features we decided on are shown in table 2. Table 2. Main Features Features Group Definition nRootScore Language Model Features The news score calculated according to the Root Model nRawScore The news score calculated according to the Raw Model SpellErrorScore Extracted Features Spell errors per sentences ComplexityScore The score of the complexity/readibility of the news Source Labels The URL or identifier of the news MainCategory The category of the news NewsSite The unique address of the news The language model features nRootScore and nRawScore are features that we have borrowed from our earlier study on fake news detection.31 In that study, we focused on constructing a fake news dictionary/lexicon based on different morphological segments of the words used in news texts. These two scores were found to be the most successful ones in determining the fakeness/validity of a news text, one considering the raw form of the words, the other considering the root form. The extracted features are ComplexityScore and SpellErrorScore. ComplexityScore basically represents the readability of the text. Studies for determining a good readability metric exist for the Turkish language.32 We used a modified version of the Gunnig-Fog metric, which is based on word length and sentence length.33 Since Turkish is an agglutinative language, we used word length instead of using the syllable count. We also made some modifications to normalize the scores. The average number of syllables per word syllable in Turkish is 2.6, so we defined a word as a long word if it has more than 9 letters.34 For a given news text T, the Complexity Score (CS) can be computed by equation 1. (1) 𝑇𝐶𝑆 = ( 𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠𝑐𝑜𝑢𝑛𝑡 + 𝐿𝑜𝑛𝑔𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡∗100 𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡 10 ) The second Extracted Feature is SpellErrorScore. We foresee that there may be many more errors in fake news than in valid news. We calculated the spell error counts making use of Turkish Spellchecker class of Zemberek. Due to the text length of news varies, we calculate the ratio INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 12 according to the sentences. For a given news text T, SE (Spell Error Score) is calculated as shown in equation 2. (2) 𝑇𝑆𝐸 = ( 𝑆𝑝𝑒𝑙𝑙𝐸𝑟𝑟𝑜𝑟𝐶𝑜𝑢𝑛𝑡 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠𝐶𝑜𝑢𝑛𝑡 ) Finally, we included the metadata categories Source, MainCategory, and NewsSite as additional identifiers for the learning process. Then, we combined features extracted from text representation techniques with the features shown in table 2 and trained the model with different classifiers. For text representation, we followed two directions for the experiments. First, we converted text into structured features with Bag of Words (BOW) approach in which text data is represented as the multiset of its words. Second, we experimented with N-grams which represents the sequence of n words, in other words splitting text into chunks of size N-words. In the (BOW) model, documents in TRFN are represented as a collection of words, ignoring grammar and even word order, but preserving multiplicity. In a classic BOW approach, each document can be represented as a fixed-length vector with length equal to the vocabulary size. This means each dimension of this vector corresponds to the occurrence of a word in a news item. We customized the generic approach by reducing variable-length documents to fixed-length vectors to be able to use with varying lengths with many machine learning models. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 13 Figure 5. An overview of BOW (Bag of Word) Approach. Because we ignore the word order, we reduced fixed length of counts as histograms as seen in figure 5. Assuming N is the number of news documents and W is the number of possible words in the corpus, it should be noted that in N*W count matrix, N is generally large but infrequent, because we have many news documents, but most words do not occur in any given document causing rareness of a term/word which is a drawback for the approach. Therefore, we modified the model to compensate the rarity problem by weighting the terms using TF-IDF measure which evaluates how important a word is to a document in a collection. The other technique we used, N-gram model is the generic term for a string of words in computational linguistics, and it is extensively used in text mining and NLP tasks. The prefixes that replace the n-part indicate the number of consecutive words in the string. So, a unigram is referred to one word, a bigram is two words, and an n-gram is n words. EXPERIMENTAL RESULTS AND DISCUSSION In this section, the experimental process and the results are presented. All experiments are performed using the Scikit-learn library. To evaluate the performance of the model and proposed features we employed the precision, recall, F1 score (the harmonic mean of the precision and recall), and accuracy metrics. We did many experiments using different combinations of features. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 14 Several classification models have been trained. These are as follows: K-Nearest Neighbor, Decision Trees, Gaussian Naive Bayes, Random Forest, Support Vector Machine, ExtraTrees Classifier, and Logistic Regression. To be effective, a classifier should be able to correctly classify previously unseen data. To this end, we tuned the parameter values for all the classification models used. Then, models were trained and evaluated on TRFN dataset using 10-fold cross-validation. In table 3, we present the ultimate best scores of the proposed model. The results are highly motivating to exemplify how useful automated detection systems can be as a key component of the integrated solution framework in figure 2. We compared the algorithms with three ultimate feature sets for having respectively consistent results to the other feature set combinations. Set1 stands for bigram+FOpt (Optimized Features), Set2 stands for BOWModified+ FOpt and Set3 stands for unigram+bigram+FOpt. The results show that there is a relative consistency in terms of performance across the models. In almost all models, the combination of unigram+bigram and optimized features sets (FOpt) gives better results than the other combinations. The ExtraTree Classifier model is chosen as the best due to its higher performance. This model is also known as Extremely Randomized Trees Classifier which is a type of ensemble learning technique aggregating the results of multiple decision trees collected in a “forest” to output its classification result. It is very similar to Random Forest Classifier and only differs in the manner of construction of the decision trees. So, we can also see closer results between these two classifiers. Table 3. Results. Evaluation results of all combinations of features and classification models. Model Feature Sets Precision%(0,1) Recall%(0,1) Accurac y F1Scor e Set1 93.32 93.96 93.92 93.3 6 93.64 93.62 Gaussian Naive Bayes Set2 93.37 94.02 93.98 93.4 2 93.70 93.68 Set3 93.95 94.21 94.19 93.9 7 94.08 94.07 Set1 93.70 93.50 93.52 93.6 9 93.60 93.61 K-Nearest Neighbour Set2 93.66 94.05 94.03 93.6 8 93.85 93.84 Set3 94.42 94.21 94.22 94.4 1 94.31 94.32 Set1 94.15 94.92 94.88 94.1 9 94.53 94.51 ExtraTrees Classifier Set2 94.09 94.94 94.90 94.1 4 94.51 94.49 Set3 97.90 95.72 95.81 97.8 6 96.81 96.85 Set1 89.61 88.92 88.99 89.5 4 89.26 89.30 Support Vector Machine Set2 89.70 88.96 89.04 89.6 2 89.33 89.37 Set3 90.85 91.26 91.22 90.8 9 91.05 91.03 Set1 91.56 92.28 92.23 91.6 2 91.92 91.89 Logistic Regression Set2 91.50 92.28 92.22 91.5 6 91.89 91.86 Set3 92.25 92.90 92.86 92.3 0 92.57 92.55 Set1 93.71 94.44 94.40 93.7 5 94.07 94.05 Random Forest Set2 93.87 95.00 94.94 93.9 4 94.44 94.41 Set3 94.77 95.14 95.12 94.7 9 94.96 94.95 Set1 93.95 94.59 94.56 93.9 9 94.27 94.25 Decision Trees Set2 94.05 95.08 95.03 94.1 1 94.57 94.54 Set3 94.94 95.24 95.23 94.9 5 95.09 95.08 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 15 Every ADS in GLIS_1.0 framework may use its own way to detect fake news. The open source ADS may improve with feedbacks. Hybrid models and other techniques such as neural networks with deep learning methodology can also be used according to the data, language of news and the news features related with both social context and news content. CONCLUSION AND FUTURE WORK In this study we presented a novel framework which offers a practical architecture of an integrated system for identifying fake news. We have tried to illustrate how digital libraries can be a service authority to promote media literacy and fight against fake news. Because librarians are trained to critically analyze information sources, their contributions to our proposed model are critical. Accordingly, we see this work as an encouraging effort for the next collaborative studies among the communities of LIS and CS (computer science). We think that there is an immediate need for LIS professionals to participate and contribute to automated solutions that can help detecting inaccurate and unverified information. In the same manner, we believe the collaboration of LIS professionals, computer scientists, fact-checking organizations, and pioneering technology platforms is the key to provide qualified news within a real-time framework to promote information literacy. Moreover, we put the reader at the core of the framework as the feed reader position while consuming news. In terms of automated detection systems, we proposed a fake news detection model in tegration of dictionary-based approach and machine learning techniques offering optimized feature sets applicable to agglutinative languages. We comparatively analyzed the findings with several classification models. We demonstrated that machine learning algorithms when used together with dictionary-based findings yield high scores both for precision and recall. Consequently, we believe once operational in the field, proposed workflow can be extended in the future to support other news elements such as photographs and videos. With the help of Social Network Analysis (SNA) it may be possible to stop or slow down the spread of fake news as it emerges. During all the experiments we did, this work also highlighted several tasks as future research directions such as: • The studies can be deepened to mathematically categorize the fake news types and the dissemination characteristics of each type can be analyzed. • The workflow has the potential to provide an automated verification platform for all news content existing in digital libraries to promote media literacy. ENDNOTES 1 M. Connor Sullivan, “Why Librarians Can’t Fight Fake News,” Journal of Librarianship and Information Science 51, no. 4 (December 2019): 1146–56, https://doi.org/10.1177/0961000618764258. 2 “Definition of 'News',” available at: https://www.collinsdictionary.com/dictionary/english/news 3 Dominic DiFranzo and Kristine Gloria-Garcia, “Filter Bubbles and Fake News,” XRDS: Crossroads, The ACM Magazine for Students 23, no. 3 (April 2017): 32–35, https://doi.org/10.1145/3055153. https://doi.org/10.1177/0961000618764258 https://www.collinsdictionary.com/dictionary/english/news https://doi.org/10.1145/3055153 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 16 4 Andrew Guess, Brendan Nyhan, and Jason Reifler, “Selective Exposure to Misinformation: Evidence from the Consumption of Fake News during the 2016 US Presidential Campaign,” European Research Council 9, no. 3 (2018): 4; Eni Mustafaraj and P. Takis Metaxas, “The Fake News Spreading Plague: Was It Preventable?” Proceedings of the 2017 ACM on Web Science Conference, (June 2017): 235–39, https://doi.org/10.1145/3091478.3091523. 5 Jana Laura Egelhofer and Sophie Lecheler, “Fake News as a Two-Dimensional Phenomenon: A Framework and Research Agenda,” Annals of the International Communication Association 43, no. 2 (2019): 97–116, https://doi.org/10.1080/23808985.2019.1602782. 6 Hannah Rashkin et al., “Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking,” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (2017): 2931–37. 7 Soroush Vosoughi, Deb Roy, and Sinan Aral, “The Spread of True and False News Online,” Science 359, no. 6380 (2018): 1146–51, https://doi.org/10.1126/science.aap9559. 8 Xinyi Zhou and Reza Zafarani, “A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities,” ACM Computing Surveys (CSUR) 53, no. 5 (2020): 1–40, https://doi.org/10.1145/3395046. 9 S. F. Kattimani, Praveenkumar Kumbargoudar, and D. S. Gobbur, “Training of the Library Professionals in Digital Era: Key Issues” (2006), https://ir.inflibnet.ac.in:8443/ir/handle/1944/1234. 10 Lynn Silipigni Connaway et al., “Digital Literacy in the Era of Fake News: Key Roles for Information Professionals,” Proceedings of the Association for Information Science and Technology 54, no. 1 (2017): 554–55, https://doi.org/10.1002/pra2.2017.14505401070. 11 Matthew C. Sullivan, “Libraries and Fake News: What’s the Problem? What’s the Plan?,” Communications in Information Literacy 13, no. 1 (2019): 91–113, https://doi.org/10.15760/comminfolit.2019.13.1.7. 12 Wayne Finley, Beth McGowan, and Joanna Kluever, “Fake News: An Opportunity for Real Librarianship,” ILA reporter 35, no. 3 (2017): 8–12; American Library Association, “Resolution on Access to Accurate Information,” 2018; Nick Rochlin, “Fake News: Belief in Post-Truth,” Library Hi Tech 35, no. 3 (2017): 386–92, https://doi.org/10.1108/LHT-03-2017-0062; Linda Jacobson, “The Smell Test: In the Era of Fake News, Librarians Are Our Best Hope,” School Library Journal 63, no. 1 (2017): 24–29; Angeleen Neely–Sardon, and Mia Tignor, “Focus on the Facts: A News and Information Literacy Instructional Program,” The Reference Librarian 59, no. 3 (2018): 108–21, https://doi.org /10.1080/02763877.2018.1468849; Claire Wardle and Hossein Derakhshan, “Information Disorder: Toward an Interdisciplinary Framework for Research and Policy Making,” Council of Europe report 27 (2017). 13 IFLA, “How to Spot Fake News,” 2017. https://doi.org/10.1145/3091478.3091523 https://doi.org/10.1080/23808985.2019.1602782 https://doi.org/10.1145/3395046 https://doi.org/10.1002/pra2.2017.14505401070 https://doi.org/10.15760/comminfolit.2019.13.1.7 https://www.emerald.com/insight/publication/issn/0737-8831 https://doi.org/10.1108/LHT-03-2017-0062 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 17 14 Jane Mandalios, “Radar: An Approach for Helping Students Evaluate Internet Sources,” Journal of Information Science 39, no. 4 (2013): 470–78, https://doi.org/10.1177/0165551513478889; Sarah Blakeslee, “The CRAAP test,” LOEX Quarterly 3, no. 3 (2004):4. 15 Victoria L. Rubin and Niall Conroy, “Discerning Truth from Deception: Human Judgments and Automation Efforts,” First Monday 17, no. 5 (2012), https://doi.org/10.5210/fm.v17i3.3933; Verónica Pérez-Rosas et al., “Automatic Detection of Fake News,” arXiv preprint arXiv:1708.07104 (2017). 16 Justin P. Friesen, Troy H. Campbell, and Aaron C. Kay, “The Psychological Advantage of Unfalsifiability: The Appeal of Untestable Religious and Political Ideologies,” Journal of Personality and Social Psychology 108, no. 3 (2015): 515–29, https://doi.org/10.1037/pspp0000018. 17 Tanja Pavleska et al., “Performance Analysis of Fact-Checking Organizations and Initiatives in Europe: A Critical Overview of Online Platforms Fighting Fake News,” Social Media and Convergence 29 (2018). 18 Yasmine Lahlou, Sanaa El Fkihi, and Rdouan Faizi, “Automatic Detection of Fake News on Online Platforms: A Survey,” (paper, 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Rabat, Morocco, 2019), https://doi.org/10.1109/ICSSD47982.2019.9002823; Christian Janze, and Marten Risius, “Automatic Detection of Fake News on Social Media Platforms,” (paper, Pasific Asia Conference on Information Systems (PACIS), 2017); Torstein Granskogen, “Automatic Detection of Fake News in Social Media Using Contextual Information” (master’s thesis, Norwegian University of Science and Technology (NTNU), 2018). 19 Jacob L. Nelson and Harsh Taneja, “The Small, Disloyal Fake News Audience: The Role of Audience Availability in Fake News Consumption,” New Media & Society 20, no. 10 (2018): 3720–37, https://doi.org/10.1177/1461444818758715; Philip N. Howard et al., “Social Media, News and Political Information During the US Election: Was Polarizing Content Concentrated in Swing States?,” arXiv preprint arXiv:1802.03573 (2018); Alexandre Bovet and Hernán A. Makse, “Influence of Fake News in Twitter During the 2016 US Presidential Election,” Nature Communications 10, no. 7 (2019): 1–14, https://doi.org/10.1038/s41467-018-07761-2. 20 Lina Zhou et al., “Automating Linguistics-Based Cues for Detecting Deception in Text-Based Asynchronous Computer-Mediated Communications,” Group Decision and Negotiation 13, no. 1 (2004): 81–106, https://doi.org/10.1023/B:GRUP.0000011944.62889.6f; Myle Ott et al., “Finding Deceptive Opinion Spam by Any Stretch of the Imagination,” arXiv preprint arXiv:1107.4557 (2011); Rada Mihalcea and Carlo Strapparava, “The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language,” (paper, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, (2009): Association for Computational Linguistics, 309–12); Julia B. Hirschberg et al., “Distinguishing Deceptive from Non-Deceptive Speech,” (2005), https://doi.org/10.7916/D8697C06. 21 Victoria L. Rubin, Yimin Chen, and Nadia K. Conroy, “Deception Detection for News: Three Types of Fakes,” Proceedings of the Association for Information Science and Technology 52, no. 1 (2015): 1–4, https://doi.org/10.1002/pra2.2015.145052010083; David M. Markowitz, and Jeffrey T. Hancock, “Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel,” PloS https://doi.org/10.1177/0165551513478889 https://doi.org/10.5210/fm.v17i3.3933 https://psycnet.apa.org/doi/10.1037/pspp0000018 https://doi.org/10.1109/ICSSD47982.2019.9002823 https://doi.org/10.1177%2F1461444818758715 https://doi.org/10.1038/s41467-018-07761-2 https://doi.org/10.1023/B:GRUP.0000011944.62889.6f https://doi.org/10.7916/D8697C06 https://doi.org/10.1002/pra2.2015.145052010083 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 18 one 9, no. 8 (2014): e105937, https://doi.org/10.1371/journal.pone.0105937; Jing Ma et al., “Detecting Rumors from Microblogs with Recurrent Neural Networks,” (paper, Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), (2016): 3818–24), https://ink.library.smu.edu.sg/sis_research/4630. 22 Kai Shu et al., “Fake News Detection on Social Media: A Data Mining Perspective,” ACM SIGKDD Explorations Newsletter 19, no. 1 (2017): 22–36, https://doi.org/10.1145/3137597.3137600. 23 Eugenio Tacchini et al., “Some Like It Hoax: Automated Fake News Detection in Social Networks,” arXiv preprint arXiv:1704.07506 (2017). 24 Julio C.S. Reis et al., “Supervised Learning for Fake News Detection,” IEEE Intelligent Systems 34, no. 2 (2019): 76–81, https://doi.org10.1109/MIS.2019.2899143. 25 Victoria L. Rubin et al., “Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News,” (paper, Proceedings of the Second Workshop on Computational Approaches to Deception Detection, (2016): 7–17); Francesco Barbieri, Francesco Ronzano, and Horacio Saggion, “Is This Tweet Satirical? A Computational Approach for Satire Detection in Spanish,” Procesamiento del Lenguaje Natural, no. 55 (2015): 135-42; Soujanya Poria et al., “A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks,” arXiv preprint arXiv:1610.08815 (2016). 26 Lei Guo and Chris Vargo, “’Fake News’ and Emerging Online Media Ecosystem: An Integrated Intermedia Agenda-Setting Analysis of the 2016 Us Presidential Election,” Communication Research 47, no. 2 (2020): 178–200, https://doi.org/10.1177/0093650218777177. 27 Natali Ruchansky, Sungyong Seo, and Yan Liu, “CSI: A Hybrid Deep Model for Fake News Detection,” Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (November 2017): 797–806, https://doi.org/10.1145/3132847.3132877. 28 Yaqing Wang et al., “EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection,” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2018): 849–57, https://doi.org/10.1145/3219819.3219903; James W. Pennebaker, Martha E. Francis, and Roger J. Booth, “Linguistic Inquiry and Word Count: LIWC 2001”, Mahway: Lawrence Erlbaum Associates 71, no. 2001 (2001). 29 “Facebook, Twitter May Face More Scrutiny in 2019 to Check Fake News, Hate Speech,” accessed May 17, 2020, available: https://www.huffingtonpost.in/entry/facebook-twitter-may-face- more-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e. 30 Nitesh V. Chawla et al., “Smote: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research 16, (2002): 321–57, https://doi.org/10.1613/jair.953. 31 Uğur Mertoğlu and Burkay Genç, “Lexicon Generation for Detecting Fake News,” arXiv preprint arXiv:2010.11089 (2020). 32 Burak Bezirci, and Asım Egemen Yilmaz, “Metinlerin Okunabilirliğinin Ölçülmesi Üzerine Bir Yazilim Kütüphanesi Ve Türkçe Için Yeni Bir Okunabilirlik Ölçütü,” Dokuz Eylül Üniversitesi https://doi.org/10.1371/journal.pone.0105937 https://ink.library.smu.edu.sg/sis_research/4630 https://doi.org/10.1145/3137597.3137600 https://doi.org10.1109/MIS.2019.2899143 https://doi.org/10.1177%2F0093650218777177 https://doi.org/10.1145/3132847.3132877 https://doi.org/10.1145/3219819.3219903 https://www.huffingtonpost.in/entry/facebook-twitter-may-face-more-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e https://www.huffingtonpost.in/entry/facebook-twitter-may-face-more-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e https://doi.org/10.1613/jair.953 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 AUTOMATED FAKE NEWS DETECTION IN THE AGE OF DIGITAL LIBRARIES | MERTOĞLU AND GENÇ 19 Mühendislik Fakültesi Fen ve Mühendislik Dergisi 12, no. 3 (2010): 49–62, https://dergipark.org.tr/en/pub/deumffmd/issue/40831/492667. 33 Robert Gunning, “The technique of clear writing,” Revised Edition, New York: McGraw Hill, 1968. 34 Ender Ateşman, “Türkçede Okunabilirliğin Ölçülmesi,” Dil Dergisi 58, no. 71–74 (1997). https://dergipark.org.tr/en/pub/deumffmd/issue/40831/492667 ABSTRACT INTRODUCTION LITERATURE REVIEW METHODOLOGY Framework Model Dataset Features EXPERIMENTAL RESULTS AND DISCUSSION CONCLUSION AND FUTURE WORK ENDNOTES 12593 ---- A Collaborative Approach to Newspaper Preservation PUBLIC LIBRARIES LEADING THE WAY A Collaborative Approach to Newspaper Preservation Ana Krahmer and Laura Douglas INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12596 Ana Krahmer (ana.krahmer@unt.edu) oversees the Digital Newspaper Unit at UNT. Through this work, she manages the Texas Digital Newspaper Program collection on The Portal to Texas History, which is a gateway to historic research materials freely available worldwide. Laura Douglas (laura.douglas@cityofdenton.com) is the librarian in charge of the Special Collections with the Denton Public Library which houses the genealogy, Texana, and local Denton history collections as well as the Denton municipal archives. In her work, she regularly assists patrons with newspaper research questions specifically related to Denton newspapers. © 2020. INTRODUCTION When we first proposed this column in January 2020, we had no idea how much the world would change between then and the July deadline. While we have collaborated for many years on a variety of projects, the value of our collaboration has never proven itself more than in this COVID - 19 reality: collaboration leverages the strengths and resources of partners to form something stronger than each. In this world of COVID-19, the collaboration between the Denton Public Library (DPL) and the University of North Texas Libraries (UNT) has allowed us to build open, online access to the first 16 years of the Denton Record-Chronicle (DRC). This newspaper is the city’s daily newspaper of record, and the collaboration between DPL and UNT resulted in free, worldwide research access, via The Portal to Texas History. The project was funded by a $24,820.00 grant through the IMLS Library Services and Technology Act (LSTA), awarded from September 2019 to August 2020 by the Texas State Libraries and Archives Commission (TSLAC) as part of their TexTreasures program, to digitize 24,000 newspaper pages. This project has also resulted in a follow-up collaboration to build open access to further years of this daily newspaper title, through a 2021 TexTreasures award to digitize an additional 24,000 newspaper pages. The real question, though, is what recipe made this a successful collaboration. BACKGROUND The DRC has been the community newspaper in Denton for over 100 years. Due to the sheer amount of material, digitizing a daily newspaper with such an extensive publication run is a long - term project that requires a lot of planning, time, and funding. Since the DPL’s inception in 1937, the library has endeavored to collect items related to Denton and Texas history. With community support, the library has developed a well-rounded collection of local history, Texana, and genealogical materials, all of which are housed in the Special Collections Research Area at the Emily Fowler Central Library. These materials support research, projects, and exhibits. One major research resource is the archival collection of local newspapers, mainly the DRC, maintained on 752 rolls of microfilm containing issues from 1908 to 2018. Before this project, access to these newspapers was only available in the Special Collections Research Area, through microfilm readers or paid subscription services. In addition, although steps had been taken to preserve the film, many of the rolls show wear from years of use, while others have developed vinegar syndrome and soon will no longer be a usable resource. In 2018, UNT obtained publisher permission to make the DRC run freely accessible on The Portal to Texas History. mailto:ana.krahmer@unt.edu mailto:laura.douglas@cityofdenton.com INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 A COLLABORATIVE APPROACH TO NEWSPAPER PRESERVATION | KRAHMER AND DOUGLAS 2 Laura had been exploring different avenues to digitize this microfilm and make them freely available to the public when Ana contacted her with information about the Texas State Library and Archives Commission (TSLAC), which awards annual grants supported by Library Services & Technology Act funds, through the Institute of Museum & Library Services. LSTA funding is annually provided to all fifty states through the Institute of Museum and Library Services, and the state library determines how this funding is expended. In Texas, LSTA funding is provided through a number of grant programs including TexTreasures, a competitive grant program for any Texas library. As described by TSLAC, the “TexTreasures grant is designed to help libraries make their special collections more accessible for the people of Texas and beyond. Activities considered for possible funding include digitization, microfilming, and cataloging.” Libraries can apply to fund the same type of project up to three years in a row, and the DRC project applied for $24,820.00 in 2019 to digitize 24,000 newspaper pages, representing the earliest years of microfilm available at the Denton Public Library. To create a viable grant application DPL partnered with the Texas Digital Newspaper Program (TDNP), available through UNT’s Portal to Texas History, and decided to start first by digitizing as many early years of microfilm as grant funding could cover. TDNP is the largest single-state, open access, digital newspaper preservation repository in the U.S., hosting just under 8 million newspaper pages at the time of this writing. In late 2018, UNT received permission from the owner of the DRC to include the newspaper run in the TDNP collection, which represented a very exciting opportunity for city and county researchers, as well as for the DPL. As thanks to the publisher for granting permission, UNT built access to the 2014 to 2018 PDF ePrint editions, which the TDNP preserves as a service to Texas Press Association- member publishers. After this, UNT contacted the DPL to discuss applying for grant funding. Once Laura learned that the DPL had received the 2019 award, she prepared the local planning steps necessary to collaborate with the university. THE PROJECT BECOMES REAL The Denton Record-Chronicle Digitization Project Grant contract and resolution for adoption went before the Denton City Council on October 8, 2019. The City of Denton issued a press release that day, and the DRC also published an article announcing the project. Over the next few days the DRC article appeared across social media, including the City of Denton’s social media accounts, as well as through library-associated email newsletters. After the first newspapers became available on the Portal, both DPL and UNT prepared blog posts about the project, which have also appeared on social media. These blog posts fulfilled publicity requirements specified by the grant, even while offering training to researchers in how to work with the online newspaper collection. One major convenience to this collaboration is that both organizations are in the same city. Transfer of materials was arranged by email and accomplished by a trip across town. We completed the digitization process in batches, with the first 10 microfilm rolls going to UNT on October 10, 2019, and UNT uploading the first 854 issues in December 2019. The newspapers from the first microfilm set represented 1908-1916. DPL transferred the last set of microfilm in April 2020, with dates ranging from 1917 through September 1924, shortly after which UNT completed and uploaded the grant-funded count of 24,000 newspaper pages. The estimated year given in the grant proposal that the scans would have gone through was 1938, but the page count on this newspaper proved to be much, much higher than originally estimated, and as a result, the funding only covered up to September 1924. DPL and UNT will continue their partnership by INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 A COLLABORATIVE APPROACH TO NEWSPAPER PRESERVATION | KRAHMER AND DOUGLAS 3 digitizing further years of the DRC, through a variety of methods. As we were in the midst of preparing this column the TSLAC contacted Laura to inform her that DPL had received a second grant award, in the amount of $24,820.00 to digitize 24,000 additional newspaper pages, which will move the newspapers through 1954. As of July 23, 2020, the Denton Record-Chronicle Collection on the Portal to Texas History hosts 6,168 items and has been used 16,397 times. This includes 1,743 items that are PDF ePrint editions of the paper from 2014 to 2018, which UNT uploaded for long-term preservation and access. UNT uploads ePrint editions without a charge, and digitally preserves these through an agreement with the Texas Press Association; these PDFs were not a part of the funded grant, but they do enhance access to the collection and helped to build community interest in seeing earlier years available on the Portal. The usage of the collection skyrocketed after the early editions became available. January 2020 saw the highest number with the collection uses at 3105. Once this project is complete, it will include over 200,000 newspaper pages. Neither DPL nor UNT has the ability to tackle this project on their own, but through collaboration, it is possible. RECIPE FOR YOUR OWN COLLABORATION SUCCESS These are planning recommendations as you prepare for your own collaboration, drawn from what we’ve learned as we worked on this project together. 1. Communicate Early and Often: Communicating needs enables partners to identify each other’s strengths. Each partner will bring their strengths to the project, which in this case included actual archival materials from DPL and technological expertise on the UNT side. In addition, be prepared to communicate with local groups who need to endorse or sign off on the project, including possibly the city council, the historical commission, or the city manager. 2. Partner to Write the Grant: Partnering in preparing the grant achieves two goals: first, it enables partners to develop a communication flow that will move forward throughout the collaboration; second, it ensures that partners know what each can realistically accomplish within the grant timeline. In this case, Laura wrote most of the grant application herself, but she had very specific questions that Ana had to answer, and she needed key elements from UNT, including project budget, technological infrastructure, and a commitment letter. Communicating early and partner on the grant application process ensured that there were no unexpected surprises that were within the control of either partner. 3. Work Together to Explain Your Partnership: With a grant of this size, we always spoke in advance to ensure we weren’t over-promising when newspapers would appear online. This also gave both Laura and Ana lead-time for promoting the project: Laura would share the years of the physical microfilm before sending them over, and Ana would walk Laura through the years that would get uploaded in a given month. This allowed them to plan publicity, training, and outreach efforts based on the dates of newspapers going online. In addition, Laura regularly communicated with Ana prior to submitting grant reports, and this was critical in preventing miscommunication going to the funding agency. 4. Pad Enough Time for the Unexpected: Of course, we had no way of knowing a pandemic would occur when we began this project, and what saved us was that we’d started planning as soon as we learned about receiving the grant, rather than as soon as the grant started, which was in September 2019. Planning two months in advance put us two months ahead of schedule, and we were able to start exchanging materials as soon as the grant period INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 A COLLABORATIVE APPROACH TO NEWSPAPER PRESERVATION | KRAHMER AND DOUGLAS 4 started. This gave us a few weeks of lead time so we successfully completed the project by the end of April 2020, at which point the microfilm page count had been scanned and UNT staff could remote in to complete the digitization processes. Extra time is only a benefit. If the COVID-19 pandemic had not occurred, we still might have had to address technological or film deterioration problems, and we could resolve these earlier rather than later because we had given ourselves a few extra weeks of lead time. 5. Don’t be Afraid to Explain Changes to Your Granting Agency: If your project changes due to unforeseen circumstances, for example in our project the uploaded total of pages reached 24,000 before we digitized the entire planned date range. UNT charges a per-page digitization fee, and these newspaper issues proved to contain more pages than expected . Laura contacted the representative at TSLAC to explain the situation and offer an alternative approach to cover the digitization of the remaining years. The important thing is to keep the granting agency informed of any changes, delays, or hiccups in the project. We are both proud of having completed this project three months before the end of the grant period, but we know that without solid communication, planning, or flexibility, the COVID-19 pandemic would have made the situation extremely difficult if not impossible. Leveraging the Portal’s technical infrastructure and TDNP’s newspaper expertise with the volume of material and collection expertise provided by the DPL has given us a model for success we plan to capitalize on in future projects. Best of all, in the world of COVID-19, our patrons can access these newspapers from the comfort of their own couches, without even taking off their pajamas! Introduction Background The Project Becomes Real Recipe for your own collaboration Success 12619 ---- What More Can We Do to Address Broadband Inequity and Digital Poverty? EDITORIAL BOARD THOUGHTS What More Can We Do to Address Broadband Inequity and Digital Poverty? Lori Ayre INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 HTTPS://DOI.ORG/10.6017/ITAL.V39I3.12619 Lori Ayre (lori.ayre@galecia.com) is Principal, The Galecia Group, and a member of the ITAL Editorial Board. © 2020. We are now almost seven months into our new lives with the novel coronavirus and over 190,000 Americans have died of COVID-19. Library administrators have been struggling with their commitment to provide services to their communities while keeping their staff safe. Initially, libraries relied on their online offerings, so more e-books and other online resources were acquired. Staff learned that they could do quite a bit of their work from home. They could still respond to email and phone messages. They could evaluate and order new material. They could deliver online programs like summer reading and story time. They could interact with people on social media. They could put together key resources for patrons and post them on the website.1 A lot of what the library was doing while the buildings were closed was not obvious. Most people associate the library with the building and since the building was closed… it seemed like nothing was happening at the library. Yet, library workers were busy. Once it became possible for library staff to enter the building (per local health ordinances), the first thing that libraries started to do was accept returns. That was a little fraught considering how little we knew about the virus and how long contaminants might live on returned library material. Eventually with the long-awaited testing results from the REALM Project and Battelle Labs (https://www.webjunction.org/explore-topics/COVID-19-research-project.html), people started standardizing on a three-day quarantine of returns. Then more testing of stacked material was done resulting in some people choosing to quarantine returns for four days. As of early September, we have learned that even five days isn’t enough to quarantine delivery totes and some other plastic material. Curbside pick-up was born in these early days of being allowed back in the buildings. If someone had mapped who was offering curbside pick-up, it would look like popcorn popping across the country. The number of libraries offering the service slowly increased and pretty soon nearly everyone was doing it.2 Many library directors will say that curbside pick-up is here to stay. People love the convenience too much to take the service away. Rolling out curbside pick-up has had some challenges: how to safely make the handoff between library staff and library patrons; whether to accept returns; whether to charge fines; modifying mailto:lori.ayre@galecia.com https://www.webjunction.org/explore-topics/COVID-19-research-project.html INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 2 circulation policies to fit the current needs; and selecting books for people that want them but who don’t have the skills needed to negotiate the library catalog’s requesting system. Some libraries started putting together grab bags of materials selected by staff for specific patrons—kind of like homebound services on-the-fly. Curbside helped get material in circulation again. Importantly, also during this period, libraries started finding creative ways to get Wi-Fi hotspots out into communities. They began lending them if they weren’t already. Those libraries already circulating hotspots increased their inventory. They took their bookmobiles into neighborhoods and created temporary hotspot connections around town. Many libraries made sure Wi-Fi from their building was available in their own parking lots.3 But one thing everyone has learned during this pandemic is that libraries alone cannot be the solution to the digital divide. This isn’t news to librarians who have been arguing that Internet access should be as readily available as electricity and water. Librarians understand that information cannot be free and accessible unless everyone has Internet access and knows how to use it. Public access computers, Wi-Fi hotspots, and media literacy are all staple services in our libraries today.4 However, these services are not enough to bridge a digital divide that only seems to be getting worse. The coronavirus that closed libraries and schools has made it painfully clear that something much bigger has to happen to address the problem. As Gina Millsap stated in a recent Facebook post: I think it’s become obvious that the COVID-19 crisis is shining a spotlight on the flaws we have in our broadband infrastructure and on our failure to make the investments that should have been made for equitable access to what should be a basic utility, like water or electricity.5 According to BroadbandNow, the number of people who lack broadband Internet access could be as high as 42 million.6 The FCC reports that at least “18 million people lacked access to broadband Internet at the end of 2018.”7 Even if all the libraries were open and circulating hundreds of Wi-Fi hotspots, we’d still have a very serious access problem. THINKING DIFFERENTLY ABOUT ADDRESSING THE DIGITAL DIVIDE In a paper published March 28, 2019, by the Urban Libraries Council (ULC), the author suggested three specific actions that libraries can take to address race and social equity and the digital divide. They are: 1. Assess and respond to the needs of your community through meaningful conversation (including considering different partners for your work) 2. Optimize funding opportunities to support your efforts (e.g. E-rate), and INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 3 3. Think outside the box to create effective solutions that are informed by those in need (e.g. lending Wi-Fi hot spots).8 While we know libraries have been heeding this advice when it comes to Wi-Fi hotspots, let’s look into what can be done when we consider ULC’s suggestion to consider different partners for your work. Community Partners An excellent example of what can be done with a coalition of community partners comes from Detroit where a mesh wireless network was put in place to provide permanent broadband in a low-income neighborhood.9 The project is called the Detroit Community Technology Project. With the community-based mesh network, only one Internet account is needed to provide access for multiple homes. The networks also enable people on the network to share resources on the network (calendar, files, bulletin board) and that data lives on their network, not in the cloud. One of the sponsors of the Detroit Community Technology Project is the Allied Media Project (https://www.alliedmedia.org/) which also sponsors the CassCoWifi and the Equitable Internet Initiative to get broadband and digital literacy training into several underserved areas. Community Networks (https://muninetworks.org/), a project of the Institute for Local Self- Reliance (https://ilsr.org/), describes several innovative projects in which communities partner with electric utilities. Surry County, Virginia, expects to extend broadband access to 6,700 households through a first-ever partnership between a utility (Dominion Energy Virginia) and an electric cooperative (Dominion Energy). A similar project is underway with the Northern Neck Cooperative and Dominion Energy.10 These initiatives are made possible due to some regulatory changes made in Virginia (SB 966). According to Community Networks, there are 900 communities providing broadband connectivity locally (https://muninetworks.org/communitymap). But nineteen states still have barriers in place that discourage, if not outright prevent, local communities from investing in broadband. Libraries in states where community networks are a viable option should be at the table, or perhaps setting the table, for discussions about how to bring broadband to the entire community - - not just into the library or dispatched one-at-a-time via Wi-Fi hotspots. This is an opportunity to convene community conversations focusing on the issue of broadband. Library staff have been doing more and more of this type of outreach into the community and acting as facilitator. The ALA has even produced a Community Conversation Workbook (http://www.ala.org/tools/sites/ala.org.tools/files/content/LTC_ConvoGuide_final_062414.pdf ) to support libraries just getting started. State Partners In California, the Governor recently issued Executive Order N-73-20 (https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-EO-N-73-20.pdf) directing state agencies to pursue a goal of 100 Mbps download speed and outlines actions across state agencies https://www.alliedmedia.org/ https://muninetworks.org/ https://ilsr.org/ https://muninetworks.org/communitymap http://www.ala.org/tools/sites/ala.org.tools/files/content/LTC_ConvoGuide_final_062414.pdf https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-EO-N-73-20.pdf INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 4 and departments to accelerate mapping and data collection, funding, deployment, and adoption of high-speed internet.11 This will undoubtedly create fertile ground for libraries to partner with other agencies and community organizations to advance this initiative. Libraries are specifically called out to raise awareness of low-cost broadband options to their local community. Every state has some kind of broadband task force or commission or advisory council (https://www.ncsl.org/research/telecommunications-and-information-technology/state- broadband-task-forces-commissions.aspx). This is another instance where libraries should be at the table. In my state, our State Librarian is on the California Broadband Council. But many of these commissions do not have a representative from the library world which means they probably are not hearing from us. Whether it is through your local library, your state library, or your state library association, it is important for librarians to build relationships with people on these commissions—if not get a seat on the commission themselves. National Partners Unless your community is blanketed with affordable broadband connectivity, it will be important that we continue to advocate nationally for the needs we see. In addition to helping the patron standing right in front of us checking out their hotspot, we also need to address the needs of the people who aren’t able to get to the library but are equally in need of access. Our job is to make sure that any new initiatives undertaken by a new administration provide for free and equitable access to the Internet for every household. Extending E-rate (the Federal Communication Commission’s program for making Internet access more affordable for schools and libraries) isn’t enough. Free (or at least affordable) broadband needs to be brought to every home. The Electronic Frontier Foundation (EFF) argues that fiber-to-the-home is the best option for consumers today because it will be easily upgradeable without touching the underlying cables and will support the next generation of applications (see https://www.eff.org/wp/case-fiber-home- today-why-fiber-superior-medium-21st-century-broadband). Libraries have worked with the EFF on issues related to privacy and government transparency. Maybe it’s time to team-up with them about broadband. Global Partners Low Earth Orbit (LEO) satellites could potentially bring broadband to everyone on Earth.12 Starlink (https://www.starlink.com/) is Elon Musk’s initiative and Project Kuiper (https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project- kuiper-satellite-constellation) is Amazon’s Jeff Bezos’ project. A private beta Starlink service is due (or perhaps it is already happening). If it works as Musk has envisioned, it could be a game- changer. Or it might just make the digital divide worse if it isn’t affordable to everyone who needs it. How might we lobby Musk to roll-out this service in a way that is equitable and fair? https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.starlink.com/ https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 5 SPEAK UP, SPEAK OUT, AND GET IN THE WAY These are just a few avenues that we, as professionals committed to free access to information, might pursue. I worry that we have not made enough noise about the problems we see in our communities that are a result of broadband inequity and digital poverty. And although virtually every library is doing something to address the problem, our efforts are no match for the magnitude of the problem. In a blog post on the Brookings Institution’s website, authors Lara Fishbane and Adie Tomer argue for a new agenda focused on comprehensive digital equity that includes (among other things) “building networks of local champions, ensuring community advocates, government officials, and private network providers share intelligence, debate priorities, and deploy new programming .”13 There are no better local champions and advocates for communities than the City or County Librarians and their staffs. Let’s treat this problem with the seriousness it deserves and at a scale that will be meaningful. To quote John Lewis (as so many of us have since his death on July 17, 2020), it's time for us to “speak up, speak out, and get in the way.”14 We have to make it painfully clear to policymakers that libraries cannot bridge the digital divide with public access computers and hotspots. We need to tell our communities’ stories, convene conversations, and agitate for equitable broadband that is as readily available as water and electricity. ENDNOTES 1 “Libraries Respond: COVID-19 Survey,” American Library Association, accessed August 25, 2020, http://www.ilovelibraries.org/sites/default/files/MAY-2020-COVID-Survey-PDF-Summary- of-Results-web-2.pdf. 2 Erica Freudenberger, “Reopening Libraries: Public Libraries Keep Their Options Open,” Library Journal, June 25, 2020, https://www.libraryjournal.com/?detailStory=reopening-libraries- public-libraries-keep-their-options-open. 3 Lauren Kirchner, “Millions of American Depend on Libraries for Internet. Now They’re Closed,” The Markup, June 25, 2020, https://themarkup.org/coronavirus/2020/06/25/millions-of- americans-depend-on-libraries-for-internet-now-theyre-closed. 4 Jim Lynch, “The Gates Library Foundation Remembered: How Digital Inclusion Came to Libraries,” TechSoup, accessed August 24, 2020, https://blog.techsoup.org/posts/gates- library-foundation-remembered-how-digital-inclusion-came-to-libraries. 5 Gina Millsap, “This was in April. Q. We’re starting a new school year and what has changed? A. Not much. It’s past time to get serious about universal broadband in the U.S.” Facebook, August 16, 2020, 5:37 a.m., https://www.facebook.com/gina.millsap.7/posts/10218986781485855. Accessed September 14, 2020. http://www.ilovelibraries.org/sites/default/files/MAY-2020-COVID-Survey-PDF-Summary-of-Results-web-2.pdf http://www.ilovelibraries.org/sites/default/files/MAY-2020-COVID-Survey-PDF-Summary-of-Results-web-2.pdf https://www.libraryjournal.com/?detailStory=reopening-libraries-public-libraries-keep-their-options-open https://www.libraryjournal.com/?detailStory=reopening-libraries-public-libraries-keep-their-options-open https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://www.facebook.com/gina.millsap.7/posts/10218986781485855 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 WHAT MORE CAN WE TO DO ADDRESS BROADBAND INEQUITY AND POVERTY? | AYRE 6 6 “Libraries are Filling the Homework Gap as Students Head Back to School,” Broadband USA, last modified September 4, 2018, https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are- filling-homework-gap-students-head-back-school. 7 James K. Willcox, “Libraries and Schools Are Bridging the Digital Divide During the Coronavirus Pandemic,” Consumer Reports, last modified April 29, 2020, https://www.consumerreports.org/technology-telecommunications/libraries-and-schools- ridging-the-digital-divide-during-the-coronavirus-pandemic/. 8 Sarah Chase Webber, “The Library’s Role in Bridging the Digital Divide”, Urban Libraries Council, last modified March 28, 2019, https://www.urbanlibraries.org/blog/the-librarys-role-in- bridging-the-digital-divide. 9 Cecilia Kang, “Parking Lots Have Become a Digital Lifeline,” The New York Times, May 20, 2020, https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html. 10 Ry Marcattilio-McCracken, “Electric Cooperatives Partners with Dominion Energy to Bring Broadband to Rural Virginia,” last modified August 6, 2020, https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring- broadband-rural-virginia. 11 “Newsom Issues Executive Order on Digital Divide,” CHEAC (Improving the Health of All Californians), last modified August 14, 2020, https://cheac.org/2020/08/14/newsom-issues- executive-order-on-digital-divide/. 12 Tyler Cooper, “Bezos and Musk’s Satellite Internet Could Save Americans $30B a Year,” Podium: Opinion, Advice, and Analysis by the TNW Community, last modified August 24, 2019, https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could- save-americans-30b-a-year/. 13 Lara Fishbane and Adie Tomer, “Neighborhood Broadband Data Makes It Clear: We Need an Agenda to Fight Digital Poverty,” Brookings Institution, last modified February 6, 2020, https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data- makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/. 14 Rashawn Ray, “Five Things John Lewis Taught us About Getting in ‘Good Trouble,’” Brookings Institution, last modified July 23, 2020, https://www.brookings.edu/blog/how-we- rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/. https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ Thinking Differently About Addressing the Digital Divide Community Partners State Partners National Partners Global Partners Speak Up, Speak Out, and Get in the Way ENDNOTES 12637 ---- Harnessing the Power of OrCam PUBLIC LIBRARIES LEADING THE WAY Harnessing the Power of OrCam Mary Howard INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12637 Mary Howard (mhoward@sccl.lib.mi.us) is Reference Librarian Library for Assistive Media and Talking Books (LAMTB) at the St. Clair County Library, Port Huron, Michigan. © 2020. Library for Assistive Media and Talking Books services (LAMTB) are located at the main branch for the St. Clair County’s Library System. LAMTB facilitates resources and technologies for residents of all ages who have visual, physical, and/or reading limitations that prevent them from using traditional print materials. Operating out of Port Huron, Michigan, we encounter many instances where we need to provide assistance above and beyond what a basic library may offer. We host Talking Book services which provide free players, cassettes, braille titles, and downloads to users who are vision or mobility impaired. We also have a large and stationary Kurtzweil reading machine that converts print to speech, video-enhanced magnifiers, large print books. We also provide home delivery service for patrons who are unable to travel to branches. The library has been searching for a more technology-forward focus for our patrons. The state’s Talking Books center in Lansing set up an educational meeting at the Library of Michigan in 2018 to see a live demonstration of the OrCam My Eye reader. This was the innovation we were seeking and I was thoroughly impressed with the compact and powerful design of the reader, the ease of use, and the stunningly accurate feedback provided by this AI reading assistive device. Users are able to read with minimal setup and total control. OrCam readers are lightweight, easily maneuverable assistive technology devices for users who are blind, visually impaired, or have a reading disability, including children, adults and the elderly. The device automatically reads any printed text: newspapers, money, books, menus, labels on consumer products, text on screens, books, or smartphones, etc. The OrCam reader will repeat back any text immediately and is fit for all ages and abilities. OrCam works with English, Spanish, and French languages and can identify money and other business and household items. It can be placed near either the left or right ear. Users can easily adjust the volume and speed of the read text. It can be to either the left or right temple on your glasses using a magnetic docking device. Having a diverse group of users with different needs use the reader as they like is one of the more impressive offerings. Changing most settings is normally facilitated with just a finger swipe on the OrCam device. The mission of OrCam is to develop a "portable, wearable visual system for blind and visually impaired persons, via the use of artificial computer intelligence and augmented reality” By offering these devices to our sight, mobility, or otherwise impaired patrons we open up the world of literacy, discovery and education. Some of our users are not able to read in any other fashion and the OrCam provides a much-needed boost to their learning profile. We secured a grant from the Institute of Museum and Library Services (IMLS) for the purchase of the readers (CFDA 45.310). We also worked with OrCam to get lower pricing for these units. Normally they retail for $3,500 but we were able to move this to the lower price point of $3,000. We also were awarded a $22,106 Improving Access to Information grant from the Library of Michigan to fund the entire purchase. Without this funding stream we would not have been able to secure the OrCam. However, if you have veterans in your service area please contact the company since there is availability for VA health coverage for low vision or legally blind veterans who may INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 HARNESSING THE POWER OF ORCAM | HOWARD 2 qualify to receive an OrCam device, fully paid for by the VA. Please visit https://orcam.com/en/veterans for more information. Figure 1. Close-up of the OrCam device. The grant was initially set to run from September 2019 to September 2020. We purchased six OrCam readers for our library users, and they were planned to be rotated among our twelve branches throughout this grant cycle. However, due to the pandemic and out of safety concerns for staff and visitors, our library was closed from March 23 to June 15 and we were only able to offer it to the public at six branches. As of July 14, 2020, we are projecting that we may open to the public in September, but COVID-19 issues could halt that. We have had to make arrangements with the grantor to extend the period for the usage of the OrCam from September to December. This will make up for some of the lost time and open a path for the other six libraries to have their turn offering the OrCam to their patrons. The interesting aspect of this is we now have to take our technology profile even further by offering remote training to prospective OrCam users. Thankfully, the design and rugged housing for the reader makes it easy to clean and maintain but the social distancing can prove to be intrusive for training. To set up a user you need to be within a foot or two of them and being very close in order to get them used to how the OrCam reads. There is a lot of directing involved and close contact with the user and instructor. We will use a work - around of providing distance instruction including in-person and remote training. OrCam also has a vast array of instructional videos that we will have cued up for users. We have had over 150 https://orcam.com/en/veterans INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 HARNESSING THE POWER OF ORCAM | HOWARD 3 residents attend presentations, demonstrations, and talks on the OrCam. I anticipate that this number will not be achieved for the second round; however, we may be more successful in our online presence since we can add the instruction to our YouTube page, offer segments on Facebook and other social media and provide film clips for our webpage. The situation has been difficult, but it has opened up LAMTB services to think about how we should be working to provide better and more remote service to our users. Since we cover over 800 square miles in the county, becoming more adaptable to servicing our patrons has become a paramount area of work for the library. The OrCam will bring about a new way of remote training to our patrons, which will bring about more awareness of the reader and how it can be beneficial to users. The St. Clair County Library System would like to thank the Institute of Museum and Library Services for supporting this program. The views, findings, conclusions or recommendations expressed in this article do not necessarily represent those of the Institute of Museum and Library Services. 12687 ---- In the Middle of Difficulty Lies Opportunity: Hope Floats LITA President’s Message In the Middle of Difficulty Lies Opportunity Hope Floats Evviva Weinraub Lajoie INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12687 Evviva Weinraub Lajoie (evviva@gmail.com) is Vice Provost for University Libraries and University Librarian, University at Buffalo, and the last LITA President. © 2020. If quarantine has illustrated anything to me, it’s that time is merely a construct. While my approximately 2-month term as President may be the shortest in LITA history, it has been filled with meetings, reports, protests, and preparations for our metamorphosis into Core. My thoughts have been consumed with the myriad of financial, health, and societal issues that have also filled my news feed. I spend a lot of time thinking and worrying about what their impact will be on our work and our institutions, how it impacts me and the people I work with personally, and what role Core may play for many of us in the future. I imagine all of us are thinking about health and safety. We are all balancing those parts of ourselves that want to aid, to help, to teach and guide with the parts of ourselves that are anxious and scared. Many of us have responsibilities where we need to protect our loved ones and ourselves. We are seeing the health and safety of our BIPOC colleagues disproportionately harmed. Balancing our crucial role within our communities is complicated and there are no right answers. I imagine many of us have been spending a lot of time thinking about money, whether it be personal concerns, institutional and organizational concerns, or their intersection point. We’re thinking a lot more about where our money comes from, how it is invested, how we pay for things, how we prioritize paying for things, who decides what gets purchased, and whose voice gets centered when we make that purchase. We’re thinking carefully about the institutions and infrastructures that have existed and how they will look different and should be different in a post-COVID landscape. I imagine most of us are thinking about societal connections. We are interacting with our professional colleagues differently, and many of us are, perhaps for the first time, perceiving the deep imbalances that permeate our personal, social, and professional lives. We are all trying to figure out how to do the work we need to do when we are uncomfortable and the world is uncertain and the demands for change are coming from all angles and in a variety of forms. LITA remained my professional home through the years because I found it to be a place where no matter who you were or where you worked, there was a place for you. That feeling of connection is so vital to all of us, pandemic and social unrest or not. Knowing there is a network I can depend on to be there when I’m working through the difficult and uncomfortable makes the work just a little bit easier and significantly more meaningful. Our professional organizations and affiliations have the ability to be an anchor in uncertain times - whether through a change in career, a financial crisis, an environmental catastrophe, or a global health emergency. On August 31, 2020, LITA officially dissolved and on September 1, our home became Core. At our last LITA Board meeting, Margaret Heller and Amanda L. Goodman presented a history of LITA. mailto:evviva@gmail.com http://hdl.handle.net/11213/14823 INFORMATION TECHNOLOGY AND LIBRARIES SEPTEMBER 2020 IN THE MIDDLE OF DIFFICULTY LIES OPPORTUNITY | WEINRAUB LAJOIE 2 What became clear to me in the retelling is that this is not LITA’s first reorganization. Nor is it our second or our third. LLAMA, LITA, and ALCTS have always been dancing with each other. Our merger is an acknowledgement that we “...play a central role in every library, shaping the future of the profession by striking a balance between maintenance and innovation, process and progress, collaboration and leading.” Collectively, we have had a year that is beyond comprehension—it has been filled with loss, anger, frustration, grief, anxiety, depression, horror...we have all been weathering the same storm, but our ships are not all equally prepared for the task laid ahead of them. That has been, for so many of us, the hardest part of all of this. We may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” Balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. I believe that together, we can make Core stand up to that challenge. It has been an honor to serve as the last LITA President. For the brief time I have served, to have the chance to hold an office so many people I truly admire have held...it is a legacy I am proud to have had a moment to uphold. I am gratified to transition LITA into a partnership that will take all that we have loved about LITA and make something new, something Core. https://core.ala.org/our-mission-vision-and-values/ https://core.ala.org/our-mission-vision-and-values/ https://core.ala.org/our-mission-vision-and-values/ 12691 ---- Letter from the Editor (September 2020) LETTER FROM THE EDITOR September 2020 Kenneth J. Varnum INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2020 https://doi.org/10.6017/ital.v39i3.12xxx With “unprecedented” rising to first place on my personal list of words I would prefer never to need to use again, let alone hear used, I find it eminently satisfying that some activities and events from before COVID continue in their usual, predictable ways. For me, the quarterly rhythm of publication of Information Technology and Libraries is one of those activities. It is helping keep me grounded. While it is certainly not much in the scope of what is happening all around me, it is at least something. One thing that is changing is that this journal, along with Library Resources and Technical Services and Library Leadership & Management are now publications of ALA’s newest division: Core: Leadership, Infrastructure, Futures. You’ll notice a new logo at the top of our site, reflecting the new organizational structure. I am excited about the possibilities of richer cross-Core cooperation and collaboration as we explore our new structure. This issue includes the first—and last—LITA President’s Message from incoming and outgoing LITA President Evviva Weinraub Lajoie. Evviva assumed the LITA presidency this summer, just before the merger of LITA, LLAMA, and ALCTS into the new Core division took place on September 1. Members of those three merged divisions should watch for information about elections for the new Core president in October. I am pleased that this issue includes the 2020 LITA/Ex Libris Student Writing Award winning article, Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results, by Sam Grabus of Drexel University. Julia Bauder, the Chair of this year’s selection committee (I was also a member, as ITAL editor) said, “This valuable work of original research helps to quantify the scope of a problem that is of interest not only in the field of library and information science, but that also, as Grabus notes in her conclusion, could affect research in fields from the digital humanities to the sciences.” Before closing, I would like to express my appreciation to Breanne Kirsch, who ably served on the editorial board from 2018-2020. Sincerely, Kenneth J. Varnum, Editor varnum@umich.edu September 2020 https://doi.org/10.6017/ital.v39i3.12235 https://doi.org/10.6017/ital.v39i3.12235 mailto:varnum@umich.edu 12847 ---- Public Libraries Respond to the COVID-19 Pandemic, Creating a New Service Model EDITORIAL BOARD THOUGHTS Public Libraries Respond to the COVID-19 Pandemic, Creating a New Service Model Jon Goddard INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.12847 Jon Goddard (jgoddard@northshorepubliclibrary.org) is a Librarian at the North Shore Public Library, and a member of the ITAL Editorial Board. © 2020. During the COVID-19 pandemic, public libraries have demonstrated, in many ways, their value to their communities. They have enabled their patrons to not only resume their lives, but to help them learn and grow. Additionally, electronic resources offered to patrons through their library card have allowed people to be educated and entertained. The credit must go to the librarians, who initially fueled, and have maintained this level of service by re-writing the rules—creating a new service model. Once libraries closed, librarians promoted ebooks and other important platforms available to patrons with their library cards. The result: The checkout of ebooks, and the use of these platforms rose, exponentially. Community engagement became completely virtual with librarians, and those who provide library programs to the public, providing services on platforms that they may or may not have heard of, such as Zoom and Discord. As libraries re-opened, many offered real-time reference services, as well as seamless and contactless curbside service, providing a sense of control and continuity amongst the chaos. EXPONENTIAL INCREASES IN ELECTRONIC RESOURCE USAGE Overdrive, which is currently used by nearly 90% of public libraries in the United States to manage both ebook and audiobook collections, saw an exponential increase in its usage. Since the lockdown began in mid-March, the daily average for ebook checkouts have been consistently 52% above pre-COVID periods. Additionally, new users to the platform have been consistently double and triple 2019 highs.1 Library staff have been helping readers during this time to ensure they obtain access with their devices. In Suffolk County, New York, where new patron registration to Overdrive is up 72% from last year (as of August 2020), there has been no shortage of requests for help.2 With kids being home from school and learning virtually, it is no surprise that ebook readership skyrocketed amongst YA and Juvenile readers with an 87% increase from last year. 3 To help them with their homework and studies, families turned to online tutoring. In Suffolk County, New York, the usage of the Brainfuse online live tutoring service has been consistently up by nearly 50% during the school closures.4 Gale, a Cengage company, which offers Miss Humblebee's Academy, a virtual learning program for preschoolers, saw its user sessions increase by 100% from the previous year.5 mailto:jgoddard@northshorepubliclibrary.org INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 PUBLIC LIBRARIES RESPOND TO THE COVID-19 PANDEMIC | GODDARD 2 Adults, also eager to learn new skills, took to online courses as well. Gale Courses saw a 50% increase in enrollments from March-July from the previous year. Likewise, Gale Presents: Udemy, which offers on-demand video courses, saw just over 21,000 course enrollments from March- June.6 To help those who did not have sufficient broadband Wifi to use these necessary resources and platforms, many libraries left their Wifi on even when the building was closed to allow access to those in the vicinity of the building. In addition, many libraries purchased Wifi hotspots to lend to their patrons. According to Pew Research, approximately 25% of households do not have a broadband internet connection at home.7 While public libraries cannot provide the only local solution to this gap, here are other steps libraries have been taking during the shutdown: • Strengthening wireless signals so people can access wireless from outside library buildings. • Hosting drive-up Wifi hotspot locations. • Partnering with schools to obtain and distribute Wifi hotspots to families in need. COMMUNITY ENGAGEMENT - VIRTUALLY Community engagement has been vital since the COVID-19 lockdown. Both librarians and those who provide library programs to the public had to quickly adjust to the virtual world in which we were suddenly living. Using a mixture of social media platforms, including Facebook Live and Stories, Discord, Instagram, YouTube, Zoom, and GoToMeeting, librarians flocked to the internet, providing a wide range of programming. Even those libraries that did not previously have any virtual programs managed to very quickly provide quality programs to their patrons. Virtual programming was not available at the San José Public Library (SJPL) prior to the shutdown. Librarians quickly started to move programs online, including story time, and created a program called Spring Into Reading, similar to the summer reading program, to continue to encourage families to read together. They also started a weekly recorded story time, so patrons could call the library and use their phones to hear a story. To date, SJPL has hosted over 2,000 virtual events since the lockdown began on March 17th.8 Some libraries, like the Oceanside Library in New York, were offering virtual programs before the pandemic. When the library closed on March 13, the team started planning to move completely virtual. Two days later, the library was offering four programs a day, including story times, book chats, and book clubs. By the end of the week, they were offering eight programs a day.9 In April, May, and June, they found book discussions and story times were the most popular programs. They then started to open their programs to people from out of state, partnering with other libraries. The result? Program attendance has increased and several Zoom meeting rooms have been maxed out.10 Through the lockdown, library patrons have been exercising, listening to concerts, taking virtual vacations, learning new skills, cooking, playing games, and reducing stress. This incredible INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 PUBLIC LIBRARIES RESPOND TO THE COVID-19 PANDEMIC | GODDARD 3 adaption was only possible due to library worker’s quick thinking and a never-ending determination to help. DELIVERING INFORMATION AND MATERIALS WITH A NEW SERVICE MODEL At the San Jose Public Library (SJPL), which has over 500,000 library members, library staff had to quickly shift to a new online reality just after the shutdown. To help patrons get the most from their electronic resources, SJPL used LibAnswers to post FAQs and email responses to their issues and questions. When a librarian was available, patrons could use LibChat to ask questions in real-time. Because no one was in the library buildings to answer phones, LibAnswers and LibChat became the only way the public could communicate with staff. Chat reference conversations increased by nearly 400%—from approximately 40 chat sessions per day to 160 per day. The chat service was also made available in Spanish, Vietnamese, and Chinese. When the library implemented its Express Pickup service, SJPL utilized the Spaces functionality in LibCal to allow patrons to create pickup appointments. When patrons arrived at the library for their appointment, the SMS functionality in LibAnswers allowed patrons to text staff upon arrival. Through the City of San José’s SJ Access initiative, which aims to help bridge the digital divide in the city, SJPL worked closely with other city departments, and the Santa Clara County Office of Education, to purchase approximately 16,000 high-speed AT&T hotspots for students and the public.11 Working Towards the New Normal The American Library Association (ALA) is committed to advocate strongly for libraries on several different fronts. Thanks to thousands of advocate communications with Congress, libraries secured $50 million for the Institute of Museum and Library Services (IMLS) in the Coronavirus Aid, Relief, and Economic Security (CARES) Act. This enabled libraries and museums to apply for grants during this time of need.12 In addition, the ALA is currently advocating for the passage of the Library Stabilization Fund Act (S.4181 / H.R.7486) to allow libraries to retain staff, maintain services, and safely keep communities connected and informed. The legislation calls for $2 billion in emergency recovery funding for America's libraries through the Institute of Museum and Library Services (IMLS).13 While the ALA is rightly advocating for these emergency funds, public librarians and administrators should take advantage of this time to strategically review what has been put into place to react to the COVID-19 pandemic, and plan for the long term. While it is true that libraries are physical spaces, they are also technology-driven services for learning and connections for all ages. Additionally, they have shown that due to this new service model, access has expo nentially expanded to new patrons, showing tremendous value when it comes to education and engagement. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 PUBLIC LIBRARIES RESPOND TO THE COVID-19 PANDEMIC | GODDARD 4 This new service model should be preserved. Programs that engage our communities should be both physical and virtual. Physical media and books should be provided both at the circulation desk and through a contactless service. Reference services should be provided both at the reference desk, and through chat reference services. This must be our new normal. ENDNOTES 1 David Burleigh, Director, Brand Marketing & Communication at Overdrive, phone conversation with author, October 9, 2020. 2 Maureen McDonald, Special Projects Supervisor at the Suffolk Cooperative Library System, phone conversation, September 14, 2020. 3 Burleigh. 4 McDonald. 5 Kayla Siefker, Head of Media & Public Relations at Gale, a Cengage Company, Brian Risse, VP of Sales – Public Libraries. Muna Sharif, Product Manager, Discovery & Analytics, phone conversation with author, October 16, 2020. 6 Siefker. 7 Pew Research Center, “Internet/Broadband Fact Sheet,” June 12, 2019, accessed October 13, 2020, https://www.pewresearch.org/internet/fact-sheet/internet-broadband/. 8 Laurie Willis, Web Services at SJPL, Phone conversation with author, October 14, 2020. 9 Erica Freudenberger, “Programming Through the Pandemic,” Library Journal, May 22, 2020, https://www.libraryjournal.com/?detailStory=Programming-Through-the-Pandemic-covid- 19. 10 Tony Iovino, Assistant Director for Community Services at the Oceanside Library, phone conversation with author, October 19, 2020. 11 Willis. 12 American Library Association, “Advocacy & Policy,” accessed October 15, 2020, http://www.ala.org/tools/covid/advocacy-policy. 13 Ibid. https://www.pewresearch.org/internet/fact-sheet/internet-broadband/ https://www.libraryjournal.com/?detailStory=Programming-Through-the-Pandemic-covid-19 https://www.libraryjournal.com/?detailStory=Programming-Through-the-Pandemic-covid-19 http://www.ala.org/tools/covid/advocacy-policy Exponential Increases in Electronic Resource Usage Community Engagement - Virtually Delivering Information and Materials with a New Service Model ENDNOTES 12857 ---- Journey with Veterans: Virtual Reality Program using Google Expeditions PUBLIC LIBRARIES LEADING THE WAY Journey with Veterans Virtual Reality Program using Google Expeditions Jessica Hall INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.12857 Jessica Hall (jessica.hall@fresnolibrary.org) Community Librarian, Fresno County Public Library. © 2020. “Where would you like to go?” is the question of the day. We have stood atop the Great Wall of China, swam with sea lions in the Galapagos Islands, and walked along the vast red sands of Mars. Each journey was unique and available through the library. As a community librarian in charge of outreach to seniors and veterans, I first learned about the virtual tour idea from a colleague who returned from a conference excited to tell me about a workshop she had attended. The workshop she had taken described a program which utilized Google Expeditions to take seniors on virtual tours. This idea stayed with me for months until Fresno County Public Library obtained the $3000 Value of Libraries grant, which was funded by the California Library Services Act. As a part of this grant, $2905 went to purchase a Google Expeditions kit and supplied to create a virtual reality program called Journey with Veterans. The kit includes 5 viewers and 1 tablet. A viewer is basically a Google Cardboard except the case is plastic and there is a smartphone inside of the case. During the program, I use the table to select and run each tour. The tour I select on the tablet is projected to the 5 viewers so participants can experience it. In this manner, veterans can explore places without physically having to travel anywhere. The Journey with Veterans program took the technology to the veterans instead of requiring them to come into the library. The two locations that were chosen were the Veterans Home of California - Fresno and the Community Living Center at the VA Medical Center in Fresno, CA. From the time the program began in September 2019 to March 2020, when the pandemic shutdown brought a halt to the program, the library hosted 26 sessions at these two locations with 182 veterans. In sessions where more than 5 people were in attendance, the viewers were shared between the participants. The tablet and smartphones inside of the viewers have an app installed on them called Google Expeditions which is the software that runs the tours. One hotspot, which was already owned by the library, was used for this program. It is a requirement that all the viewers and the tablet are connected to the same WiFi. Having a portable WiFi connection was necessary to run this program in locations where there was not access to a strong internet connection. Each tour is a selection of still 360-degree views. The landscape does not move. Instead, the participant turns their head around, up and down to look at the entire scene. The control tablet included additional menu items not seen by participants. These items included scripts that I can read off about the landscape we are looking at and suggested points of interest that I could highlight for participants. When I selected the point of interest on the tablet, the participant would see arrows pointing to that area of their screen. The participant would follow the arrows by turning their head in the direction that was indicated. The participants knew they were looking at the area of interest when the arrows disappeared and was replaced by a white circle surrounding the relevant portion of the screen. mailto:jessica.hall@fresnolibrary.org INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 JOURNEY WITH VETERANS | HALL 2 The viewers did not have straps attached to them and there was no way to attach straps to them. Therefore, the viewer could not be strapped to the participant’s head. Instead, the participant had to hold up the viewer the entire time they wished to look through it. This presented a challenge for participants who did not have the ability to hold the viewer on their own. At the locations I went to, there were staff available to help and they would hold the viewer up to a participant’s eyes. In some cases, one staff person held the viewer up for the participant while another would turn the participant’s wheelchair in a circle so they could see the entire image. Each program lasted 30-45 minutes but the amount of time looking through the viewer was kept to around 15-20 minutes. The rest of the time was filled with talking about the location that we are viewing. For the veterans in memory care at the Veterans Home of California - Fresno, this program was designed with the hope that it would allow the veterans to reminisce about places they had visited and lived in and encouraged them to talk about their experiences. Some of the participants had been to the countries that we visited virtually and they reminisced on their time there. At every session, the participants shared their enthusiasm and eagerness to continue the program. The program once was tried with music. On one of my first visits to the Community Living Center at the VA Medical Center, a participant asked if he could play music in the background. Since I had thought about incorporating music into the program, I agreed, and the participant played some classical music from his own device. Though it was a good idea, the execution did not work well. The music was coming from one location, which made it too loud when one stood near it but too quiet once one walked too far away. I found the music difficult to talk over while giving the tour. I believe that incorporating sounds of the location we visit, such as the sounds of the countryside or a big city would make the experience more immersive. However, I have yet to find a way to do so successfully. After the grant ended, I continued the program at both locations. The partnership I had created at the Veterans Home of California-Fresno grew into a second program, Storytime with Veterans which was requested by specifically by the residents. I alternated my visits so that some weeks we did a virtual reality program and some weeks I read to them. One time, there was miscommunication and the activity coordinator thought I had come to read a story but I was under the impression that it was a virtual reality week and so I had brought the Google Expeditions with me. The solution was to do both. One of the Google Expeditions tours is a very short and much abridged virtual reality version of Twenty Thousand Leagues Under the Sea by Jules Verne. The tour used artwork to represent scenes from the books and each scene tells a different part of the story. The Veterans Home’s residents were treated to both a story and a virtual reality tour at the same time. Up until the library’s shutdown in mid-March due to COVID-19, I was in the process of expanding the use of the Google Expeditions but was unable to continue. Since then, the equipment has not been used. Restarting the program now includes multiple challenges, not the least of which is sanitizing the devices. Sanitation was a consideration even before COVID-19 and sanitary virtual reality masks were acquired using grant funds as part of the initial program. These masks look like strips of cloth that line the eyes with strings to hook it around the ears to hold it in place. Cleaning products were also purchased and utilized to clean the devices after each program. INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 JOURNEY WITH VETERANS | HALL 3 Before COVID-19, a viewer could be handled by multiple people before it was cleaned. I always handled them first to prepare them for use. Then I handed each one to the participant. Occasionally they were also handled by staff. I always cleaned the viewers right after the program ended but not during the program. With the current COVID-19 restrictions, the sanitation practices previously used are inadequate. I do not know the future of the program in a post- COVID-19 world, but I intend to begin the program again once when it becomes safe to do so and I will incorporate all required precautions and restrictions. I look forward to once more being able to take veterans on exciting virtual journeys. 13027 ---- Leadership and Infrastructure and Futures…Oh my! LETTER FROM THE CORE PRESIDENT Leadership, Infrastructure, Futures Christopher Cronin INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.13027 Christopher Cronin (cjc2260@columbia.edu) is Core President and Associate University Librarian for Collections, Columbia University. © 2020. I am so pleased to be able to welcome all ITAL subscribers to Core: Leadership, Infrastructure, Futures! This issue marks the first of ITAL since the election of Core’s inaugural leadership. A merger of what was formerly three separate ALA divisions—the Association of Library Collections & Technical Services (ALCTS), Library & Information Technology Association (LITA), and the Library Leadership & Management Association (LLAMA)—Core is an experiment of sorts. It is, in fact, multiple experiments in unification, in collaboration, in compromise, in survival. While initially born out of a sheer fight or flight response to financial imperatives and the need for organizational effectiveness, developing Core as a concept and as a model for an enduring professional association very quickly became the real motivation for those of us deeply embedded in its planning. Core is very deliberately not an all-caps acronym representing a single subset of practitioner within the library profession. It is instead an assertion of our collective position at the center of our profession. It is a place where all those working in libraries, archives, museums, historical societies—information and cultural heritage broadly—will find reward and value in membership and a professional home. All organizations need effective leaders, strong infrastructure, and a vision for the future. And that is what Core strives to build with and for its members. While I welcome ITAL’s readers into Core, I also welcome Core’s membership into ITAL. No longer publications of their former divisions, all three journals within Core have an opportunity to reconsider their mandates. As with all things, audience matters. ITAL’s readership has now expanded dramatically, and those new readers must be invited into ITAL’s world just as much as ITAL has been invited into theirs. As we embark on this first year of the new division, we do so with a sense of altogether newness more than of a mere refresh, and a sense of still becoming more than a sense of having always been. And who doesn’t want to reinvent themselves every once in a while? Start over. Move away from the bits that aren’t working so well, prop up those other bits that we know deserve more, and venture into some previously uncharted territory. How will being part of this effort, and of an expanded division, reframe ITAL’s mandate? The importance of information technology has never been more apparent. It is not lost on me that we do this work in Core during a year of unprecedented tumult. In 2020, a murderous global pandemic was met with unrelenting political strife, pervasive distribution of misinformation and untruths, devastating weather disasters, record-setting unemployment, heightened attention on an array of omnipresent social justice issues, and a racial reckoning that demands we look both inward and outward for real change. Individually and collectively, we grieve so many losses —loss of life, loss of income, loss of savings, loss of homes, loss of dignity, loss of certainty, loss of control, loss of physical contact. And throughout all of these challenges, what have we relied on more this year than technology? Technology kept us productive and engaged. It provided a focal point for communication and connection. It provided venues for advocacy, expression, inspiration, and, as a mailto:cjc2260@columbia.edu INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2020 LEADERSHIP, INFRASTRUCTURE, FUTURES | CRONIN 2 counterpoint to that pervasive distribution of misinformation, it provided mechanisms to amplify the voices of the oppressed and marginalized. For some, but unfortunately not all, technology also kept us employed. And as the physical doors of our organizations closed, technology provided us with new ways to invite our users in, to continue to meet their information needs, and to exceed all of our expectations for what was possible even with closed physical doors. And yet our reliance on and celebration of technology in this moment has also placed another critical spotlight on the devastating impact of digital poverty on those who continue to lack access, and by extension also a spotlight on our privilege. In her parting words to you in the final issue of ITAL as a LITA journal, Evviva Weinraub Lajoie, the last President of LITA, wrote: We may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” Balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. I believe that together, we can make Core stand up to that challenge. I believe we will do this, too, and with a spirit of reinvention that is guided by principles and values that don’t just inspire membership but also improve our professional lives and experience in tangible ways. It was a privilege to have served as the final President of ALCTS and such a humbling and daunting responsibility to now transition into serving as Core’s first. It is a responsibility I do not take lightly, particularly in this moment when so much is demanded of us. As we strive for equity and inclusion, we do so knowing that we are only as strong as every member’s ability to bring their whole selves to this work. We must work together to make our professional home everything we need it to be and to help those who need us. It is yours, it is theirs, it is ours. https://doi.org/10.6017/ital.v39i3.12687 13051 ---- Letter from the Editor: Farewell 2020 LETTER FROM THE EDITOR Farewell 2020 Kenneth J. Varnum INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2020 https://doi.org/10.6017/ital.v39i4.13051 I don’t think I’ve ever been so ready to see a year in the rear-view mirror as I am with 2020. This year is one I’d just as soon not repeat, although I nurture a small flame of hope. Hope that as a society what we have experienced this year will exert a positive influence on the future. Hope that we recall the critical importance of facts and evidence. Hope that we don’t drop the effort to be better members of our local, national, and global communities and treat everyone equitably. Hope that as a global populace we continue to get into “good trouble” and push back against institutionalized policies and practices of racism and discrimination and strive to be better. Despite the myriad challenges this year has brought, it is welcome to see so many libraries continuing to serve their communities, adapting to pandemic restrictions, and providing new and modified access to books and digital information. And equally gratifying, from my perspective as ITAL’s editor, is that so many library technologists continue to generously share what they have learned through submissions to this journal. Along those lines, I’m extending my annual invitation to our public library colleagues to propose a contribution to our quarterly column, “Public Libraries Leading the Way.” Items in this series highlight a technology-based innovation from a public library perspective. Topics we are interested in could include any way that technologies have helped you provide or innovate service to your communities during the pandemic, but could touch on any novel, interesting, or promising use of technology in a public library setting. Columns should be in the 1,000-1,500 word range and may include illustrations. These are not intended to be research articles. Rather, Public Libraries Leading the Way columns are meant to share practical experience with technology development or uses within the library. If you are interested in contributing a column, please submit a brief summary of your idea. Wishing you the best for 2021, Kenneth J. Varnum, Editor varnum@umich.edu December 2020 https://ejournals.bc.edu/index.php/ital/pllw https://docs.google.com/forms/d/e/1FAIpQLSd7c0-g-LxeTkJ2uKJoKD7OYT-VPrTOizdm1Fs8XuHKotCtug/viewform https://docs.google.com/forms/d/e/1FAIpQLSd7c0-g-LxeTkJ2uKJoKD7OYT-VPrTOizdm1Fs8XuHKotCtug/viewform mailto:varnum@umich.edu 5704 ---- Fulfill Your Digital Preservation Goals with a Budget Studio Yongli Zhou INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 26 ABSTRACT To fulfill digital preservation goals, many institutions use high-end scanners for in-house scanning of historical print and oversize materials. However, high-end scanner prices do not fit in many small institutions’ budgets. As digital single-lens reflex (DSLR) camera technologies advance and camera prices drop quickly, a budget photography studio can help to achieve institutions’ preservation goals. This paper compares images delivered by a high-end overhead scanner and a consumer-level DSLR camera, discusses pros and cons of using each method, demonstrates how to set up a cost-efficient shooting studio, and presents a budget estimate for a studio. INTRODUCTION Colorado State University Libraries (CSUL) are regularly engaged in a variety of digitization projects. Materials for some projects are digitized in-house, while items from selected projects are sometimes outsourced. Most fragile materials that require professional handling are digitized in- house using an expensive overhead scanner. However, the overhead scanner has been occasionally unstable since it was purchased, and this has delayed some of our digitization projects. As digital photography technologies advance, image quality delivered by digital single- lens reflex (DSLR) cameras is improving, and camera prices have lowered to an affordable level. In this paper, I will compare images produced by a scanner and a camera side-by-side, list pros and cons of using each method, illustrate how to establish a shooting studio, and present a budget estimate for that studio. LITERATURE REVIEW There are many online guidelines and manuals for digitizing print materials. Some universities and museums have information about their digitization equipment online. Most articles focus on either high-end scanners or customized scanning stations. These articles are very helpful for universities and museums that are relatively well funded. However, there is almost no literature discussing how to use inexpensive digital cameras and photography equipment to produce high- quality digitized images. This article will use a case study to prove that a low-budget studio can produce high-quality digitized images. COMPARISON OF SCANNED AND PHOTOGRAPHED IMAGES The test camera set was chosen because it was the one the author used for general purpose. The camera was also chosen by many professional photographers because of its quality and Yongli Zhou (yongli.zhou@colostate.edu) is Digital Repositories Librarian, Colorado State University Libraries, Fort Collins, Colorado. mailto:yongli.zhou@colostate.edu FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 27 affordability. To avoid dispute, the overhead scanner’s make and model are not revealed. Test Equipment Budget Studio Overhead Scanner • Nikon D800 • Nikon AF Micro-Nikkor 60mm f/2.8D Lens • Manfrotto 055CXPRO3 3-Section Carbon Fiber Tripod Legs • Really Right Stuff BH-40 LR II Ballhead • Nonreflective glass • Book cradles • X-Rite Original ColorChecker Card • Natural daylight • Total cost: $4,500 and no maintenance fees (priced in 2014) • Our overhead scanner • Nonreflective glass • Book cradles • Purchase price: $55,000 (purchase in 2007) • $8,000 annual maintenance (2013 price) Table 1. Test Equipment Focus and Sharpness A quality digitized image needs to have a good focus. A well-focused image shows details better and can produce better Optical Character Recognition (OCR) results for text-based documents. At CSUL, we have no control over the automatic focus on our overhead scanner and have noticed that sometimes one page is sharply focused but the next page is slightly out-of-focus. During the scanning process, our overhead scanner does not indicate if a shot is focused or not. A DSLR camera can beep or display a flashing dot on the viewfinder when in focus. Illustration The following two figures compare images produced by our test DSLR and overhead scanner. Both images were originals and have not been enhanced by software. In addition to this image, we tested nine other illustrations. Following our comparison study, we concluded that a semiprofessional DSLR camera produces sharper images than our expensive overhead scanner. In figure 1, at 100 percent zoom , the left image has a better focus, contains more details, and has colors closer to the original. The left image was taken using a Nikon D800 + Nikkor 60mm macro lens and under natural lighting. The right image was produced by our overhead scanner. In Figure 2, at 200 percent zoom, the left image (taking using the DSLR) shows much more detail than the image on the right (taken with the overhead scanner). INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 28 Figure 1. Comparative Images from DSLR (Left) and Overhead Scanner (Right), at 100 Percent Zoom. Image from Samuel M. Janney, The Life of William Penn; with Selections from His Correspondence and Auto-Biography (Philadelphia: Hogan Perkins & CO, 1852), plate between pages 296 and 297. Figure 2. Comparative Images from DSLR (Left) and Overhead Scanner (Right), at 200 Percent Zoom. Image from Samuel M. Janney, The Life of William Penn; with Selections from His FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 29 Correspondence and Auto-Biography (Philadelphia: Hogan Perkins & CO, 1852), frontispiece, print. At CSUL, the process of digitizing a text document includes scanning pages, converting them into Portable Document Format (PDF) files, and applying an OCR process. In general, a well-focused image of text produces better OCR results, although software such as Adobe Acrobat can tolerate fuzzy images and produce reasonably accurate OCR text. Our OCR tests from a slightly out-of-focus image and a well-focused image have no significant difference; however, from preservation and usability standpoints, we prefer well-focused images. Figure 3. The left image was produced by our test DSLR camera and has a better focus. The right image was produced by our overhead scanner. Samuel M. Janney, The Life of William Penn; with Selections from His Correspondence and Auto-Biography (Philadelphia: Hogan Perkins & CO, 1852), 300, print. Figure 4. We ran the OCR process on the above two images. The top image was produced by our test DSLR camera and the bottom image was produced by our overhead scanner. Samuel M. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 30 Janney, The Life of William Penn; with Selections from His Correspondence and Auto-Biography (Philadelphia: Hogan Perkins & CO, 1852), 300, print. Generated from the Image by Camera Generated from the Image by Scanner " On one or two points of high importance, he had notions more correct than were, in his day, common, even among men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." Yet, "he was not a man of stron sense." " On one or two points of high importance, he bad notions more correct than were, in his day, common, even arnong men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." Yet, "he was not a man of strong sense." Table 2. OCR Results Comparison These test results are very close because of the forgiveness of the Adobe Acrobat software. However, we have seen that for some other pages, a better-focused image generates improved OCR results. Photograph A 6.5 inches by 4.5 inches silver print was used for this test. Our tests show that the test DSLR camera produced a sharper image of this historic photograph. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 31 Figure 5. Tested 6.5 Inches by 4.5 Inches Photograph. The red square indicates the enlarged area for figure 6. Historical photograph from Colorado State University Archives and Special Collections. Figure 6. Screen View at 100 Percent Zoom of a Silver Print. The top image was produced by the test DSLR camera and the bottom one was produced by our overhead scanner. Historical photograph from Colorado State University Archives and Special Collections. Oversize Materials For oversized materials, overhead scanners and DSLR cameras have their drawbacks, so we do not think either option is ideal for them. Our library uses a map scanner to scan oversize maps and posters. However, a map scanner is expensive and may not fit many libraries’ budgets. A map scanner also is not suitable for fragile maps or posters. Our overhead scanner’s maximum scanning area is 24 inches by 17 inches, and the test map’s size is 25 inches by 26 inches. We had to scan the map in four sections and stitch them together using Adobe Photoshop. Each section image has a files size of 313 MB. Because of large file sizes, the stitching process is extremely slow. Also stitching images is not recommended because there are always some degrees of mismatching errors created by lens distortion. A camera can capture any material size, but the details of the photographed images diminish as the material’s size increases. The photo of the entire map taken by our test DLSR has a file size of 35.8 MB. The image produced by camera has a lower resolution and less detail. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 32 Figure 7. Oversized Materials Screen View at 100 Percent Zoom. The top image was photographed by the test DSLR. The bottom image was scanned by our overhead scanner. Historical map from Colorado State University Archives and Special Collections. Small Prints One big advantage of a DSLR camera is that it can be set farther away to take pictures of oversized materials or very close to smaller objects to take close-up pictures. Comparatively, the distance of lens and scanning platform on our overhead scanner is fixed, so no close-up images can be produced, and everything is reproduced at scale of 1:1. For the following example, we used a 5.5 inches by 3.5 inches drawing as our test subject. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 33 Figure 8. A 5.5 inches by 3.5 inches Fine Drawing. A historical booklet from Colorado State University Archives and Special Collections. Figure 9. Small Prints Screen View at 100 Percent Zoom. The left image is produced by a DSLR with a macro lens and the right image was scanned by our overhead scanner. A historical booklet from Colorado State University Archives and Special Collections. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 34 The image produced by our overhead scanner has a resolution of 3,427 pixels by 2,103 pixels. The camera produces a 6,776 pixels by 4,240 pixels image. The higher pixel count allows users to see more details at the same zoom level. The image produced by camera is not only sharper but also contains more details. It also is good for making enlarged prints for promotion materials. For smaller maps, a DSLR camera also produces superior images. For the following sample, we tested a 15 inches by 9.5 inches map. Figure 10. A 15 inches by 9.5 inches map. The blue square indicates the enlarged area for figure 11. Historical map from Colorado State University Archives and Special Collections. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 35 Figure 11. Small Map Screen Views at 100 Percent Zoom. The left image was photographed by a DSLR camera with a macro lens and the right image was produced by our overhead scanner. Historical map from Colorado State University Archives and Special Collections. Post-Processing Use of a Sharpening Filter Our tests showed that a main drawback of our overhead scanner is that images produced are out- of-focus. Some digitization guidelines recommend minor post-processing for delivered images files to improve image quality. One might argue that to fix our overhead scanner’s out-of-focus problem, sharpening can be applied. Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files recommends doing minor post-scan adjustment to optimize image quality and bring all images to a common rendition.1 This is good advice, but it is not applicable in real-world practice. To get the best result, each image would need to be evaluated and have a sharpening filter applied separately because when an improper sharpening setting is applied to an image, it often creates haloing artifacts and an unnatural look. The application of a sharpening filter to each image process will be extremely time-consuming. The haloing artifact is also called chromatic aberration (CA) effect. CA appears as unsightly color fringes near high contrast edges. Chromatic aberrations are typically only visible when viewing the image on-screen at higher zoom levels or on large prints. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 36 The following example shows that the CA may not appear at lower zoom levels, such as 50 percent or 100 percent. The left image has no sharpening filter applied and the right image has a sharpening filter applied. At 100 percent zoom, chromatic aberration is almost not identifiable, and the right image appears to be superior in turns of sharpness. Figure 12. Sharpening Filter Comparison Sample at 100 Percent Zoom. The left image has no sharpening filter applied and the right image has been applied a sharpening filter. Historical map from Colorado State University Archives and Special Collections. At a higher zoom level, we see CA, visible in the right image of figure 13. The extra colors are introduced by the software. Figure 13. Comparison of Sharpening Filter Applied to Images and at 500 Percent Zoom. The left image has no sharpening filter applied and the right image has sharpening filter applied. Historical map from Colorado State University Archives and Special Collections. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 37 We recommend not applying sharpening filters to original scanned images; instead, attempt to obtain well-focused images from the beginning. For this reason, the test DSLR camera out- performed our overhead scanner for most materials. Color Balance Have you seen a scanned color image or color photograph with colors very different from the original image? For example, a white area appears to be bluish, or it has an orange cast? When scanning or photographing an image under different lighting, the output image can have very different colors. In the following figure, the left image was shot at a correct white balance (WB) setting. WB is the process of removing unrealistic color casts so that objects that appear white in person are rendered white in your photo.2 The center image has a blue color cast, which was caused by a lower Kelvin setting, and the right image was shot at a higher Kelvin setting. A camera may create images with the wrong colors, but so will a scanner if it is not calibrated correctly. Figure 14. Images Shot under Different White Balance Settings. We pay an $8,000 annual service fee for overhead scanner maintenance, which includes scanner color calibration. In general, image colors rendered by the machine are close to original colors but not exact. We have noticed that some images have a very light green overcast and other others are overly yellow; sometimes images appear to be darker than they should be. Because we are not certified to calibrate the overhead scanner, we only use the prescribed settings set by technicians. Also, we have no control over maintaining a fading light bulb, which will affect correct exposure. WB adjustment on photographs taken in a studio can be very precise. Most DSLR contains a variety of preset white balances. In general, auto WB works well, but does not deliver the best results. Custom WB allows fine-tuning of colors. If a shooting studio is set up properly, the lighting should be consistent, so ideally one setting found most desirable can be used repeatedly. However, professional photographers do test shots at the beginning of each shooting session. Once they find INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 38 the optimal test shot, they will use the exact settings for the batch. Later, they will do minor color adjustment on the chosen test shot to ensure precise color representation, and then apply the adjustment settings on all other photos of the same batch. Because many small variations can be present for each shooting session, they do not use the settings from the previous shooting. It may seem arduous to do test shots for each shooting, but it ensures accurate color reproduction. Many professional photographers use ColorChecker Passport,3 which is a commercial product to help with quick and easy capture of accurate colors. I will demonstrate briefly a useful trick I learned from a professional photography seminar how to utilize ColorChecker Passport to apply correct white balance a group of images. 4 Step 1: Place an 18 percent gray card or a ColorChecker Passport card on top of a page. Choose the correct exposure and take the photo. Use the same exposure setting to take additional photos. For demonstration purposes, we deliberately used a very low and high Kelvin setting for sample images. The low Kelvin setting created cool and blue tones and the high Kelvin setting created a tone that was too warm. Note that the test shot with ColorChecker Board was not taken with exactly the correct white balance setting. Figure 15. Sample Images for White Balance Adjustment. Rocky Mountain Collegian 3–4 (1893), 118, Colorado State University Archives and Special Collections. Step 2: In Adobe Lightroom, select the test target image and switch to “Develop” mode. Select the White Balance tool, move the cursor over a gray area, try to find a spot where the red, green, and blue (RGB) values are close. If you can find a place with equal RGB values, it will be ideal. This simple click will set the test image’s white balance to an almost perfect setting. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 39 Figure 16. Applying a White Balance in Adobe Lightroom 4 Step 3. Synchronize other images’ settings with the target image. Select the target image and all other images, click the Sync button, and select settings you would like to synchronize. Make sure the WB button is checked. Figure 17. Synchronize Settings in Adobe Lightroom 4 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 40 Figure 18. Synchronized Images with Correct White Balance. Rocky Mountain Collegian 3–4 (1893), 118, Colorado State University Archives and Special Collections. Recently, I had the opportunity to visit the Spencer Museum of Art’s digitization lab. They have a different workflow to ensure even more scientifically correct colors. If you are interested in their approach, you can contact their information technology manager or photographer. Color Space One very important thing to understand is color space when you use a DSLR camera. Many DSLR cameras support Adobe RGB and sRGB. sRGB reflects the characteristics of the average cathode ray tube (CRT) display. This standard space is endorsed by many hardware and software manufacturers, and it is becoming the default color space for many scanners, low-end printers, and software applications. It is the ideal space for web work but not recommended for prepress work because of its limited color gamut. Adobe RGB (1998) was designed to encompass most of the colors achievable on CMYK printers, but only by using RGB primary colors on a device such as your computer display.5 It is recommended to use this color space if you need to do print production work with a broad range of colors. Many scanning vendors deliver images in Adobe RGB color space. ProPhoto RGB contains all colors that are in Adobe RGB, and Adobe RGB contains nearly every color that is in sRGB. This color space covers more colors than the human eye can see. It can only be used for images in RAW format and in 16-bit mode. Common file formats that support 16-bit images are TIFF and PSD. Most printers do not support 16-bit format. This color space normally is used by photographers who have a specific workflow and who print on specific high-end inkjet printers. When converting from 16-bit to 8-bit, some images will have banding or posterization problems. Banding is a digital imaging artifact. A picture with banding problem shows horizontal or vertical lines. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 41 Figure 19. An Example of Colour Banding, Visible in the Sky in This Thotograph.6 Posterization of an image entails conversion of a continuous gradation of tone to several regions of fewer tones, with abrupt changes from one tone to another.7 Figure 20. An Example of Posterization.8 While it is a good idea to capture images using Adobe RGB to preserve a wide range of colors, you should convert images to sRGB when delivering to unknown users and displaying on the web. Currently, sRGB is the only appropriate choice for images uploaded to the web, since most web browsers don’t support any color management. Adobe RGB images that are uploaded to websites without conversion to sRGB generally appear dark and muted.9 If they were printed on printers that do not support Adobe RGB format, colors will be dull too. SETTING UP A BUDGET STUDIO Commercial Approach BookDrive Pro is a commercially available digitization unit. It uses two digital cameras and built-in flash lights. It may be the optimal solution for your projects, but it also may not fit your library’s INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 42 budget. The unit also is not suitable for oversized material such as large maps and posters. For more information about this product, please visit http://pro.atiz.com/. Sample Budget Studio Setup A digitization lab can have three rooms or areas, one for oversized materials, one for smaller prints or 3-D objects, and one for computers. The area for shooting oversized materials should have black walls and floor. You can either use one flash light to bounce light off the ceiling or use two flash lights to shine lights directly onto the materials. For fragile materials, the first approach is more appropriate. The area for shooting smaller prints or 3-D objects should have a stable table and black or white background paper. For this room or area, black walls and floor are not required. For shooting equipment, I will use the set chosen by the photographer from the University of Kansas Spencer Museum of Art as my example. Item Name Sample Item Purchasing URL Price DSLR camera Nikon D810 http://www.bhphotovideo.co m/c/search?atclk=Camera+Mo del_Nikon+D810&ci=6222&N= 4288586280+3907353607 $2,996.95 Macro lens Nikon AF Micro-Nikkor 60mm f/2.8D Lens http://www.bhphotovideo.co m/c/product/66987- GREY/Nikon_1987_AF_Micro_ Nikkor_60mm_f_2_8D.html $429.00 Heavy duty mono stand Arkay 6JRCW Mono Stand Jr with Counter Weight— 6' http://www.bhphotovideo.co m/c/product/2727- REG/Arkay_605138_6JRCW_M ono_Stand_Jr.html $678.50 Strobe Broncolor G2 Pulso— 1600 Watt/Second Focusing Lamphead with 16' Cord http://www.bhphotovideo.co m/c/product/259745- REG/Broncolor_32_115_07_G2 _Pulso_with_16.html $3,053.68 Power pack Broncolor Senso A4 2,400W/s Power Pack http://www.bhphotovideo.co m/c/product/745060- REG/Broncolor_31_051_07_Se nso_A4_2_400W_s_Power.html $3,629.92 http://www.bhphotovideo.com/c/search?atclk=Camera+Model_Nikon+D810&ci=6222&N=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=Camera+Model_Nikon+D810&ci=6222&N=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=Camera+Model_Nikon+D810&ci=6222&N=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=Camera+Model_Nikon+D810&ci=6222&N=4288586280+3907353607 http://www.bhphotovideo.com/c/product/66987-GREY/Nikon_1987_AF_Micro_Nikkor_60mm_f_2_8D.html http://www.bhphotovideo.com/c/product/66987-GREY/Nikon_1987_AF_Micro_Nikkor_60mm_f_2_8D.html http://www.bhphotovideo.com/c/product/66987-GREY/Nikon_1987_AF_Micro_Nikkor_60mm_f_2_8D.html http://www.bhphotovideo.com/c/product/66987-GREY/Nikon_1987_AF_Micro_Nikkor_60mm_f_2_8D.html http://www.bhphotovideo.com/c/product/2727-REG/Arkay_605138_6JRCW_Mono_Stand_Jr.html http://www.bhphotovideo.com/c/product/2727-REG/Arkay_605138_6JRCW_Mono_Stand_Jr.html http://www.bhphotovideo.com/c/product/2727-REG/Arkay_605138_6JRCW_Mono_Stand_Jr.html http://www.bhphotovideo.com/c/product/2727-REG/Arkay_605138_6JRCW_Mono_Stand_Jr.html http://www.bhphotovideo.com/c/product/259745-REG/Broncolor_32_115_07_G2_Pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-REG/Broncolor_32_115_07_G2_Pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-REG/Broncolor_32_115_07_G2_Pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-REG/Broncolor_32_115_07_G2_Pulso_with_16.html http://www.bhphotovideo.com/c/product/745060-REG/Broncolor_31_051_07_Senso_A4_2_400W_s_Power.html http://www.bhphotovideo.com/c/product/745060-REG/Broncolor_31_051_07_Senso_A4_2_400W_s_Power.html http://www.bhphotovideo.com/c/product/745060-REG/Broncolor_31_051_07_Senso_A4_2_400W_s_Power.html http://www.bhphotovideo.com/c/product/745060-REG/Broncolor_31_051_07_Senso_A4_2_400W_s_Power.html FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 43 Reflector Broncolor P65 Reflector, 65 Degrees, 11" Diameter, for Broncolor Pulso 8, Twin and HMI http://www.bhphotovideo.co m/c/product/7162- REG/Broncolor_33_106_00_P6 5_Reflector_65_Degrees.html $513.52 Reflector Broncolor Softlight Reflector, 20" Diameter, for Broncolor Primo, Pulso 2/4 & HMI Heads http://www.bhphotovideo.co m/c/product/7167- REG/Broncolor_33_110_00_Sof tlight_Reflector_20_for.html $501.76 Light Stand Impact Air-Cushioned Light Stand http://www.bhphotovideo.co m/c/product/253067- REG/Impact_LS10AB_Air_Cush ioned_Light_Stand.html $44.99 Light meter Sekonic L-308S Flashmate—Digital Incident, Reflected and Flash Light Meter http://www.bhphotovideo.co m/c/product/368226- REG/Sekonic_401_309_L_308S _Flashmate_Light_Meter.html $199.00 Book cradle Book Exhibition Cradles http://www.universityproduct s.com/cart.php?m=product_list &c=1115&primary=1&parentI d=1271&navTree[]=1115 $30.00 Background paper Savage Seamless Background Paper (Both white and black) http://www.bhphotovideo.co m/c/product/45468- REG/Savage_1_12_107_x_12yd s_Background.html $45.00 x 2 = $90.00 Nonreflective glass 1/4" Optiwhite Starphire Purified Tempered Single Lite Clear Class Can be purchased at local glass store. $75.00 White balancing accessory X-Rite Original ColorChecker Card http://www.bhphotovideo.co m/c/product/465286- REG/X_Rite_MSCCC_Original_C olorChecker_Card.html $69.00 Software Adobe Lightroom 5 http://www.adobe.com/produ cts/photoshop-lightroom.html $150.00 Table 3. List of Items Needed to Prepare for a Budget Studio The total cost for a “budget” shooting studio ranges from $10,000 to $15,000, and there is no annual maintenance expense. http://www.bhphotovideo.com/c/product/7162-REG/Broncolor_33_106_00_P65_Reflector_65_Degrees.html http://www.bhphotovideo.com/c/product/7162-REG/Broncolor_33_106_00_P65_Reflector_65_Degrees.html http://www.bhphotovideo.com/c/product/7162-REG/Broncolor_33_106_00_P65_Reflector_65_Degrees.html http://www.bhphotovideo.com/c/product/7162-REG/Broncolor_33_106_00_P65_Reflector_65_Degrees.html http://www.bhphotovideo.com/c/product/7167-REG/Broncolor_33_110_00_Softlight_Reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-REG/Broncolor_33_110_00_Softlight_Reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-REG/Broncolor_33_110_00_Softlight_Reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-REG/Broncolor_33_110_00_Softlight_Reflector_20_for.html http://www.bhphotovideo.com/c/product/253067-REG/Impact_LS10AB_Air_Cushioned_Light_Stand.html http://www.bhphotovideo.com/c/product/253067-REG/Impact_LS10AB_Air_Cushioned_Light_Stand.html http://www.bhphotovideo.com/c/product/253067-REG/Impact_LS10AB_Air_Cushioned_Light_Stand.html http://www.bhphotovideo.com/c/product/253067-REG/Impact_LS10AB_Air_Cushioned_Light_Stand.html http://www.bhphotovideo.com/c/product/368226-REG/Sekonic_401_309_L_308S_Flashmate_Light_Meter.html http://www.bhphotovideo.com/c/product/368226-REG/Sekonic_401_309_L_308S_Flashmate_Light_Meter.html http://www.bhphotovideo.com/c/product/368226-REG/Sekonic_401_309_L_308S_Flashmate_Light_Meter.html http://www.bhphotovideo.com/c/product/368226-REG/Sekonic_401_309_L_308S_Flashmate_Light_Meter.html http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentId=1271&navTree%5B%5D=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentId=1271&navTree%5B%5D=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentId=1271&navTree%5B%5D=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentId=1271&navTree%5B%5D=1115 http://www.bhphotovideo.com/c/product/45468-REG/Savage_1_12_107_x_12yds_Background.html http://www.bhphotovideo.com/c/product/45468-REG/Savage_1_12_107_x_12yds_Background.html http://www.bhphotovideo.com/c/product/45468-REG/Savage_1_12_107_x_12yds_Background.html http://www.bhphotovideo.com/c/product/45468-REG/Savage_1_12_107_x_12yds_Background.html http://www.bhphotovideo.com/c/product/465286-REG/X_Rite_MSCCC_Original_ColorChecker_Card.html http://www.bhphotovideo.com/c/product/465286-REG/X_Rite_MSCCC_Original_ColorChecker_Card.html http://www.bhphotovideo.com/c/product/465286-REG/X_Rite_MSCCC_Original_ColorChecker_Card.html http://www.bhphotovideo.com/c/product/465286-REG/X_Rite_MSCCC_Original_ColorChecker_Card.html http://www.adobe.com/products/photoshop-lightroom.html http://www.adobe.com/products/photoshop-lightroom.html INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 44 Figure 21. The University of Kansas Spencer Museum of Art Digitization Lab Setup for Oversized Materials Figure 22. Steelworks Museum of Industry and Culture’s Digitization Lab Setup for Oversized Materials FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 45 Figure 23. The University of Kansas Spencer Museum of Art Digitization Lab Setup for Smaller Prints and 3-D Objects Figure 24. Steelworks Center of the West’s Digitization Lab Setup for 3-D Objects Functions of Some Elements in the Sample Shooting Studio 1. Macro Lens: It allows close up shooting of objects. It is especially useful when photograph small prints and small 3-D objects. It can also be used to photograph regular and oversized materials. 2. Heavy-duty mono stand: It replaces a traditional tripod. It is very stable and allows quick adjustment of camera height and location. 3. Strobe, power pack, and reflector: Together they generate consistent and homogeneous light distribution. Recommended further reading: “Introduction to Off- Camera Flash: Three Main Choices in Strobe Lighting.”10 4. Light stand: It holds strobe and reflector. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 46 5. Light meter: Hand-held exposure meters measure light falling onto a light-sensitive cell and converts it into a reading that enables the correct shutter speed and or lens aperture settings to be made.11 6. Book cradles: They help to minimize the stress on bookbindings and minimize page curvature problem. 7. Nonreflective glass: It helps to flatten a photographed page and reduce the reflection. However, it does not completely eliminate glass reflection. One very useful trick to reduce glass reflection is to place a black board with a hole above a page and shoot through the hole. This approach actually does not eliminate reflection but reflects black to the photograph. When the photograph is reviewed on computer, it will appear as no reflection has occurred. Figure 25. The University of Kansas Spencer Museum of Art Digitization Lab Setup for Materials Needed be Pressed Down by a Glass. Many librarians believe that digitizing print materials using a digital camera requires a professional photographer, but this is not necessarily true. A professional photographer or even an art student can act as a consultant to help set up a shooting studio and provide basic training. Also, many museums have professional photographers and have set up shooting studios for digitization. They are very willing to share their experience and even provide training. I believe the learning curve for operating a shooting studio is no greater than the learning curve to operate an overhead scanner machine and its software. PROS AND CONS No digitization equipment or system is perfect. They all have trade-offs in image quality, speed, convenience of use, quality of accompanying software, and cost. Our tests show that for most archival materials a DSLR camera will do a better job than an overhead scanner. Pros of Overhead Scanner FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 47 • The scanner is a complete scanning station. It can be connected to a computer and starts scanning immediately. Materials can be placed on the scanning surface, so no equipment adjustments are required while scanning. • It can scan and save images in bitmap format directly, while a DSLR camera can only shoot in grayscale or color. • Built-in book cradles help to scan thick books and those that cannot be fully opened. • Book curve correction functionality is provided by the accompanying software. Cons of Overhead Scanner • High cost. The overhead scanner we have cost more than $50,000, with an annual maintenance contract of $8,000. • High replacement cost. When a scanner is outdated or broken, the entire machine has to be replaced. • Instability. Our overhead scanner is unstable even when placed on a sturdy table and handled only by professionals. From April 2010 to October 2010, the scanner was down for a total of forty-two working days (sixty calendar days). The company fixed the machine onsite many times, but it continues to have minor problems and has not been completely reliable. • The autofocus feature does not work consistently. • Special training is needed to operate the machine and associated software. • File formats supported are limited. Most scanners only support TIFF, JPEG, JPEG 2000, Windows BMP, and PNG. • Unsupported outdated software: Our overhead scanner’s software can only be run on an older operating system (Windows XP) because there is no updated software for this model. Pros of Budget Studio • Stable. Under normal use DSLR cameras are much less likely to break down than scanners. For example, I have had an older DSLR, Nikon D200, for seven years. It has survived numerous backpacking trips, multiple drops, and extreme weather conditions. The camera still functions as needed. • Fast and accurate focus. DSLR cameras are designed to focus quickly, and their focus indicators provide instant feedback to the operators so they know that the image is focused. If operated properly, images delivered by DSLR cameras can be sharper than ones delivered by scanners. • Less expensive. A good quality DSLR camera and a lens can be purchased for fewer than $4,000 and last for years. As technologies advance, DSLR cameras’ prices will continue to drop. • Ability to save files in more formats. In addition to TIFF and JPEG formats, most DSLR cameras can save photos in RAW file format. Some cameras can directly save images in Digital Negative (DNG) format, and others deliver images in proprietary formats that can be INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 48 converted into DNG using a computer program. Editing RAW images is nondestructive, while editing of TIFF and JPEG images is irreversible. • Accurate WB and exposure. By using right shooting and post-processing techniques, photographs can have exact color reproduction. On the other hand, calibrating an overhead scanner most likely can only be performed by a company’s trained technician. Proper exposure and WB are not guaranteed. • The RAW file format usually provides more dynamic range. Overexposed and underexposed images can be fixed by adjusting exposure compensation via software; thus lost shadow or highlight detail can be restored. • Can photograph 3-D objects. Archival collections often have materials other than books, such as art pieces. These materials are better to be photographed than scanned. • Versatile. Cameras can perform on-site digitization, while overhead scanners are too bulky to be moved around. • Faster and better preview. Images can be viewed instantly on a computer when proper software, such as Adobe Lightroom, is used. Operators can compare multiple shoots on a screen side-by-side and decide which photo to retain. • More accessible technical support. The number of DSLR camera users is much higher than overhead scanner users. Technical questions can often be answered through online forums. • Easy to find replacement parts. When a piece in a shooting studio break down, it is easy to find replacement piece and replace by staff. • Easy software updates. Software used in a studio is independent from equipment. Cons of Budget Studio • There is learning curve for setting up a shooting studio, operating the studio, and mastering new image processing techniques. • A DSLR camera with a lower pixel setting will not be sufficient for scanning large-format materials, such as posters and maps. • No built-in book curve correction is provided by Adobe Photoshop or Lightroom. However, our experience proves that the automatic book curve function does not always work well. We normally use a home-made book cradle to help lay a page flat and use one or two weights to hold down the other side of book. For some books, if flatness is hard to achieve, we place a piece of glass on the top to ensure the flatness. • Security concern: Since a DSLR camera is highly portable, it can be stolen easily. FULFILL YOUR PRESERVATIONS GOALS WITH A BUDGET STUDIO | ZHOU doi: 10.6017/ital.v35i1.5704 49 Figure 26. Scanning Setup Using a Book Cradle. CONCLUSION The technology of DSLR cameras has advanced very quickly in the past ten years. Newer DSLR cameras can handle higher resolutions and have very little image noise even at a high ISO setting. The higher demand for DSLR cameras and accompanying image-editing software results in more rapid technology advances compared to low-demand and high-end overhead scanners. High consumer demand drives DSLR camera prices much lower than prices for overhead scanners. In addition, the wide range of consumers purchasing DSLR cameras and software prompts companies to offer more user-friendly interfaces. As you can see from our tests, for most library materials a DSLR camera can produce superior images. If you do not have a budget for high-end overhead scanners, you can still fulfill your digitization preservation goals with a budget studio. ACKNOWLEDGEMENT I would like to thank Robert Hickerson and Ryan Waggoner, the University of Kansas Spencer Museum of Art, Tim Hawkins, and Steelworks Center of the West for showing their digitization labs and sharing experience with me. REFERENCES 1. Federal Agencies Digitization Guidelines Initiative, “Technical Guidelines for Digitizing Cultural Heritage Material: Creation of Raster Image Master Files,” August 2010, http://www.digitizationguidelines.gov/guidelines/digitize-technical.html 2. “Tutorials: White Balance,” Cambridge in Colour, accessed March 9, 2016, http://www.cambridgeincolour.com/tutorials/white-balance.htm. http://www.cambridgeincolour.com/tutorials/white-balance.htm INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 50 3. “ColorChecker Passport User Manual,” X-Rite Incorporated, accessed March 9, 2016, http://www.xrite.com/documents/manuals/en/ColorCheckerPassport_User_Manual_en.pdf. 4. Scott Kelby, “Scott Kelby's Editing Essentials: How to Develop Your Photos,” Pearson Education, Peachpit, accessed March 9, 2016, http://www.peachpit.com/articles/article.aspx?p=2117243&seqNum=3. 5. “sRGB vs. Adobe RGB 1998,” Cambridge in Colour, accessed March 9, 2016, http://www.cambridgeincolour.com/tutorials/sRGB-AdobeRGB1998.htm. 6. “Colour Banding,” Wikipedia, accessed March 9, 2016, http://en.wikipedia.org/wiki/Colour_banding. 7. “Posterization,” Wikipedia, accessed March 9, 2016, http://en.wikipedia.org/wiki/Posterization. 8. “Image Posterization,” Cambridge in Colour, accessed March 9, 2016, http://www.cambridgeincolour.com/tutorials/posterization.htm. 9. Richard Anderson and Peter Krogh, “Color Space and Color Profiles,” American Society of Media Photographers, accessed March 9, 2016, http://dpbestflow.org/color/color-space-and- color-profiles. 10. Tony Roslund, “Introduction to Off-Camera Flash: Three Main Choices in Strobe Lighting,” Fstoppers (blog), accessed March 9, 2016, https://fstoppers.com/originals/introduction- camera-flash-three-main-choices-strobe-lighting-40364. 11. “Introduction to Light Meters,” B & H Foto & Electronics Corp., accessed March 9, 2016, http://www.bhphotovideo.com/find/Product_Resources/lightmeters1.jsp. http://www.xrite.com/documents/manuals/en/ColorCheckerPassport_User_Manual_en.pdf http://www.peachpit.com/articles/article.aspx?p=2117243&seqNum=3 http://www.cambridgeincolour.com/tutorials/sRGB-AdobeRGB1998.htm http://en.wikipedia.org/wiki/Colour_banding http://en.wikipedia.org/wiki/Posterization http://www.cambridgeincolour.com/tutorials/posterization.htm http://dpbestflow.org/color/color-space-and-color-profiles http://dpbestflow.org/color/color-space-and-color-profiles https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 http://www.bhphotovideo.com/find/Product_Resources/lightmeters1.jsp Oversize Materials Small Prints Use of a Sharpening Filter Color Balance Color Space SETTING UP A BUDGET STUDIO Commercial Approach Sample Budget Studio Setup Cons of Budget Studio ACKNOWLEDGEMENT I would like to thank Robert Hickerson and Ryan Waggoner, the University of Kansas Spencer Museum of Art, Tim Hawkins, and Steelworks Center of the West for showing their digitization labs and sharing experience with me. 8652 ---- Identifying Key Steps for Developing Mobile Applications and Mobile Websites for Libraries Devendra Dilip Potnis, Reynard Regenstreif- Harms, and Edwin Cortez INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 43 ABSTRACT Mobile applications and mobile websites (MAMW) represent information systems that are increasingly being developed by libraries to better serve their patrons. Because of a lack of in-house IT skills and the knowledge necessary to develop MAMW, a majority of libraries are forced to rely on external IT professionals who may or may not help libraries meet patron needs but instead may deplete libraries’ scarce financial resources. This paper applies a system analysis and design perspective to analyze the experience and advice shared by librarians and IT professionals engaged in developing MAMW. This paper identifies key steps and precautions to take while developing MAMW for libraries. It also advises library and information science graduate programs to equip their students with the specific skills and knowledge needed to develop and implement MAMW. INTRODUCTION The unprecedented adoption and ongoing use of a variety of context-specific mobile technologies by diverse patron populations, the ubiquitous nature of mobile content, and the increasing demand for location-aware library services have forced libraries to “go mobile.” Mobile applications and mobile websites (MAMW), that is, web portals running on mobile devices, represent information systems that are increasingly being developed and used by libraries to better serve their patrons. However, a majority of libraries often lack the in-house human resources necessary to develop MAMW. Because of a lack of staff equipped with the requisite IT skills and knowledge, libraries are often forced to partner with and rely on external IT professionals, potentially losing control over the process of developing MAMW.1 Partnerships with external IT professionals do not always help libraries meet the information needs of their patrons but instead can deplete their scarce financial resources. It then becomes necessary for librarians to understand the process of developing MAMW to better evaluate MAMW for better serving library patrons. One possibility Devendra Dilip Potnis (dpotnis@utk.edu) is Associate Professor, School of Information Sciences; Reynard Regenstreif-Harms (reynardrh@gmail.com) is Project Archives Technician, Great Smoky Mountains National Park, Gatlinburg, Tennessee; and Edwin Cortez (ecortez@utk.edu) is Professor, School of Information Sciences, University of Tennessee at Knoxville. mailto:dpotnis@utk.edu mailto:reynardrh@gmail.com) mailto:ecortez@utk.edu IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 44 is to re-educate themselves through continuing education or other professional development activities. Another solution would be to see library and information science (LIS) schools strengthen their curriculum in the area of management, evaluation, and application of MAMW and related emerging technologies. Issues, challenges, and strategies for providing librarians with these opportunities are abundant and have been debated for more than thirty years, especially since libraries started experiencing the impact of microchip and portable technologies.2 Any practical and immediate guidance could help librarians in charge of developing MAMW.3 However, a majority of the practical guidance available for developing MAMW for libraries is limited to specific settings or patron populations. Also, the practical guidance is not theoretically validated, curtailing its generalizability for diverse library settings. For instance, a number of librarians and IT professionals share their experience and stories of MAMW development to serve a specific patron population in a specific library setting.4,5 Their stories typically describe their success stories of developing MAMW, the lessons learned during the development of MAMW, or their advice for developing MAMW. This paper applies a system analysis and design perspective from the information systems discipline to examine the experience and advice shared by librarians and IT professionals for identifying the key steps and precautions to be taken when developing MAMW for libraries. System analysis and design, a branch of the information systems discipline, is the most widely used theoretical knowledgebase available for developing information systems.6 According to the system analysis and design perspective, development, planning, analysis, design, implementation, and maintenance are the six phases of building any information system.7 The next section synthesizes our method for this secondary research. The following section discusses the key steps we identified for developing, planning, analyzing, designing, implementing, and maintaining MAMW for libraries. The concluding section presents the implications of this study for libraries and LIS graduate programs. METHOD We began this study with a practitioner’s handbook guiding libraries to use mobile technologies for delivering services to diverse patron populations.8 To search the literature relevant to our research, we devised many key phrases, including but not limited to “mobile technolog*,” “mobile applications for libraries,” and “mobile websites for libraries.” As part of our active information- seeking process, we applied a snowball sampling technique to collect more than seventy-five scholarly research articles, handbooks, ALA library technology reports, and books hosted on EBSCO and Information Science Source databases. Our passive information-seeking was helped by article suggestions from Emerald Insight and Elsevier Science Direct, two of the most widely used journal hosting sites, in response to the journal articles we accessed there. We applied the following four criteria to establish the relevancy of publications to our research: accuracy of facts; duration of publications (i.e., from 2000 to 2014); credibility of authors; and content focused on INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 45 problems, solutions, advice, and tips for developing MAMW. Several research articles published by Information Technology and Libraries and Library Hi Tech, two top-tier journals covering the development of MAMW for libraries, built the foundation of this secondary research. We analyzed the collected literature using the qualitative data presentation and analysis method proposed by Miles and Huberman.9 We developed Microsoft Excel summary sheets to code the experience and advice shared by librarians and IT professionals. The coded data was read repeatedly to identify and name patterns and themes. Each relevant publication was analyzed individually and then compared across subjects to identify patterns and common categories. The inter-coder reliability between the two authors who analyzed data was 85 percent. Data analysis helped us identify the key steps needed for planning, analyzing, designing, implementing, and maintaining MAMW for libraries. FINDINGS AND DISCUSSION Key Steps for Planning MAMW Forming and Managing a Team Building teams of people with the appropriate skills, knowledge, and experience is one of the first steps suggested by the existing literature for planning MAMW. It is essential for team members to be aware of new developments and trends in the market.10 For instance, developers should be aware of print resources on relevant technologies such as Apache, ASP, JavaScript, PHP, Ruby on Rails, and Python, etc.; online resources such as detectmobilebrowser.com and W3C mobileOK Checker to test catalogs, design functionality, and accessibility on mobile devices; and various online communities of developers who could provide peer-support when needed.11 Team members are also expected to keep up with new developments in mobile devices, platforms, operating systems, digital rights management terms and conditions, and emerging standards for content formats.12 Periodic delegation of various tasks could help libraries develop MAMW effectively.13 Libraries should also form productive, financially feasible partnerships with external stakeholders such as Internet service providers and network administrators for hosting MAMW on appropriate Internet servers that meet desired safety and security standards.14,15 Requirements Gathering Requirements for developing MAMW can be collected through empirical research and secondary research. Typically, the goal of empirical research is to help libraries [set off as bulleted list?]gather patron preferences for and expectations of MAMW,16,17 stay abreast of the continual evolution of patron needs,18 periodically (e.g., quarterly, annually, biannually, etc.) gather and evaluate user needs,19 index the content of MAMW,20 investigate the acceptance of the library’s use of MAMW by patrons,21 understand user needs, and identify top library services requested by patrons. IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 46 Empirical research in the form of usability testing, functional validation, user surveys, etc., should be carried out before developing MAMW to inform the development process and/or after developing MAMW to study their adoption by library patrons. Empirical research typically involves the identification of patrons and other stakeholders who are going to be affected by MAMW. This step is followed by developing data-collection instruments, collecting data from patrons and other stakeholders, and analyzing qualitative and quantitative data using appropriate techniques and software.22 Secondary research mainly focuses on scanning and assessing existing literature. For instance, using appropriate datasets on mobile use, librarians may be able to identify the factors responsible for the adoption of mobile technologies.23 Typically, such factors include but are not limited to cognitive, affective, social, and economic conditions of potential users. MAMW developers could also scan the environment by examining existing MAMW and reviewing the literature to create sets of guidelines for replacing old information systems by developing new, well-functioning MAMW.24 Librarians could also scan the market for free software options to conserve financial resources.25 Making Strategic Choices Mobile Applications or Mobile Websites? One of the most important strategic decisions libraries need to make during this phase is whether to use a mobile app or a mobile website—that is, a web portal running on mobile devices—for offering services to patrons. Mobile websites are web browser-based applications that might direct mobile users to a different set of content pages, serve a single set of content to all patrons while using different style sheets or templates reformatted for desktop or mobile browsers, or use a site transcoder (a rule-based interpreter), which resides between a website and a web client and intercepts and reformats content in real time for a mobile device.26,27 Mobile apps are more challenging to build than mobile websites because they require separate and specific programming for each operating system.28 Mobile apps burden users and their devices. For instance, users are expected to remember the functionality of each menu item, and a significant amount of memory is required to store and support apps on mobile devices. However, potential profitability, better mobile-device functionality, and greater exposure through app stores can make mobile apps an economical option over mobile websites.29 Buy or Build? In the planning phase, libraries also need to decide whether to buy commercial, off-the-shelf (COTS) MAMW or build a customized MAMW. MAMW need to be evaluated in terms of customer support and service, maintenance, the ability to meet patron needs, and library needs when making this choice.30 Sometimes libraries purchase COTS products and end up customizing them, benefiting from both options. For example, some libraries first purchase packaged mobile frameworks to create simple, static mobile websites and subsequently develop dynamic library apps specific to library services.31 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 47 Managing Scope Many libraries have limited financial resources, which makes it necessary for their staff to manage the scope of MAMW development. The ability to prioritize tasks and identify mission-critical features of mobile MAMW are some of the most common activities undertaken by libraries to manage this scope.32 For instance, it is not practical to make entire library websites mobile because libraries would end up serving only those patrons who access their sites over mobile alone. Instead, libraries should determine which part of the website should go mobile. A growing trend of using products like Mobile First Design to design a mobile version of a website first and then work up to a larger desktop version could help librarians better manage the scope of MAMW development. Alternatively, Jeff Wisniewski, a leading web services librarian in the United States, advises libraries to create a new mobile-optimized homepage alone, which is faster than trying to retrofit the library’s existing homepage for mobile.33 This advice is highly practical because no webmaster has any interest in trying to maintain two distinct versions of the library’s webpages with details such as hours of operations and contact information. Selecting the Appropriate Software Development Method There are three key methods for developing MAMW: structured methodologies (e.g., waterfall or parallel), rapid application prototyping (e.g., phased, prototyping, or throwaway prototyping), and agile development, an umbrella term used to refer to the collection of agile methodologies like Crystal, dynamic systems development method, extreme programming, feature-driven development, and Scrum. There is a bidirectional relationship between these MAMW development methods and the resources available for their development. Project resources such as funding, duration, and human resources influence and are affected by the type of software development method selected for developing MAMW. However, studies rarely pay attention to this important dimension of the planning phase.34 Key Steps in the Analysis Phase Requirements Analysis After collecting data from patrons, the next natural step is to analyze the data to inform the process of conceptualizing, building, and developing MAMW.35 The requirements-analysis phase helps libraries achieve user-centered design of MAMW and assess the return on investment in MAMW. The context and goals of the patrons using mobile devices, and the tasks they are likely and unlikely to perform on a mobile device, are the key considerations for developing user-centered MAMW for library patrons.36 It is critical to gather, understand, and review user needs.37 Surveys can be developed on paper or online, which can be analyzed using advanced statistical techniques or qualitative software.38,39 The analysis allows the following questions to be answered: Which IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 48 library services do patrons use most frequently on their mobile devices? What is their level of satisfaction for using those services? What types of library services and products would they like to access with their mobile phones in the future? Survey analyses can help librarians predict which mobile services patrons will find most useful;40 they can also help librarians classify users on the basis of their perceptions, experience, and habits when using mobile technologies to access library services.41 As a result, libraries can identify and prioritize functional areas for their MAMW deployment.42 MAMW developers can learn from their users’ humbling and/or frustrating experience of using mobile devices for library services. In addition, libraries can keep track of their patrons’ positive and negative observations, their information-sharing practices, and howthey create group experiences on the platform provided by their libraries.43 To improve existing MAMW, libraries could also use Google Analytics, a free web metrics tool, for identifying the popularity of MAMW features and analyzing statistics on how they are used.44 To develop operating system-specific mobile apps, Google Analytics can be used to learn about the popularity of mobile devices used by patrons.45 Ideally, libraries should calculate and document ROI before investing in the development of MAMW.46 For instance, libraries can run a cost-benefit analysis on the process of developing MAMW and compare various library services offered over mobile devices.47 Typically the following data could help libraries run the cost-benefit analysis: specific deliverables (e.g., features of MAMW), resources (e.g., resources needed, available resources, etc.), risks (e.g., types of risks, level of risks, etc.), performance requirements, and security requirements for developing MAMW. This analysis would help libraries make decisions on service provisions such as specific goals to be set for developing MAMW, feasibility of introducing desired features of MAMW, and how to manage available resources to meet the set goals.48 Libraries should also examine what other libraries have already done to provide mobile services.49 Communication/Liaising with Stakeholders The effective communication between developers and stakeholders influences almost every aspect of developing information systems. However, existing studies do not emphasize the significance of communication with stakeholders. For instance, several studies vaguely refer to the translation of user needs into technology requirements.50 But few studies point out the precise modeling technique (e.g., Entity Relationship Diagrams, Unified Modeling Language, etc.) for converting user needs into a language understood by software developers. Developers should communicate best practices and suggestions for the future implementation of MAMW in libraries,51 which involves the prediction and selection of appropriate MAMW for libraries,52 the demonstration of what is possible and how services are relevant, and how new resources can help create value for libraries.53,54 Communication with users is also critical for creating value-added services for patrons who use different mobile technologies to meet their needs related to work, leisure, commuting, etc.55 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 49 However, the existing literature on MAMW development for libraries does not mention the significance of this activity. Key Steps for Designing MAMW Prototyping Prototyping refers to the modeling or simulation of an actual information system. MAMW can have paper-based or computer-based prototypes. Prototyping allows developers to directly communicate with MAMW users to seek their feedback. Developers can correct or modify the original design of MAMW until users and developers are in agreement about the system design. Building consensus between MAMW developers and potential users is another key challenge to overcome during this phase, which may put a financial burden on MAMW development projects. It requires skilled personnel to manage the scope, time, human resources, and budget of such projects. Wireframing is one of the most prominent prototyping techniques practiced by librarians and IT professionals for developing MAMW for libraries.56 This technique depicts schematic on-screen blueprints of MAMW, lacking style, color, or graphics, focusing mainly on functionality, behavior, and priority of content. Selecting Hardware, Programming Languages, Platforms, Frameworks, and Toolkits Existing literature on the development of MAMW for libraries covers the selection and management of software; software development kits; scripting languages like JavaScript; data management and representation languages such as HTML, XML, and their text editors; and AJAX for animations and transitions. The existing literature also guides libraries for training their staff for using MAMW to better serve patrons.57 Few studies also provide guidance on selecting COTS products such as WebKit, an open source web browser engine that renders webpages on smartphones and allows users to view high-quality graphics on data networks with faster throughput.58 However, it might be a good idea to use licensed open source COTS products because licensed software allows libraries to legally distribute software within their organizations as covered by the licensing agreement. Libraries that use software-licensing agreements may also be able to seek expert help and advice whenever they have a concern or query. In the authors’ experience, librarians have shared few effective strategies to design MAMW. One key strategy is to purchase reliable device emulators and cross-compatible web editors. These technologies allow the user to work with the design at the most basic level, save documents as text, transfer the documents between web programs, and direct designers toward simple solutions.59 Sample cross-compatible web editors include, but are not limited to, Notetab Pro (http://www.notetab.com/), Code Lobster (http://www.codelobster.com/), and Bluefish (http://bluefish.openoffice.nl). http://www.notetab.com/ http://www.codelobster.com/ http://bluefish.openoffice.nl/ IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 50 Hybrid mobile app frameworks like Bootstrap, Ionic, Mobile Angular UI, Intel XDK, Appcelerator Titanium, Sencha, Kendo UI, and PhoneGap use a combination of web technologies like HTML, CSS, and JavaScript for developing mobile-first, responsive MAMW. A majority of these frameworks use a drag-and-drop approach and do not require any coding for developing mobile apps. One-click API connect further simplifies the process. User-interface frameworks like jQueryMobile and Topcoat eliminate the need to design user interfaces manually. Importantly, MAMW developed using such frameworks can support many mobile platforms and devices. Toolkits like GitHub, skyronic, crudkit, and HAWHAW enable developers to quickly build mobile- friendly CRUD (create/read/update/delete) interfaces for PHP, Laravel, and Codeigniter apps. Such mobile apps also work with MySQL and other databases, allowing users to receive and process data and display information to users. Table 1 categorizes specific hardware and software features recommended for MAMW to better serve library patrons. # Areas of Information Systems/IT Specific Features Recommended for Developing MAMW for Libraries 1 Human-Computer Interaction (HCI) Behavioral, cognitive, motivational, and affective aspects of HCI Design responsive web sites for libraries to enhance user experience60 Design a user interface meeting the expectations and needs of potential users (e.g., menu with the following items: library catalog, patron accounts, ask a librarian, contact information, listing of hours, etc.)61 Design meaningful mobile websites based on user needs, documenting and maintaining mobile websites62 Usability engineering Design concise interfaces with limited links, descriptive icons, home and parent-link icons63 Create a user-friendly site (e.g., the DOK Library Concept Center in Delft, Netherlands, offers a welcome text message to first-time visitors)64 Effectively transition from traditional websites to mobile-optimized sites with responsive design65 Create user-friendly interface designs66 Present a clean, easy to navigate mobile version of search results67 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 51 Information visualization Automatically maintain reliable and stable fundamental information required by indoor localization systems68 Save time by redesigning existing sites69,70 2 Web Programming HTML, XML, etc. Design sites with a complete separation of content and presentation71 Code HTML and CSS for better user experiences72 Create and shorten links to make them easier to input using small or virtual keyboards73 Using cient-side and server-side scripting such as JavaScript Object Notation, etc. Design and develop mashups74 Develop MAMW using client-server architecture, accessible on mobile devices75 Without scripting Implement widgetization to facilitate the integration of mobile websites—developing a widget library for mobile-based web information systems76 3 Open Source Design mobile websites that allow users to leverage the same open source technology as the main websites77 Design mobile websites linking to other existing services like library h3lp and library catalogs with mobile interfaces such as MobileCat78 4 Networking Design a mobile website capable of exploiting advancements in technology such as faster mobile data networks79 Identify and address technology issues (e.g., connectivity, security, speed, signal strength, etc.) faced by patrons when using MAMW80 5 Input/Output Devices Use a mobile robot to determine the location of fixed RFID tags in space81 Design MAMW capable of processing data communicated using radio frequency identification devices, near-field communication technology, and Bluetooth- based technology like iBeacons82 Offer innovative services using augmented- reality tools83 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 52 6 Databases Integrate a back-end database of metadata with front-end mobile technologies84 Integrate front-end of mobile MAMW with back-end of standard databases and services85 7 Social Media and Analytics Integrate social media sites (e.g., Foursquare, Facebook Place, Gowalla, etc.) with existing checkout services for accurate and information rich entries86 Implement Google Voice or a free text- messaging service87 Use Google Analytics for mobile optimized website by copying the free JavaScript code generated from Google Analytics and paste it into library webpages to gain insight into what resources are used and who used them88 Integrate a geo-location feature with mobile services89 Table 1. MAMW with specific hardware and software features From the above table, which is based on the analysis of the literature on developing mobile applications and mobile websites for libraries, it becomes clear that web programming and HCI are the two leading technology areas that shape the development of MAMW and consequently the services offered by them. Designing User Interfaces of MAMW Librarians and IT professionals engaged in developing MAMW for libraries make the following recommendations. Use two style sheets: CSS play a key role in offering uniform display to user interfaces for all webpages. Studies recommend designing two style sheets—namely, mobile.css and iphone.css— when developing MAMW, since most of the time smartphones ignore mobile stylesheets.90 In that case, iphone.css could direct itself to browsers of a specific screen-width, helping those mobile devices that are not directed to the mobile website by the mobile.css stylesheet.91 Minimize use of JavaScript: JavaScript is instrumental in detecting what mobile device is being used by patrons and then directing them to the appropriate webpage with options including full website, simple text-based, and touch-mobile-optimized. However, it is critical to minimize the use of JavaScript on library mobile websites because not every smartphone offers the minimum level of support required to operate it.92 Handle images intelligently: To help patrons optimize their bandwidth use, image files on mobile sites should be incorporated with CSS rather than HTML code; also, to ensure consistency in the INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 53 appearance of user interfaces of mobile websites, images should be kept to the same absolute size.93 Key Steps for Implementing MAMW Programming for MAMW Programming is at the heart of developing MAMW. As shown in table 1 above, web programming enables developers to build MAMW with a number of value-added features for patrons. For instance, a web-application server running on Cold Fusion can process data communicated via web browsers on mobile devices; this feature allows MAMW users to access search engines on library websites via smartphones.94 Also, client-side processing of classes (with a widget library) allows patrons to use their mobile devices as thin clients, thereby optimizing the use of network bandwidth.95 Testing MAMW Past studies recommend testing the content, display/design, and functionality of MAMW in a controlled environment (e.g., usability lab) or in the real world (i.e., in libraries). Content: Librarians are advised to set up testing databases for testing image presentation, traditional free text search, location-based search, barcode scanning for ISBN search, QR encapsulation, and voice search.96 Display/design: Librarians can review and test MAMW on multiple devices to confirm that everything displays and functions as intended.97 They can also test a beta version of their mobile website with varying devices to provide guidance regarding image sizing;98 beta versions are also useful in testing mobile websites for their display on different browsers and devices.99 Functionality: Librarians can set up testing practices and environments for the most heavily used device platforms (e.g., HCI incubators such as eye testing software, which is a combination of virtual emulators and mobile devices not owned by libraries).100,101 They can also use the User Agent Switcher Add-On for Firefox to test a mobile website and use web-based services like Device Anywhere and Browser Cam offering mobile emulation to test the functionality of MAMW.102 Training Patrons Unless patrons realize the significance of a new information system for managing information resources they will hardly use it. However, training patrons for using a newly developed MAMW is almost completely missing from the studies describing the process of developing MAMW for libraries. Joe Murphy, a technology librarian at Yale University, identifies the significance of user training in managing the change from traditional to mobile search and advises librarians to explore the mobile literacy skills of their patrons and educate them on how to use new systems.103 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 54 Data Management MAMW cannot function properly without clean data. Cleaning up data, curating data, and addressing other data-related issues are some of the least mentioned activities in the literature for developing MAMW. However, it is necessary for librarians engaged in developing MAMW to identify and address common challenges for managing data when used for MAMW. For example, it might be a good strategy for librarians to study the best practices for managing data-related issues when offering reference services using SMS .104 Skills Needed for Maintaining MAMW Documentation and Version Control of Software Past studies recommend developing a mobile strategy for building a mobile-tracking device and evaluating mobile infrastructure to ensure the continued assessment and monitoring of mobile usage and trends among patrons.105 However, past studies do not report or provide many details about the maintenance of MAMW, which leads us to infer that maintenance of MAMW involving documentation and version control is a neglected aspect of their development. Open source software development is increasingly becoming a common practice for developing MAMW. Implementing version-control software (e.g., subversion and GitHub) to accommodate the needs of developers distributed across the world is a necessity for developing MAMW. Version- control software provides a code repository with a centralized database for developers to share their code, which minimizes errors associated with overwriting or reverting code changes and maximizes software development collaboration efforts.106 CONCLUSION There are various forces driving change in the knowledge and skills area for information professionals: technologies, changing environments, and the changing role of IT in managing and providing services to patrons. These forces affect all levels of IT-based professionals, those responsible for information processing and those responsible for information services. This paper has examined the key steps and precautions to be taken while developing MAMW to better serve their patrons. After analyzing the existing guidance offered by librarians and IT professionals from the system analysis and design perspective, we find that some of the most ignored activities in MAMW development are selecting appropriate software development methodologies, prototyping, communicating with stakeholders, software version control, data management, and training patrons to use newly developed or revamped MAMW. The lack of attention to these activities could hinder libraries’ ability to better serve patrons using MAMW. It is necessary for librarians and IT professionals to pay close attention to the above activities when developing MAMW. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 55 Our study also shows that web programming and HCI are the two most widely used technology areas for developing MAMW for libraries. To save their scarce financial resources, which otherwise could be invested in partnering with external IT professionals, libraries could either train their existing staff or recruit LIS graduates equipped with the skills and knowledge identified in this paper to develop MAMW (see table 2). # Key Steps for Developing MAMW Skills and Knowledge Required for Developing MAMW A Planning Phase 1 Forming and managing team Human resource management 2 Making strategic choices Time management Cost management Quality management Human resource management (e.g., staff capacity) 3 Requirements gathering Research (empirical and secondary) 4 Managing scope (e.g., managing financial resources, prioritizing tasks, identifying mission-critical features of MAMW, etc.) Scope management 5 Selecting an appropriate software development method Time management Cost management Quality management B Analysis Phase 6 Requirements analysis Research (empirical and secondary) 7 Communication/liaising with stakeholders Communications management C Design Phase 8 Prototyping Software development (HCI) 9 Selecting hardware and programming languages and platforms Software development (web programming and HCI) 10 Designing user interfaces of MAMW Software development (HCI) D Implementation Phase 11 Programming for MAMW Software development (web programming—e.g., Android, iOS, Visual C++, Visual C#, Visual Basic, etc.) 12 Testing MAMW Software development (web programming and HCI) IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 56 13 Training patrons Human resource management 14 Data management (e.g., cleaning up data, curating data, etc.) Data management E Maintenance Phase 15 Documentation and version control of software Software development (web programming and HCI) Table 2. Skills and knowledge necessary to develop MAMW The management of scope, time, cost, quality, human resources, and communication related to any project is known as project management.107 In addition to the skills and knowledge related to project management, librarians would also need to be proficient in software development (with an emphasis on HCI and web programming), data management, and the proper methods for conducting empirical and secondary research for developing MAMW. If LIS programs equip their graduate students with the skills and knowledge identified in this paper, the next generation of LIS graduates could develop MAMW for libraries without relying on external IT professionals, which would make libraries more self-reliant and better able to manage their financial resources.108 This paper assumes a very small number of scholarly publications to be reflective of the real- world scenarios of developing MAMW for all types of libraries. This assumption is one of the limitations of this study. Also, the sample of publications analyzed in this study is not statistically representative of the development of MAMW for libraries around the world. In the future, the authors plan to interview librarians and IT professionals engaged in developing and maintaining MAMW for their libraries to better understand the landscape of developing MAMW for libraries. REFERENCES 1. Devendra Potnis, Ed Cortez, and Suzie Allard, “Educating LIS Students as Mobile Technology Consultants” (poster presented at 2015 Association for Library and Information Science Education Annual Meeting, Chicago, January 25–27), http://f1000.com/posters/browse/summary/1097683. 2. Edwin Michael Cortez, “New and Emerging Technologies for Information Delivery,” Catholic Library World no. 54 (1982): 214–18. 3. Kimberly D. Pendell and Michael S. Bowman, “Usability Study of a Library’s Mobile Website: An Example from Portland State University,” Information Technology & Libraries 31, no. 2 (2012): 45–62, http://dx.doi.org/10.6017/ital.v31i2.1913. 4. Godmar Back and Annette Bailey, “Web Services and Widgets for Library Information Systems,” Information Technology & Libraries 29 no. 2 (2010): 76–86, http://dx.doi.org/10.6017/ital.v29i2.3146 . http://f1000.com/posters/browse/summary/1097683 http://dx.doi.org/10.6017/ital.v31i2.1913 http://dx.doi.org/10.6017/ital.v29i2.3146 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 57 5. Hannah Gascho Rempel and Laurie Bridges, “That was Then, This is Now: Replacing the Mobile Optimized Site with Responsive Design,” Information Technology & Libraries 32, no. 4 (2013): 8–24, http://dx.doi.org/10.6017/ital.v32i4.4636. 6. June Jamrich Parsons and Dan Oja, New Perspectives on Computer Concepts 2014: Comprehensive, Course Technology (Boston: Cengage Learning, 2013). 7. Ibid. 8. Andrew Walsh, Using Mobile Technology to Deliver Library Services: A Handbook (London: Facet, 2012). 9. Matthew B. Miles and A. Michael Huberman, Qualitative Data Analysis (Thousand Oaks, CA: Sage, 1994). 10. Bohyun Kim, “Responsive Web Design, Discoverability and Mobile Challenge,” Library Technology Reports 49, no 6 (2013): 29–39, https://journals.ala.org/ltr/article/view/4507. 11. James Elder, “How to Become the “Tech Guy and Make iPhone Apps for Your Library,” The Reference Librarian 53, no. 4 (2012): 448–55, http://dx.doi.org/10.1080/02763877.2012.707465. 12. Sarah Houghton, “Mobile Services for Broke Libraries: 10 Steps to Mobile Success,” The Reference Librarian 53, No. 3 (2012): 313–21, http://dx.doi.org/10.1080/02763877.2012.679195. 13. Pendell and Bowman, “Usability Study.” 14. Lisa Carlucci Thomas, “Libraries, Librarians and Mobile Services,” Bulletin of the American Society for Information Science & Technology 38, no. 1 (2011): 8–9, http://dx.doi.org/10.1002/bult.2011.1720380105. 15. Elder, “How to Become the ‘Tech Guy.’” 16. Kim, “Responsive Web Design.” 17. Chad Mairn, “Three Things You Can Do Today to Get Your Library Ready for the Mobile Experience,” The Reference Librarian 53, no. 3 (2012): 263–69, http://dx.doi.org/10.1080/02763877.2012.678245. 18. Rempel and Bridges, “That was Then.” 19. Rachael Hu and Alison Meier, “Planning for a Mobile Future: A User Research Case Study From the California Digital Library,” Serials 24, no. 3 (2011): S17–25. 20. Kim, “Responsive Web Design.” http://dx.doi.org/10.6017/ital.v32i4.4636 https://journals.ala.org/ltr/article/view/4507 http://dx.doi.org/10.1080/02763877.2012.707465 http://dx.doi.org/10.1080/02763877.2012.679195 http://dx.doi.org/10.1002/bult.2011.1720380105 http://dx.doi.org/10.1080/02763877.2012.678245 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 58 21. Lorraine Paterson and Boon Low, “Student Attitudes Towards Mobile Library Services for Smartphones,” Library Hi Tech 29, no. 3 (2011): 412–23, http://dx.doi.org/10.1108/07378831111174387. 22. Jim Hahn, Michael Twidale, Alejandro Gutierrez and Reza Farivar, “Methods for Applied Mobile Digital Library Research: A Framework for Extensible Wayfinding Systems,” The Reference Librarian 52, no. 1-2 (2011): 106–16, http://dx.doi.org/10.1080/02763877.2011.527600. 23. Patterson and Low, “Student Attitudes.” 24. Gillian Nowlan, “Going Mobile: Creating a Mobile Presence for Your Library,” New Library World 114, no. 3/4 (2013): 142–50, http://dx.doi.org/10.1108/03074801311304050. 25. Elder, “How to Become the ‘Tech Guy.’” 26. Matthew Connolly, Tony Cosgrave, and Baseema B. Krkoska, “Mobilizing the Library’s Web Presence and Services: A Student-Library Collaboration to Create the Library’s Mobile Site and iPhone Application,” The Reference Librarian 52, no. 1-2 (2010): 27–35, http://dx.doi.org/10.1080/02763877.2011.520109. 27. Stephan Spitzer, “Make That to Go: Re-Engineering a Web Portal for Mobile Access,” Computers in Libraries 3 no. 5 (2012): 10–14. 28. Houghton, “Mobile Services.” 29. Cody W. Hanson, “Mobile Solutions for Your Library,” Library Technology Reports 47, no. 2 (2011): 24–31, https://journals.ala.org/ltr/article/view/4475/5222. 30. Terence K. Huwe, “Using Apps to Extend the Library’s Brand,” Computers in Libraries 33, no. 2 (2013): 27–29. 31. Edward Iglesias and Wittawat Meesangnill, “Mobile Website Development: From Site to App,” Bulletin of the American Society for Information Science and Technology 38, no. 1 (2011): 18– 23. 32. Jeff Wisniewski, “Mobile Usability,” Bulletin of the American Society for Information Science & Technology 38, no. 1 (2011): 30–32, http://dx.doi.org/10.1002/bult.2011.1720380108. 33. Jeff Wisniewski, “Mobile Websites with Minimal Effort,” Online 34, no. 1 (2010): 54–57. 34. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 35. J. Michael DeMars, “Smarter Phones: Creating a Pocket Sized Academic Library,” The Reference Librarian 53, no. 3 (2012): 253–62, http://dx.doi.org/10.1080/02763877.2012.678236. http://dx.doi.org/10.1108/07378831111174387 http://dx.doi.org/10.1080/02763877.2011.527600 http://dx.doi.org/10.1108/03074801311304050 http://dx.doi.org/10.1080/02763877.2011.520109 https://journals.ala.org/ltr/article/view/4475/5222 http://dx.doi.org/10.1002/bult.2011.1720380108 http://dx.doi.org/10.1080/02763877.2012.678236 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 59 36. Kim Griggs, Laurie M. Bridges, and Hannah Gascho Rempel, “Library/Mobile: Tips on Designing and Developing Mobile Websites,” Code4lib no. 8 (2009), http://journal.code4lib.org/articles/2055. 37. DeMars, “Smarter Phones.” 38. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 39. Beth Stahr, “Text Message Reference Service: Five Years Later,” The Reference Librarian no. 52, no. 1-2 (2011): 9–19, http://dx.doi.org/10.1080/02763877.2011.524502. 40. Patterson and Low, “Student Attitudes.” 41. Ibid. 42. Ibid. 43. Hanson, “Mobile Solutions for Your Library.” 44. Stahr, “Text Message Reference Service.” 45. Spitzer, “Make That to Go.” 46. Allison Bolorizadeh et al., “Making Instruction Mobile,” The Reference Librarian 53, no. 4 (2012): 373–83, http://dx.doi.org/10.1080/02763877.2012.707488. 47. Maura Keating, “Will They Come? Get Out the Word About Going Mobile,” The Reference Librarian no. 52, no. 1-2 (2010): 20-26, http://dx.doi.org/10.1080/02763877.2010.520111. 48. Patterson and Low, “Student Attitudes.” 49. Hanson, “Mobile Solutions for Your Library.” 50. Patterson and Low, “Student Attitudes.” 51. Hanson, “Mobile Solutions for Your Library.” 52. Cody W. Hanson, “Why Worry About Mobile?,” Library Technology Reports no. 47, no. 2 (2011): 5–10, https://journals.ala.org/ltr/article/view/4476. 53. Keating, “Will They Come?” 54. Spitzer, “Make That to Go.” 55. Kim, “Responsive Web Design.” 56. Wisniewski, “Mobile Usability.” 57. Elder, “How to Become the ‘Tech Guy.’” http://journal.code4lib.org/articles/2055 http://dx.doi.org/10.1080/02763877.2011.524502 http://dx.doi.org/10.1080/02763877.2012.707488 http://dx.doi.org/10.1080/02763877.2010.520111 https://journals.ala.org/ltr/article/view/4476 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 60 58. Sally Wilson and Graham McCarthy, “The Mobile University: From the Library to the Campus,” Reference Services Review 38, no. 2 (2010): 214–32, http://dx.doi.org/10.1108/00907321011044990. 59. Brendan Ryan, “Developing Library Websites Optimized for Mobile Devices,” The Reference Librarian 52, no. 1-2 (2010): 128–35, http://dx.doi.org/10.1080/02763877.2011.527792. 60. Kim, “Responsive Web Design.” 61. Connolly, Cosgrave, and Krkoska, “Mobilizing the Library’s Web presence and Services.” 62. DeMars, “Smarter Phones.” 63. Mark Andy West, Arthur W. Hafner, and Bradley D. Faust, “Expanding Access to Library Collections and Services Using Small-Screen Devices,” Information Technology & Libraries 25 (2006): 103–7. 64. Houghton, “Mobile Services.” 65. Rempel and Bridges, “That was Then.” 66. Elder, “How to Become the ‘Tech Guy.’” 67. Heather Williams and Anne Peters, “And That’s How I Connect to MY Library: How a 42- Second Promotional Video Helped to launch the UTSA Libraries’ New Summon Mobile Application,” The Reference Librarian 53, no. 3 (2012): 322–25, http://dx.doi.org/10.1080/02763877.2012.679845. 68. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 69. Danielle Andre Becker, Ingrid Bonadie-Joseph, and Jonathan Cain, “Developing and Completing a Library Mobile Technology Survey to Create a User-Centered Mobile Presence,” Library Hi-Tech 31, no. 4 (2013): 688–99, http://dx.doi.org/10.1108/LHT-03-2013-0032. 70. Rempel and Bridges, “That was Then.” 71. Iglesias and Meesangnill, “Mobile Website Development.” 72. Elder, “How to Become the ‘Tech Guy.’” 73. Andrew Walsh, “Mobile Information Literacy: A Preliminary Outline of Information Behavior in a Mobile Environment,” Journal of Information Literacy 6, no. 2 (2012): 56–69, http://dx.doi.org/10.11645/6.2.1696. 74. Back and Bailey, “Web Services and Widgets.” 75. Ibid. 76. Ibid. 77. Spitzer, “Make That to Go.” http://dx.doi.org/10.1108/00907321011044990 http://dx.doi.org/10.1080/02763877.2011.527792 http://dx.doi.org/10.1080/02763877.2012.679845 http://dx.doi.org/10.1108/LHT-03-2013-0032 http://dx.doi.org/10.11645/6.2.1696 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 61 78. Iglesias and Meesangnill, “Mobile Website Development.” 79. Bohyun Kim, “The Present and Future of the Library Mobile Experience,” Library Technology Reports 49, no. 6 (2013): 15–28, https://journals.ala.org/ltr/article/view/4506. 80. Pendell and Bowman, “Usability Study.” 81. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 82. Andromeda Yelton, “Where to Go Next,” Library Technology Reports 48, no. 1 (2012): 25–34, https://journals.ala.org/ltr/article/view/4655/5511. 83. Ibid. 84. Hahn et al., “Methods for Applied Mobile Digital Library Research.” 85. Houghton, “Mobile Services.” 86. Ibid. 87. Mairn, “Three Things You Can Do Today.” 88. Ibid. 89. Tamara Pianos, “EconBiz to Go: Mobile Search Options for Business and Economics— Developing a Library App for Researchers,” Library Hi Tech 30, no. 3 (2012): 436–48, http://dx.doi.org/10.1108/07378831211266582. 90. DeMars, “Smarter Phones.” 91. Ryan, “Developing Library Websites.” 92. Pendell and Bowman, “Usability Study.” 93. Ryan, “Developing Library Websites.” 94. Michael J. Whitchurch, “QR Codes and Library Engagement,” Bulletin of the American Society for Information Science & Technology 38, no. 1 (2011): 14–17. 95. Back and Bailey, “Web Services and Widgets.” 96. Jingru Hoivik, “Global Village: Mobile Access to Library Resources,” Library Hi Tech 31, no. 3 (2013): 467–77, http://dx.doi.org/10.1108/LHT-12-2012-0132. 97. Elder, “How to Become the ‘Tech Guy.’” 98. Ryan, “Developing Library Websites.” 99. West, Hafner and Faust, “Expanding Access.” 100. Hu and Meier, “Planning for a Mobile Future.” 101. Iglesias and Meesangnill, “Mobile Website Development.” https://journals.ala.org/ltr/article/view/4506 https://journals.ala.org/ltr/article/view/4655/5511 http://dx.doi.org/10.1108/07378831211266582 http://dx.doi.org/10.1108/LHT-12-2012-0132 IDENTIFYING KEY STEPS FOR DEVELOPING MOBILE APPLICATIONS & MOBILE WEBSITES FOR LIBRARIES | POTNIS, REGENSTREIF-HARMS, AND CORTEZ |doi:10.6017/ital.v35i2.8652 62 102. Wisniewski, “Mobile Usability.” 103. Joe Murphy, “Using Mobile Devices for Research: Smartphones, Databases and Libraries,” Online 34, no. 3 (2010): 14–18. 104. Amy Vecchione and Margie Ruppel, “Reference is Neither Here nor There: A Snapshot of SMS Reference Services,” The Reference Librarian 53, no. 4 (2012): 355–72, http://dx.doi.org/10.1080/02763877.2012.704569. 105. Hu and Meier, “Planning for a Mobile Future.” 106. Wilson and McCarthy, “The Mobile University.” 107. Project Management Institute, A Guide to the Project Management Body of Knowledge (PMBOK Guide) (Newtown Square, PA: Project Management Institute, 2013). 108. Devendra Potnis et al., “Skills and Knowledge Needed to Serve as Mobile Technology Consultants in Information Organizations,” Journal of Education for Library & Information Science 57 (2016): 187–96. http://dx.doi.org/10.1080/02763877.2012.704569 ABSTRACT INTRODUCTION METHOD Forming and Managing a Team Key Steps in the Analysis Phase Key Steps for Designing MAMW Key Steps for Implementing MAMW Skills Needed for Maintaining MAMW CONCLUSION Forming and managing team This paper assumes a very small number of scholarly publications to be reflective of the real-world scenarios of developing MAMW for all types of libraries. This assumption is one of the limitations of this study. Also, the sample of publications anal... REFERENCES 8749 ---- In the Name of the Name: RDF Literals, ER Attributes, and the Potential to Rethink the Structures and Visualizations of Catalogs Manolis Peponakis ABSTRACT The aim of this study is to contribute to the field of machine-processable bibliographic data that is suitable for the Semantic Web. We examine the Entity Relationship (ER) model, which has been selected by IFLA as a “conceptual framework” in order to model the FR family (FRBR, FRAD, and RDA), and the problems ER causes as we move towards the Semantic Web. Subsequently, while maintaining the semantics of the aforementioned standards but rejecting the ER as a conceptual framework for bibliographic data, this paper builds on the RDF (Resource Description Framework) potential and documents how both the RDF and Linked Data’s rationale can affect the way we model bibliographic data. In this way, a new approach to bibliographic data emerges where the distinction between description and authorities is obsolete. Instead, the integration of the authorities with descriptive information becomes fundamental so that a network of correlations can be established between the entities and the names by which the entities are known. Naming is a vital issue for human cultures because names are not random sequences of characters or sounds that stand just as identifiers for the entities—they also have socio-cultural meanings and interpretations. Thus, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. In this study, a method is proposed to connect the names with the entities they represent and, in this way, to document the provenance of these names by connecting specific resources with specific names. INTRODUCTION The basic aim of this study is to contribute to the field of machine-processable bibliographic data. As to what constitutes “machine processable” we concur with the clarification of Antoniou and van Harmelen, who state, “In the literature the term machine-understandable is used quite often. We believe it is the wrong word because it gives the wrong impression. It is not necessary for intelligent agents to understand information; it is sufficient for them to process information effectively, which sometimes causes people to think the machine really understands.”1 Also, in the bibliography used, the term “computationally processable” is used as a synonym to “machine processable.” Manolis Peponakis (epepo@ekt.gr) is an information scientist at the National Documentation Centre, National Hellenic Research Foundation, Athens, Greece. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 19 mailto:epepo@ekt.gr With regard to machine-processable bibliographic data, we have taken into consideration both the practice and theory of Library and Information Science (LIS) and Computer Science. From LIS we have chosen the Functional Requirements for Bibliographic Records (FRBR) and the Functional Requirements for Authority Data (FRAD) while making comparisons with the Resource Description and Access (RDA) standard. From the Computer Science domain we have chosen the Resource Description Framework (RDF) as a basic mechanism for the Semantic Web. We examine the Entity Relationship (ER) model (selected from IFLA as a “conceptual framework” for the development of FRBR), 2 as well as the potential problems that may arise as we move towards the Semantic Web. Having rejected the ER model as a conceptual framework for bibliographic data, we have built on the potential of RDF and document how its rationale affects the modeling process. In the context of the Semantic Web and Uniform Resource Identifiers (URIs), the identification process has been transformed. For this reason we have performed an analysis of appellations and names as identifiers and also explored how we could move on from an era where controlled names play the role of identifiers to one of the URI dominion: “While it is self-evident that labels and comments are important for constructing and using ontologies by humans, the OWL standard does not pay much attention to them. The standard focuses on the syntax, structure and reasoning capabilities. . . . If the Semantic Web is to be queried by humans, there will be no other way than dealing with the ambiguousness of human language.”3 It is essential to build on the “library's signature service, its catalog,”4 and use it to provide added- value services. But to get there, first there has to be “a shift in perspective, from locked-up databases of records to open data shared on the Web.”5 This requires a transition from descriptions aimed at human readers to descriptions that put the emphasis on computational processes to escape the rationale of records being a condensed description in textual form and move towards more flexible and fruitful representations and visualizations. BACKGROUND FRBR and RDA The FR family has been growing for more than a decade. The first member of the family was the Functional Requirements for Bibliographic Records (FRBR),6 the first version of which was published towards the end of the last century. Subsequently, IFLA decided to extend the model in order to cover authorities. During this process, the task of modeling the names was separated from the task of modeling the subjects. Thus two new members were added to the family; the “Functional Requirements for Authority Data: A Conceptual Model” (FRAD) and the “Functional Requirements for Subject Authority Data (FRSAD).” 7,8 At the same period of time, the “Resource Description and Access” (RDA) standard was established as a set of cataloging rules to replace the AACR standard. According to its creators, the alignment with the FR family was crucial. As stated, IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 20 “A key element in the design of RDA is its alignment with the conceptual models for bibliographic and authority data developed by the International Federation of Library Associations and Institutions (IFLA): Functional Requirements for Bibliographic Records [and] Functional Requirements for Authority Data.”9 This paper uses the FR family and the RDA as a starting point but detects some problems and inconsistencies between these models. It sustains the basic semantics from these standards but rejects their structural formalism because it finds that it is quite problematic and lacks effectiveness in expressing highly machine-processable data. The effective processability of the data will be discussed in detail in the section “The Impact of the Representation Scheme’s Selection: RDF versus ER.” Among the FR family, the terminology is inconsistent and, as we pass from the FRBR to FRAD and FRSAD, even the perception angle of the general model undergoes change. In FRBR (the first in order), there is no notion of the name as an entity. FRAD introduces this perception (FRAD also adds family as a new entity) and FRSAD makes a step forward and introduces the concept of nomen instead of the concept of name. Hence, despite the fact that each of the members of the FR family of models has been represented in RDF,10 there is no established consolidated edition yet that combines the different angles using a common model and terminology (vocabulary).11 These representations (one for each model) are available at IFLA’s website.12 On the other hand, in the context of RDA there may be more consistency regarding terminology, but, as is well established in the relevant literature, there are significant differences between the two models, i.e. the FR family and RDA.13,14,15 Due to these differences, there are no URIs, not even in the RDA registry, in the examples of our study.16 Given the above, the terms appearing in the figures are a selection from the three texts of the FR family. Thus, nomen (from FRSAD) is used instead of name (from FRAD) as a more abstract notion, and the attribute—property in the context of RDF—“has string” (from FRAD) is used to assign a specific literal to a nomen. In figures 2–5 we have used the “has appellation” (reversed “is appellation of”) relationship of FRAD.17 Notes about Terminology and Graphs: How to Read the Figures In this paper two different sorts of figures appear. This covers the need to compare two different models and pinpoint the differences between them and the problems that arise from selecting the ER model to express FRBR. An explanation of the two major models follows in the next subsection. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 21 The first figure type follows the diagrams of the Entity–relationship model and is used in figure 1. In this case: • The rectangles represent entities. • The oval shapes represent attributes. • The diamond-shaped boxes represent relationships. The second figure type has been created according to the RDF graphical representations and is used in figures 2–5. In these cases: • The oval shapes represent nodes that are identified by a URI and they could serve as objects or subjects for further expansion of the network. In figures 3–5 all the names were derived from the FR entities. • The line connectors between nodes represent the predicates (i.e., they are properties) and should also serve as URIs. • The rectangle shapes represent literals consisting of lexical form. Language code could apply in these cases. With or without language codes, these are the end points and they could not be subject to new connections. We follow the common modeling of the language in RDF in which the literal itself contains a language code, for example "example"@en in standard Turtle syntax, or in RDFS XML coding. We must note that this kind of modeling is quite a simplistic way of language modeling because there is no mechanism to declare more information about language, such as multiple scripts, which could apply in the context of the same language. The Impact of the Representation Scheme’s Selection: RDF versus ER Nowadays, all the information on library catalogs is created through and stored in computers. This technological infrastructure provides specific methods and dictates limitations for the catalog’s data management. Hence, every model must take into consideration the basic rationale of the technological infrastructure that will curate and process the data. Depending on the syntax capabilities of the representation model, the expression of what we want to express becomes reasonably easy and accurate since “semantics is always going to have a close relationship with the field of syntax.”18 This establishes a vital relationship between what we want to do and how computers can do it. In this section we emphasize the limitations of the Entity Relationship (ER) implementation, which FRBR proposes, and denote how syntax affects expressiveness and, accordingly, functionality. Finally, we demonstrate how the selection of one implementation or another (in our case ER vs. RDF) has serious implications, both for cataloging rules and for cataloging practice. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 22 Why do we compare these two specific models? The ER model is the base that has been selected from IFLA as a “conceptual framework” 19 for the development of FRBR, while FRBR is the conceptual model upon which RDA has been founded. Subsequently, RDA is also affected by the choice of ER model. On the other hand, RDF is the current conceptualization for resource description in the web of data. So, what kind of problems and conflicts arise from the implementations of each of these models? The basic rationale of ER comprises three fundamental elements. There are entities; entities have attributes; and there are relationships between entities. It is also possible to declare cardinality constraints upon which the FR family builds. Then again, RDF implies quite a different model. “The core structure of the abstract syntax is a set of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph. An RDF graph can be visualized as a node and directed-arc diagram, in which each triple is represented as a node-arc-node link. . . . There can be three kinds of nodes in an RDF graph: IRIs, literals, and blank nodes.”20 “Linking the object of one statement to the subject of another, via URIs, results in a chain of linked statements, or linked data. This avoids the ambiguity of using natural language strings as headings to match statements. As a result, a literal object terminates a linked data chain, and literals are generally used for human-readable display data such as labels, notes, names, and so on.”21 As a representative example of the differences between the two models, let us consider “place of publication.” Peponakis counts nine attributes of place and notices that, due to the fact that the ER model does not allow links between attributes, there is no way to define explicitly whether these attributes address the same place or not.22 Taking into consideration this problem we demonstrate the transition from the ER attributes approach to RDF implementations in figures 1– 2. Let us assume that there is Person (X), who was born in London, is named John Smith and works at Publisher (Y). This Publisher is located in London, where Book (1), entitled History of London, has been published. For this specific book, Person X was the lithographer. If we create a strict mapping to FRBR entities, attributes, and relations, then we have the situation illustrated in figure 1. Due to the fact that there is no way to link the four occurrences of London (inasmuch as there is no option to define relations between attributes in the ER model), there is no way to be certain that London is the same in all cases. Judging only by the name, it could stand for London in England, in Ontario, in Ohio, or elsewhere. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 23 Figure 1. Example of “Place” as attribute of several entities The IFLA working group has faced the problem with place and noted the following. The model does not, however, parallel entity relationships with attributes in all cases where such parallels could be drawn. For example, “place of publication/distribution” is defined as an attribute of the manifestation to reflect the statement appearing in the manifestation itself that indicates where it was published. Inasmuch as the model also defines place as an entity it would have been possible to define an additional relationship linking the entity place either directly to the manifestation or indirectly through the entities person and corporate body which in turn are linked through the production relationship to the manifestation. To produce a fully developed data model further definition of that kind would be appropriate. But for the purposes of this study it was deemed unnecessary to have the conceptual model reflect all such possibilities. 23 Finally, they seem to avoid the problem and repeat their position in FRAD as well. In certain instances, the model treats an association between one entity and another simply as an attribute of the first entity. For example, the association between a person and the place in which the person was born could be expressed logically by defining a relationship (“born in”) between person and place. However, for the purposes of this study, it was deemed sufficient to treat place of birth simply as an attribute of person. 24 IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 24 For some reason the creators of the FR family have chosen not to “upgrade” the attributes of place into one and only one entity. Furthermore, the same problem exists for many attributes, not only for place. Thus, the problem has to do with the selection of ER as “conceptual framework” and not with the specific entity of place. If we accept that “Place of Publication” must not be recorded as it appears on the resource, an RDF-based approach makes things clearer, as figure 2 shows. In this case, all attributes of place are promoted to the same RDF node and, instead of four repeats of the attribute with the value “London,” we reduce it to one and only one node with four connections to it. Then, as illustrated by figure 2, we can be sure that all instances refer to the same London. Figure 2. RDF-based representations of figure 1 In figure 2, it is assumed that there is no need to transcribe the literal of “Place of Publication” from the resource; i.e., we did not follow rule 2.8.1.4 of RDA: “Transcribe places of publication and publishers' names as they appear on the source of information.” For cataloging rules that demand to record the place as it appears on the resource, the readers can consult the subsection “Place Names” in this study. Last but not least, RDF has another significant advantage compared to the ER model: data coded in RDF are packed ready for use in the Semantic Web. On the contrary, data coded in ER must undergo conversion—with all its implications—in order to be published in the Semantic Web. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 25 NAMES, ENTITIES, and IDENTITIES In this section, the significance of names as carriers of meaning is outlined and the importance of documenting the relations of names with the entities and identities they refer to is established. Additionally, the basic approaches are presented for metadata generation for managing names. These approaches resulted in the distinction (dissociation of authorities) from the bibliographic records, which in turn led (both FRBR/FRAD and RDA) to the lack of potentially linking—in an explicit way—the entity with the names it goes by. This linking, as it is presented later in this text, is fundamental for the description and interpretation of the entity. In everyday communication, the usage of a name in a sentence plays the role of the identifier for the entity that this specific name indicates. If the speakers share a common background, there is no need for qualifiers other than the name in order to disambiguate information such as whether Nick is Person X or Person Y, or if the word “London” indicates the city in Ohio or in England, etc. Thus, the common background leads to a very limited context in which the interpretation of the name and the assignment to the appropriate entity is sufficient and accurate. However, the context of the Internet is extended into a variety of possibilities, so there is need of a more precise way to identify specific entities. In this regard, a very essential issue is the distinction between the properties of the name and the properties of the entity that is represented by the specific name. The word “John” could be recognized as an English name, but we jump to a logical flaw if we assume that John knows English. A representative example of this kind of inference (syllogism) can be found in Rayside and Campbell.25 Statement: “Man is a species of animal. Socrates is a man. Therefore, Socrates is a species of animal. . . . ‘Man' is a three-lettered word. Socrates is a man. Therefore, Socrates is a three-lettered word.” Therefore the authorities of a catalog should embody a two-level modeling of the information they represent. The first has to do with the entities and the second with the names of these entities. Consequently, there is the need to find a way to pass from names to the entities they indicate; and, from entities, to the various appellations that these entities have. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 26 In catalogs, it is kind of vague whether the change of a name signifies a new identity. Niu states: “For example: the maiden name and the married name of an agent are normally not considered two separate identities, yet one pseudonym used for writing fiction and another pseudonym used for writing scientific works are often considered two different identities of an agent.”26 Then there can be one individual with many identities. But there can also be one identity which incorporates many individuals: for example, a shared pseudonym for a group of authors. To deal with these problems, FRAD introduces the notion of persona, rejecting at the same time the idea that a person is equal to an individual. FRAD defines a person as an “individual or a persona or identity established or adopted by an individual or group.”27 The question that arises here is when the persona must be conceived as a new identity. Yet, FRAD does not make a sufficient judgment; instead, they refer to cataloguing rules. “Under some cataloguing rules, for example, authors are uniformly viewed as real individuals, and consequently specific instances of the bibliographic entity person always correspond to individuals. Under other cataloguing rules, however, authors may be viewed in certain circumstances as establishing more than one bibliographic identity, and in that case a specific instance of the bibliographic entity person may correspond to a persona adopted by an individual rather than to the individual per se.”28 So there is no specific guidance if, for example, in the case of “religious relationship,”29 there must be one identity created with two alternative names or two different identities. Rule 9.2.2.8 in RDA does not elaborate further. Still, even with the problem of identities solved, the matter of appellations itself could be extremely complicated, and this is widely addressed in relevant literature.30,31,32 The VIAF project confirms this with an extremely huge data set .33 Assigning all appellations as attributes is an easy way to model the variants of a name, but it is very simplistic because it “does not allow these appellations to have attributes of their own and neither does it allow the establishing of relationships among the appellations. . . . FRAD makes a big step forward: all appellations are defined as entities in their own right, thus allowing full modeling.”34 Of course, FRAD’s approach is not a novelty in the domain of LIS since library catalogs have been modeling names since the era of MARC. In UNIMARC Authorities,35 the control subfield $5 contains a coded value to indicate the relations between the names with values such as “k = name before the marriage,” “i = name in religion,” “d = acronym,” etc., and in MARC 21 there is the corresponding subfield $w.36 FRAD puts these values on a more consistent and abstract level. FRAD also defines “Relationships between Persons, Families, Corporate Bodies, and Works” in section 5.3 and “Relationships between their Various Names” in section 5.4.37 The Distinction between Authorities and Descriptive Information Since the days of card catalogs and for as long as MARC and AACR have been used, bibliographic records have set their grounds on the dichotomy between descriptive information and control access points. The various types of headings stand for control access points. The terminus of headings was the alphabetical sorting. With the advent of computers, they were used as string identifiers to cluster and retrieve relevant bibliographic records. These bibliographic records had INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 27 a body of descriptive information that was transcribed from the resource and remained unchanged. So the headings were the keys to the records and the records were surrogates for documents. “The elements of a bibliographic record . . . were designed to be read and comprehended by human beings, not by machines”38; established headings are not an exception. One of their basic characteristics was the precondition that they were unique in the context of a specific catalog, thereby avoiding ambiguity. In every case of synonymy, qualifiers (such as date of birth or profession) were added to disambiguate, while the names also played the role of a unique identifier. From this process, an issue emerges: the information that appears on the document has changed and the controlled name may be completely different from the name on the resource. This means that the cataloger performs a transformation of the information, and this transformation carries two dangers. First, by changing the name, there is the possibility of assigning the entity behind the name to a wrong entity. Second, by disturbing the correspondence between the information on the resource and the information on the record of the resource, the record becomes a problematic surrogate of the resource. To surpass this obstacle, traditional catalogs split the information into two different areas: one with the established forms, i.e., the headings; and the second with the purely descriptive information, i.e., the information that must be transcribed from the resource. This is the reason why traditional library catalogs put much effort into transcribing information from resources and very detailed guidelines have been developed. On the other hand, current approaches on metadata creation (such as Dublin Core) seem to underestimate the importance of descriptive information while concentrating on the established forms of names. But how can we be sure that different literals communicate the same meaning? Does this kind of simplification, perhaps, cause problems regarding the integrity of the information? The names are not just sequences of characters (i.e., strings), but they carry latent information. It is known that there are women who wrote using male names (for example Mary Ann Evans wrote as George Eliot) and men who wrote by using female names. There are also nicknames for groups (e.g., “Richard Henry” is a pseudonym for the collaborative works of Richard Butler and Henry Chance Newton), etc. Therefore, it is important not to ignore names and the forms in which they appear on the resources, but to model them in such a way that integration between authorities and descriptive information is feasible, and the names are efficiently machine-processable. INTEGRATING AUTHORITIES WITH DESCRIPTIVE INFORMATION As we have already stated, traditional library catalogs are built on the dichotomy between description and access points. This analysis aims to bring descriptive information and authorities closer, i.e. to connect the access point of catalogs with the description of the resource. The basic principle of the model presented in this section is to promote each verbal (lexical) representation of a name to a nomen, whether this form of the name derives from a controlled vocabulary or not. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 28 In the cases that this form appears in a specific vocabulary, appropriate properties could be used to indicate such a relation. In this section, some representative examples are presented. It is important to note, once again, that every node and relation in the following figures could (and must, in the context of the Semantic Web) be identified by a URI, except for the values in rectangles, which are RDF simple literals and therefore cannot be the subjects of further expansion. Thus, the concatenation is the following: Every individual (instance of the relevant class) acquires a URI. Every individual is connected through the “has appellation” property (acquires URI) to a nomen (also acquires URI) and these nomens end up connected to a plain RDF literal, which is in natural language wording and cannot be subjected to further analysis. Place Names The problem of place as an attribute in FRBR and FRAD has also been analyzed in the Background Analysis of the current paper, specifically in the subsection “The Impact of the Representation Scheme’s Selection: RDF versus ER.” Here, a solution to this problem that is compatible with the FRBR/RDA solution is proposed. By promoting every nomen of a place to an RDF node, there is the option of referring to the entity of place as a whole or to a specific appellation of this entity. So, the relation (property in the context of RDF) between the subjects of a work could be indicated by connecting Work X with Place Z. On the other hand, according to rule 2.8.1.4 of RDA, the place of publication for the manifestation must be transcribed as it appears on the source of information. But following the connections presented in figure 3, it is easy to assume that this specific nomen corresponds to the same entity, i.e., to the same place. Figure 3. Place INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 29 Personal names In the section “Names, Entities and Identities,” we analyzed many of the problems associated with personal names. Here, a model is presented where the work (and expression) is connected directly with the author, whereas manifestation is connected with a specific appellation, i.e., nomen, of this author. Figure 4. Statements of responsibility RDA rule 2.4.1.4 states, “Transcribe a statement of responsibility as it appears on the source of information.” But occasionally the statement of responsibility may contain phrases and not just names. In these cases, a solution similar to the Metadata Object Description Schema (MODS) could be implemented where, if needed, the statement of responsibility is included in the note element using the attribute type="statement Of Responsibility." Titles The management of titles in FRBR and RDA indicates a different point of view between the two standards. According to RDA there is no title for the expression,39 and, as Taniguchi states, this is a “significant difference between FRBR and RDA.”40 BIBFRAME abides by the same principle of downgrading expression, since it entangles expression with work in an indivisible unit. In this regard, BIBFRAME is closer to RDA than to FRBR. The notion of work has nothing to do with specific languages, even in the case when the work is a written text. Therefore the assignment of the title of work to a specific appellation is an unnecessary limitation. On the contrary, the title of a manifestation is derived by a specific IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 30 resource. We argue that between these two poles there is the title of expression, which could stand as a uniform title per language. Figure 5. Titles V of BIBLIOGRAPHIC RECORDS and CATALOGING RULES Resource description in the domain of LIS—from Cutter’s era to the present day—emphasizes static linear textual representations. According to the RDA “0.1 Key Features,” “In RDA, there is a clear line of separation between the guidelines and instructions on recording data and those on the presentation of data. This separation has been established in order to optimize flexibility in the storage and display of the data produced using RDA. Guidelines and instructions on recording data are covered in chapters 1 through 37; those on the presentation of data are covered in appendices D and E.” But the tables in the relative appendices (D and E) contain guidelines that are mainly concentrated on punctuation issues, and they do not take into consideration the dynamics of current interactive user interface capabilities. As Coyle and Hillmann comment, “there are instructions for highly structured strings that are clearly not compatible with what we think of today as machine-manipulable data.”41 It is rather like producing high-tech cards: RDA is faithful INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 31 to the classical text-centric approaches that produce bibliographic records as a linear enumeration of attributes; thus, RDA can be likened to a new suit that is quite old fashioned. Traditional catalogs (from card catalogs to OPACs and repository catalogs) were built upon the principle of creating autonomous records. FRBR set this principle, i.e. one record for each resource, under dispute, while Linked Data abolishes it. This way, a gigantic graph of statements is created, while a certain part of these statements (not always the same) responds to or describes the desired information. Thus, a more sophisticated method emerges, if not makes itself imposed, for showing the results. Therefore, the issue is not to present a record that describes a specific resource, since this conceptualization tends to be obsolete altogether. Consequently, the visualization has to be different while in dependence with the data structure as well as the available interface of the searcher. In this context, the analysis of this study tries to keep in balance the machine-processable character of RDF that builds on identifiers (URIs), while paying attention to the linguistic representation of entities. We argue that the balance between them will result in highly accurate and efficient representations for both humans and software agents. Let us consider the model for titles that has been introduced in this study. According to FRBR, “if the work has appeared under varying titles (differing in form, language, etc.), a bibliographic agency normally selects one of those titles as the basis of a ‘uniform title’ for purposes of consistency in naming and referencing the work.”42 RDA treats the case in a very similar way: rule 5.1.3 states, “The term ‘title of the work’ refers to a word, character, or group of words and/or characters by which a work is known. The term ‘preferred title for the work’ refers to the title or form of title chosen to identify the work. The preferred title is also the basis for the authorized access point representing that work”. In this study, we consider the aforementioned statements as a projection that springs from the days when records were static textual descriptions independent of interfaces. Nowadays we are moving towards a much clearer distinction between the entity and its names. This is reflected in figure 5, in which the connection between a work and its author has nothing to do with specific names (appellations) but is based on URIs. The selection of the appropriate name as a title for the specific work could be based on certain criteria such as the language of the interface: in this case, the title of the work will be the title of the user interface language, and if this is not possible (i.e. there is no title label in this language), then it could be the title of the catalog’s default language. Following the kind of modeling proposed in the current study, the visualizations of data become more flexible and efficient in a variety of dynamic ways. Hence, we can isolate and display nodes and their connections, correlate them with the interface language or screen size (i.e., mobile phone or PC), create levels relative to the desired depth of analysis, personalize them upon the user’s request or habits, and so on. Also, it becomes possible to display the data in forms other than textual. “As a result, humans, with their great visual pattern recognition skills, can comprehend data tremendously faster and more effectively through visualization than by reading the numerical or textual representation of the data.”43 IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 32 As we have already mentioned, the syntax and the semantics are always going to have a close relationship, but it is crystal clear that, now more than ever, the current Semantic Web standards allow for greater flexibility. As Dunsire et al. put it, The RDF approach is very different from the traditional library catalog record exemplified by MARC21, where descriptions of multiple aspects of a resource are bound together by a specific syntax of tags, indicators, and subfields as a single identifiable stream of data that is manipulated as a whole. In RDF, the data must be separated out into single statements that can then be processed independently from one another; processing includes the aggregation of statements into a record-based view, but is not confined to any specific record schema or source for the data. Statements or triples can be mixed and matched from many different sources to form many different kinds of user-friendly displays.44 In this framework, cataloging rules must reexamine their instructions in light of the new opportunities offered by technological advancements. DISCUSSION Naming is a vital issue for human cultures. Names are not random sequences of characters or sounds that stand just as identifiers for the entities, but they also have socio-cultural meanings and interpretations. Recently, out of “political correctness” and fear of triggering racism, Sweden changed the names of bird species that could potentially offend, such as “gypsy bird” and “negro.”45 Therefore we cannot treat names just as random identifiers. In this study we examined how, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. We proposed a method for connecting the names to the entities they represent and, at the same time, we documented the provenance of these names by connecting specific resources with specific names. We illustrated how to establish connections between entities, connections between an entity and a specific name of another entity, as well as connections between one name and another name concerning one or two entities. In the proposed framework, we maintain the linguistic character of naming while modeling the names in a machine-processable way. This formalism allows for a high level of expressiveness and flexible descriptions that do not have a static, text-centric orientation, since the central point is not the establishment of the text values (i.e., heading) but the meaning of our statements. This study has shown that it is important to have the possibility to establish relationships both between entities and between specific appellations (nomens in the context of this study) of these entities. To achieve this we promoted every appellation to an RDF node. This is not something unheard of in the domain of RDF since this approach has also been adopted by W3C for the development of SKOS-XL.46 FRBRoo, which is another interpretation of increasing influence in the wider context of the FR family, adopts the same perspective. 47 FRBRoo also gives the option to connect a specific name with a resource through the property “R64 used name (was name used INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 33 by)” or to connect a name with someone who uses this specific name through the property “R63 named (was named by).” Murray and Tillett state that “cataloging is a process of making observations on resources”48; hence, the production of records is the result of the judgments during this process. But in the context of traditional descriptive cataloging, the cataloger was not required to judge information in any way other than its category, i.e. to characterize whether the X set of characters corresponded to the name of an author, publisher, or place and so on. There was no obligation of assigning a particular name to a specific author, publisher, or place. In our approach, the cataloger interprets the information and supports the catalog’s potential to deliver added-value information. Moreover, the initial information remains undifferentiated; hence, there is always the option of going back in order to generate new interpretations or validate existing ones. In recent years, there has been a significant increase in the attention given to multi-entity models of resource description.49 In this new environment, “the creation of one record per resource seems a deficient simplification.”50 RDF allows the transformation of universal bibliographic control to a giant global graph.51 In this manner, current approaches on resource description “cannot be considered as simple metadata describing a specific resource but more like some kind of knowledge related to the resource.”52 Indeed, this knowledge can be computationally processable and exploitable. Yet, to achieve this, “catalogers can only begin to work in this way if they are not held bound by the traditional definitions and conceptualizations of bibliographic records.”53 One critical issue is the isolation of parts (sets of statements) of this “giant graph” and the linking of these parts with something else; indeed, theory on this topic is starting to emerge.54 This is very essential because it allows for the creation of ad hoc clusters (i.e. the usage of a specific identity for an entity with all the names that have been assigned to this identity, in our context), which could be used as a set to link to some other entity. As a final remark, we could say that authorities manage controlled access points. In the Semantic Web, every URI is a controlled access point, and hence, the discrimination between description and authorities acquires a new meaning. In the context of machine-processable bibliographic data, the aim is to connect these two, i.e. the authorities with the description, and examine how one can support the other. However, since the emphasis is not on their individual management, we are drawn away from a mentality of ‘descriptive information versus access points” and towards one of “descriptive information as an access point.” ACKNOWLEDGEMENT The author wishes to thank Henry Scott who assisted in the proofreading of the manuscript. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 34 REFERENCES and NOTES 1. Grigoris Antoniou and Frank van Harmelen, A Semantic Web Primer, 2nd ed. (Cambridge, MA: MIT Press, 2008), 3. 2. IFLA, Functional Requirements for Bibliographic Records: Final Report, as amended and corrected through February 2009, IFLA Series on Bibliographic Control, vol. 19 (Munich: K.G. Saur, 1998), 6. 3. Daniel Kless et al., “Interoperability of Knowledge Organization Systems with and through Ontologies,” in Classification & Ontology: Formal Approaches and Access to Knowledge: Proceedings of the International UDC Seminar 19–20 September 2011, The Hague, the Netherlands, Organized by UDC Consortium, The Hague, edited by Aida Slavic and Edgardo Civallero (Würzburg: Ergon, 2011), 63–64. 4. Karen Coyle and Diane Hillmann, “Resource Description and Access (RDA): Cataloging Rules for the 20th Century,” D-Lib Magazine 13, no. 1/2 (January 2007): para. 2, doi:10.1045/january2007-coyle. 5. Cory K. Lampert and Silvia B. Southwick, “Leading to Linking: Introducing Linked Data to Academic Library Digital Collections,” Journal of Library Metadata 13, no. 2–3 (2013): 231, doi:10.1080/19386389.2013.826095. 6. IFLA, Functional Requirements for Bibliographic Records. 7. IFLA, Functional Requirements for Authority Data: A Conceptual Model, edited by Glenn E. Patton, IFLA Series on Bibliographic Control (Munich: K.G. Saur, 2009). 8. IFLA, “Functional Requirements for Subject Authority Data (FRSAD): A Conceptual Model” (IFLA, 2010), http://www.ifla.org/files/assets/classification-and-indexing/functional requirements-for-subject-authority-data/frsad-final-report.pdf. 9. ALA, “RDA Toolkit: Resource Description and Access,” sec. 0.3.1, accessed June 18, 2014, http://access.rdatoolkit.org/. 10. Gordon Dunsire, “Representing the FR Family in the Semantic Web,” Cataloging & Classification Quarterly 50, no. 5–7 (2012): 724–41, dx:10.1080/01639374.2012.679881. 11. While this paper was under review, IFLA released the draft “FRBR-Library Reference Model” (FRBR-LRM), which is a consolidated edition for the FR family standards. It is developed according to the respective individual standards following the principles of the entity relationship modeling, which is challenged in this paper. Taking into account the ER modeling and the statement (available on p.5 of the standard) that “the model is comprehensive at the conceptual level, but only indicative in terms of the attributes and relationships that are defined,” this consolidated edition could not be perceived as a standard that could be implemented directly as a property vocabulary qualifying for use in the RDF environment. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 35 http://dx.doi.org/10.1045/january2007-coyle http://dx.doi.org/10.1080/19386389.2013.826095 http://www.ifla.org/files/assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf http://www.ifla.org/files/assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf http://access.rdatoolkit.org/ http://dx.doi.org/10.1080/01639374.2012.679881 12. Main page (for all FR) at http://iflastandards.info/ns/fr/; “FRBR Model" available at http://iflastandards.info/ns/fr/frbr/frbrer/; “FRAD Model” available at http://iflastandards.info/ns/fr/frad/; “FRSAD Model” available at http://iflastandards.info/ns/fr/frsad/. An addition to the previous is FRBRoo: the element set is available at http://iflastandards.info/ns/fr/frbr/frbroo/. 13. Manolis Peponakis, “Conceptualizations of the Cataloging Object: A Critique on Current Perceptions of FRBR Group 1 Entities,” Cataloging & Classification Quarterly 50, no. 5–7 (2012): 587–602, doi:10.1080/01639374.2012.681275. 14. Pat Riva and Chris Oliver, “Evaluation of RDA as an Implementation of FRBR and FRAD,” Cataloging & Classification Quarterly 50, no. 5–7 (2012): 564–86, doi:10.1080/01639374.2012.680848. 15. Shoichi Taniguchi, “Viewing RDA from FRBR and FRAD: Does RDA Represent a Different Conceptual Model?,” Cataloging & Classification Quarterly 50, no. 8 (2012): 929–43, doi:10.1080/01639374.2012.712631. 16. RDA registry is available at http://www.rdaregistry.info/. 17. The nomen entity and the “has appellation” (reversed “is appellation of”) property are also used by the FRBR-LRM. 18. Paul H. Portner, What Is Meaning?: Fundamentals of Formal Semantics (Malden, MA: Blackwell, 2005), 34. 19. IFLA, Functional Requirements for Bibliographic Records, 19:6. 20. W3C, “RDF 1.1 Concepts and Abstract Syntax: W3C Recommendation,” February 25, 2014, http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/. 21. Gordon Dunsire, Diane Hillmann, and Jon Phipps, “Reconsidering Universal Bibliographic Control in Light of the Semantic Web,” Journal of Library Metadata 12, no. 2–3 (2012): 166, doi:10.1080/19386389.2012.699831. 22. Manolis Peponakis, “Libraries’ Metadata as Data in the Era of the Semantic Web: Modeling a Repository of Master Theses and PhD Dissertations for the Web of Data,” Journal of Library Metadata 13, no. 4 (2013): 333, doi:10.1080/19386389.2013.846618. 23. IFLA, Functional Requirements for Bibliographic Records, 19:32. 24. IFLA, Functional Requirements for Authority Data: A Conceptual Model, 36–37. 25. Derek Rayside and Gerard T. Campbell, “An Aristotelian Understanding of Object-Oriented Programming,” in Proceedings of the 15th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA ’00 (New York: ACM, 2000), 350, doi:10.1145/353171.353194. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 36 http://iflastandards.info/ns/fr/ http://iflastandards.info/ns/fr/frbr/frbrer/ http://iflastandards.info/ns/fr/frad/ http://iflastandards.info/ns/fr/frad/ http://iflastandards.info/ns/fr/frsad/ http://iflastandards.info/ns/fr/frbr/frbroo/ http://dx.doi.org/10.1080/01639374.2012.681275 http://dx.doi.org/10.1080/01639374.2012.680848 http://dx.doi.org/10.1080/01639374.2012.712631 http://www.rdaregistry.info/ http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/ http://dx.doi.org/10.1080/19386389.2012.699831 http://dx.doi.org/10.1080/19386389.2013.846618 http://dx.doi.org/10.1145/353171.353194 26. Jinfang Niu, “Evolving Landscape in Name Authority Control,” Cataloging & Classification Quarterly 51, no. 4 (2013): 405, doi:10.1080/01639374.2012.756843. 27. IFLA, Functional Requirements for Authority Data: A Conceptual Model, 24. 28. Ibid., 20. 29. “Religious relationship” is the “relationship between a person and an identity that person assumes in a religious capacity”; for example the “relationship between the person known as Thomas Merton and that person’s name in religion, Father Louis” (IFLA, 2009, 61–62). 30. Junli Diao, “‘Fu hao,’ ‘fu hao,’ ‘fuHao,’ or ‘fu Hao’? A Cataloger’s Navigation of an Ancient Chinese Woman’s Name,” Cataloging & Classification Quarterly 53, no. 1 (2015): 71–87, doi:10.1080/01639374.2014.935543. 31. On Byung-Won, Sang Choi Gyu, and Jung Soo-Mok, “A Case Study for Understanding the Nature of Redundant Entities in Bibliographic Digital Libraries,” Program: Electronic Library and Information Systems 48, no. 3 (July 1, 2014): 246–71, doi:10.1108/PROG-07-2012-0037. 32. Neil R. Smalheiser and Vetle I. Torvik, “Author Name Disambiguation,” Annual Review of Information Science and Technology 43, no. 1 (2009): 1–43, doi:10.1002/aris.2009.1440430113. 33. Thomas B. Hickey and Jenny A. Toves, “Managing Ambiguity in VIAF,” D-Lib Magazine 20, no. 7/8 (2014), doi:10.1045/july2014-hickey. 34. Martin Doerr, Pat Riva, and Maja Žumer, “FRBR Entities: Identity and Identification,” Cataloging & Classification Quarterly 50, no. 5–7 (2012): 524, doi:10.1080/01639374.2012.681252. 35. IFLA, UNIMARC Manual: Authorities Format, 2nd revised and enlarged edition, UBCIM Publications—New Series, vol. 22 (Munich: K.G. Saur, 2001). 36. Library of Congress, “MARC 21 Format for Authority Data” (Library of Congress, April 18, 1999), http://www.loc.gov/marc/authority/. 37. IFLA, Functional Requirements for Authority Data: A Conceptual Model. 38. Martha M. Yee, “FRBRization: A Method for Turning Online Public Findings Lists into Online Public Catalogs,” Information Technology and Libraries 24, no. 2 (2005): 81, doi:10.6017/ital.v24i2.3368. 39. See FRBR-RDA Mapping from Joint Steering Committee for Development of RDA available at http://www.rda-jsc.org/docs/5rda-frbrrdamappingrev.pdf 40. Taniguchi, “Viewing RDA from FRBR and FRAD,” 934. 41. Coyle and Hillmann, “Resource Description and Access (RDA): Cataloging Rules for the 20th Century,” sec. 8. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 37 http://dx.doi.org/10.1080/01639374.2012.756843 http://dx.doi.org/10.1080/01639374.2014.935543 http://dx.doi.org/10.1108/PROG-07-2012-0037 http://dx.doi.org/10.1002/aris.2009.1440430113 http://dx.doi.org/10.1045/july2014-hickey http://dx.doi.org/10.1080/01639374.2012.681252 http://www.loc.gov/marc/authority/ http://dx.doi.org/10.6017/ital.v24i2.3368 http://www.rda-jsc.org/docs/5rda-frbrrdamappingrev.pdf 42. IFLA, Functional Requirements for Bibliographic Records, 19:33. 43. Leonidas Deligiannidis, Amit P. Sheth, and Boanerges Aleman-Meza, “Semantic Analytics Visualization,” in Intelligence and Security Informatics, edited by Sharad Mehrotra et al., Lecture Notes in Computer Science 3975 (Springer Berlin Heidelberg, 2006), 49, http://link.springer.com/chapter/10.1007/11760146_5. 44. Dunsire, Hillmann, and Phipps, “Reconsidering Universal Bibliographic Control in Light of the Semantic Web,” 166. 45. Rick Noack, “Out of Fear of Racism, Sweden Changes the Names of Bird Species,” Washington Post, February 24, 2015, http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of racism-sweden-changes-the-names-of-bird-species/. 46. W3C, “SKOS eXtension for Labels (SKOS-XL) Namespace Document—HTML Variant,” 2009, http://www.w3.org/TR/2009/REC-skos-reference-20090818/skos-xl.html. 47. Chryssoula Bekiari et al., FRBR Object-Oriented Definition and Mapping from FRBRER, FRAD and FRSAD, version 2.0 (draft), 2013, http://www.cidoc crm.org/docs/frbr_oo//frbr_docs/FRBRoo_V2.0_draft_2013May.pdf. 48. Robert J. Murray and Barbara B. Tillett, “Cataloging Theory in Search of Graph Theory and Other Ivory Towers,” Information Technology and Libraries 30, no. 4 (January 12, 2011): 171, http://dx.doi.org/10.6017/ital.v30i4.1868. 49. Thomas Baker, Karen Coyle, and Sean Petiya, “Multi-Entity Models of Resource Description in the Semantic Web,” Library Hi Tech 32, no. 4 (2014): 562–82, http://dx.doi.org/10.1108/LHT 08-2014-0081. 50. Peponakis, “Libraries’ Metadata as Data in the Era of the Semantic Web,” 343. 51. Kim Tallerås, “From Many Records to One Graph: Heterogeneity Conflicts in the Linked Data Restructuring Cycle,” Information Research 18, no. 3 (2013), http://informationr.net/ir/18 3/colis/paperC18.html. 52. Peponakis, “Conceptualizations of the Cataloging Object,” 599. 53. Rachel Ivy Clarke, “Breaking Records: The History of Bibliographic Records and Their Influence in Conceptualizing Bibliographic Data,” Cataloging & Classification Quarterly 53, no. 3–4 (2015): 286–302, doi:10.1080/01639374.2014.960988. 54. Gianmaria Silvello, “A Methodology for Citing Linked Open Data Subsets,” D-Lib Magazine 21, no. 1/2 (2015), doi:10.1045/january2015-silvello. IN THE NAME OF THE NAME: RDF LITERALS, ER ATTRIBUTES, AND THE POTENTIAL TO RETHINK THE STRUCTURES AND VISUALIZATIONS OF CATALOGS | PEPONAKIS |doi:10.6017/ital.v35i2.8749 38 http://link.springer.com/chapter/10.1007/11760146_5 http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of-racism-sweden-changes-the-names-of-bird-species/ http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of-racism-sweden-changes-the-names-of-bird-species/ http://www.w3.org/TR/2009/REC-skos-reference-20090818/skos-xl.html http://www.cidoc-crm.org/docs/frbr_oo/frbr_docs/FRBRoo_V2.0_draft_2013May.pdf http://www.cidoc-crm.org/docs/frbr_oo/frbr_docs/FRBRoo_V2.0_draft_2013May.pdf http://dx.doi.org/10.6017/ital.v30i4.1868 http://dx.doi.org/10.1108/LHT-08-2014-0081 http://dx.doi.org/10.1108/LHT-08-2014-0081 http://informationr.net/ir/18-3/colis/paperC18.html http://informationr.net/ir/18-3/colis/paperC18.html http://dx.doi.org/10.1080/01639374.2014.960988 http://dx.doi.org/10.1045/january2015-silvello Introduction Background FRBR and RDA Notes about Terminology and Graphs: How to Read the Figures The Impact of the Representation Scheme’s Selection: RDF versus ER Names, Entities, and Identities The Distinction between Authorities and Descriptive Information Integrating authorities with descriptive information Place Names Personal names Titles Visualization of Bibliographic Records and Cataloging Rules Discussion References and Notes 8923 ---- Microsoft Word - March_ITAL_Kuglitsch_proof.docx Facilitating Research Consultations Using Cloud Services: Experiences, Preferences, and Best Practices Rebecca Zuege Kuglitsch, Natalia Tingle, and Alexander Watkins INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 29 ABSTRACT The increasing complexity of the information ecosystem means that research consultations are increasingly important to meeting library users' needs. Yet librarians struggle to balance escalating demands on their time. How can we embrace this expanded role and maintain accessibility to users while balancing competing demands on our time? One tool that allows us to better navigate this balance is Google Appointment Calendar, part of Google Apps for Education. It makes it easier than ever for students to book a consultation with a librarian, while at the same time allowing the librarian to better control their schedule. Our experience suggests that both students and librarians felt it was a useful, efficient system. INTRODUCTION The growing complexity of the information ecosystem means that research consultations are increasingly important to meeting library users' needs. Although reference interactions in academic libraries have declined overall, in-depth research consultations have not followed that trend.1 These research consultations represent an increasingly large proportion of academic librarians' reference interactions, and offer important opportunities to follow up on information literacy instruction, support student academic success, and relieve library anxiety. The library literature has demonstrated a need for and appreciation of these services.2 Moreover, students value face to face consultations because they provide an opportunity to talk through complex problems and questions while providing affective benefits such as relationship building and reassurance.3 It is evident that students seek out and value these services. But even as these services become increasingly important, librarians struggle to balance escalating demands on their time. How can we embrace this expanded role and maintain accessibility to users while managing competing priorities? We found little guidance in the literature to identify the most efficient technological tools to offer these services to undergraduates, so we began to explore options. One tool that allows us to better navigate this shifting landscape is Google Appointment Calendar, part of Google Apps for Education. It makes it easier for students to book a consultation with a librarian, while at the same time allowing the librarian to better control their schedule; Rebecca Zuege Kuglitsch (rebecca.kuglitsch@colorado.edu) is Head, Gemmill Library of Engineering, Mathematics & Physics, University of Colorado Boulder. Natalia Tingle (natalia.tingle@colorado.edu) is Business Collections & Reference Librarian, University of Colorado Boulder. Alexander Watkins (alexander.watkins@colorado.edu) is Art & Architecture Librarian, University of Colorado Boulder. FACILITATING RESEARCH CONSULTATIONS USING CLOUD SERVICES: EXPERIENCES, PREFERENCES, AND BEST PRACTICES | KUGLITSCH, TINGLE, AND WATKINS | https://doi.org/10.6017/ital.v36i1.8923 30 consequently, it is being adopted by many librarians at the University of Colorado Boulder. There are several other options available for librarians interested in calendar applications, such as YouCanBook.me.4 However, on campuses using Google Apps for Education, it may be easier to use a tool students are already familiar with and commonly use as part of their daily academic routines. Moreover, the integration with Apps for Education solves some of the problems Hess noted in the public version of Google Calendar Appointments (which is also no longer available), such as appointments booked without identifying information, and the extra step of logging in just for an appointment. Because students are often already logged in due to using Google Apps for word processing, group work, and more, there is no extra step to log in for a simple appointment.5 Our exploration of this tool suggests that it is helpful to librarians, but that it can also be of benefit to students, too. Research has proposed that students may hesitate to ask questions due to library anxiety. Would scheduling an appointment using a calendaring system be less intimidating than emailing a librarian directly, for example? We set out apply this technology in an environment of changing student preferences and expectations, explore how students received it, and establish effective practices for using it in an academic setting. Since we are liaisons to science, social science, and humanities subject areas, we were able to work with a wide spread of undergraduate students in our exploration to see what might be most effective for us, and also for students from a variety of backgrounds. Why Google Calendar We selected appointment booking via Google Calendar because of its ease of use and because the University of Colorado Boulder has Google Apps for Education. This means that every student will have a Google ID and the option of using Google Calendar as part of their normal routine. In December 2012, Google discontinued appointment calendars for general users, and limited claimable appointment slots to Google Apps for Education. For institutions which who do not subscribe, it may be worth investigating third-party Google Calendar apps, some of which are free or freemium, such as Calendly (https://calendly.com/), or SpringShare’s similar subscription service, LibCal (https://www.springshare.com/libcal/). Setting up Google Calendar One of the benefits of Google Calendar is its ease of use. Starting to set up the calendar for appointment slots is as simple as creating a new Google Calendar event and selecting appointment slots as the type of event. Next, you can give your appointment slots a name that correspond with the language your institution uses for research consultations, and schedule them for the desired length of time. It is possible to schedule blocks of appointments that Google will automatically break into shorter appointments of predetermined amounts of time. The authors created appointments lasting 30 minutes, 60 minutes, or a mix of both, depending on the expectations of our disciplines. It is also possible to create several simultaneous appointment slots, if you would like to accommodate small groups. As well as indicating time, each appointment also has a space to indicate location, particularly useful for librarians who might work in several branches or combine office hours in academic buildings with in-library office consultations. Once the events are named and saved, the calendar can be shared. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 31 Figure 1. Create a new event, selecting ‘Appointment slots’. Appointment calendars are given a unique shareable URL to direct users to available appointments; however, these URLs are necessarily long and complicated, so we recommend using a link shortener. To obtain the very long URL for an appointment calendar, click on ‘edit details’ in an appointment event. From there, it is possible to copy the link and use a link shortener to make a brief, understandable link. Figure 2. Obtain the shareable link When a student uses the link to make an appointment, both the librarian and the student receive an email with the student’s login name, email, appointment time, and other details. The slot immediately appears as taken on the calendar, so it is no longer available for other students, reducing confusion and double booking. Receiving the student’s email allows the librarian to initiate the reference interview and establish expectations. FACILITATING RESEARCH CONSULTATIONS USING CLOUD SERVICES: EXPERIENCES, PREFERENCES, AND BEST PRACTICES | KUGLITSCH, TINGLE, AND WATKINS | https://doi.org/10.6017/ital.v36i1.8923 32 Figure 3. Google calendar showing a variety of available appointments. Student Impressions We received positive feedback about the appointment calendars from students. Students commented: ● “I like the ability to see all of the possible openings,” ● “I already bookmarked that bit.ly, so you’ll probably hear from me” (which we did, shortly thereafter). ● “I like to be able to ‘schedule’ a consultation, not request one. It seems more useful and immediate.” We kept track of how many students who made calendar appointments over two semesters kept them, and sent a short, informal survey to students who made appointments. No students who made a calendar appointment failed to attend their consultation. Though our survey does not permit large-scale generalizations due to a very low response rate (4) and a small sample size (15), all of the students who responded and used the calendar found the experience of booking an appointment that way to be easy, convenient, and unintimidating. Everyone who used the calendar indicated that they would prefer to use it again, and about half of the respondents who set up their appointments via email told us that they would prefer to book a consultation through INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 33 an appointment calendar in the future. Our anecdotal evidence in succeeding semesters aligns with this perception. We found that using appointment calendars can have many benefits for students: ● They can reduce student anxiety from having to compose and send an email. ● Booking appointments can take less of their time. They book immediately without back and forth emailing. This also means there’s no time to rethink the appointment and either never send the email or back out later. ● The appointment is placed on their calendar, meaning they automatically have a built-in reminder and don’t need to search through their email to find the date and time of their appointment. ● Since the appointment calendars eliminate back and forth scheduling and reduce email fatigue, students may be more willing to use email to discuss their topic and/or question with the librarian. Librarian Impressions Our experience has been equally positive. We found that using the calendars radically streamlines the typical back and forth email exchanges for setting appointments. We emailed each student to confirm the appointment, but this single email is still a significant reduction of claim on the librarian’s attention from a minimum of three emails to schedule an appointment (which often realistically becomes five or more when negotiating a time) to two. Additionally, librarians can put appointment slots in between meetings and other times when they might only have a spare hour, which are often too tedious to list when emailing. Using appointment calendars lets librarians efficiently use their time even when it is fragmented. As well as facilitating efficient use of small amounts of time, appointment calendars also allow librarians to gently create boundaries. Rather than having to deny appointments requested for late nights or weekends, students are guided to viable times. While the use of Google Calendar is entirely voluntary at the University of Colorado Boulder we presented the tool at several reference librarian meetings with success and several other librarians have happily adopted the tool. One librarian who adopted the tool said: “Sending a student a calendar that they can use to request a meeting eliminates the twelve messages back and forth on when to schedule a meeting. I also like that it puts the meeting on both our calendars, reducing the number of no-shows.” BEST PRACTICES Our experiences and verbal feedback from students and librarians provided a foundation to develop best practices to minimize both librarian and student confusion. For students, confusion often centered around accessing the calendar, identifying which time slots were available, and identifying acceptable locations for appointments. The following best practices can help solve these difficulties. Use a link shortener and a consistent naming convention so the links are similar for multiple librarians. Using a link shortener makes it easy for students to jot down the calendar URL, either to manually enter into a browser later or to quickly get to the link and bookmark it. This makes it easy for students to file the link and return to it at point of need. Using a consistent naming FACILITATING RESEARCH CONSULTATIONS USING CLOUD SERVICES: EXPERIENCES, PREFERENCES, AND BEST PRACTICES | KUGLITSCH, TINGLE, AND WATKINS | https://doi.org/10.6017/ital.v36i1.8923 34 convention makes it intuitive for students to transfer the appointment method over to other librarians’ cases for future research needs. If your link shortener is case-sensitive, create capitalized and lowercase versions of the link. Many link shorteners are case-sensitive, unlike most URLs, which can confuse students and lead to frustration when they try to access a link later. While this could be solved to some extent by using only lowercase letters for the shortened link, that solution can create a cumbersome and difficult to read short URL. Simply creating two forms of the link efficiently solves this. Develop a naming convention so available appointment slots are obvious. We found that when naming time slots simply “Consultation” students sometimes assumed that all appointments were booked when, in fact, every appointment was open. Using a term like “Available consultation” made it clear to students that the appointments were not already booked. Google Calendar automatically makes booked appointments unavailable, eliminating the opposite frustration. Carefully consider the location in the bookable appointment form. Google Calendar allows librarians to enter or leave empty the location. If the field is left empty, users can specify a location, and students often filled in a location when none was indicated. If a librarian is not mobile, or is available in certain places only at certain times, it is key to identify a location. For example, in our study, one librarian held weekly office hours in two academic buildings; it was particularly important to identify which times the librarian was available in the library versus the academic buildings. On the other hand, it may also make sense not to designate a location. Another of the authors, serving a population that used the main library, one branch library, and research area of the campus with no onsite library services, chose not to enter any location in order to accommodate the extremely dispersed population. Users frequently indicated in which location they would be willing to meet, an option the librarian wanted to support in order to underscore the availability of services wherever users were located on campus. Schedule two weeks of availability. We found that students could almost always find a time that worked for them with two weeks of available appointments. Moreover, other than recurring office hours, it was difficult for librarians to predict their schedule further into the future than a few weeks. Librarian concerns centered around keeping calendars synchronized, providing enough lead time for users to book appointments, and publicizing the service. We found several best practices that eased these concerns. Designate a day each week to update hours and clear conflicts on the calendar. If Google Calendar is not the primary calendaring software for the library, it can be challenging to synchronize calendars. Google Calendar sends a calendar invitation to the librarian when an appointment is claimed, which they can accept on their primary calendaring system, but conflicts that arise on the primary calendaring system are not automatically sent to Google Calendar. By selecting a day and habitually updating the Google Calendar and quickly checking for conflicts that have arisen with unclaimed slots, librarians can avoid forgetting to add slots or remove those that conflict with other late-arising obligations. Advertise the link on the library web site, give out the calendar link during class sessions and give it to professors to embed in course management systems. While appointment calendars still INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 35 benefit librarian workflows without advertising, students need easy access to the calendar. For maximum user uptake, it is important to put the calendar link anywhere a librarian’s contact information can be found. We found it helpful to promote the link in classes, and that it was particularly effective when professors agreed to place the link in the class web site. This positions library research assistance next to assignments when they are given out and drafts when they are returned--hopefully reminding students that the library is available for assistance at moments in which they are most likely to seek it. REFLECTIONS AND CONCLUSIONS Our experiences support the idea that online appointment calendars are appreciated by students, streamline work for librarians, and are easily adopted by both parties. More use of this technology, whether via Google Apps for Education or another service, can be mutually beneficial to librarians and students. Students using the calendar indicated that it was not more intimidating than emailing a librarian, and by removing the waiting period for a response, a calendar can prevent student distraction or students persuading themselves that they actually do not need help in the interim. By providing a calendar where students can quickly and simply book an appointment with a librarian for research assistance, librarians can support students seeking assistance, and thus ultimately bolster student success and increase the library’s relevance. REFERENCES 1. Naomi Lederer and Louise Mort Feldmann, “Interactions: A Study of Office Reference Statistics,” Evidence Based Library and Information Practice 7, no. 2 (2012): 5–19. 2. Ramirose Attebury, Nancy Sprague, and Nancy J. Young, “A Decade of Personalized Research Assistance,” Reference Services Review 37, no. 2 (2009): 207–20, https://doi.org/10.1108/00907320910957233; Trina J. Magi and Patricia E. Mardeusz, “What Students Need from Reference Librarians: Exploring the Complexity of the Individual Consultation,” College & Research Libraries News 74, no. 6 (2013): 288–91. 3. Trina J. Magi and Patricia E. Mardeusz, “Why Some Students Continue to Value Individual, Face- to-Face Research Consultations in a Technology-Rich World,” College & Research Libraries 74, no. 6 (November 1, 2013): 605–18, https://doi.org/10.5860/crl12-363. 4. Amanda Nichols Hess, “Scheduling Research Consultations with YouCanBook.Me Low Effort, High Yield,” College & Research Libraries News 75, no. 9 (October 1, 2014): 510–13. 5. Hess, “Scheduling Research Consultations with YouCanBook.Me Low Effort, High Yield,” 511. 8930 ---- September_ITAL_Ullah_for_proofing Bibliographic Classification in the Digital Age: Current Trends and Future Directions Asim Ullah, Shah Khusro, and Irfan Ullah INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 48 ABSTRACT Bibliographic classification is among the core activities of Library & Information Science that brings order and proper management to the holdings of a library. Compared to printed media, digital collections present numerous challenges regarding their preservation, curation, organization and resource discovery & access. Therefore, true native perspective is needed to be adopted for bibliographic classification in digital environments. In this research article, we have investigated and reported different approaches to bibliographic classification of digital collections. The article also contributes two evaluation frameworks that evaluate the existing classification schemes and systems. The article presents a bird’s-eye view for researchers in reaching a generalized and holistic approach towards bibliographic classification research, where new research avenues have been identified. INTRODUCTION Classification is the primary instinct of human beings in arranging, understanding, and relating knowledge artifacts. Bibliographic classification provides a framework for arranging and organizing knowledge artifacts preserved in the form of books, magazines, newspapers and other holdings to explore new avenues of knowledge management. Today several classification schemes are in use ranging from conventional classification schemes including Library of Congress Classification (LCC), Dewey Decimal Classification (DDC), Colon Classification (CC), and Universal Decimal Classification (UDC) to classification for digital environments including Association for Computing Machinery (ACM) digital library1, Institute of Electrical and Electronics Engineering (IEEE) digital library2, and Online Computer Library Center (OCLC) cooperative catalogue3. Besides the difficulties that lie in devising a classification scheme (time-consuming and resource- consuming), it is required that either the existing schemes should be revised and extended or a new classification scheme should be devised, which could act as a common platform for representing knowledge artifacts belonging to different contexts. Such a classification scheme should also resolve the challenges in digital preservation and curation and support the precise Asim Ullah (asimullah@upesh.edu.pk), Shah Khusro (khusro@upesh.edu.pk), and Irfan Ullah (cs.irfan@upesh.edu.pk) are researchers at the Department of Computer Science, University of Peshawar, Peshawar, Pakistan. 1 http://dl.acm.org/ 2 http://ieeexplore.ieee.org/Xplore/home.jsp 3 https://www.oclc.org/ BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 49 And accurate search and retrieval of digital collections. The first step, in this connection, is to properly analyze and evaluate the existing bibliographic classification schemes and to dig out their strengths and limitations in classifying digital collections accurately and appropriately. Therefore, the objectives of this research article include: • To investigate and evaluate the available approaches to bibliographic classification from the perspective of devising a classification scheme that can act as a common platform for classifying any type of digital collection. • To devise evaluation frameworks that compares the available bibliographic classification schemes and approaches. • To present issues, challenges, and research opportunities in state-of-the-art bibliographic classification research. The rest of the paper is organized as: Section 2 presents the current trends in the classification of digital collections. Section 3 presents two evaluation frameworks for comparing and evaluating the existing solutions. Section 4 presents research challenges and opportunities in bibliographic classification research. Finally, Section 5 concludes our discussion. References are presented at the end of the paper. Classifying Digital Collections – A Mixed Trend The bibliographic classification has been the focus of several researchers to properly classify, catalogue, and describe digital collections. In this regard, two approaches have been adopted: the former supports the use of conventional classification schemes including CC, DDC, and LCC etc., in describing and classifying digital documents, while the latter recommends devising some new ways of classification such as ACM4 computing classification. However, in most of the digital environments, a mixed trend has been observed, where along with new classification schemes, categorization is also used as a complementary solution. For example, ACM presents its own classification system as poly-hierarchical ontology in describing Computer Science literature and for using in Semantic Web applications. ACM has replaced its 2008 ACM classification system that serves as de-facto model for the classification of Computer Science literature by giving visual topic display along with searching services. It serves as semantic vocabulary for categorizing concepts and a foundation of computing disciplines ("The 2012 ACM Computing Classification System,"). Similarly, IEEE digital library categorizes its holdings into directories per its own rules of cataloguing and categorization. It categorizes articles and standards in to several subject areas and clusters documents through year of publication, author names, content type, affiliation, publication title, publisher, country of publication, alphabets, numerals and alphanumeric values5. The document collection can be navigated through collection names, number of documents, by topic and International Classification for Standards (ICS). 4 http://dl.acm.org 5 http://ieeexplore.ieee.org/browse/standards/ics/ieee/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 50 The DMOZ6 directory is the largest human made directory of web pages. Since its inception in 1998, it categorizes 3,861,137 websites available in 90 languages into 1,031,719 categories and sub-categories by 91,928 editors and volunteers. In addition, it has its DMOZ RDF dumps available on Linked Open Data (LOD) cloud. According to the World Wide Web Consortium (W3C), LOD enables the data integration and reasoning at a large scale ("Linked data,"). It establishes links among data enabling machines and users to explore the web of data rather than the web of documents along with finding related data (Berners-Lee, 2006; Bizer, Heath, & Berners-Lee, 2009). However, it lacks in semantic search (meaningful search), which affects the precision and accuracy in exploring the required resources. Also, the categories, under which the websites are kept, are needed to be revised because there can be faceted and intra-hierarchical links among web pages. In addition, the content management needs to be upgraded for updating the directory with new entries and the way it reviews and categorizes websites (Boykin, 2016). Institutional repositories use the mixed approach towards creating, collecting and managing metadata for printed and digital collections using several sources including conventional and digital. This mixed trend introduces challenges to the metadata managers (Chapman, Reynolds, & Shreeves, 2009). To deal with these challenges, the subject classification systems can be very beneficial in providing Web-oriented services including searching of contents through search patterns, browsing, and content filtering by subject area. However, at the same time, a cognitive overload rises for the authors and depositors of the institutional repository (Cliff, 2008) that needs further attention. To handle the information overload in retrieving digital collections, several controlled methods have been proposed in the literature ranging from manual techniques (e.g., web directories) to automatic techniques including clustering and classification. Several classification schemes including sentiment and subject classification have been developed for classifying (and categorizing) web pages. Classification is used in focused crawling, searching and ranking results, and classifying queries. Clustering also classifies web resources but it is slightly different from classification, which is based on a rigid predefined taxonomy and rules for interpreting the meaning of classification order. On the other hand, clustering shows flexibility in classification (categorization) of web documents (Zhu, 2011). However, a mixed trend has been observed, where classification and categorization are intermingled to facilitate organization, description, exploration, and retrieval of digital collections. Semantic Web brings meaningful connections between the web of data so that not only humans but machines can also understand the content of documents to retrieve the most intended and required documents. This way other related documents could also be easily connected and retrieved (Berners-Lee, 2006). To understand, describe, and relate concepts within documents, ontologies are used. Therefore, researchers have been working on bringing semantics through Semantic Web and related technologies to automatically classify digital collections. For example, 6http://www.dmoz.org BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 51 (Beghtol, 1986) argues that semantic axis makes syntactical classification structure more meaningful and provides the platform for developing relationships among knowledge artifacts through several warrants in classification systems. Similarly, classification ontology is used in automatic classification (Wijewickrema & Gamage, 2013) to minimize the ambiguity in vocabulary. To obtain a single subject for the input document, several weight functions including the term frequency-inverse document frequency (TF-IDF), and filtering methods are applied. Semantic Web and LOD technologies have also been used in dealing with bibliographic data. For example, BibBase7, a bibliographic data publishing and management tool (Xin, Hassanzadeh, Fritz, Sohrabi, & Miller, 2013) publishes bibliographic data on the user website according to LOD principles. However, these are limited because of the lack of interoperability among native languages while translating classification records from source language to the target language (Kwaśnik & Rubin, 2003). The classification schemes are also being converted into ontologies. (Giunchiglia, Marchese, & Zaihrayeu, 2007) have applied reasoning capabilities of OWL ontologies to classification schemes. These ontologies are used as interfaces to human knowledge for machines whereas classification schemes are interfaces to knowledge for humans. However, there is limited support available for cross-disciplinary searching and accommodation for more views and interpretations of knowledge (Albrechtsen, 2000). The supervised and unsupervised machine learning techniques are used for automatic text classification. Supervised machine learning techniques use models including multinomial Naïve Bayes model, and Bernoulli model (Manning, Raghavan, & Schütze, 2008) for classification. Yelton (2011) applies probabilistic classification of important words (and therefore of documents) especially by considering Amazon’s Statistically Improbable Phrases (SIPs)8 and Google phrase search inside a book. For subject analysis, he mentions simplistic; content-based; and requirements-based methods in terms of understanding text classification and manipulation of books. The Wikipedia page structural hierarchy is exploited in automatically harvesting, classification, categorization, clustering, and metadata enrichment (Yelton, 2011). Information Extraction (IE) is also applied in classifying books automatically. For example, (Betts, Milosavljevic, & Oberlander, 2007) use IE methods for automatic labeling of books using LCC classification. They used bag-of-words (BOW) model, bag-of-named-entity recognition (NER) model, generalizing named entities (GAZ) model in automatic text classification. To achieve better accuracy, they also combined the results of these models. However automatic classification may lead to limited search and retrieval because of the missing semantics associated with phrases or key words. To overcome this issue, a fundamental and practical theoretical model of classification is required (Jones, 1970). 7 https://bibbase.org/ 8http://www.amazon.com/gp/search-inside/sipshelp.html INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 52 Table 1 categorizes the bibliographic classification approaches into three broader categories namely: theoretical approaches, practical approaches and approaches used in digital environments. Theoretically researchers have discussed different viewpoints for classification, whereas we get a different view when these schemes are applied for classification. Practically, the syntactic structure is valued by using faceted and enumerative techniques. In digital environments like the Web and digital libraries, strict boundaries of classification are often compromised by categorization. Approaches to Classification Techniques Used Theoretical Approaches 1. Biasness (Mai, 2009) (Mai, 2010) 2. Subjectivity and objectivity (Hjørland, 2016) 3. Epistemological and Semiotic approaches (Hjørland, 2013) (Lee, 2012; Mai, 2011) (Tennis, 2008) 4. Empiricism, Rationalism, Historicism and Pragmatism (Hjørland, 2013) 5. Multidisciplinarity approach (Beghtol, 1998) 6. Scientific approaches (Hjørland, 2008) 7. Positivistic and pragmatic approaches (Dousa, 2009) (Mai, 2011) 8. Interdisciplinary and evidence based practice classification (Hjørland, 2016) 9. Social and cultural context (J.-E. Mai, 2004) 10. By tracking the universe of knowledge 11. Universal order (Smiraglia & Van den Heuvel, 2011) 12. Integrative levels in classification (Dousa, 2009) 13. Literary warrant (Rodriguez, 1984) 14. Education warrant (Hjørland, 2007) (Beghtol, 1986) 15. Semantic warrant (Beghtol, 1986) 16. Syntactic warrant (Beghtol, 1986) 17. Domain and users requirements (Mai, 2005) 18. Pluralism and human interpretations Practical Approaches 1. Enumerative and Faceted (Batley, 2014). 2. General Purpose approach (Mai, 2003) and Special Purpose approach (Mancuso, 1994) e.g. classification schemes for general classes of knowledge areas or for a special class of knowledge area. 3. Syntactic axis (Beghtol, 1986) (Beghtol, 2001) 4. Semantic axis (Beghtol, 1986) (Beghtol, 2001) BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 53 Classification in Digital Environment 1. Document Similarity (Hamming distance and Euclidean geometric approaches) (Losee, 1993) 2. Fuzzy approach (Jacob, 2004) 3. Clustering (Nizamani, Memon, & Wiil, 2011) 4. Categorization (Koshman, 1993) 5. TF-IDF weighting (Dorji et al., 2011) 6. Unsupervised machine learning techniques (Joorabchi & Mahdi, 2011). (K-mean Clustering, hierarchical clustering) 7. Supervised machine learning techniques (Wang, 2009) (Multinomial Naïve BAYES, Bernoulli model, Support Vector Machine, Random Forest, K-NN technique) 8. Information Extraction methods (Gilchrist, 2015) 9. Probabilistic text and document classification (Maron, Kuhns, & Ray, 1959) 10. Ontologies (Campbell, 2002) Table 1. Categorization of approaches towards bibliographic classification Evaluating Classification Schemes & Approaches In this Section, we present two evaluation frameworks to compare and evaluate the existing classification and categorization systems and well-known bibliographic classification ontologies. We have chosen CC, DDC, LCC, and Universal Decimal Classification (UDC) on the basis of their structural properties and wide usage both in conventional and digital libraries ("Subject classification schemes," 2015) ("Library of Congress Classification," 2014) ("About Universal Decimal Classification (UDC),") (Press, 2002) (Encyclopedia, 1 August 2014). Some of these properties include: citation and filling order; notations expressiveness; flexibility in classification principles, rules and notations; coverage of the knowledge areas; classification schedules and notations structure; notations brevity and simplicity; notations mnemonics; notations hospitality; schedules with updateable and comprehensive subjects order; and knowledge coverage (Batley, 2014). The UDC, LCC, and DDC are universal, multidisciplinary, and widely used systems (Koch & Day, 1997), whereas CC has the seminal and inspirational value for the faceted structure of the bibliographic classification. Therefore, the evaluation framework mainly targets these classification schemes as our natural choice for the evaluation and comparison. Similarly, we evaluate ACM9, IEEE10 and DMOZ11 using the evaluation framework as these are the well-known and widely used document classification & categorization systems for the digital libraries. Table 2 presents 22 metrics used in the evaluation framework. These evaluation metrics are extracted 9 http://www.acm.org/about/class 10http://www.ieee.org/about/today/at_a_glance.html?utm_source=mm_link&utm_campaign=iaa&utm_medium=ab& utm_term=at%20a%20glance 11 https://www.dmoz.org/docs/en/about.html INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 54 from the existing literature (Kaosar, 2008) (Painter, 1974) (Encyclopedia, 1 August 2014) (Buchanan, 1979) (Kaosar, 2008) (Painter, 1974) (Encyclopedia, 1 August 2014) (Koch et al., 1997) (Reiner, 2008) (Gnoli, Merli, Pavan, Bernuzzi, & Priano, 2008) (Francu, 2007) (Chan, Intner, & Weihs, 2016). These metrics include: (i) structural complexity; (ii) notational brevity; (iii) predefined structure; (iv) rules complexity; (v) theoretical laws; (vi) mnemonics; (vii) hospitality; (viii) search complexity; (ix) usability; (x) precision and accuracy; (xi) multilinguality; (xii) interoperability; (xiii) semantic search; (xiv) bias in subject representation; (xv) enumerative structure; (xvi) faceted structure; (xvii) faceted search; (xviii) consistency; (xix) LOD datasets; (xx) Linked Open Vocabularies (LOV) support; (xxi) platform; and (xxii) warrants of classification. These metrics, their need, and their use in ratings of classification systems are discussed in the following paragraphs. In Table 2, these bibliographic systems are evaluated for these metrics. The indicator ü shows the presence of metric value, û indicator represents that the system has no or minimal support for the mentioned metric, whereas and N/A is used for not applicable. In addition, each classification system has been evaluated and rated based on these metrics (Table 3), where Figure 1 graphically demonstrates the rankings and ratings of these classification systems. Schemes Metrics CC UDC DDC LCC ACM IEEE DMOZ Structural Complexity ü ü û û û û û Notational Brevity û û ü ü ü ü N/A Predefined Structure ü û ü ü ü ü ü Rules Complexity ü ü û ü û û û Theoretical Laws ü ü ü ü û û û Mnemonics ü ü ü ü ü ü û Hospitality ü ü ü ü ü ü ü Search Complexity ü ü û û û û ü Usability ü ü ü ü ü ü û Accuracy and Precision ü ü ü ü ü ü û Multilinguality ü ü ü ü û û ü Interoperability û ü ü ü ü ü û Semantic search ü ü ü ü ü ü û Bias in representation ü ü ü ü ü ü û Enumerative Structure û ü ü ü û û û BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 55 Faceted Structure û ü û û ü ü û Faceted Search û ü ü ü ü ü û Consistency ü ü ü ü ü ü ü LOD Datasets û ü ü ü ü ü ü LOV Support û û û û ü û û Platform N/A UDC consortium OCLC Library of Congress ACM digital library IEEE Xplore digital library Open Directory Project Warrants of classification Literary Warrant (Giess, Wild, & McMahon, 2007) Literary Warrant (Perles, 1995) Literary and Scientific Warrant (Giess et al.,2007) Literary and Scientific Warrant (Giess et al.,2007) Scientific Research warrant Scientific Research warrant N/A Table 2. Evaluation of Classification Schemes The structural complexity means difficulties in using the structure and notations in classifying and describing a specific subject area. The metric will help us in selecting a classification scheme or system that is easy to use in classifying document collection by requiring short notations and simple rules. The notations and rules are complex in CC and UDC (Ranganathan, 1968). This complexity is because of the faceted structure in these classification schemes (Sukhmaneva, 1970). The structural complexity of CC is greater than that of UDC. UDC comes at second position in complexity as compared to CC. Because of its enumerative structure, LCC stands at third position, as it is lesser complex than CC and UDC. DDC is the simplest in this list because it is based on enumerative classification structure and on the principle of dividing universe of knowledge into defined classes. IEEE is more complex than ACM, whereas DMOZ is the least complex system. The classification system with greater structural complexity is ranked lower in the list. Therefore, based on this metric, the classification systems can be ranked as DMOZ, ACM, IEEE, DDC, LCC, UDC, and CC. The notational brevity means how brief are the notations in describing and understanding the holdings with minimum number of symbols and minimal cognitive load. DDC uses well-organized short notations and their mnemonic value is also greater (Comaromi & Satija, 1983) (Hyman, 1980). LCC has notational brevity (Chan et al., 2016). UDC uses lengthy notations (Kaosar, 2008) as compared to DDC, whereas CC also uses lengthy and complex notations (Chatterjee, 2016). ACM notations are shorter than IEEE, whereas DMOZ do not use any notations at all. Using this metric, these classification systems can be ranked as ACM, IEEE, DDC, LCC, UDC, CC, and DMOZ at the last with no usage of symbols at all. The predefined structure means that the classification scheme follows rigid pre-assumed subject categorization along with classification class marks. In this regard, UDC and LCC are enumerative INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 56 and impose subjectivity viewpoint of classification by following a predefined structure (Goh, Giess, McMahon, & Liu, 2009). Being faceted, CC arranges basic concepts in few predefined categories (Satija & Martínez-Ávila, 2015). DDC also has the predefined hierarchical structure of classification (Press, 2002) (Jonassen, 2004). Among these schemes, CC has minimal predefined structure because of using facets; UDC is both enumerative and analytico-synthetic. LCC is enumerative but possesses weaker predefined rules for the structural design. Because of the rigid enumerative hierarchies and predefined class structure, DDC comes at first position. DMOZ has the most rigid predefined structure as compared to that of IEEE and ACM. The classification system with most rigid and predefined structure is ranked lower, and therefore, the ranking could be CC, ACM, IEEE, UDC, DDC, LCC and DMOZ. The complexity in rules determines the difficulty level in applying classification rules on knowledge artifacts. CC presents a complex set of rules and classification theory, which is comparatively difficult to implement and understand (Tennis, 2011). LCC is also complex ("Library of Congress Subject Headings: Pre- vs. Post-Coordination and Related Issues," March 15, 2007 ) in implementing Library of Congress Subject Headings (LCSH) in pre-coordinated subject strings. DDC’s rules and principles are comprehensive and complete (Press, 2002) and easier than those of CC and LCC. UDC is also easy to understand and implement (Piros, 2014). ACM, IEEE, and DMOZ are simple to use and understand, and therefore, bears no such complexity. A classification system with greater complexity is ranked lower, therefore, based on this metric, the rankings could be ACM, IEEE, DMOZ are on top with similar rankings followed by UDC, DDC, LCC, and CC. Theoretical laws are considered as a metric to analyze the foundations of classification systems to understand whether they are based on certain theoretical laws and principles of classification or not. UDC combines the enumerative and faceted approaches gathered from DDC and CC (Kaosar, 2008). The synthetic principle of UDC contributes to its widespread use but it is not enough at the intellectual level for making the relations between the subject facets (Kyle & Vickery, 1961). UDC lacks standard rules for its application for making facets, but there are rules for its structural representation (McIlwaine, 1997). Therefore, the structural and synthetic rules are good enough for its applicability but it should be refined further at the intellectual level. The theoretical laws of CC are based on the faceted approach of managing knowledge artifacts. CC has sound rules and principles, which include different postulates, laws, principles and canons (Batley, 2014) (Arashanapalai Neelameghan & Parthasarathy, 1997). On the other hand, LCC has weaker theoretical foundations. There also exists some intellectual and structural limitations due to its enumerative structure (San Segundo Manuel, 2008). DDC has the hierarchical and the enumerative structure which is based on the knowledge philosophy of hierarchical division (Hjorland, 1999). Because of the strong theoretical foundations, CC is at the top of this list, DDC is second because of its universal theory of knowledge division, UDC is third for being exploiting the theories of DDC and CC, LCC is at fourth position for comparatively weak theory of classification, whereas ACM, IEEE, and DMOZ present no or very limited theoretical laws or philosophical rules of classification. BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 57 The support for using mnemonics enables human classifiers to easily memorize the symbols and notations of classification scheme. The systematic and literal mnemonics are used in UDC (Satija, 2013) (Kaosar, 2008). The mnemonics are increased through mnemonic devices, which are described through the canons of mnemonics (Kaula, 1965). LCC uses literal mnemonics (Satija, 2013), whereas DDC uses systematic and literal mnemonics but its systematic mnemonics are not consistent (Satija, 2013). There are several seminal mnemonics in CC (Rahman & Ranganathan, 1962). These mnemonic devices increase mnemonics in CC, but the formation and length of the notations affects this mnemonic quality. ACM has greater support for mnemonics in comparison with IEEE, whereas DMOZ is the collection of web pages under specific categories. Based on this metric, the rankings of classification systems could be DDC, UDC, LCC, ACM, IEEE, CC, whereas DMOZ lacks in using any mnemonic devices or notations. Hospitality means the ability of a classification scheme to incorporate new knowledge areas expressed in different multilingual contexts. Hospitality is present in UDC (Kaosar, 2008). CC is also hospitable for new subjects (De Grolier, 1962). LCC is hospitable for expressing the new subjects and knowledge areas (Satija, 2013). DDC is hospitable for new subject areas (Satija, 2013). By applying this metric, a classification scheme with faceted approach is naturally more hospitable than others. Therefore, CC is more hospitable and at the top in this list followed by UDC. DDC is at third position for being following enumerative approach. LCC is at fourth position because of it’s of pure enumerative structure. IEEE and ACM are at fifth position by covering short span of knowledge areas, faceted structure, and efficient search. DMOZ is covering only web pages in already specified categories therefore it is at seventh position. Search complexity measures the difficulty in searching artifacts using a classification scheme. It describes that which classification scheme is worth in searching a specific document. Search complexity is minimal in UDC because of its syntactic-analytico and enumerative nature (Kaosar, 2008), which can contribute in search applications in both Web based and in-house searching applications e.g., Online Public Access Catalog (OPAC). The theory and philosophy of CC is the trend setter for the knowledge management, resource discovery & access, however, according to (Raghavan, 2016) searching through CC is comparatively weaker than other bibliographic classification schemes. According to (Chan, 2000), LCC and LCSH have the potential to provide the ease in searching because of richer vocabulary for greater subject coverage, synonym and homograph capabilities, pre-coordinated system, browsing capability in multi-faceted structure, multilingual support and MARC format support with sematic interoperability. However, it is limited in providing ease in search & retrieval process, which include syntax and application rules complexity, lack of training for the personnel, and too lengthy and complex searching strings. DDC and LCC are aggregated in Classify12 project initiated by OCLC. With the use of the Classify application, the search experience of the catalogers and patrons becomes much easier. Using this metric, DDC stands at the top with least complexity than LCC, UDC, and CC. IEEE is more complex than ACM and DMOZ. The classification scheme with less search complexity will be ranked higher. 12 http://www.oclc.org/research/themes/data-science/classify.html INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 58 Therefore, ACM and IEEE, DDC, and LCC stand first with least search complexity followed by UDC, and CC. DMOZ stands at the last position with greater search complexity having loose boundaries of categorization. Usability analyzes the difficulty in using a classification scheme for classifying and searching documents. This metric defines the ease of learning and effective usage. Usability measures user satisfaction, user understanding of the system, and precision with minimal recall in lesser amount of time (Singapore, 2016). OCLC has included structural changes to improve usability and simplify classification tasks ("Dewey Services: Dewey Decimal Classification System,"). The Classify13 project aims at finding books through a web interface, which is easy to use and understand by using DDC and LCC. UDC is extensively used in Web-based search and retrieval applications (Kaosar, 2008). This classification scheme is used in several institutions’ OPAC systems ("Library OPACs containing UDC codes,"). The UDC notations are supportive for the usability (Slavic- Overfield, 2005). However, the user interface of these OPAC search systems could be further improved (Slavic, 2006) (Pollitt, 1998) (Schallier, 2005). CC is the source of inspiration and a standardized model for the usability of faceted structure of bibliographic classification in the electronic and web based environments (Thelwall, 2009). In (Rosenfeld & Morville, 2002), the philosophy and methodology of CC is considered at the abstract and theoretical level. This assessment of CC leads us to the argument that the faceted structure is supportive in precise retrieval with a considerably high cognitive work at the user end as compared to DDC and LCC because of their simple enumerative structures. Library of Congress uses LCC in its catalog14 and Classification Web15 applications. These applications are exploiting LCSHs and LCC in user friendly manner. By looking at the usability aspect of these classification schemes, the ranking through this metrics appears as DDC is at the top for its easy enumerative structure and notational simplicity along with easy to use Web applications. LCC is at second position because of its enumerative structure and adoptability in web applications. Being enumerative and faceted, UDC stands at the third position. CC for being a pure faceted scheme with complex notations and rules, is ranked at the fourth position. Similarly, IEEE and ACM are faceted and easy to use, and therefore, share the first position with DDC. DMOZ with loose boundaries of categorization is least usable with limited browsing and search. The accuracy and precision metric measures how accurate and precise a classification system can identify the exact locations of the holdings in the given knowledge space. UDC shows accuracy and precision in finding the required knowledge artifact (Kaosar, 2008). The accuracy and precision of CC gets compromised as its lengthy notations introduces complexity in searching and discovering documents (Satija, 2015). LCC and DDC were researched for accuracy and precision by using a prototype model (Gnoli, Pusterla, Bendiscioli, & Recinella, 2016) for automatic text classification of electronic documents using classification metadata of library holdings from LCC and DDC 13 http://classify.oclc.org/classify2/ 14 https://catalog.loc.gov/vwebv/searchBasic 15 https://www.loc.gov/cds/classweb/classwebfeatures.html BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 59 datasets. It was observed that for precision, there is a need for increasing DDC and LCC bibliographic data on the Web, introducing searching capabilities for bibliographic data at the micro level of any document, and increasing the efficiency of user interfaces for navigation using DDC-based browsing structure (Joorabchi & Mahdi, 2009) (Joorabchi & Mahdi, 2011). Therefore, CC because of the pure faceted approach has high-level precision in search and resource discovery. UDC stands second because of being enumerative and enumerative and analytico- synthetic. DDC is at the third position as OCLC maintains and updates its structure regularly along with state-of-the-art search applications. LCC shares the third position with DDC, being regularly updated and maintained by Library of Congress for precision in their search application. IEEE and ACM also show greater precision in their search & retrieval, and therefore, share the third position with DDC and LCC. DMOZ are the manually created and updated categories of web pages, having limited keyword search with very low precision. In connection to the evaluation framework, multilinguality means to classify and describe the knowledge artifacts written and expressed in variety of natural languages and the availability of any classification scheme in different natural languages. DMOZ supports 72 different languages of the world and therefore stays at the top. UDC is multilingual by supporting French, Portuguese, Spanish and Russian (Slavic, 2008) (Koch & Day, 1997) and has been translated into languages ("Universal Decimal Classification summary," 2017). LCC supports works in 19 language subclasses ("Library of Congress Classification Outline: Class P - Language and Literature,") including German, Slavic, Oriental Languages and Roman languages etc. The translations of DDC support to localize this scheme for different languages of the world (Vizine-Goetz, 2009). DDC is translated in 30 different languages but covers different languages in only seven classes i.e., from 420 to 490 class number ("Dewey Decimal Classification summaries,"). CC shows minimal multilingual support because of its sub-continental origin (A Neelameghan & Lalitha, 2013; Raghavan, 2016). ACM and IEEE are in English languages only and therefore, show no multilinguality at all. Using this metric, we can conclude that DMOZ is at the first position, followed by UDC, DDC, LCC, and then CC. Consistency measures the level of uniformity in classification system to classify subjects. According to (Batty, 1967), in the earlier stages, CC shows no consistency but by the addition of consistency cannons, it has gradually become consistent. LCC seems less consistent in expressing different subjects areas (Madge, 2011). DDC and LCC were found short of defining and classifying religious holdings especially Jewish contents. These schemes also show biasness towards different religious and regional contents (Maddaford & Briefing). Although DDC is a little bit inconsistent, still it can classify complex subjects (Gnoli et al., 2016). UDC also shows inconsistency, which can be sorted out by introducing specific UDC classes to database in online system (Kaosar, 2008). DDC shows comparatively greater consistency in classifying new subjects with constant uniformity; CC is ranked second because of the introduction of canons of consistency. LCC and UDC are ranked at third position. For being only limited to the scientific research articles, IEEE INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 60 and ACM are at fourth position. DMOZ stands at fifth position due to its loose boundaries of categorization. The interoperability determines how much a given classification scheme is interoperable in expressing its classification artifacts with other schemes. UDC is interoperable (Koch & Day, 1997) and supports integration with other systems. CC, because of its sub-continental origin, shows limited interoperability (A Neelameghan & Lalitha, 2013) (Raghavan, 2016). LCC shows interoperability by being capable to map with DDC (Vizine-Goetz, 2009). The interoperability and multilinguality of DDC enables it to map with other classification schemes (Vizine-Goetz, 2009). IEEE, ACM and DMOZ datasets are interoperable with other web applications. Based on this metric, DDC, LCC, UDC, ACM, IEEE and DMOZ are standing at first position because of the presence of their interoperability and data harvesting protocols and ontologies in the digital environment. DMOZ stands at the second position because of limited interoperability. CC provides only philosophical and theoretical model but we found no practical web-based application so it is not included in this list. By enabling semantic search, a classification scheme can proactively respond to information seekers using its faceted structure. UDC, because of its semantic structure (Slavic, 2008), contains semantic search capability. The classification theory and philosophy of CC provides the basis for classification ontology development (Panigrahi & Prasad, 2005), which makes obvious its capability of semantic search and inference. LCC supports semantic search through LOD support, semantically enabled LCSH and authority control files ("LC Linked Data Service: Authorities and Vocabularies,") (Harper & Tillett, 2007). DDC also contains semantic features (Green, 2015), which can be utilized in the semantic search applications. Therefore, it can be concluded that semantic search is also supported by DDC. This metric can be better analyzed in the digital environment and especially through analyzing these bibliographic classifications for their ontologies. LCC could be ranked first because of its expressive ontology with efficient semantic search application. DDC is at second position, because of efficient search but limited usage of its ontology. ACM is at third position because of its expressive ontology and efficient search but limited coverage to scientific domain. IEEE is at fourth position because of its faceted semantic search. UDC comes at fifth position because of its ontological presence but with limited usage. CC has no application in the digital environment, which could demonstrate its capability for semantic search, although it provides the basis for the semantic level for all bibliographic classification systems. DMOZ lacks in semantic search, where it is only based on keywords. Bias in subject representation means inclination for or against some subjects which results in unfair, partial negligence or fully ignoring any subject. DDC and LCC are biased in representing different knowledge and regional information, e.g., Anglo-American bias (Tomren, 2003), while UDC is biased towards European culture (Fandino, 2008). CC is biased towards different knowledge areas (Satija & Singh, 2010). A classification system with least biasness is ranked higher. Therefore, in this connection, DMOZ is ranked higher for showing no/least biasness; CC is ranked second because of the presence of acute biasness followed by DDC showing comparatively BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 61 less biasness towards religion and regional subjects. LCC comes at the fourth position followed by IEEE and ACM that show greater biasness towards certain domains. Enumerative structure exhibits the rigid hierarchies. LCC is enumerative (Goh et al., 2009; Perles, 1995) (Bryant, October 4, 1993). UDC is nearly enumerative and faceted (Kaosar, 2008) (Bryant, October 4, 1993) and DDC is both analytico-synthetic and enumerative (Hallows, 2014). CC is faceted (Chatterjee, 2016; Dawson, Brown, & Broughton, 2006). By comparing these systems, LCC fully supports enumerative structure, and then comes DDC, whereas UDC is nearly enumerative and CC shows no enumerative structure at all. LCC and DDC are enumerative. The trend is towards semantic and faceted structure, and therefore, enumerative structure in classification systems is not a desirable characteristic. Therefore, the system with enumerative nature will be ranked lower. Based on this metric, CC and DMOZ are least enumerative and therefore, ranked higher, followed by IEEE and ACM at the second position, then UDC at the third position, while DDC and LCC at the last. The faceted structure means the semantically interlinked structure of categories, which can be merged and combined to generate an expression for existing or new concepts (Svenonius, 2000). CC is faceted (Chatterjee, 2016; Dawson et al., 2006). UDC is analytico-synthetic (Kaosar, 2008) and follows the faceted method of CC using different connecting symbols in mixed notations and using subject facets including time and space (Chatterjee, 2016). IEEE and ACM possess faceted structures. DMOZ has only hierarchical structure and predefined categories. Based on this metric we rank CC first, UDC second, ACM and IEEE third while DDC and LCC are enumerative structures, and therefore, cannot be included in the list. Faceted search means to navigate or browse through the faceted structure of a faceted classification scheme. Faceted search is also applied by selecting different ranges and choices from different facets that are given by any faceted system to search the required contents. It is different from search complexity in the sense that it looks at the pattern and criteria of search that exist in any classification scheme either in there OPACs or web applications. The theory and philosophy of CC supports faceted search & browsing economically (Kong, 2016), however, to the best of our knowledge, no real-world application demonstrates its usefulness. UDC is based on the faceted approach, which supports faceted search (Tunkelang, 2009). LCC supports faceted search with the help of LCSH (McGrath, 2007). LCC also provides faceted search through the Faceted Application of Subject Terminology (FAST) application ("Faceted Application of Subject Terminology," 2017). DDC provides the faceted search through the OCLC Classify16 application. Using this metric, these classification schemes can be ranked as DDC at first position because DDC is adopting the faceted approach along with its native enumerative nature and state-of-the-art web based search applications developed by OCLC. LCC is at second position because of its web based search applications and its adaptation of comparatively restricted faceted approach. IEEE, for providing extensive choice of searching patterns, stands at the third position. ACM has poly-hierarchical and 16 http://classify.oclc.org/classify2/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 62 multi-faceted classification structure along with robust search mechanism; therefore, it is on fourth position in this list. There are very limited faceted search applications of UDC and therefore it stands at fifth position. DMOZ has hierarchical structure in which the required element can be accessed through a keyword search. Therefore, it is not providing any faceted search. CC has no search applications that could confirm its support for the faceted search. LOD datasets means the availability of datasets of a given classification system on LOD cloud. Among our choice of well-known classification systems UDC, LCC, DDC, IEEE, ACM and DMOZ have datasets in the LOD cloud whereas CC has no such datasets. The definitions of classes and properties are gathered in Linked Data Vocabularies (LOV), which are used for describing different types of objects used in LOD cloud. These definitions of different things provide vocabularies for linking the linked data (Foundation, 2017). CC, UDC, DDC, LCC, IEEE and DMOZ have no LOV, whereas ACM has LOV vocabularies. The metric “platforms” in the evaluation framework, considers the applicability of a given classification system in real-world web applications and other digital environments. In this regard, UDC is supported by UDC consortium, DDC by OCLC, LCC by Library of Congress, ACM by ACM digital library, IEEE by IEEE Xplore digital library, and DMOZ by Open Directory Project. To the best of our knowledge, CC has not been used by any of the online applications. Table 3. Ranking and Average Ranking of Classification Schemes The warrants of classification work as authoritative acts for classificationists to perform the cognitive practice for designing the classes and concepts in the classification system, their structural properties and then putting subjects in the specified classes (Beghtol, 1986). CC and R an ki n g St ru ct ur al C om pl ex it y N ot at io n al b re vi ty P re de fi n ed S tr uc tu re R ul es C om pl ex it y T he or et ic al L aw s M n em on ic s H os pi ta li ty Se ar ch C om pl ex it y U sa bi li ty P re ci si on a n d A cc ur ac y M ul ti li n gu il it y In te ro pe ra bi li ty Se m an ti c Se ar ch B ia sn es s En um er at iv e St ru ct ur e Fa ce te d S tr uc tu re Fa ce te d S ea rc h Co n si st en cy LO D d at as et s LO V S up po rt A ve ra ge R an ki n g CC 1 2 7 1 5 2 6 2 2 4 2 1 7 5 4 4 1 4 1 1 3.1 UDC 2 3 4 4 3 6 5 3 3 3 5 3 3 3 2 3 2 5 2 1 3.25 DDC 4 5 3 3 4 7 4 4 5 2 4 3 6 4 1 1 6 3 2 1 3.6 LCC 3 4 2 2 2 5 3 4 4 2 3 3 2 1 1 1 5 3 2 1 2.65 ACM 6 7 6 5 1 4 2 4 5 2 1 3 5 2 3 2 3 2 2 1 3.3 IEEE 5 6 5 5 1 3 2 4 5 2 1 3 4 2 3 2 4 2 2 2 3.15 DMOZ 7 1 1 5 1 1 1 1 1 1 6 2 1 6 4 1 1 1 2 1 2.25 BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 63 UDC use literary warrant; DDC and LCC use literary & scientific warrants. ACM and IEEE use scientific research warrant, while DMOZ exhibits no warrant of classification. In the above paragraphs, we compared and evaluated the selected classification system using the evaluation metrics (shown in Table 2), and discussed how these systems can be ranked based on a given evaluation metric. However, to give a holistic view of this comparison and evaluation, we introduce a ranking score or levels ranging from 1 (meaning low ranking, not applicable, or not available) to 7 (meaning high ranking) in how a classification scheme is best among its counterparts in the list. It is also the case that for a given metric, multiple systems may belong to the same ranking level. By assigning these ranking levels, Table 3 compares these systems based on 20 metrics by excluding platforms and warrants of classification. Table 3 also reports the average ranking of these classification systems, showing DDC at top with average ranking of 3.6, followed by ACM = 3.3, and UDC = 3.25. It can be concluded that DDC and UDC are among the best classification schemes for describing printed as well as digital collections, whereas ACM is best for classifying digital collections belonging to Computer Science domain. However, ACM classification system can be extended to include other domains as well. Figure 1 illustrates graphically the comparison and evaluation of these systems. Figure 1. Comparison and Ranking of Classification Systems Table 4 presents the state-of-the-art bibliographic classification ontologies including Bibliographic ontology, LCC ontology, DDC ontology, UDC ontology, and DMOZ ontology. Some of these ontologies were designed specifically for certain targeted applications e.g., ACM ontology for ACM digital library, and LCC ontology for Library of Congress etc., whereas others have multiple usage scenarios and have been used by several applications. An example of such general-purpose INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 64 bibliographic classification ontology is the Bibliographic Ontology17, which is used by several bibliographic services and digital libraries e.g., digital object identifier (DOI), Zotero, and Library of Congress Classification Number (LCCN) permalink service (Giasson, 2012). This evaluation framework compares these ontologies based on their size (in terms of number of classes), usage in the state-of-the-art applications, LOD support, the availability of datasets on datahub18, and LOV support. By looking at Table 3, ACM show more comprehensiveness in terms of number of classes, triples and LOV support. Classification and Categorization Ontologies No. of classes Applications LOD datasets LOD datasets triples LOV support Bibliographic ontology19 69 Library of Congress and BibBase ü 200000 ü LCC ontology20 40+ Library of Congress ü Not Given û DDC ontology21 20+ OCLC ü 402288 û UDC ontology22 2,600 UDC23 ü 69,000 û ACM ontology24 1469 ACM ü 12402336 ü IEEE LOM metadata ontology (Casali, Deco, Romano, & Tomé, 2013) 9 IEEE25 Xplore digital library ü 91564 ü DMOZ ontology26 Not given Open Directory Project ü Not given û Table 4. Comparison of classification and categorization ontologies 18 https://datahub.io 19 http://purl.org/ontology/bibo 20 http://id.loc.gov/ 21 http://dewey.info/ 22 http://udcdata.info/ 23 http://udcdata.info/ 24 http://dl.acm.org/ccs/ccs.cfm 25 http://ieee.rkbexplorer.com/id/ 26 https://www.dmoz.org/rdf.html BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 65 Issues & Challenges in Classification Research Although bibliographic classification has been practiced since the use of books and the inception of library & information science practices, further research & development efforts are required for to meet the classification needs of the digital age. Especially, with the arrival of digital holdings, researchers face several issues and challenges. For example, automatic text classification performs categorization of resources using ordinary metrics including TF-IDF and classification in its true sense is yet to be achieved (Yi, 2006). To handle the issue, text classification is also carried out through semantic indexing but its accuracy and precision are yet to be achieved. Semantic and structural relationships among different parts of text corpus is still at infant level and has not been exploited to their fullest so that these can be used in text classification in more meaningful ways. Other challenges in text classification include handling huge data resulted by applying a classification scheme, dynamism in classification, and structure dissimilarity among classification schemes although they agree upon subject as the primary characteristic. The biasness in DDC and LCC needs to be resolved. Several revisions and proposals are put forward for addressing the problem of systematic knowledge organization and searching through natural language terms (Miksa, 2007). There are various issues regarding the structural updates, search & retrieval criteria, and visualization (Slavic-Overfield, 2005). There are two main challenges for the application of the bibliographic classification principles in classifying the Web. First, the principles of the bibliographic classification are formulated for the printed documents, which should also be applicable to digital collections. For addressing these challenges, there is need to apply and modify bibliographic classification principles in digital environments. Second, it is required to exploit hidden hierarchies and concepts to be better classified by the principles of bibliographic classification for precise discovery, search and retrieval (J. Mai, 2004). The issue of dependent process of classification of any object per predefined criteria and principles is important to address for finding a place in this age of search engines. This issue can be tailored by the principles of classification, so that the conventional principles are modified to consider the purpose of classification and domain of objects. For this issue semantic web and ontologies can play a vital role in bibliographic classification, which can provide independent classification of the bibliographic classification predefined theories (Hjørland, 2012). The issue of heterogeneity conflicts, which arise because of the inconsistencies and structural divergences, are the challenges for the semantic interoperability. Semantic interoperability can be brought into the bibliographic records inside the bibliographic system and across the systems through different phases of interlinking, evaluation, analysis, remodeling & conversion for analyzing, and restructuring the bibliographic data (Tallerås, 2013). Bibliographic data is in multi-format, multi-topical, multi-lingual and multi-targeted. For tackling these issues, the bibliographic data must be made mutually interoperable for making it INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 66 interlinked, searchable, and presented in a harmonized way across the boundaries of the datasets and data silos. The interoperability problem arises at the syntactic level for making consistent the character sets, notations, data formats and records in different systems. The interoperability problem is also arising at the semantic level because of the difference in data interpretations and difference in vocabularies, and precision levels in data encoding. Publishing, collecting and maintenance of bibliographic data by multi organizations through own established standards and best practices in Web 2.0 (Hyvönen, 2012). With these problems in hand, the transition of this data from syntactical Web to Semantic Web is a challenge for bringing the uniformity in records that are generated by diverse sources, encoded in multi-bibliographic systems, cross bibliographic systems interoperability, the visualization of bibliographic data accordingly as per need for different contexts. For addressing these problems there is a need for coordination and collaboration between bibliographic data publishers and the technical developers of the web applications (Hyvönen, 2012). There is variety of metadata standards and schemas for defining, managing, resource discovery, search & retrieval, preserving, mapping, cross-walking, integrity, accuracy, and authenticity of metadata and bibliographic data. But for these tasks to be handled with great simplicity, semantic richness and accuracy, a universal all in one metadata format and schema is the need of the day (Ramesh, Vivekavardhan, & Bharathi, 2015) to get out of this jungle of standards (Gartner, 2016). This way, the metadata publishers and managers could get relieved and the job will become economic in terms of time, management, and search & retrieval. Three main tasks were set in Semantic Publishing challenges 2015. These tasks are: (i) extracting data on workshops’ quality indicators; (ii) extracting data on affiliations, citations, funding; and (iii) interlinking. Several challenges were faced while fulfilling these tasks. These tasks are being fulfilled through a proposed solution, which is composed of a text mining pipeline, LODeXporter and Named Entity Recognition (NER) for named entities extraction form text and linking them to resources on the LOD cloud (Sateli & Witte, 2015). In (Peroni, 2012) three main issues of semantic publishing are addressed which are: lack of document publishing universal metadata schemas according to publishing vocabulary, lacking of efficient user interface that are based on models and theories of semantic publishing, and there is a need for a tool that semantically link and describe document text. These issues require the urgent need for comprehensive ontologies for document publishing domain. (Ferrara & Salini, 2012) tossed 10 challenges for multiple dimensions of data in terms of bibliographic analysis. These challenges are: (i) analyzing bibliographic data in a multidimensional pattern; (ii) discovering and integrating data coming from diverse sources; (iii) detecting multiple references to the same item and cleaning, normalizing, and disambiguating bibliographic data records; (iv) analyzing multidimensional nature of bibliographic data through multivariate analysis for aggregating the data; (v) comparing different elements of bibliographic data and its ranking accordingly, (vi) aggregating indexes of different nature with respect to BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 67 different parameters, dimensions, and elements of bibliographic data; (vii) dealing with multiple indexes for the same item with different values coming from different sources; (viii) extracting and indexing textual information from text corpus in support of text mining; (ix) analyzing textual data topic-wise and describing these topics for research and learning process and tracing different trends; and (x) combining multidimensional information for finding trends in bibliographic data collection. Bibliographic classification systems are being incorporated in LOD. In Dewey.info27, a prototype version of DDC is designed for linking its dataset in linked data cloud. The intention is to provide a platform for DDC data on the Web having summaries of top 3 levels of classification order of DDC 22nd edition in 11 different language encoded in RDF/SKOS, having actionable URIs for every class, representation for machines is in RDF, and for humans in XHTML+RDFs, and serialization available in formats of RDF/XML, Turtle and JSON, and with SPARQL endpoint. (OCLC 2011; Mitchell and Panzer 2013). However, this version of DDC on LOD cloud is still at infant stage to cover different subjects and to be widely used in generating and creating documents metadata. Library of Congress Linked Data service provides access to commonly used standards and vocabularies developed by Library of Congress. This includes data values, controlled vocabularies, and preservation vocabularies which are part of this service. This service provides access to LCSH, LCC name authority files, LCC28, LC children's subject headings, LC genre/form terms, thesaurus for graphic materials, MARC relators, MARC countries, MARC geographic areas, MARC languages, ISO639-1 languages, ISO639-2 languages, ISO639-5 languages, extended date/time format, preservation events, and preservation level role and cryptographic hash functions. The authorities and vocabularies currently included in this service are listed on the Linked Data service (Library of Congress 2014). However, it lacks in vocabularies for supporting PREMIS, MARC, MODS, METS, and MIX. As presented in Section 2, several ontologies have been developed for describing and sharing knowledge about bibliographic classification. However, the available ontologies are limited in several ways e.g., these ontologies are not the complete clones of classification schemes of which they are deemed to be ontologies and they also not mature enough in terms of metadata collection. In addition, these ontologies still couldn’t break the cross-classification scheme metadata collection barriers i.e., they are not interoperable enough to harvest the metadata across bibliographic ontology system. Therefore, further initiatives are required to develop matured bibliographic ontologies which fully clone bibliographic schemes that are in practical use and have strong theoretical ground. These ontologies must be interoperable and sharing metadata collection with other bibliographic ontologies. In this way in future we can have ontology-based general bibliographic classification system by fusion of the new and existing bibliographic ontologies for better management of the knowledge artifacts. 27 https://datahub.io/dataset/dewey_decimal_classification INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 68 CONCLUSIONS With the arrival of digital collections, new challenges of preservation, curation as well as resource discovery & access (retrieval) have emerged that needs proper attention, where classification schemes and ontologies can play a significant role. By comparing and evaluating the available bibliographic classification and categorization systems it is concluded that currently DDC is the best classification system followed by UDC, and ACM. The bibliographic classification ontologies are limited in one way or the other e.g., some of these are comprehensive like UDC and ACM but lack support for LOD and LOV etc., while others support these later aspects but lack comprehensiveness. Keeping in view the available bibliographic classification ontologies and their limitations, we recommend that a universal bibliographic classification ontology should be developed by using the classes from the available ontologies and providing support in terms of availability of datasets, support for interoperability, LOD, and Linked data vocabularies. For developing a more meaningful classification system, equally applicable to digital environments, it is necessary to consider the book structural semantics such as table of contents, headings, chapters, sections, subsections, figures, algorithms, mathematical equations, quotations etc., and the logical connections in contents (Khusro & Ullah, 2016; I. Ullah & Khusro, 2016) as well as about the book information i.e., the bibliographic details of the holdings. To meet, the former requirement, a comprehensive ontology like BookOnt (A. Ullah, Ullah, Khusro, & Ali, 2016) could be used, which can be mapped with any bibliographic ontology like e.g., Bibliographic Ontology29. However, as the evaluation frameworks suggest, DDC, UDC, and ACM Classification System should be exploited in designing such a general-purpose classification system. REFERENCES The 2012 ACM Computing Classification System. Retrieved March 20, 2017, from http://www.acm.org/about/class/2012 About Universal Decimal Classification (UDC). Retrieved March 21, 2017, from http://www.udcc.org/index.php/site/page?view=about Albrechtsen, H. (2000). Who wants yesterday's classifications? Information science perspectives on classification schemes in common information spaces. In K. Schmidt (Ed.), Papers. Technical University of Denmark, Center for Tele-Information. Batley, S. (2014). Classification in Theory and Practice. Oxford: Chandos Publishing. Batty, C. D. (1967). An introduction to colon classification: Archon Books. Beghtol, C. (1986). Semantic validity: concepts of warrant in bibliographic classification systems. Library resources & technical services, 30(2), 109-125. 29 http://bibliontology.com/# BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 69 Beghtol, C. (1998). Knowledge domains: multidisciplinarity and bibliographic classification systems. Knowledge Organization, 25(1-2), 1-12. Beghtol, C. (2001). Relationships in classificatory structure and meaning Relationships in the organization of knowledge (pp. 99-113): Springer. Berners-Lee, T. (2006, June 18, 2009). Linked data. Design Issues. Retrieved March 21, 2017, from https://www.w3.org/DesignIssues/LinkedData.html Betts, T., Milosavljevic, M., & Oberlander, J. (2007). The utility of information extraction in the classification of books Advances in Information Retrieval (pp. 295-306): Springer. Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. Semantic Services, Interoperability and Web Applications: Emerging Concepts, 205-227. Boykin, J. (2016). Assessing DMOZ: A Quality Review. Retrieved 14-03-2016, 2016, from https://www.seochat.com/c/a/search-engine-news/assessing-dmoz-a-quality-review/ Bryant, B. (October 4, 1993). 'Numbers You Can Count On' Dewey Decimal Classification Is Maintained at LC. Library of Congress Information Bulletin, 52(18). http://www.loc.gov/loc/lcib/93/9318/count.html Buchanan, B. (1979). Theory of library classification. Campbell, D. G. (2002). Centripetal and Centrifugal Forces in Bibliographic Classification Research. Paper presented at the ASIS SIG/CR Classification Research Workshop. Casali, A., Deco, C., Romano, A., & Tomé, G. (2013). An assistant for loading learning object metadata: An ontology based approach. Chan, L. M. (2000). Exploiting LCSH, LCC, and DDC to Retrieve Networked Resources: Issues and Challenges. Chan, L. M., Intner, S. S., & Weihs, J. (2016). Guide to the Library of Congress classification: ABC- CLIO. Chapman, J. W., Reynolds, D., & Shreeves, S. A. (2009). Repository metadata: approaches and challenges. Cataloging & classification quarterly, 47(3-4), 309-325. Chatterjee, A. (2016). Universal Decimal Classification and Colon Classification: Their mutual impact. Annals of Library and Information Studies (ALIS), 62(4), 226-230. Cliff, P. (2008). JISC-Repositories: Subject Classification Thread Summary. Comaromi, J. P., & Satija, M. P. (1983). Brevity of notation in Dewey decimal classification: Metropolitan. Dawson, A., Brown, D., & Broughton, V. (2006). The need for a faceted classification as the basis of all methods of information retrieval. Paper presented at the Aslib proceedings. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 70 De Grolier, E. (1962). A study of general categories applicable to classification and coding in documentation. Dewey Decimal Classification summaries. Retrieved March 21, 2017, from https://www.oclc.org/en/dewey/features/summaries.html Dewey Services: Dewey Decimal Classification System. Retrieved March 20, 2017, from https://www.oclc.org/content/dam/oclc/services/brochures/211422usb_dewey_services. pdf Dorji, T. C., Atlam, E.-s., Yata, S., Fuketa, M., Morita, K., & Aoe, J.-i. (2011). Extraction, selection and ranking of Field Association (FA) Terms from domain-specific corpora for building a comprehensive FA terms dictionary. Knowledge and Information Systems, 27(1), 141-161. doi: 10.1007/s10115-010-0296-x Dousa, T. M. (2009). Evolutionary order in the classification theories of CA Cutter & EC Richardson: its nature and limits. Encyclopedia, N. W. (1 August 2014). Library classification. 2017, from http://www.newworldencyclopedia.org/entry/Library_classification Faceted Application of Subject Terminology. (2017). Retrieved March 21, 2017, from http://www.oclc.org/research/themes/data-science/fast.html Fandino, M. (2008). UDC or DDC: a note about the suitable choice for the National Library of Liechtenstein. Extensions and Corrections to the UDC. Ferrara, A., & Salini, S. (2012). Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics, 93(3), 765-785. Foundation, O. K. (2017). About LOV. from http://lov.okfn.org/dataset/lov/about Francu, V. (2007). Multilingual access to information using an intermediate language: Proefschrift voorgelegd tot het behalen van de graad van doctor in de Taal-en Letterkunde aan de Universiteit Antwerpen. Gartner, R. (2016). Metadata: Springer. Giasson, B. D. A. F. (2012). Projects using BIBO. from http://www.bibliontology.com/projects.html Giess, M. D., Wild, P., & McMahon, C. (2007). The use of faceted classification in the organisation of engineering design documents. Paper presented at the Proceedings of the International Conference on Engineering Design 2007. Gilchrist, A. (2015). Reflections on Knowledge, Communication and Knowledge Organization. Knowledge Organization, 42(6), 456-469. Giunchiglia, F., Marchese, M., & Zaihrayeu, I. (2007). Encoding classifications into lightweight ontologies Journal on data semantics VIII (pp. 57-81): Springer. BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 71 Gnoli, C., Merli, G., Pavan, G., Bernuzzi, E., & Priano, M. (2008). Freely faceted classification for a Web-based bibliographic archive: the BioAcoustic Reference Database. Gnoli, C., Pusterla, L., Bendiscioli, A., & Recinella, C. (2016). Classification for collections mapping and query expansion. Goh, Y. M., Giess, M., McMahon, C., & Liu, Y. (2009). From faceted classification to knowledge discovery of semi-structured text records Foundations of Computational, IntelligenceVolume 6 (pp. 151-169): Springer. Green, R. (2015, October 29-30, 2015). Relational aspects of subject authority control: the contributions of classificatory structure. Paper presented at the Proceedings of the International UDC Seminar 2015 Classification & authority control Expanding resource discovery, Lisbon. Hallows, K. M. (2014). It's All Enumerative: Reconsidering Library of Congress Classification in US Law Libraries. Law Libr. J., 106, 85. Harper, C. A., & Tillett, B. B. (2007). Library of Congress controlled vocabularies and their application to the Semantic Web. Cataloging & classification quarterly, 43(3-4), 47-68. Hjorland, B. (1999). The DDC, the universe of knowledge, and the post-modern library. Journal of the Association for Information Science and Technology, 50(5), 475. Hjørland, B. (2007). Semantics and knowledge organization. Annual review of information science and technology, 41(1), 367-405. Hjørland, B. (2008). Core classification theory: a reply to Szostak. Journal of Documentation, 64(3), 333-342. Hjørland, B. (2012). Is classification necessary after Google? Journal of Documentation, 68(3), 299- 317. Hjørland, B. (2013). Theories of knowledge organization—theories of knowledge: Keynote March 19, 2013. 13th Meeting of the German ISKO in Potsdam. Knowledge Organization, 40(3), 169-181. Hjørland, B. (2016). Subject (of documents). Knowledge Organization, 44(1), 55-64. Hyman, R. J. (1980). Shelf classification research: past, present--future? Occasional papers (University of Illinois at Urbana-Champaign. Graduate School of Library Science); no. 146 (Nov. 1980). Hyvönen, E. (2012). Publishing and using cultural heritage linked data on the semantic web. Synthesis Lectures on the Semantic Web: Theory and Technology, 2(1), 1-159. Jacob, E. K. (2004). Classification and categorization: a difference that makes a difference. Library trends, 52(3), 515. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 72 Jonassen, D. H. (2004). Handbook of research on educational communications and technology: Taylor & Francis. Jones, K. S. (1970). Some thoughts on classification for retrieval. Journal of Documentation, 26(2), 89-101. Joorabchi, A., & Mahdi, A. E. (2009). Leveraging the legacy of conventional libraries for organizing digital libraries. Paper presented at the International Conference on Theory and Practice of Digital Libraries. Joorabchi, A., & Mahdi, A. E. (2011). An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. J. Inf. Sci., 37(5), 499-514. doi: 10.1177/0165551511417785 Kaosar, A. (2008). Merit & Demerit of using Universal Decimal Classification on the Internet. Kaula, P. (1965). Colon Classification: Genesis and Development. Library Science Today. Ranganathan’s Festschrift, 1, 87-93. Khusro, S., & Ullah, I. (2016). Towards a semantic book search engine. Paper presented at the 2016 International Conference on Open Source Systems & Technologies (ICOSST'16), Lahore, Pakistan. Koch, T., & Day, M. (1997). DESIRE - Development of a European Service for Information on Research and Education. Koch, T., Day, M., Brümmer, A., Hiom, D., Peereboom, M., Poulter, A., & Worsfold, E. (1997). The role of classification schemes in Internet resource description and discovery. Work Package, 3. Kong, W. (2016). Extending Faceted Search to the Open-Domain Web. University of Massachusetts Amherst. Koshman, S. (1993). Categorization and classification revisited: a review of concept in library science and cognitive psychology. Current Studies in Librarianship Spring/Fall, 26. Kwaśnik, B. H., & Rubin, V. L. (2003). Stretching conceptual structures in classifications across languages and cultures. Cataloging & classification quarterly, 37(1-2), 33-47. Kyle, B., & Vickery, B. C. (1961). The Universal Decimal Classification: present position and future developments: Unesco. LC Linked Data Service: Authorities and Vocabularies. Retrieved 28 Feb 2017, 2017, from http://id.loc.gov Lee, H.-L. (2012). Epistemic foundation of bibliographic classification in early China: A Ru classicist perspective. Journal of Documentation, 68(3), 378-401. Library of Congress Classification. (2014, 10/1/2014). Retrieved March 20, 2017, from https://www.loc.gov/catdir/cpso/lcc.html BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 73 Library of Congress Classification Outline: Class P - Language and Literature. [Press release]. Retrieved from https://www.loc.gov/aba/cataloging/classification/lcco/lcco_p.pdf . Library of Congress Subject Headings: Pre- vs. Post-Coordination and Related Issues. (March 15, 2007 ) Report for Beacher Wiggins, Director, Acquisitions & Bibliographic Access Directorate, Library Services, Library of Congress (pp. 49). Cataloging Policy and Support Office. Library OPACs containing UDC codes. Retrieved March 21, 2017, from http://www.udcc.org/index.php/site/page?view=opacs Linked data. from https://www.w3.org/standards/semanticweb/data Losee, R. M. (1993). Seven fundamental questions for the science of library classification. Knowledge Organization, 20, 65-65. Maddaford, S., & Briefing, C. Library of Congress Classification System. Madge, O.-L. (2011). Evidence Based Library and Information Practice. Studii de Biblioteconomie şi Ştiinţa Informării(15), 107-112. Mai, J.-E. (2003). The future of general classification. Cataloging & classification quarterly, 37(1-2), 3-12. Mai, J.-E. (2004). Classification in context: relativity, reality, and representation. Knowledge Organization, 31(1), 39-48. Mai, J.-E. (2005). Analysis in indexing: document and domain centered approaches. Information Processing & Management, 41(3), 599-611. doi: http://dx.doi.org/10.1016/j.ipm.2003.12.004 Mai, J.-E. (2009). The boundaries of classification. Mai, J.-E. (2010). Classification in a social world: bias and trust. Journal of Documentation, 66(5), 627-642. Mai, J.-E. (2011). The modernity of classification. Journal of Documentation, 67(4), 710-730. Mai, J. (2004). Classification of the Web: challenges and inquiries. Knowledge Organization, 31(2), 92. Mancuso, J. (1994). General Purpose vs Special Purpose Couplings. Paper presented at the 23rd Turbomachinery Symposium, Dallas, TX, Sept. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1): Cambridge university press Cambridge. Maron, M. E., Kuhns, J. L., & Ray, L. C. (1959). Probabilistic indexing: a statistical approach to the library problem. Paper presented at the Preprints of papers presented at the 14th national meeting of the Association for Computing Machinery, Cambridge, Massachusetts. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 74 McGrath, K. (2007). Facet-based search and navigation with LCSH: Problems and opportunities. code4lib Journal, 1. McIlwaine, I. C. (1997). The Universal Decimal Classification: Some factors concerning its origins, development, and influence. Journal of the American Society for Information Science (1986- 1998), 48(4), 331. Miksa, S. D. (2007). The challenges of change: a review of cataloging and classification literature, 2003-2004. Library resources & technical services, 51(1), 51. Neelameghan, A., & Lalitha, S. (2013). Multilingual thesaurus and interoperability. DESIDOC Journal of Library & Information Technology, 33(4). Neelameghan, A., & Parthasarathy, S. (1997). SR Ranganathan's Postulates and Normative Principles: Applications in Specialized Databases Design, Indexing and Retrieval: Sarada Ranganathan Endowment for Library Science. Nizamani, S., Memon, N., & Wiil, U. K. (2011). Cluster Based Text Classification Model Counterterrorism and Open Source Intelligence (pp. 265-283): Springer. Painter, A. F. (1974). Classification: Theory and Practice. Drexel Library Quarterly, 10(4), n4. Panigrahi, P., & Prasad, A. (2005). Inference Engine for Devices of Colon Classification in AI-based Automated Classification System. Perles, B. (1995). Faceted Classifications and Thesauri. Retrieved from Howard Besser's Web website: http://besser.tsoa.nyu.edu/impact/f95/Papers-projects/Papers/perles.html Peroni, S. (2012). Semantic Publishing: issues, solutions and new trends in scholarly publishing within the Semantic Web era. alma. Piros, A. (2014, 29 February 1, 2014). A different approach to Universal Decimal Classification in a Mechanized Retrieval System. Paper presented at the Proceedings of the 9th International Conference on Applied Informatics Eger, Hungary. Pollitt, A. S. (1998). The key role of classification and indexing in view-based searching: Technical report, University of Huddersfield, UK, 1998. http://www.ifla.org/IV/ifla63/63polst.pdf. Press, O. F. (2002). Introduction to the dewey decimal classification. Raghavan, K. (2016). The Colon Classification: A few considerations on its future. Annals of Library and Information Studies (ALIS), 62(4), 231-238. Rahman, A., & Ranganathan, T. (1962). Seminal Mnemonics. Annals of Library Science, 9, 53-67. BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 75 Ramesh, P., Vivekavardhan, J., & Bharathi, K. (2015). Metadata Diversity, Interoperability and Resource Discovery Issues and Challenges. DESIDOC Journal of Library & Information Technology, 35(3). Ranganathan, S. R. (1968). Choice of scheme for classification. Lib. Sci. with a slant to Documentation, 5(1), 1-69. Reiner, U. (2008). Automatic analysis of Dewey decimal classification notations Data analysis, machine learning and applications (pp. 697-704): Springer. Rodriguez, R. D. (1984). Hulme's concept of literary warrant. Cataloging & classification quarterly, 5(1), 17-26. Rosenfeld, L., & Morville, P. (2002). Information architecture for the world wide web: " O'Reilly Media, Inc.". San Segundo Manuel, R. (2008). Some arguments against the suitability of Library of Congress Classification for Spanish Libraries. Extensions and Corrections to the UDC. Sateli, B., & Witte, R. (2015). Automatic construction of a semantic knowledge base from CEUR workshop proceedings. Paper presented at the Semantic Web Evaluation Challenge. Satija, M. P. (2013). The theory and practice of the Dewey decimal classification system: Elsevier. Satija, M. P. (2015). Save the national heritage: Revise the Colon Classification. Satija, M. P., & Martínez-Ávila, D. (2015). Features, Functions and Components of a Library Classification System in the LIS tradition for the e-Environment. Journal of Information Science Theory and Practice, 3(4), 62-77. Satija, M. P., & Singh, J. (2010). Colon Classification (CC) Encyclopedia of library and information sciences (Vol. 2, pp. 1158-1168). Schallier, W. (2005). Subject retrieval in OPAC's: a study of three interfaces. Paper presented at the 7th ISKO-Spain Conference: The human dimension of knowledge Organization, Barcelona. Singapore, N. L. o. (2016). Usability on the Web. from http://www.nlb.gov.sg/resourceguides/usability-on-the-web/ Slavic-Overfield, A. (2005). Classification management and use in a networked environment: the case of the Universal Decimal Classification. University of London. Slavic, A. (2006). Interface to classification: some objectives and options. Slavic, A. (2008). Use of the Universal Decimal Classification: A world-wide survey. Journal of Documentation, 64(2), 211-228. Smiraglia, R. P., & Van den Heuvel, C. (2011). Idea Collider: From a theory of knowledge organization to a theory of knowledge interaction. Bulletin of the American Society for Information Science and Technology, 37(4), 43-47. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 76 Subject classification schemes. (2015). from http://www.ifla.org/best-practice-for-national- bibliographic-agencies-in-a-digital-age/node/9042 Sukhmaneva, E. (1970). The Problems of Notation and Faceted Classification. 17(3-4), 112-116. Svenonius, E. (2000). The intellectual foundation of information organization: MIT press. Tallerås, K. (2013). From Many Records to One Graph: Heterogeneity Conflicts in the Linked Data Restructuring Cycle. Information Research: An International Electronic Journal, 18(3), n3. Tennis, J. T. (2008). Epistemology, theory, and methodology in knowledge organization: toward a classification, metatheory, and research framework. Tennis, J. T. (2011). Ranganathan's layers of classification theory and the FASDA model of classification. Thelwall, M. (2009). Synthesis lectures on information concepts, retrieval, and services.". Introduction to webometrics: Quantitative web research for the social sciences. Tomren, H. (2003). Classification, bias, and American Indian materials. Unpublished work, San Jose State University, San Jose, California. Tunkelang, D. (2009). Faceted search. Synthesis lectures on information concepts, retrieval, and services, 1(1), 1-80. Ullah, A., Ullah, I., Khusro, S., & Ali, S. (2016, 19-21 Dec. 2016). BookOnt: A Comprehensive Book Structural Ontology for Book Search and Retrieval. Paper presented at the 2016 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan. Ullah, I., & Khusro, S. (2016). In Search of a Semantic Book Search Engine on the Web: Are We There Yet? Artificial Intelligence Perspectives in Intelligent Systems (pp. 347-357): Springer. Universal Decimal Classification summary. (2017). from http://www.udcsummary.info/php/index.php?id=67277&lang=en# Vizine-Goetz, J. S. M. D. (2009). The Dewey Decimal Classification. Encyclopedia of Library and Information Science. Wang, J. (2009). An extensive study on automated Dewey Decimal Classification. Journal of the American Society for Information Science and Technology, 60(11), 2269-2286. Wijewickrema, C. M., & Gamage, R. (2013). An ontology based fully automatic document classification system using an existing semi-automatic system. Xin, R. S., Hassanzadeh, O., Fritz, C., Sohrabi, S., & Miller, R. J. (2013). Publishing bibliographic data on the Semantic Web using BibBase. Semantic Web, 4(1), 15-22. Yelton, A. (2011). A Simple Scheme for Book Classification Using Wikipedia. Information Technology and Libraries, 30(1), 7-15. BIBLIOGRAPHIC CLASSIFICATION IN THE DIGITAL AGE | ULLAH, KHUSRO, AND ULLAH | doi:10.6017/ital.v36i3.8930 77 Yi, K. (2006). Challenges in automated classification using library classification schemes. Paper presented at the Proceedings of world library and information congress: 72nd ifla general conference and council. Zhu, Z. (2011). Improving Search Engines via Classification. University of London. 8965 ---- Lessons Learned: A Primo Usability Study Kelsey Brett, Ashley Lierman, and Cherie Turner INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 7 ABSTRACT The University of Houston Libraries implemented Primo as the primary search option on the library website in May 2014. In May 2015, the Libraries released a redesigned interface to improve user experience with the tool. The Libraries took a user-centered approach to redesigning the Primo interface, conducting a “think-aloud” usability test to gather user feedback and identify needed improvements. This article describes the method and findings from the usability study, the changes that were made to the Primo interface as a result, and implications for discovery-system vendor relations and library instruction. INTRODUCTION Index-based discovery systems have become commonplace in academic libraries over the past several years, and academic libraries have invested a great deal of time and money into implementing them. Frequently, discovery platforms serve as the primary access point to library resources, and in some libraries they have even replaced traditional online public access catalogs. Because of the prominence of these systems in academic libraries and the important function that they serve, libraries have a vested interest in presenting users with a positive and seamless experience while using a discovery system to find and access library information. Libraries commonly conduct user testing on their discovery systems, make local customizations when possible, and sometimes even change products to present the most user-friendly experience possible. University of Houston Libraries has adopted new discovery technologies as they became available in an effort to provide simplified discovery and access to library resources. As a first step, the Libraries implemented Innovative Interface’s Encore, a federated search tool, in 2007. When index-based discovery systems became available, the Libraries saw them as a way to provide an improved and intuitive search experience. In 2010, the Libraries implemented Serials Solutions’ Summon. After three years and a thorough process of evaluating priorities and investigating alternatives, the Libraries made the decision to move to Ex Libris’ Primo, which was done in May of 2014. The Libraries’ intention was to continually assess and customize Primo to improve functionality and user experience. The Libraries conducted research and performed user testing, and in May Kelsey Brett (krbrett@ua.edu) is Discovery Systems Librarian, Ashley Lierman (arlierman@uh.edu) is Instructional Design Librarian, and Cherie Turner (ckturner2@uh.edu) is Chemical Sciences Librarian, University of Houston Libraries, Houston, Texas. mailto:krbrett@ua.edu mailto:arlierman@uh.edu mailto:ckturner2@uh.edu LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 8 2015 a redesigned Primo search results page was released. One of the activities that informed the Primo redesign was a “think-aloud” usability test that required users to complete a set of two tasks using Primo. This article will present the method and results of the testing as well as the customizations that were made to the discovery system as a result. It will also discuss some broader implications for library discovery and its effect on information literacy instruction. LITERATURE REVIEW There is a substantial body of literature discussing usability testing of discovery systems. In the interest of brevity, we will focus solely on studies and overviews involving Primo implementations, from which several patterns have emerged. Multiple studies have indicated that users’ responses to the system are generally positive; even in testing of very early versions by a development partner users responded positively overall.1 Interestingly, some studies found that in many cases users rated Primo positively in post-testing surveys even when their task completion rate in the testing had been low.2 Multiple studies also found evidence that, although users may struggle with Primo initially, the system is learnable over time. Comeaux found that the time it took users to use facets or locate resources decreased significantly with each task they performed,3 while other studies saw the use of facets per task increase for each user over the course of the testing.4 User reactions to facets and other post-limiting functions in Primo were divided. In one of the earliest studies, Sadeh found that users responded positively to facets,5 and some authors found users came to use them heavily while searching,6 while others found that facets were generally underused.7 Multiple studies found that users tended to repeat their searches with slightly different terms rather than use post-limiting options.8 Thomsett-Scott and Reese, in a survey of the literature on discovery tools, reported evidence of a trend that users reacted more positively to post-limiting in earlier discovery studies,9 while the broader literature shows more negative reactions in more recent studies. This could indicate that shifts in the software, user expectations, or both may have decreased users’ interest in these options. A few specific types of usability problems seem common across tests of Primo and other discovery systems. Across a large number of studies, it has been found that users—especially undergraduate students—struggle to understand library and academic terminology used in discovery. Some terminology changes were made after users had difficulty in the earliest usability tests of Primo,10 but users continued to struggle with terms like hold and recall in item records.11 Users also failed to understand the labels of limiters,12 and they also failed to recognize the internal names of repositories and collections.13 Literature reviews on discovery systems have found terminology to be a common stumbling block for searchers across a wide number of individual studies.14 Similarly, users often struggle to understand the scope of options available to them when searching and the holdings information in item records. Users failed in multiple tests to distinguish between the article level and the journal level,15 could not interpret bibliographic INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 9 information sufficiently to determine that they had found the desired item,16 and chose incorrect options for scoping their searches.17 Many studies found that users were unable to distinguish between multiple editions of a held item when all item types or editions were listed in the record.18 In other cases, users had difficulty interpreting locations and holdings information for physical items.19 Among the needs and desires expressed by and for Primo users in the literature, two in particular stand out. First, many users expressed a desire for more advanced search options; some wanted more complexity in certain facets and the ability to search within results,20 while other users simply wanted an advanced search option to be available.21 Secondly, a large number of studies indicated that instruction on Primo or other discovery systems was needed for users to search effectively. In some cases this was the conclusion of the researchers conducting the study,22 while in other cases users themselves either suggested or requested instruction on the system.23 It is also worth noting that it has been questioned whether usability testing as a whole is a sufficient mechanism for evaluating discovery-system functionality. Prommann and Zhang found that usability testing has focused almost exclusively on the technical functioning of the software and not adequately revealed the ability of discovery systems like Primo to successfully complete users’ desired tasks.24 They proposed hierarchical task analysis (HTA) as an alternative, to examine users’ most frequent desires and the capacity of discovery systems to meet them. Prommann and Zhang acknowledged, however, that as HTA is completed by an expert on the system rather than by an actual user, some of the valuable information derived from usability testing (including terms and functions that users do not understand, however well-designed) is lost in the process; they concluded that a combination of the two systems of testing is ideal to retain the best of both. BACKGROUND At the University of Houston Libraries, the Resource Discovery Systems department (RDS) is responsible for the maintenance and development of Primo. However, it is important to RDS to gather feedback and foster buy-in from stakeholders in the Library before making changes to the system. To that end, RDS works with two committees to assess the system and make recommendations for its improvement. The Discovery Usability Group and the Discovery Advisory Group include members from public services, technical services, and systems; each member brings a unique perspective on discovery. The Discovery Usability Group is charged with assessing the discovery system through a variety of methods including usability testing, focus groups, and user interviews. The Discovery Advisory Group reviews results of user testing and makes recommendations for improvement. All changes to the discovery system are reviewed by the Groups before they are released for public use. LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 10 In fall 2014, several months after the Primo implementation, the Discovery Usability Group conducted a focus group with student workers from the library’s information desk (a dual reference and circulation desk) to solicit feedback about the functionality of Primo and suggestions for its improvement. In the meantime, the Discovery Advisory Group was testing Primo and evaluating Primo sites at peer and aspirational institutions. The groups used the information collected through the focus group and research on Primo to make recommendations for improvement. RDS has access to a Primo development sandbox, and many of the recommended changes were made in the sandbox environment and reviewed by the two groups prior to public release. Changes to the search box can be seen in figure 1. Rarely used tabs were replaced with a drop- down menu to the right of the search box to allow users to limit to “everything,” “books+,” or “digital library.” To increase visibility, links to “Advanced Search” and “Browse Search” were made larger and more spacing was added. Live site: Development Sandbox: Figure 1. Search Box in Live Site (Above) and Development Sandbox (Below) at Time of Testing Changes were also made to create a cleaner and less cluttered search results page (see figure 2). More white space was added, and the links (or tabs) to “View Online,” “Request,” “Details,” etc., were redesigned and renamed for clarity. For example, the “View Online” link was renamed to “Preview Online” because it opens a box within the search results page that displays the item. The groups believed “Preview Online” more accurately represents what the link does. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 11 Live Site: Development Sandbox: Figure 2. Search Results in Live Site (Above) and Development Sandbox (Below) at Time of Testing The facets were also redesigned to look cleaner and larger to attract users’ attention (see figure 3). LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 12 Live Site: Development Sandbox: Figure 3. Facets in Live Site and Development Sandbox at Time of Testing Both groups were happy with the changes to the Primo development sandbox but wanted to test the effect of the changes on user search behavior before updating the live site. The Discovery Usability Group conducted a usability test within the development sandbox. The goal of the test was to find out if users could effectively complete common research tasks using Primo. With that goal in mind, the group developed a usability test and conducted it during the spring semester of 2015. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 13 METHODOLOGY The Discovery Usability Group developed a usability test using a “think-aloud” methodology, where users were asked to verbalize their thought process as they completed research tasks through Primo. Four tasks were designed to mirror tasks that users are likely to complete for class assignments or for general research. To minimize the testing time, each participant completed two tasks, with the facilitators alternating between two sets of tasks from one participant to the next. Test 1 Task 1: You are trying to find an article that was cited in a paper you read recently. You have the following citation: Clapp, E., & Edwards, L. (2013). Expanding our vision for the arts in education. Harvard Educational Review, 83(1), 5–14. Please find this article using OneSearch [the public-facing name given to the Libraries’ Primo implementation]. Task 2: You are doing a research project on the effects of video games on early childhood development. Find a peer-reviewed article on this topic, using OneSearch. Test 2 Task 1: Recently your friend recommended the book The Lighthouse by P. D. James. Use OneSearch to find out if you can check this book out from the library. Task 2: You are writing a paper about the drug cartels’ influence on Mexico’s relationship with the United States. Find a newspaper article on this topic, using OneSearch. Two facilitators set up a table with a laptop in the front entrance of the library. They alternated between the facilitator and note-taker roles. Another group member took on the role of “caller” and recruited library patrons to participate in the study. The caller set up a table visible to those passing by with library-branded T-shirts and umbrellas to incentivize participation. The caller explained what would be expected of the potential participant and went over the informed- consent document. After signing the form, the participant performed two tasks. After the test the participant received a library T-shirt or umbrella, and snacks. The facilitators used Morae Usability Software to record the screen and audio of each test. Participants were asked for permission to record their sessions, but could opt out. During the three hour testing period, fifteen library patrons participated in the study, and fourteen sessions were recorded. Of the fifteen participants, thirteen were undergraduate students (four freshman, one sophomore, seven juniors, and two seniors), one was a graduate student, and one was a post- baccalaureate student. The majority of the participants were from the sciences, along with two students from the College of Business and two from the School of Communications. There were no participants from the humanities. LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 14 The facilitators took notes on a rubric (see table 1) that simplified the processes of coding and reviewing the recordings. After the usability testing, the facilitators reviewed the notes and recordings, coded them for common themes and breakdowns, and prepared a report of their findings and design recommendations. The facilitators sent the report, along with audio and screen recordings, to the Discovery Advisory Group, who reviewed them along with RDS. The Discovery Advisory Group made additional design recommendations, and RDS used the information and recommendations to implement additional customizations to the Primo development sandbox. Preliminary Questions ASK: What is your affiliation with the University of Houston? Year? Major? ASK: How often do you use the library website? For what purpose(s)? Task 1 Describe the steps the participant took to complete the task S/U ASK: How did you feel about this task? What was simple? What was difficult? ASK: Is there anything that would make completing this task easier? Task 2 Describe the steps the participant took to complete the task S/U ASK: How did you feel about this task? What was simple? What was difficult? ASK: Is there anything that would make completing this task easier? Follow-up Question ASK: What can we do to improve the overall experience using OneSearch? Table 1. Task Completion Rubric for Test 1 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 15 RESULTS Test 1, Task 1 You are trying to find an article that was cited in a paper you read recently. You have the following citation: Clapp, E., & Edwards, L. (2013). Expanding our vision for the arts in education. Harvard Educational Review, 83(1), 5–14. Please find this article using OneSearch. Participant Time on Task Task Completion 1 1m 54s Y 2 4m 13s Y 3 1m 26s Y 4 1m 17s Y 5 1m 26s Y (required assistance) 6 1m 43s Y 7 1m 27s Y 8 1m 5s Y Table 2. Results for Test 1, Task 1 All eight participants successfully completed this task, although sophistication and efficiency varied between participants. Some searched by the authors’ last names, which was not specific enough to return the item in question. Four participants attempted to use advanced search or the drop-down menu to the right of the search box to pre-filter their results. Two participants viewed the options in the drop-down menu, which were “everything,” “books+,” and “digital library,” and left it on the default “everything” search. When prompted, the participants explained that they were expecting the drop-down to contain title and/or author limiters. Similarly, participants expected an author limiter in the advanced search. The citation format seemed to confuse participants, and they tended to search for the piece of information that was listed first—the authors—rather than the most unique piece of information—the title. If the first search did not return the correct item in the first few results, the participant would modify their search by searching for a different element of the citation or adding another element of the citation to the initial search until the item they were looking for appeared as one of the first few results. Participant 5 thought they had successfully completed the LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 16 task, but the facilitator had to point out that the item they chose did not meet the citation exactly, and on the second try they found the correct item. Participant 2 worked on the task for more than four minutes, significantly longer than the other seven participants. They immediately navigated to advanced search and filled out several fields in the advanced search form with the elements of the citation. If the search did not return their item, they added more elements until they finally found it. Simply searching the title in the citation would have returned the item as the first search result. Filling out the advanced search form with all of the information from the citation does not necessarily increase a user’s chances of finding the item in a discovery system, though it might do so when searching in an online catalog or subject database. The Discovery Advisory and Usability Groups made two recommendations to address some of the identified issues: include an author search option in the advanced search, and add an “articles+” option to the drop-down menu on the basic search. RDS implemented both recommendations. The Discovery Usability Group identified confusion around citations as a common breakdown during this task. The groups recommended providing instructional information about searching for known items to address this breakdown; however, RDS is still working on an effective method to provide this information in a simple and visible way. Test 1, Task 2 You are doing a research project on the effects of video games on early childhood development. Find a peer-reviewed article on this topic, using OneSearch. Participant Time on Task Task Completion 1 3m 44s Y 2 2m 21s Y 3 5m 23s Y (required assistance) 4 2m 5s Y 5 3m 32s Y 6 2m 45s Y 7 3m 8s Y 8 3m 1s Y (required assistance) Table 3. Results for Test 1, Task 2 All eight participants successfully found an article on this topic, but were less successful in determining whether the article was peer-reviewed. Only one participant used the “Peer-reviewed INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 17 Journals” facet without being prompted. Three users noticed the “[Peer-reviewed Journal]” note in the record information for search results, and used it to determine if the article was peer-reviewed. One participant went to the full-text of an article, and said it “seemed” like it was peer-reviewed and considered the task complete. The resource type facets were more heavily used during this task than the “Peer-reviewed Journals” facet, despite its being promoted to the top of the list of facets. Two participants used the “Articles” facet, and two participants used the “Reviews” facet, thinking it limited to peer-reviewed articles. Participants 3 and 8 needed help from the facilitator to determine whether a source was peer-reviewed. There was an overall misunderstanding of what peer-reviewed means, which affected participants’ confidence in completing the task. The design recommendations based on this task included changing the “Peer-reviewed Journals” facet to “Peer-reviewed Articles” or simply, “Peer-reviewed.” RDS changed the facet to “Peer- reviewed Articles” to help alleviate confusion. Additionally, the groups recommended emphasizing the “[Peer-reviewed Journal]” designations within the search results and providing a method for limiting to peer-reviewed materials before conducting a search. Customization limitations of the system have prevented RDS from implementing these design recommendations yet. A way to address the breakdowns caused by misunderstanding terminology also has yet to be identified. It was disheartening that participants did not use the “Peer-reviewed Journals” facet despite its being purposefully emphasized on the search results page. Test 2, Task 1 Recently your friend recommended the book The Lighthouse by P. D. James. Use OneSearch to find out if you can check this book out from the library. Participant Time on Task Task Completion 1 1m 7s Y 2 56s Y 3 No recording Y 4 2m 21s Y 5 1m 8s Y 6 2m 14s Y 7 1m 15s Y Table 4. Results for Test 2, Task 1 All seven participants were able to find this book using Primo, but had difficulty in determining what to do once they found it. For this task every participant searched by title and found the book as the first search result. Four users limited to “books+” before searching using the drop-down LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 18 menu, while the other three remained in the default “everything” search. Only one participant used the locations tab within the search results to determine availability; the others clicked the title and went to the item’s catalog record. All participants were able to determine that the book was available in the library, but there was an overall lack of understanding about how to use the information in the catalog to check out a book. Participant 1 said that they would write down the call number, take it to the information desk, and ask how to find it, which was the most sophisticated response of all seven participants. Participant 4 spent nearly two minutes clicking through links in the OPAC expecting to find a “Check Out” button and only stopped when the facilitator stepped in. A recommended design change based on this task was to have call numbers in Primo and the online catalog link to a stacks guide or map. This is a feature that may be developed in the future, but technical limitations prevented RDS from implementing it in time for the release of the redesigned search interface. Like the previous tasks, some of the breakdowns occurred because of a lack of understanding of library services. Users easily figured out that there was a copy of the book in the library, but had little sense of what to do next. None of the participants successfully located the stacks guide or the request feature that would put the item on hold for them. Steps should be taken to direct users to these features more effectively. Test 2, Task 2 You are writing a paper about the drug cartels’ influence on Mexico’s relationship with the United States. Find a newspaper article on this topic, using OneSearch. Participant Time on Task Task Completion 1 4m 45s Y (required assistance) 2 59s Y 3 No recording N 4 7m 47s Y 5 2m 52s Y 6 1m 33s Y 7 1m 30s Y Table 5. Results for Test 2, Task 2 This task was difficult for participants. Two users limited their search initially to “digital library” using the drop-down menu, thinking it would be a place to find newspaper articles; their searches returned zero results. Only two users used the “Newspaper Articles” facet without being prompted, and users did not seem to readily distinguish newspaper articles as a resource type. Participants INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 19 did not notice the resource type icons without being prompted. Several participants needed to be reminded that the task was to find a newspaper article, and not any other type of article. With guidance, most participants were able to complete the task. Participant 4 remained on the task for almost eight minutes because of their dissatisfaction with the relevancy of the results to the prompt. Interestingly, they found the “Newspaper Articles” facet and reapplied it after each modified search, suggesting that they learned to use system features as they went. One of the recommendations based on this task was to remove “digital library” as an option in the drop-down menu on the basic search. It was evident that “digital library” did not have the same meaning to end users as it does to internal users. This recommendation was easily implemented. Another recommendation was to emphasize the resource type icons within the search results, but we have not determined a way to do so effectively. One suggestion from the Discovery Usability Group was to exclude newspaper articles from the search results as a default, but no consensus was reached on this issue. LIMITATIONS The Discovery Usability Group identified limitations to the usability test that should be noted. Testing was done in a high-traffic portion of the library’s lobby, which is used as study space by a broad range of students. Participants were recruited from this study space, and we chose not to screen participants. The fifteen participants in the study did not constitute a representative sample. Almost all participants were undergraduate students, and no humanities majors participated. The outcomes might have been different if our participants had included more experienced researchers or students from a broader range of disciplines. By adding screening questions or choosing a more neutral location, we would have limited the number of participants who could complete our testing. Another limitation was that the participants started the usability test within the Primo interface. Because Primo is integrated into the Libraries’ website, users would typically begin searching the system from within the library homepage. The goals of the study required testing of our Primo development sandbox, which was not yet available to the public, and therefore could not be accessed in the same way. This gave participants some additional options from the initial search pages that are not usually available through the main search interface. While testing an active version of the interface would be preferable, one of our goals was to understand how our modifications affected user behavior, so testing the unmodified version was not an acceptable substitute. Additionally, the usability study presented tasks out of context and did not replicate a true user-searching experience. Despite the limitations, we learned valuable lessons from the participants in this study. DISCUSSION Users successfully completed the tasks in this usability study. Unfortunately, they did not take advantage of many of the features that can make such tasks easier—particularly facets. This was LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 20 especially apparent when we asked users to find a peer-reviewed journal article (Test 1, Task 2). Primo has a facet that will limit a search to only peer-reviewed journal articles, and only one out of eight participants used this facet during this task. Participants appreciated the pre-search filtering options, and requested more of them (such as an author search), while post-search facets were underutilized. Similarly, participants almost uniformly ignored the links, or tabs, within the search results, which would provide users with more information, a preview of the full-text, and additional features such as an email function. Users bypassed these options and clicked on the title instead. The Discovery Usability Group theorized that users clicked on the title of the item because that behavior would be successful in a more familiar search interface like Google. The team customized the configuration so that a title click would open either the full-text of electronic items or the catalog record for physical items to accommodate users’ instinctive search behaviors. The tabs, though a prominent feature of the discovery system, have proved to have little value for users. Throughout the implementation of discovery systems in academic libraries, both research studies and anecdotal evidence have suggested that users do not find end-user features like facets valuable; however, discovery system vendors have made no apparent attempt to reimagine the possibilities for search refinements. Indeed, most of the findings in this study will present few surprises to anyone familiar with the discovery usability literature, which is itself concerning. As our literature review has shown, many of the same general usability issues have repeated throughout studies of Primo since 2008, and most are very similar to usability issues in other, competitor discovery systems. This raises some concerns about the pace of innovation in the discovery field, and whether discovery vendors are genuinely taking into account the research findings about the needs of our users as they refine their products. In a recent article, David Nelson and Linda Turney identified many issues with discovery facets in their current form that may be barriers to usage, particularly labeling and library jargon; we join them in urging vendors and libraries to collaborate more closely for deep analysis of actual facet usage by users, and to address those factors that have negatively affected facets’ value.25 During our usability study, a common barrier to the successful completion of a task was not the technology itself but a lack of understanding of the task. Participants had difficulty deciphering a citation, which may have led to their tendency to search for a journal article by author and not by title. Many participants struggled with using call numbers, and how to find and check out books in the library. Peer review also proved to be a difficult or unfamiliar concept for many; when looking for peer-reviewed articles, some participants clicked on the “Reviews” facet, which limited their searches to an inappropriate resource type. Additionally, participants did not differentiate between journal articles and newspaper articles, which may indicate a broader inability to differentiate between scholarly and nonscholarly resources. This effect may be exaggerated by the high percentage of science students who participated, as these students may not have frequent need for newspaper articles. All of these challenges, however, are indicative of a deeper problem INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 21 with terminology. Regardless of how simple it is to limit a search to peer-reviewed articles, a user who does not understand what peer review means cannot complete the task with confidence or certainty. Librarians struggle with presenting understandable language and avoiding library terminology; as we discovered, academic language, like “peer-reviewed” and “citation,” presents a similar problem. These are not issues that can be resolved with a technological solution. Rather, we join previous authors in suggesting that instruction may be a reasonable way to address many usability issues in Primo. From our findings and from those in the wider literature, we conclude that general instruction in information literacy is prerequisite for effective use of this or any research tool, particularly for undergraduates. Nichols et al. “recommend studying how to effectively provide instruction on Primo searching and results interpretation,”26 but instruction on the use of a single tool is of limited utility to students in their academic lives. Instead libraries could bolster information literacy instruction on key concepts around the production and storage of information, scholarly communications, and differences in information types. Teaching these concepts effectively should help to alleviate the most common user issues, including understanding terminology and different types of information, as well as helping students to understand key elements of research in general. This is a particularly important point to note for librarians working as advocates for information literacy instruction, especially in cases where administrators or faculty may feel that more advanced tools, like discovery systems, should make instruction obsolete. CONCLUSION Several changes were made to the Primo interface in response to breakdowns identified during the usability study. Resource Discovery Systems first implemented the changes to the Primo development sandbox. After the Discovery Usability and Advisory Groups agreed on the changes, they were made available on the live site (see figure 4). The redesigned search results page became available to the general public between the spring and summer academic sessions of 2015. In addition to the changes that were made because the usability study, RDS made changes to the look and feel to make the search results interface more aesthetically pleasing and more in line with the University of Houston brand. LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 22 Before (live site): Figure 4. Primo Interface before Usability Testing During (development sandbox): Figure 5. Primo Interface during Usability Testing INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 23 After (live site): Figure 6. Primo Interface after Usability Testing Many larger assertions of this study, encompassing implications for instruction and our needs from discovery vendors, will require further study to address. The authors intend to continue to investigate these issues as additional usability testing is conducted and to use the data to support future vendor relations and instructional curriculum development discussions. REFERENCES 1. Tamar Sadeh, “User Experience in the Library: A Case Study,” New Library World 109, no. 1/2 (2008): 7–24, doi:10.1108/03074800810845976. 2. Aaron Nichols et al., “Kicking the Tires: A Usability Study of the Primo Discovery Tool,” Journal of Web Librarianship 8, no. 2 (2014): 172–95, doi: doi:10.1080/19322909.2014.903133; Scott Hanrath and Miloche Kottman, “Use and Usability of a Discovery Tool in an Academic Library,” Journal of Web Librarianship 9, no. 1 (2015): 1–21, doi:10.1080/19322909.2014.983259. 3. David J. Comeaux, “Usability Testing of a Web-Scale Discovery System at an Academic Library,” College & Undergraduate Libraries 19, no. 2–4 (2012): 189–206, doi:10.1080/10691316.2012.695671. 4. Kylie Jarrett, “FindIt@ Flinders: User Experiences of the Primo Discovery Search Solution,” Australian Academic & Research Libraries 43, no. 4 (2012): 278–300; Nichols et al., "Kicking the Tires." 5. Sadeh, “User Experience in the Library.” http://dx.doi.org/10.1108/03074800810845976 http://dx.doi.org/10.1080/19322909.2014.903133 http://dx.doi.org/10.1080/19322909.2014.983259 http://dx.doi.org/10.1080/10691316.2012.695671 LESSONS LEARNED: A PRIMO USABILITY STUDY | BRETT, LIERMAN, AND TURNER doi: 10.6017/ital.v35i1.8965 24 6. Jarrett, “FindIt@ Flinders”; Nichols et al., “Kicking the Tires.” 7. Xi Niu, Tao Zhang, and Hsin-liang Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library,” Libraries Faculty and Staff Scholarship and Research 30, no. 5 (2014), doi:10.1080/10447318.2013.873281; Hanrath and Kottman, “Use and Usability of a Discovery Tool in an Academic Library.” 8. Rice Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces,” Library Trends 61, no. 1 (2012): 186–207, doi:10.1353/lib.2012.0029; Niu, Zhang, and Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library.” 9. Beth Thomsett-Scott and Patricia E. Reese, “Academic Libraries and Discovery Tools: A Survey of the Literature,” College & Undergraduate Libraries 19, no. 2–4 (2012): 123–43, doi:10.1080/10691316.2012.697009. 10. Sadeh, “User Experience in the Library.” 11. Comeaux, “Usability Testing of a Web-Scale Discovery System at an Academic Library.” 12. Jessica Mahoney and Susan Leach-Murray, “Implementation of a Discovery Layer: The Franklin College Experience,” College & Undergraduate Libraries 19, no. 2–4 (2012): 327–43, doi:10.1080/10691316.2012.693435. 13. Joy Marie Perrin et al., “Usability Testing for Greater Impact: A Primo Case Study,” Information Technology & Libraries 33, no. 4 (2014): 57–67. 14. Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces”; Thomsett- Scott and Reese, “Academic Libraries and Discovery Tools.” 15. Jarrett, “FindIt@ Flinders”; Mahoney and Leach-Murray, “Implementation of a Discovery Layer.” 16. Jarrett, “FindIt@ Flinders”; Mahoney and Leach-Murray, “Implementation of a Discovery Layer”; Nichols et al., “Kicking the Tires." 17. Jarrett, “FindIt@ Flinders”; Mahoney and Leach-Murray, “Implementation of a Discovery Layer”; Perrin et al., “Usability Testing for Greater Impact : A Primo Case Study.” 18. Jarrett, “FindIt@ Flinders”; Nichols et al., “Kicking the Tires”; Hanrath and Kottman, “Use and Usability of a Discovery Tool in an Academic Library”; Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces.” 19. Comeaux, “Usability Testing of a Web-Scale Discovery System at an Academic Library”; Thomsett-Scott and Reese, “Academic Libraries and Discovery Tools.” 20. Jarrett, “FindIt@ Flinders.” 21. Mahoney and Leach-Murray, “Implementation of a Discovery Layer”; Perrin et al., “Usability Testing for Greater Impact.” http://dx.doi.org/10.1080/10447318.2013.873281 http://dx.doi.org/10.1353/lib.2012.0029 http://dx.doi.org/10.1080/10691316.2012.697009 http://dx.doi.org/10.1080/10691316.2012.693435 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 25 22. Mahoney and Leach-Murray, “Implementation of a Discovery Layer”; Nichols et al., “Kicking the Tires”; Niu, Zhang, and Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library.” 23. Thomsett-Scott and Reese, “Academic Libraries and Discovery Tools.” 24. Tao Zhang and Merlen Prommann, “Applying Hierarchical Task Analysis Method to Discovery Layer Evaluation,” Information Technology & Libraries 34, no. 1 (2015): 77–105, doi:10.6017/ital.v34i1.5600. 25. David Nelson and Linda Turney, “What’s in a Word? Rethinking Facet Headings in a Discovery Service,” Information Technology & Libraries 34, no. 2 (2015): 76–91, doi:10.6017/ital.v34i2.5629. 26. Nichols et al., “Kicking the Tires,” 184. http://dx.doi.org/10.6017/ital.v34i1.5600 http://dx.doi.org/10.6017/ital.v34i2.5629 9152 ---- Hitting the Road Towards a Greater Digital Destination: Evaluating and Testing DAMS at University of Houston Libraries Annie Wu, Santi Thompson, Rachel Vacek, Sean Watkins, and Andrew Weidner INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 5 ABSTRACT Since 2009, tens of thousands of rare and unique items have been made available online for research through the University of Houston (UH) Digital Library. Six years later, the UH Libraries’ new digital initiatives call for a more dynamic digital repository infrastructure that is extensible, scalable, and interoperable. The UH Libraries’ mission and the mandate of its strategic directions drives the pursuit of seamless access and expanded digital collections. To answer the calls for technological change, the UH Libraries administration appointed a Digital Asset Management System (DAMS) Implementation Task Force to explore, evaluate, test, recommend, and implement a more robust digital asset management system. This article focuses on the task force’s DAMS selection activities: needs assessment, systems evaluation, and systems testing. The authors also describe the task force’s DAMS recommendation based on the evaluation and testing data analysis, a comparison of the advantages and disadvantages of each system, and system cost. Finally, the authors outline their DAMS implementation strategy comprised of a phased rollout with the following stages: system installation, data migration, and interface development. INTRODUCTION Since the launch of the University of Houston Digital Library (UHDL) in 2009, the UH Libraries have made tens of thousands of rare and unique items available online for research using CONTENTdm. As we began to explore and expand into new digital initiatives, we realize that the UH Libraries’ digital aspirations require a more dynamic, flexible, scalable, and interoperable digital asset management system that can manage larger amounts of materials in a variety of formats. We plan to implement a new digital repository infrastructure that accommodates creative workflows and allows for the configuration of additional functionalities such as digital exhibits, data mining, cross-linking, geospatial visualization, and multimedia presentation. The Annie Wu (awu@uh.edu) is Head of Metadata and Digitization Services, Santi Thompson (sathompson3@uh.edu) is Head of Repository Services, Rachel Vacek (evacek@uh.edu) is Head of Web Services, Sean Watkins (slwatkins@uh.edu) is Web Projects Manager, and Andrew Weidner (ajweidner@uh.edu) is Metadata Services Coordinator, University of Houston Libraries. mailto:awu@uh.edu mailto:sathompson3@uh.edu mailto:evacek@uh.edu mailto:slwatkins@uh.edu mailto:ajweidner@uh.edu HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 6 new system will be designed with linked data in mind and will allow us to publish our digital collections as linked open data within the larger semantic web environment. The UH Libraries Strategic Directions set forth a mandate for us to “work assiduously to expand our unique and comprehensive collections that support curricula and spotlight research. We will pursue seamless access and expand digital collections to increase national recognition.”1 To fulfill the UH Libraries’ mission and the mandate of our Strategic Directions, the UH Libraries administration appointed a Digital Asset Management System (DAMS) Implementation Task Force to explore, evaluate, test, recommend, and implement a more robust digital asset management system that would provide multiple modes of access to the UH Libraries’ unique collections and accommodate digital object production at a larger scale. The collaborative task force comprises librarians from four departments: Metadata and Digitization Services (MDS), Web Services, Digital Repository Services, and Special Collections. The core charge of the task force is to: • Perform a needs assessment and build criteria and policies based on evaluation of the current system and requirements for the new DAMS • Research and explore DAMS on the market and identify the top three systems for beta testing in a development environment • Generate preliminary recommendations from stakeholders' comments and feedback • Coordinate installation of the new DAMS and finish data migration • Communicate the task force work to UH Libraries colleagues LITERATURE REVIEW Libraries have maintained DAMS for the publication of digitized surrogates of rare and unique materials for over two decades. During that time, information professionals have developed evaluation strategies for testing, comparing, and evaluating library DAMS software. Reviewing these models and associated case studies provided insight into common practices for selecting systems and informed how the UH Libraries DAMS Implementation Task Force conducted its evaluation process. One of the first publications of its kind, “A Checklist for Evaluating Open Source Digital Library Software” by Dion Hoe-Lian Goh et al., presents a comprehensive list of criteria for library DAMS evaluation.2 The researchers developed twelve broad categories for testing (e.g., content management, metadata, and preservation) and generated a scoring system based on the assignment of a weight and a numeric value to each criterion.3 While the checklist was created to assist with the evaluation process, the authors note that an institution’s selection decision should be guided primarily by defining the scope of their digital library, the content being curated using the software, and the uses of the material.4 Through their efforts, the authors created a rubric that can be utilized by other organizations when selecting a DAMS. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 7 Subsequent research projects have expanded upon the checklist evaluation model. In “Choosing Software for a Digital Library,” Jody DeRidder outlines major issues that librarians should address when choosing DAMS software, including many of the hardware, technological, and metadata concerns that Goh et al. identified.5 Additionally, she emphasizes the need to account for personnel and service requirements with a variety of activities: usability testing and estimating associated costs; conducting a formal needs assessment to guide the evaluation process; and a tiered-testing approach, which calls upon evaluators to winnow the number of systems.6 By considering stakeholder needs, from users to library administrators, DeRidder’s contributions inform a more comprehensive DAMS evaluation process. In addition to creating evaluation criteria, the literature on DAMS selection has also produced case studies that reflect real-world scenarios and identify use cases that help determine user needs and desires. In “Evaluation of Digital Repository Software at the National Library of Medicine,” Jennifer L. Marill and Edward C. Luczak discuss the process that the National Library of Medicine (NLM) used to compare ten DAMS, both proprietary and open-source.7 Echoing Goh et al. and DeRidder, Marill and Luczak created broad categories for testing and developed a scoring system for comparing DAMS.8 Additionally, Marill and Luczak enriched the evaluation process by implementing two testing phases: “initial testing of ten systems” and “in-depth testing of three systems.”9 This method allowed NLM to conduct extensive research on the most promising systems for their needs before selecting a DAMS to implement. The tiered approach appealed to the task force, and influenced how it conducted the evaluation process, because it balances efficiency and comprehensiveness. In another case study, Dora Wagner and Kent Gerber describe the collaborative process of selecting a DAMS across a consortium. In their article “Building a Shared Digital Collection: The Experience of the Cooperating Libraries in Consortium,”10 the authors emphasize additional criteria that are important for collaborating institutions: the ability to brand consortial products for local audiences; the flexibility to incorporate differing workflows for local administrators; and the shared responsibility of system maintenance and costs.11 While the UH Libraries will not be managing a shared repository DAMS, the task force appreciated the article’s emphasis on maximizing customizations to improve the user experience. In “Evaluation and Usage Scenarios of Open Source Digital Library and Collection Management Tools,” Georgios Gkoumas and Fotis Lazarinis describe how they tested multiple open-source systems against typical library functions—such as acquisitions, cataloging, digital libraries, and digital preservation—to identify typical use cases for libraries.12 Some of the use cases formulated by the researchers address digital platforms, including features related to supporting a diverse array of metadata schema and using a simple web interface for the management of digital assets.13 These use cases mirror local feature and functionality requests incorporated into the UH Libraries’ evaluation criteria. HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 8 In “Digital Libraries: Comparison of 10 Software,” Mathieu Andro, Emmanuelle Asselin, and Marc Maisonneuve discuss a rubric they developed to compare six open-source platforms (Invenio, Greenstone, Omeka, EPrints, ORI-OAI, and DSpace) and four proprietary platforms (Mnesys, DigiTool, YooLib, and CONTENTdm) around six core areas: document management, metadata, engine, interoperability, user management, and Web 2.0. 14 The authors note that each solution is “of good quality” and that institutions should consider a variety of factors when selecting a DAMS, including the “type of documents you will want to upload” and the “political criteria (open source or proprietary software)” desired by the institution.15 This article provided the UH Libraries with additional factors to include in their evaluation criteria. Finally, Heather Gilbert and Tyler Mobley’s article “Breaking Up with CONTENTdm: Why and How One Institution Took the Leap to Open Source,” provides a case study for a new trend: selecting a DAMS for migration from an existing system to a new one.16 The researchers cite several reasons for their need to select a new DAMS, primarily their current system’s limitations with searching and displaying content in the digital library.17 They evaluated alternatives and selected a suite of open-source tools, including Fedora, Drupal, and Blacklight, which combine to make up their new DAMS.18 Gilbert and Mobley also reflect on the migration process and identify several hurdles they had to overcome, such as customizing the open-source tools to meet their localized needs and confronting inconsistent metadata quality.19 Gilbert and Mobley’s article most closely matches the scenario faced by the UH Libraries. Our study adds to the limited literature on evaluating and selecting DAMS for migration in several ways. It demonstrates another model that other institutions can adapt to meet their specific needs. It identifies new factors for other institutions to take into account before or during their own migration process. Finally, it adds to the body of evidence for a growing movement of libraries migrating from proprietary to open-source DAMS. DAMS EVALUATION AND ANALYSIS METHODOLOGY Needs Assessment The DAMS Implementation Task Force fulfilled the first part of its charge by conducting a needs assessment. The goal of the needs assessment was to collect the key requirements of stakeholders, identify future features of the new DAMS, and gather data in order to craft criteria for evaluation and testing in the next phase of its work. The task force employed several techniques for information gathering during the needs assessment phase: • Identified stakeholders and held internal focus group interviews to identify system requirement needs and gaps • Reviewed scholarly literature on DAMS evaluation and migration • Researched peer/aspirational institutions • Reviewed national standards around DAMS INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 9 • Determined both the current use of UHDL as well as its projected use of UHDL • Identified UHDL materials and users Task force members took detailed notes during each focus group interview session. The literature research on DAMS evaluation helped the task force to find articles with comprehensive DAMS evaluation criteria. The NISO criteria for core types of entities in digital library collections were also listed and applied to the evaluation after reviewing the NISO Framework of Guidance for Building Good Digital Collections.20 More than forty peer and aspirational institutions’ digital repositories were benchmarked to identify web site names, platform architecture, documentation, and user and system features. The task force analyzed the rich data gathered from needs assessment activities and built the DAMS evaluation criteria that prepared the task force for the next phase of evaluation. Evaluation, Testing, and Recommendation The task force began its evaluation process by identifying twelve potential DAMS for consideration that were ultimately narrowed down to three systems for in-depth testing. Using data from focus group interviews, literature reviews, and DAMS best practices, the group generated a list of benchmark criteria. These broad evaluation criteria covered features in categories of system functionality, content management, metadata, user interface, and search support. Members of the task force researched DAMS documentation, product information, and related literature to score each system against the evaluation criteria. Table 1 contains the scores of the initial evaluation. From this process, five systems emerged with the highest scores: ● Fedora (and, closely associated, Fedora/Hydra and Fedora/Islandora) ● Collective Access ● DSpace ● RosettaCONTENTdm The task force eliminated Collective Access from the final systems for testing because of its limited functionality. It is based around archival content only, and is not widely deployed. The task force decided not to test CONTENTdm because of the system’s known functionalities that we identified through firsthand experience. After the initial elimination process, Fedora (including Fedora/Hydra and Fedora/Islandora), DSpace, and Rosetta remained for in-depth testing. HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 10 DAMS Evaluation Score* Fedora 27 Fedora/Hydra 26 Fedora/Islandora 26 Collective Access 24 DSpace 24 Rosetta 20 CONTENTdm 20 Trinity (iBase) 19 Preservica 16 Luna Imaging 15 RODA† 6 Invenio‡ 5 Table 1. Evaluation scores of twelve DAMS using broad evaluation criteria The task force then created detailed evaluation and testing criteria by drawing from the same sources used previously: focus groups, literature review, and best practices. While the broad evaluation focused on high-level functions, the detailed evaluation and testing criteria for the final three systems closely analyzed the specific features of each DAMS in eight categories: ● System Environment and Function ● Administrative Access ● Content Ingest and Management ● Metadata ● Content Access ● Discoverability ● Report and Inquiry Capabilities ● System Support * Total Possible Score: 29. † Removed from evaluation because the system does not support Dublin Core metadata. ‡ Removed from evaluation because the system does not support Dublin Core metadata. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 11 Prior to the in-depth testing of the final three systems, the task force researched timelines for system setup. Rosetta’s timeline for system setup proved to be prohibitive. Consequently, the task force eliminated Rosetta from the testing pool and moved forward with Fedora and DSpace. To conduct the detailed evaluation, the task force scored the specific features under each category utilizing systems testing and documentation. A score range from zero to three (0 = None, 1 = Low, 2 = Moderate, 3 = High) was assigned for each feature evaluated. After evaluating all features, the score was tallied for each category. Our testing revealed that Fedora outperformed DSpace in over half of the testing sections: Content Ingest and Management, Metadata, Content Access, Discoverability, and Report and Inquiry Capabilities. See table 2 for the tallied scores in each testing section. Testing Sections DSpace Score Fedora Score Possible Score System Environment and Testing 21 21 36 Administrative Access 15 12 18 Content Ingest and Management 59 96 123 Metadata 32 43 51 Content Access 14 18 18 Discoverability 46 84 114 Report and Inquiry Capabilities 6 15 21 System Support 12 11 12 TOTAL SCORE: 205 300 393 Table 2. Scores of top two DAMS from testing using detailed evaluation criteria After review of the testing results, the task force conducted a facilitated activity to summarize the advantages and disadvantages of each system. Based on this comparison, the DAMS Task Force recommended that the UH Libraries implement a Fedora/Hydra repository architecture with the following course of action: ● Adapt the UHDL user interface to Fedora and re-evaluate it for possible improvements ● Develop an administrative content management interface with the Hydra framework ● Migrate all UHDL content to a Fedora repository HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 12 Fedora/Hydra Advantages Fedora/Hydra Disadvantages Open source Steep learning curve Large development community Long setup time Linked data ready Requires additional tools for discovery Modular design through API No standard model for multi-file objects Scalable, sustainable, and extensible Batch import/export of metadata Handles any file format Table 3. Fedora/Hydra advantages and disadvantages The primary advantages of a DAMS based on Fedora/Hydra are: a large and active development community; a scalable and modular system that can grow quickly to accommodate large scale digitization; and a repository architecture based on linked data technologies. This last advantage, in particular, is unique among all systems evaluated, and will give the UH Libraries the ability to publish our collections as linked open data. Fedora 4 conforms to the World Wide Web Consortium (W3C) recommendation for Linked Data Platforms.21 The main disadvantage of a Fedora/Hydra system is the steep learning curve associated with designing metadata models and developing a customized software suite, which translates to a longer implementation time compared to off-the-shelf products. The UH Libraries must allocate an appropriate amount of time and resources for planning, implementation, and staff training. The long-term return on investment for this path will be a highly skilled technical staff with the ability to maintain and customize an open-source, standards-based repository architecture that can be expanded to support other UH Libraries content such as geospatial data, research data, and institutional repository materials. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 13 Dspace Advantages DSpace Disadvantages Open source Flat file and metadata structure Easy installation / ready out of box Limited reporting capabilities Existing familiarity through Texas Digital Library Limited metadata features User group / profile controls Does not support linked data Metadata quality module Limited API Batch import of objects Not scalable / extensible Poor user interface Table 4. DSpace advantages and disadvantages The main advantages of DSpace are ease of installation, familiarity of workflows, and additional functionality not found in CONTENTdm.22 Installation and migration to a DSpace system would be relatively fast, and staff could quickly transition to new workflows because they are similar to CONTENTdm. DSpace also supports authentication and user roles that could be used to limit content to the UH community only. Commercial add-on modules, although expensive, could be purchased to provide more sophisticated content management tools than are currently available with CONTENTdm. The disadvantages of a DSpace system are the same long-term, systemic problems with the current CONTENTdm repository. DSpace uses a flat metadata structure, has a limited API, does not scale well, and is not customizable to the UH Libraries’ needs. Consultations with peers indicated that both CONTENTdm and DSpace institutions are exploring the more robust capabilities of Fedora-based systems. Migration of the digital collections in CONTENTdm to a DSpace repository would provide few, if any, long term benefits to the UH Libraries. Of all the systems considered, implementation of a Fedora/Hydra repository aligns most clearly with the UH Libraries Strategic Directions of attaining national recognition and improving access to our unique collections. The Fedora and Hydra communities are very active, with project management overseen by Duraspace and Hydra respectively.23,24 Over the long term, a repository based on Fedora/Hydra will give the UH Libraries a low cost, scalable, flexible, and interoperable platform for providing online access to our unique collections. HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 14 Cost Considerations To balance the current digital collections production schedule with the demands of a timely implementation and migration, the task force identified the following investments as cost effective for Fedora/Hydra and DSpace, respectively: Fedora/Hydra DSpace Metadata Librarian: annual salary ● manages daily Metadata Unit operations during implementation ● streamlines the migration process Metadata Librarian: annual salary ● manages daily Metadata Unit operations during implementation ● streamlines the migration process @Mire Modules: $41,500 ● Content Delivery (3): $13,500 ● Metadata Quality: $10,000 ● Image Conversion Suite: $9,000 ● Content & Usage Analysis: $9,000 ● These modules require one-time fees to @Mire that recur when upgrading to a new version of DSpace Table 5. Start-up costs associated with Fedora/Hydra and DSpace The task force determined that an investment in one librarian’s salary is the most cost-effective course of action. The new Metadata Librarian will manage daily operations of the Metadata Unit in Metadata & Digitization Services while the Metadata Services Coordinator, in close collaboration with the Web Projects Manager, leads the DAMS implementation process. In contrast to Fedora, migration to DSpace would require a substantial investment in third party software modules from @Mire to deliver the best possible content management environment and user experience. IMPLEMENTATION STRATEGIES The implementation of the new DAMS will occur in a phased rollout comprised of the following stages: System Installation, Data Migration, and Interface Development. MDS and Web Services will perform the majority of the work, in consultation with key stakeholders from Special Collections and other units. Throughout this process, the DAMS Implementation Task Force will INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 15 consult with the Digital Preservation Task Force* to coordinate the preservation and access systems. Phase One System Installation Phase Two Data Migration Phase Three Interface Development Set up production and server environment Formulate content migration strategy and schedule Reevaluate front-end user interface Rewrite UHDL front-end application for Fedora/Solr Migrate test collections and document exceptions Rewrite UHDL front end as a Hydra head OR . . . Create metadata models Conduct the data migration . . . Update current front end Coordinate workflows with Digital Preservation Task Force Create preservation metadata for migrated data Establish inter- departmental production workflows Begin development of administrative Hydra head for content management Continue development of the Hydra administrative interface Refine administrative Hydra head for content management Table 6. Overview of DAMS phased implementation Phase One: System Installation During the first phase of DAMS implementation, Web Services and MDS will work closely together to install an open-source repository software stack based on Fedora, rewrite the current PHP front-end interface to provide public access to the data in the new system, and create metadata content models for the UHDL based on the Portland Common Data Model,25 in consultation with the Coordinator of Digital Projects from Special Collections and other key stakeholders. The DAMS Task Force will consult with the Digital Preservation Task Force† to determine how closely the preservation and access systems will be integrated and at what points. The two groups will also jointly outline a DAMS migration strategy that aligns with the preservation system. Web Services and MDS will collaborate on research and development of an administrative interface, based on the Hydra framework, for day-to-day management of UHDL content. * An appointed task force to create a digital preservation policy and identify strategies, actions, and tools needed to sustain long-term access to digital assets maintained by UH Libraries. † A working team at UH Libraries that enforces the digital preservation policy and maintains the digital preservation system.[convert these footnotes to endnotes?] HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 16 Phase Two: Data Migration In the second phase, MDS will migrate legacy content from CONTENTdm to the new system and work with Web Services, Special Collections, and the Architecture and Art Library to resolve any technical, metadata, or content problems that arise. The second phase will begin with the development of a strategy for completing the work in a timely fashion, followed by migration of representative sample collections to the new system to test and refine its capabilities. After testing is complete, all legacy content will be migrated from CONTENTdm to Fedora, and preservation metadata for migrated collections will be created and archived. Development work on the Hydra administrative interface will also continue. After the data migration is complete, all new collections will be ingested into Fedora/Hydra, and the current CONTENTdm installation will be retired. Phase Three: Interface Development In the final phase, Web Services will reevaluate the current front-end user interface (UI) for the UHDL by conducting user tests to better understand how and why users are visiting the UHDL. Web Services will also analyze web and system analytics and gather feedback from Special Collections and other stakeholders. Depending on the outcome of this research, Web Services may create a new UI based on the Hydra framework or choose to update the current front-end application with modifications or new features. Web Services and MDS will also continue to develop or adopt tools for the management of UHDL content and work with Special Collections and the branch libraries to establish production workflows in the new system. Continued development work on the front-end and administrative interfaces, for the life of the new Digital Asset Management System, is both expected and desirable as we maintain and improve the UHDL infrastructure and contribute to the open source software community in line with the UH Libraries Strategic Directions. Ongoing: Assessment, Enhancement, Training, and Documenting Throughout the transition process MDS and Web Services will undergo extensive training in workshops and conferences to develop the skills necessary for developing and maintaining the new system. They will also establish and document workflows to ensure the long-term viability of the system. Regular consultation with Special Collections, the branch libraries, and other stakeholders will be conducted to ensure that the new system satisfies the requirements of colleagues and patrons. Ongoing activities will include: ● Assessing service impact of new system ● User testing on UI ● Regular system enhancements ● Establishing new workflows ● Creating and maintaining documentation ● Training: conferences, webinars, workshops, etc. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 17 CONCLUSION Transitioning from CONTENTdm to a Fedora/Hydra repository will place the UH Libraries in a position to sustainably grow the amount of content in the UH Digital Library and customize the UHDL interfaces for a better user experience. Publishing our data in a linked data platform will give the UH Libraries the ability to more easily publish our data for the semantic web. In addition, the Fedora/Hydra architecture can be adapted to support a wide range of UH Libraries projects, including a geospatial data portal, a research data repository, and a self-deposit institutional repository. Over the long term, the return on investment for implementing an open-source repository architecture based on industry standard software will be: improved visibility of our unique collections on the Web; expanded opportunities for aggregating our collections with high- profile repositories such as the Digital Public Library of America; and increased national recognition for our digital projects and staff expertise. REFERENCES 1. “The University of Houston Libraries Strategic Directions, 2013–2016,” accessed July 22, 2015, http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016- libraries-strategic-directions-final.pdf. 2. Dion Hoe-Lian Goh et al., “A Checklist for Evaluating Open Source Digital Library Software,” Online Information Review 30, no. 4 (July 13, 2006): 360–79, doi:10.1108/14684520610686283. 3. Ibid., 366. 4. Ibid., 364. 5. Jody L. DeRidder, “Choosing Software for a Digital Library,” Library Hi Tech News 24, no. 9 (2007): 19–21, doi:10.1108/07419050710874223. 6. Ibid., 21. 7. Jennifer L. Marill and Edward C. Luczak, “Evaluation of Digital Repository Software at the National Library of Medicine,” D-Lib Magazine 15, no. 5/6 (May 2009), doi:10.1045/may2009- marill. 8. Ibid. 9. Ibid. 10. Dora Wagner and Kent Gerber, “Building a Shared Digital Collection: The Experience of the Cooperating Libraries in Consortium,” College & Undergraduate Libraries 18, no. 2–3 (2011): 272–90, doi:10.1080/10691316.2011.577680. 11. Ibid., 280–84. http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://dx.doi.org/10.1108/14684520610686283 http://dx.doi.org/10.1108/07419050710874223 http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1080/10691316.2011.577680 HITTING THE ROAD TOWARDS A GREATER DIGITAL DESTINATION: EVALUATING AND TESTING DAMS AT UNIVERSITY OF HOUSTON LIBRARIES | WU ET AL. | doi:10.6017/ital.v35i2.9152 18 12. Georgios Gkoumas and Fotis Lazarinis, “Evaluation and Usage Scenarios of Open Source Digital Library and Collection Management Tools,” Program: Electronic Library and Information Systems 49, no. 3 (2015): 226–41, doi:10.1108/PROG-09-2014-0070. 13. Ibid., 238–39. 14. Mathieu Andro, Emmanuelle Asselin, and Marc Maisonneuve, “Digital Libraries: Comparison of 10 Software,” Library Collections, Acquisitions, & Technical Services 36, no. 3–4 (2012): 79–83, doi:10.1016/j.lcats.2012.05.002. 15. Ibid., 82. 16. Heather Gilbert and Tyler Mobley, “Breaking Up with CONTENTdm: Why and How One Institution Took the Leap to Open Source,” Code4Lib Journal, no. 20 (2013), http://journal.code4lib.org/articles/8327. 17. Ibid. 18. Ibid. 19. Ibid. 20. NISO Framework Working Group with support from the Institute of Museum and Library Services, A Framework of Guidance for Building Good Digital Collections (Baltimore, MD: National Information Standards Organization (NISO), 2007). 21 . “Linked Data Platform 1.0”, W3C, accessed July 22, 2015, http://www.w3.org/TR/ldp/. 22. “DSpace,” accessed July 22, 2015, http://www.dspace.org/. 23. “Fedora Repository Home,” accessed July 22, 2015, https://wiki.duraspace.org/display/FF/Fedora+Repository+Home. 24. “Hydra Project,” accessed July 22, 2015, http://projecthydra.org/. http://dx.doi.org/10.1108/PROG-09-2014-0070 http://dx.doi.org/10.1016/j.lcats.2012.05.002 http://journal.code4lib.org/articles/8327 http://www.w3.org/TR/ldp/ http://www.dspace.org/ https://wiki.duraspace.org/display/FF/Fedora+Repository+Home http://projecthydra.org/ INTRODUCTION LITERATURE REVIEW DAMS EVALUATION AND ANALYSIS METHODOLOGY Needs Assessment Evaluation, Testing, and Recommendation Cost Considerations Implementation Strategies Phase One: System Installation Phase Two: Data Migration Phase Three: Interface Development Ongoing: Assessment, Enhancement, Training, and Documenting CONCLUSION 9182 ---- Transitioning from XML to RDF: Considerations for an Effective Move Towards Linked Data and the Semantic Web Juliet L. Hardesty INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 51 INTRODUCTION Metadata, particularly within the academic library setting, is often expressed in eXtensible Markup Language (XML) and managed with XML tools, technologies, and workflows. Software tools such as the Oxygen XML Editor and querying languages such as XPath and XQuery over time have become capable of helping that management. However, managing a library’s metadata currently takes on a greater level of complexity as libraries are increasingly adopting the Resource Description Framework (RDF). Semantic Web initiatives are surfacing in the library context with experiments in publishing metadata as Linked Data sets, BIBFRAME development using RDF, and software developments such as the Fedora 4 digital repository using RDF. Challenges are evident when considering examples of transitions from XML into RDF and show the need for communication and coordination between efforts to incorporate and implement RDF. This article outlines these challenges using different use cases from the literature and first-hand experience. The follow-up discussion considers ways to progress forward from metadata formatted in XML to metadata expressed in RDF. The options explored are not only targeted to metadata practitioners considering this transition but also to programmers, librarians, and managers. LITERATURE REVIEW AND CONCEPTS As an initial example of the challenges faced when considering RDF, clarifying terminology is still a helpful activity. RDF focuses on sets of statements describing relationships and meaning. These statements consist of a subject, a predicate, and an object (i.e., an article, has an author, Jane Smith). These statement parts are also referred to as a resource, a property, and a property value. Since there are three parts to RDF statements, they are referred to as triples. The predicate or property of an RDF statement defines the relationship between the subject and the object. RDF ontologies are sets of properties for a particular domain. For example, Darwin Core has an RDF ontology to express biological properties,1 and EBUCore has an RDF ontology to express properties about audiovisual materials.2 Pulling apart the many issues involved in moving from XML to RDF is an exploration into the Juliet L. Hardesty (jlhardes@iu.edu) is Metadata Analyst at Indiana University Libraries, Bloomington, Indiana. mailto:jlhardes@iu.edu TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 52 purpose of metadata, the tools available and their capabilities, and the various strategies that can be employed. Poupeau rightly states that XML provides structural logic in its hierarchical identification of elements and attributes, where RDF provides data logic declaring resources that relate to each other using properties.3 These properties are ideally all identified with single reference points (Uniform Resource Identifiers or URIs) rather than a description encased in an encoding. A source of honest confusion, however, is that RDF can be expressed as XML. Lassila’s note regarding the Resource Description Framework specification from the World Wide Web Consortium (W3C) states, “RDF encourages the view of ‘metadata being data’ by using XML (eXtensible Markup Language) as its encoding syntax.”4 So even though RDF can use XML to express resources that relate to each other via properties, identified with single reference points (URIs), RDF is itself not an XML schema. RDF has an XML language (sometimes called, confusingly, RDF, and from here forward called RDF/XML). Additionally, RDF Schema (RDFS) declares a schema or vocabulary as an extension of RDF/XML to express application-specific classes and properties.5 Simply speaking, RDF defines entities and their relationships using statements. There are various ways to make these statements, but the original way formulated by the W3C is using an XML language (RDF/XML) that can be extended by an additional XML schema (RDFS) to better define those relationships. Ideally, all parts of that relationship (the subject, predicate, object, or the resource, property, property value) are URIs pointing to an authority for that resource, that property, or that property value. An additional concept worth covering is serialization. This term is used as a way to describe how RDF data is expressed using various formatting languages. RDF/XML, N-triples, Turtle, and JSON- LD are all examples of RDF serializations.6 Describing something as being in RDF really means the framework of subject, predicate, object is being used. Describing something as being expressed in RDF/XML or JSON-LD means that the RDF statements have been serialized into either of those formatting languages. Using “RDF” to refer not only to the framework to describe something (RDF) but also the serialization of that description (RDF/XML) can easily muddle the discussion. Other thoughts about the difference between XML and RDF or moving metadata from XML into RDF point to the difference in perspective and the change in thinking that is required to manage such a move. In an online discussion about RDF in relation to TEI (Text Encoding Initiative), Cummings talks about the need for both XML and RDF, using XML to encode text and RDF to extract that data and make it more useful.7 Yee, in her in-depth look at bibliographic data as part of the Semantic Web, points out that RDF is designed to encode knowledge, not information.8 The RDF Primer 1.0 also states “RDF directly represents only binary relationships.”9 XML describes what something is by encoding it with descriptive elements and attributes. RDF, on the other hand, constructs statements about something using direct references—a reference to the thing itself, a reference to the descriptor, and a reference to the descriptor’s value. As Farnel discussed in her 2015 Open Repositories presentation about the University of Alberta’s move to RDF, they learned they were moving from a records-based framework in XML to a things-based framework in RDF.10 What is pointed out here time and again is something else Farnel discussed—moving from XML to INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 53 RDF is not simply a conversion between encoding formats; it is a translation between two different ways of organizing knowledge. It involves understanding the meaning of the metadata encoded in XML and representing that meaning with appropriate RDF statements. The tools most commonly employed for reworking XML into RDF are OpenRefine when accompanied by its RDF extension; a triplestore database such as OpenLink Virtuoso,11 Apache Fuseki,12 or Sesame13; Oxygen XML Editor14; and Protégé,15 an ontology editor. OpenRefine is, according to the website, “a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.”16 The RDF extension, called RDF Refine, allows for importing existing vocabularies and reconciling against SPARQL endpoints (web services that accept SPARQL queries and return results).17,18 SPARQL is similar to SQL as a language for querying a database, but the syntax is specifically designed to allow for querying data formatted in triple statements instead of tables with columns.19 Triplestore databases such as OpenLink Virtuoso can store and index RDF statements for searching as a SPARQL endpoint, offering a way to retrieve information and visualize connections across a collection of triples. Oxygen XML Editor has proven helpful in formulating eXtensible Stylesheet Language (XSL) transformations to move metadata from a particular XML schema or format into RDF/XML or other serializations such as JSON-LD (JavaScript Object Notation for Linking Data).20 Protégé is a tool developed by Stanford University that supports the OWL 2 Web Ontology Language and has helped to convert XML schemas to RDF ontologies and establish ways to express XML metadata in RDF. These tools provide the technical means to take metadata expressed in XML and physically reformat it to metadata expressed in an RDF serialization. What that reformatting also encompasses, however, is a review of the information expressed in XML and a set of decisions as to how to express that information as RDF statements. Strategic approaches and ideas for handling data transformations into RDF have involved the XML schema or document type definition (DTD). These include Thuy, Lee, and Lee’s approach to map an XML schema (the XSD) to RDF, associating simpleType’s XSD in XML with properties in RDF, defining complexType’s XSD in XML as classes in RDF, and handling a hierarchy of XML schema elements with top levels as domains and lower-level elements and attributes as container classes or subproperties in those domains.21 Thuy et al. earlier worked on a method to transform XML to RDF by translating the DTD using RDFS (ELEMENTs in the DTD are RDF classes or subclasses, ATTLISTs are RDF properties, and ENTITIES—preset variables in the DTD—are called up for use in RDF as encountered).22 Similarly, Hacherouf, Bahloul, and Cruz translate an XML schema into an OWL ontology.23 Klein et al. point out that while ontologies serve to describe a domain, XML schemas are meant to provide constraints on documents or structure for data so it can be advantageous to work out an RDF expression this way.24 Tim Berners-Lee puts it simply: “the same RDF tree results from many XML trees,” meaning the same single statement in RDF (an article has an author Jane Smith) can be expressed in many ways in XML and can vary on the basis of the source of the XML, any schemas involved, and the people creating the metadata.25 Transitioning from XML to RDF using the XML schema might serve to ensure all XML elements are TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 54 replicated in RDF but does not necessarily establish the relationships meant by that XML encoding without additional evaluation. There is no single strategy that will always work to move XML metadata into RDF, even within the same set of tools (such as Fedora/Hydra) or the same area of concern (libraries, archives, or museums). USE CASES FOR RDF The following use cases explain approaches to transition to RDF taken from two differing perspectives. The first set describes efforts to express XML schemas or standards as RDF ontologies. The second set describes efforts by various library or cultural-heritage digital collections to transform metadata records into RDF statements. They also show that strategies to transform XML to RDF cannot occur without a shift in view from structure to relationships and, likewise, from descriptive encoding to direct meaning. Moving an XML Schema/Standard to an RDF Ontology As a graduate student at Kent State University, Mixter took on converting the descriptive metadata standard VRA Core 4.0 from an XML schema to an RDF ontology.26 Using the VRA Data Standards Committee Guidelines to ensure all minimum fields were included,27 Mixter mapped VRA XML elements and attributes to schema.org, FOAF, VoID, and DC Terms ontologies. This process is known as “cherry-picking,” or combining various ontologies that already exist to represent properties or relationships (the predicates in RDF statements) as RDF instead of creating new proprietary RDF properties. Using OWL and RDFS as metavocabularies in Protégé, this created an ontology that could “retain the granularity required to describe library, archive, or museum items” of VRA Core 4.0’s design in XML without being a straight conversion of VRA Core 4.0 from XML to RDF.28 The outcome was an XSLT stylesheet that was tested on VRA Core 4.0 XML records to produce that same information as RDF statements. One point that seemed to help in testing was the fact that all controlled vocabulary terms had reference identifiers in the XML (ready-made URIs). Something not discussed in the outcomes was that dates resulted in complex RDF (RDF statements that encompass additional RDF statements or blank nodes) and there was no discussion about this complexity or its effect on using those particular RDF statements. VRA Core 4.0 now has an RDF ontology in draft form, with Mixter as one of its authors.29 The OWL ontology still points to schema.org, FOAF, and VoID for equivalent classes and properties, but everything is now named within a VRA RDF ontology and namespace and translates to such when VRA Core 4.0 XML is transformed to RDF. Another case in the category of going from an XML standard to an RDF ontology is the development of the BIBFRAME model for bibliographic description from the Library of Congress. The BIBFRAME model is expressed as RDF. According to the BIBFRAME site, “in addition to being a replacement for MARC, BIBFRAME serves as a general model for expressing and connecting bibliographic data.”30 MARC has its own format of expression with numbered fields and subfields but can be expressed or serialized in XML and is often shared that way. The BIBFRAME model, INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 55 while revamping the way a bibliographic record is described on the basis of work, instance, authority, and annotation, also provides tools to transform records from MARC/XML to the RDF statements of BIBFRAME.31 A single namespace serves the BIBFRAME model and is explained as a long-term strategy to ensure namespace persistence over the next forty-plus years.32 The transformations produced from Library of Congress MARC records and local MARC records contain complex hierarchical RDF statements, particularly when ascribing authority sources to names, subjects, and types of identifiers. As it is still a work in progress there are no tools making use of BIBFRAME records in RDF. An additional example is the work happening with PBCore, the public broadcasting metadata standard managed by the Corporation for Public Broadcasting.33 Public broadcasting stations and other institutions across the United States provide descriptive, technical, and structural metadata for audiovisual materials using this XML standard. In Boston, WGBH’s use of PBCore coincides with its digital asset management system, HydraDAM, built on Fedora 3 and the Hydra technology stack (based on Blacklight, Solr, and the Fedora Digital Repository).34 Fedora 3 does not natively support RDF statements as properties on objects like Fedora 4. Building off an interest to move HydraDAM to Fedora 4 and leverage RDF for metadata about audiovisual collections, WGBH began exploring transitioning the PBCore XML metadata standard into an RDF ontology. EBUCore, the European Broadcasting Union’s metadata standard, is already expressed as an RDF ontology.35 A comparison between the XML standard of PBCore and the classes and properties expressed in EBUCore revealed that most PBCore elements were covered by the EBUCore ontology.36 Efforts are ongoing to offer PBCore 3.0 as an RDF ontology that uses EBUCore with the addition of a smaller set of properties along with a way to transform PBCore XML to PBCore 3.0 in RDF.37 The Hydra community, in an effort to help the transition from Fedora 3 with its XML binary files of descriptive metadata to Fedora 4 using RDF statements as properties on objects, is working on a recommendation and transformation to move descriptive metadata in MODS XML into RDF that is usable in Fedora 4.38 The MODS standard has a draft of an RDF ontology and a stylesheet transformation available,39 but the complex hierarchical RDF produced from this transformation is unmanageable with the current Fedora 4 architecture. The Hydra MODS and RDF Descriptive Metadata Subgroup is attempting to reflect the MODS elements in simple RDF statements that can be incorporated as properties on a Fedora 4 digital object.40 Led by Steven Anderson at the Boston Public Library, this group is moving through MODS element by element, asking the question, “If you had to express this MODS element from your metadata in RDF today, how would you do that?” Participating institutions are reviewing their MODS records and exploring the possible RDF predicates that could be used to represent the meaning of that information. Some are even considering how to construct those RDF statements so that MODS XML can be re-created as close to the original MODS as possible (this is called “round tripping”). There are still questions as to whether every single MODS element will be reflected in this transformation, how exactly Fedora 4 will make use of these descriptive RDF statements, and if the original MODS XML will need to be preserved as part of the digital object in Fedora, but this group is recognizing that moving from TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 56 Fedora 3 to Fedora 4 requires a major shift in thinking about descriptive metadata. This transformation tool is an effort to help make that transition possible. The Avalon Media System is an open source system for managing and providing access to large collections of digital audio and video.41 It is built on Fedora 3 and the Hydra technology stack and uses MODS XML to store descriptive metadata. As development progresses and the available descriptive fields expand, maintaining the workflow to update XML records in Fedora and reindexing objects in the Hydra interface becomes increasingly complicated. Each time an update is made to descriptive information about an audiovisual item through the Avalon interface, the entire XML record for that object, stored as a binary text file, is rewritten in Fedora 3 and reindexed in Solr. In considering advantages to using Fedora 4, it appears that descriptive metadata properties stored in RDF are easier to manage programmatically (updating content, adding new fields, more focused reindexing) because descriptive information would not be stored in a single binary file but as individual properties on the object. Turning XML metadata into RDF or Linked Data for publishing, search and discovery, and management As Southwick describes the process, the library at the University of Nevada Las Vegas (UNLV) took a collection with descriptive records from CONTENTdm and published them as a single RDF Linked Open Data set.42 After cleaning up controlled vocabulary terms across collections and solidifying locally controlled vocabularies, they exported tab-delimited CSV records from CONTENTdm. These records were brought into OpenRefine with its RDF extension where they reviewed the data and mapped to various properties within the Europeana Data Model (EDM). Controlled vocabulary terms were in text form and had to be reconciled against a SPARQL endpoint, either locally from downloaded data or from the controlled vocabulary service, to gather the URIs to use as the object or value in the RDF statement. OpenRefine was then used to create RDF files that were uploaded to a triplestore (first Mulgara then OpenLink Virtuoso). This provided public access to the Linked Open Data set and a SPARQL endpoint for querying the data set. After publishing the data set they experimented with PivotViewer from OpenLink Virtuoso and RelFinder to see what kinds of connections and relationships could be visualized from the data as Linked Open Data. The outlined steps are clear and the outcomes are described, but interestingly the data set itself no longer appears to be available online.43 Although the UNLV use case relies on CSV instead of XML as the data source, the tools and workflows enlisted to transform the data set into RDF Linked Open Data are still applicable. OpenRefine can import XML just as it imports CSV, so this described case shows the tools that can be used and decisions to be made in processing that data into RDF statements. In Oregon Digital,44 XML from Qualified Dublin Core, VRA Core, and MODS at two different institutions (University of Oregon and Oregon State University) were mapped as Linked Open Data and stored in a triplestore to be served up in a new web application using the Hydra technology stack.45 An inventory of metadata fields across all collections was first mapped to existing Linked INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 57 Data terms, or properties (those with available URIs), then properties that were needed in the new web application but did not have available corresponding URIs were mapped to a newly devised local namespace for Oregon Digital. Any properties that were not used were kept in the original static XML file for the record as part of the digital object in Fedora. The focus here appears to be on mapping properties without as much detail provided on whether the objects were kept as text or mapped to URI values where possible. From the sample record provided the objects appear to be text and not URIs. The real power of this project is finding common properties to describe objects from diverse collections and institutions. What also comes out in the example mappings is the use of many different namespaces or ontologies (DC Terms, MARC Relators, but also MODS and MADS that produce complex RDF). The University of Alberta also combined a variety of XML metadata from different sources into a new digital asset management system based on Fedora 4 and the Hydra technology stack, called the Education and Research Archive.46 Reporting on the experience at Open Repositories 2015, Farnel described the process as working in phases.47 Beginning with item types, languages, and licenses, then moving to place names and controlled subject terms, and finally person names and free-form subjects, they made multiple passes converting XML metadata into RDF statements and incorporating URIs whenever possible. They are combining all of this into a single data dictionary,48 making use of several RDF ontologies to cover the various metadata properties that are being described about objects and collections. University of California at San Diego (UCSD) has developed a local data model using a mix of external (MADS, VRA Core, Darwin Core, PREMIS) and local ontologies. They published a data dictionary and are working on a substantially different revision as part of the metadata workflow they use to bring digital objects into their digital asset management system from a variety of source metadata formats including XML.49 This allows metadata to be created from disparate source formats and makes it possible to bring them together as RDF for delivery, management, and preservation. DISCUSSION If metadata is in XML form and the desire is to express it as RDF, this is not merely a transformation from one XML schema to another. It is changing the expression of that data and changing its use. Having metadata in XML means information is encoded in a specific way that allows for interchange and sharing. Having metadata in RDF is making statements that have direct meaning and can be used independently. There are different perspectives involved in metadata when approaching RDF: those that manage metadata standards (the XML standard side) and those that have metadata encoded using those XML standards (the data management side). Depending on the desired outcomes, the needs of these two perspectives can conflict. When managing a metadata standard the RDF transition tends to follow certain patterns: TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 58 • Transform an XML standard into a new RDF ontology o Examples: Dublin Core (DC), Darwin Core (DWC), MODS, VRA Core • Establish a move to RDF that incorporates another existing ontology o Example: PBCore, Hydra community From the data management side, the RDF transition means different patterns occur. These scenarios often start by reviewing the needed outcome, deciding how much metadata needs to be expressed in RDF, and what works best to get the metadata to that point. Cases include the following commonalities: • Creating new search and discovery end user applications o Example: Oregon Digital, University of Alberta • Publishing Linked Data sets o Example: UNLV, University of Alberta • Managing metadata using software that supports RDF o Example: University of Alberta, UCSD, Hydra community Conflicts are occurring when the needed outcome on the data management side is not supported by the RDF ontology transitions that have occurred for the XML standards being used. An example of this is how RDF is handled in Fedora 4. When RDF is complex (the object of one statement is another entire RDF statement), Fedora produces blank nodes as new objects within the repository. While not technically problematic, descriptive metadata with complex RDF can result in a situation where a digital object ends up referencing a blank node that then points to, for example, a subject or a genre. This subject or genre has been created as its own object within the digital repository even though that subject or genre is only meant to provide meaning for the digital object. MODS RDF produces this complexity and thus is not workable to use with Fedora 4. In contrast, other standards such as DC or DWC in RDF produce simple statements that Fedora 4 can apply to a digital object without any additional processing. Complications in transitioning from XML to RDF also occur when the original XML does not include URIs or authority-controlled sources. Converting this metadata to RDF can mean locally minting URIs or bringing data over as literals (strings of text) without using URIs at all. Ideally, the result is somewhere in the middle with externally controlled vocabularies incorporated as much as possible and literals or locally minted URIs only used where absolutely necessary. Translating strings to authoritative sources is intensive work. If the XML standard cannot be expressed as a single RDF ontology, work is further complicated by the need to map XML elements to different RDF ontologies using logic that is often decided locally. While it is possible to transition XML to RDF, the process is not uniform and the pathway involves a lot of labor. Potential alleviators for this labor might involve a more user-centered approach by XML standard bodies to consider the ways their standards can be used when translated into RDF (“users” in this context meaning the users of the standards, not the end users searching and INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 59 discovering digital content). Triplestores can manage queries for complex RDF, but digital repository systems are not there yet. Those that support RDF for description of objects do so on the basis of simple property statements. A complex RDF ontology is going to be a challenge to support over time. Another way to progress forward is for the data management side of the equation to focus efforts on showing, in an end user search and discovery format, what is currently possible when XML is transitioned into RDF. Published Linked Data sets need to have interfaces for access and use, showing the value of what is currently available and any needs or gaps that remain. Libraries and cultural-heritage organization engaged in this work should also openly share the processes that both work and do not work so others contemplating this transformation can consider how to forge ahead themselves. Libraries and cultural-heritage organizations moving metadata from XML to RDF should provide feedback to XML standard bodies regarding the usefulness or complications of any RDF transitional help an XML standard might provide. Technologies for incorporating RDF into web applications and truly connecting triples across the web also require further work. Triplestores have so far been the main way to expose data sets but have not been incorporated into common library or cultural-heritage end user search and discovery web applications. Additionally, triplestore use does not seem to extend to management or long-term storage of complete data about digital objects. There seems to be a decision to either reduce the data stored in a triplestore down to simple statements or use the triplestore more like an isolated index or SPARQL endpoint only and manage the complete metadata record separately (in a static file containing text or in a separate database). That aligns triples in RDF more with relational database storage than with catalog records. Triple statements focus on relationships and not the complete unique details of the thing being described. Triplestores can handle complex hierarchical RDF graphs and provide responses on the basis of queries against those complexities,50 but triplestores do not appear to be taking over as either the main search and discovery mechanism for online digital resources or for digital object management. Software using RDF natively is also not currently widespread. A project such as the BIBFRAME Initiative that plans to incorporate RDF needs to make sure the complexity of its data model in RDF is manageable by any tools it produces and that it is possible for vendors and suppliers to encompass the data model in their software development. CONCLUSION The reasons for deciding metadata should transition to RDF are just as important as determining the best process for implementing that transition. Reasons for transitioning to RDF are conceptually based around making data more easily shareable and setting up data to have meaning and relationships as opposed to local static description that requires programmatic interpretation. The use cases outlined in this article show the reality does not quite yet match the concept. Transitioning an XML standard to RDF does not make that data more shareable or more easily understood unless there are end user applications for using that data in RDF. Publishing TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 60 Linked Data involves going through transitional steps, but the endpoint seems to be more of a byproduct. The real goal is going through the process of producing Linked Data to learn how that works. Self-contained projects that aim to express collections in RDF for the purpose of a new search and discovery interface are more successful in implementing RDF that has that new level of meaning and relationship. Beyond the borders of these projects, however, the data is not being shared or used. The use cases described above show some examples of what is happening now when transitioning from XML to RDF. Approaches include XML standards converting to RDF expression as well as digital collections with metadata in XML that have an interest in producing that metadata as RDF. Software that incorporates RDF is still developing and maturing. Helping that process along by providing a pathway from XML to functionally usable RDF improves the chances of the Semantic Web becoming a real and useful thing. It is vital to understand that transitioning from XML to RDF requires a shift in perspective from replicating structures in XML to defining meaningful relationships in RDF. Metadata work is never easy, and for metadata to move from encoded strings of text to statements with semantic relationships requires coordination and communication. How best to achieve this coordination and communication is a topic worth engaging as the move to use RDF, produce Linked Data, and approach the Semantic Web continues. BIBLIOGRAPHY Berners-Lee, Tim. “Linked Data.” Linked Data - Design Issues, June 18, 2009. http://www.w3.org/DesignIssues/LinkedData.html. ———. “Why RDF Model Is Different from the XML Model.” Semantic Web, September 1998. http://www.w3.org/DesignIssues/RDF-XML.html. Estlund, Karen, and Tom Johnson. “Link It or Don’t Use It: Transitioning Metadata to Linked Data in Hydra,” July 2013. http://ir.library.oregonstate.edu/xmlui/handle/1957/44856. Farnel, Sharon. “Metadata at a Crossroads: Shifting ‘from Strings to Things’ for Hydra North.” Slideshow presented at the Open Repositories, Indianapolis, Indiana, 2015. http://slideplayer.com/slide/5384520/. Hacherouf, Mokhtaria, Safia Nait Bahloul, and Christophe Cruz. “Transforming XML Documents to OWL Ontologies: A Survey.” Journal of Information Science 41, no. 2 (April 1, 2015): 242–59. doi:10.1177/0165551514565972. Klein, Michel, Dieter Fensel, Frank van Harmelen, and Ian Horrocks. “The Relation between Ontologies and XML Schemas.” In Linköping Electronic Articles in Computer and Information Science, 2001. doi:10.1.1.14.1037. Lassila, Ora. “Introduction to RDF Metadata.” W3C, November 13, 1997. http://www.w3.org/TR/NOTE-rdf-simple-intro-971113.html. http://www.w3.org/DesignIssues/LinkedData.html http://www.w3.org/DesignIssues/RDF-XML.html http://ir.library.oregonstate.edu/xmlui/handle/1957/44856 http://slideplayer.com/slide/5384520/ http://dx.doi.org/10.1177/0165551514565972 http://dx.doi.org/10.1.1.14.1037 http://www.w3.org/TR/NOTE-rdf-simple-intro-971113.html INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 61 Manola, Frank, and Eric Miller. “RDF Primer 1.0, Section 2.3 Structured Property Values and Blank Nodes.” W3C Recommendation, February 10, 2004. http://www.w3.org/TR/2004/REC-rdf- primer-20040210/#structuredproperties. Mixter, Jeff. “Using a Common Model: Mapping VRA Core 4.0 Into an RDF Ontology.” Journal of Library Metadata 14, no. 1 (January 2014): 1–23. 10.1080/19386389.2014.891890. Poupeau, Gautier. “XML vs RDF: logique structurelle contre logique des données (XML vs RDF: structural logic against logic data).” Les Petites Cases, August 29, 2010. http://www.lespetitescases.net/xml-vs-rdf. “RDF and TEI XML,” October 13, 2010. https://listserv.brown.edu/archives/cgi- bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928. Southwick, Silvia B. “A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies.” Journal of Library Metadata 15, no. 1 (March 2015): 1–35. doi: 10.1080/19386389.2015.1007009. Thuy, Pham Thi Thu, Young-Koo Lee, and Sungyoung Lee. “A Semantic Approach for Transforming XML Data into RDF Ontology.” Wireless Personal Communications 73, no. 4 (2013): 1387–1402. doi: 10.1007/s11277-013-1256-z. Thuy, Pham Thi Thu, Young-Koo Lee, Sungyoung Lee, and Byeong-Soo Jeong. “Transforming Valid XML Documents into RDF via RDF Schema.” In Next Generation Web Services Practices, International Conference on, 0:35–40. Los Alamitos, CA: IEEE Computer Society, 2007. doi:10.1109/NWESP.2007.23. “XML RDF.” W3Schools. Accessed September 30, 2015. http://www.w3schools.com/xml/xml_rdf.asp. Yee, Martha M. “Can Bibliographic Data Be Put Directly onto the Semantic Web?” Information Technology and Libraries 28, no. 2 (March 1, 2013): 55–80. doi:10.6017/ital.v28i2.3175. NOTES 1. “Darwin Core,” Darwin Core Task Group, Biodiversity Information Standards, last modified May 5, 2015, http://rs.tdwg.org/dwc/. 2. “Metadata specifications,” European Broadcasting Union, https://tech.ebu.ch/MetadataEbuCore. 3. Gautier Poupeau, “XML vs RDF: logique structurelle contre logique des données (XML vs RDF: structural logic against logic data),” Les Petites Cases (blog), August 29, 2010, http://www.lespetitescases.net/xml-vs-rdf. 4. Ora Lassila, “Introduction to RDF Metadata,” W3C, November 13, 1997, http://www.w3.org/TR/NOTE-rdf-simple-intro-971113.html. http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#structuredproperties http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#structuredproperties http://dx.doi.org/10.1080/19386389.2014.891890 http://www.lespetitescases.net/xml-vs-rdf https://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928 https://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928 http://dx.doi.org/10.1080/19386389.2015.1007009 http://dx.doi.org/10.1007/s11277-013-1256-z http://dx.doi.org/10.1109/NWESP.2007.23 http://www.w3schools.com/xml/xml_rdf.asp http://dx.doi.org/10.6017/ital.v28i2.3175 http://rs.tdwg.org/dwc/ https://tech.ebu.ch/MetadataEbuCore http://www.lespetitescases.net/xml-vs-rdf http://www.w3.org/TR/NOTE-rdf-simple-intro-971113.html TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 62 5. “XML RDF,” W3Schools, accessed September 30, 2015, http://www.w3schools.com/xml/xml_rdf.asp. 6. See “Serialization formats” from Resource Description Framework on Wikipedia. “Resource Description Framework,” Wikipedia, March 18, 2016, https://en.wikipedia.org/wiki/Resource_Description_Framework#Serialization_formats. 7. “RDF and TEI XML,” email thread on TEI-L@listserv.brown.edu, October 13–18, 2010, https://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928. 8. Martha M. Yee, “Can Bibliographic Data Be Put Directly onto the Semantic Web?” Information Technology & Libraries 28, no. 2 (March 1, 2013): 57, doi:10.6017/ital.v28i2.3175. 9. Frank Manola and Eric Miller, “RDF Primer 1.0, Section 2.3 Structured Property Values and Blank Nodes,” W3C Recommendation, February 10, 2004, http://www.w3.org/TR/2004/REC- rdf-primer-20040210/#structuredproperties. 10. Sharon Farnel, “Metadata at a Crossroads: Shifting ‘from Strings to Things’ for Hydra North” (slideshow presentation, Open Repositories, Indianapolis, Indiana, 2015), http://slideplayer.com/slide/5384520/. 11. http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/. 12. https://jena.apache.org/documentation/fuseki2/. 13. http://rdf4j.org. 14. http://www.oxygenxml.com. 15. http://protege.stanford.edu. 16. http://openrefine.org. 17. https://en.wikipedia.org/wiki/SPARQL. 18. http://refine.deri.ie. 19. https://jena.apache.org/tutorials/sparql.html. 20. http://json-ld.org. 21. Pham Thi Thu Thuy, Young-Koo Lee, and Sungyoung Lee, “A Semantic Approach for Transforming XML Data into RDF Ontology,” Wireless Personal Communications 73, no. 4 (2013): 1392–95, doi:10.1007/s11277-013-1256-z. 22. Pham Thi Thu Thuy et al., “Transforming Valid XML Documents into RDF via RDF Schema,” in Next Generation Web Services Practices, International Conference on, vol. 0 (Los Alamitos, CA: IEEE Computer Society, 2007), 37, doi:10.1109/NWESP.2007.23. http://www.w3schools.com/xml/xml_rdf.asp https://en.wikipedia.org/wiki/Resource_Description_Framework#Serialization_formats https://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1010&L=TEI-L&D=0&P=28928 http://dx.doi.org/10.6017/ital.v28i2.3175 http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#structuredproperties http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#structuredproperties http://slideplayer.com/slide/5384520/ http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/ https://jena.apache.org/documentation/fuseki2/ http://rdf4j.org/ http://www.oxygenxml.com/ http://protege.stanford.edu/ http://openrefine.org/ https://en.wikipedia.org/wiki/SPARQL http://refine.deri.ie/ https://jena.apache.org/tutorials/sparql.html http://json-ld.org/ http://dx.doi.org/10.1007/s11277-013-1256-z http://dx.doi.org/10.1109/NWESP.2007.23 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 63 23. See Mokhtaria Hacherouf, Safia Nait Bahloul, and Christophe Cruz, “Transforming XML Documents to OWL Ontologies: A Survey,” Journal of Information Science 41, no. 2 (April 1, 2015): 242–59, doi:10.1177/0165551514565972. 24. Michel Klein et al., “The Relation between Ontologies and XML Schemas,” section 5 in Linköping Electronic Articles in Computer and Information Science, 6 (2001), doi:10.1.1.108.7190. 25. Tim Berners-Lee, “Why RDF Model Is Different from the XML Model,” Semantic Web Road map, September 1998, http://www.w3.org/DesignIssues/RDF-XML.html. 26. See Jeff Mixter, “Using a Common Model: Mapping VRA Core 4.0 Into an RDF Ontology,” Journal of Library Metadata 14, no. 1 (January 2014): 1–23, doi:10.1080/19386389.2014.891890. 27. The document currently labeled “How to Convert Version 3.0 to Version 4.0” contains a recommendation for a minimum set of elements for “meaningful retrieval” in VRA Core: http://www.loc.gov/standards/vracore/convert_v3-v4.pdf. 28. Mixter, “Using a Common Model,” 2. 29. “VRA Core RDF Ontology Available for Review,” Visual Resources Association, October 7, 2015, http://vraweb.org/vra-core-rdf-ontology-available-for-review/. 30. “Bibliographic Framework Initiative,” Library of Congress, https://www.loc.gov/bibframe/. 31. See “MARC to BIBFRAME transformation tools” at “Tools” BIBFRAME, http://bibframe.org/tools/. 32. “Why a single namespace for the BIBFRAME vocabulary?” Library of Congress, BIBFRAME Frequently Asked Questions, https://www.loc.gov/bibframe/faqs/#q06. 33. “PBCore 2.1,” Public Broadcasting Metadata Dictionary Project, http://pbcore.org. 34. “WGBH,” Hydra Community Partners, http://projecthydra.org/community-2-2/partners-and- more/wgbh/. 35. “Metadata specifications,” European Broadcasting Union, https://tech.ebu.ch/MetadataEbuCore. 36. See notes from PBCore Hackathon Part 2, which occurred in June 2015 showing an element- by-element analysis of PBCore against EBUCore. “PBCore Hackathon Part 2,” June 15, 2015, https://docs.google.com/document/d/1pWDfYIzHpfjCn5RWJ1fioweXg5RIrXuDxCWkBQ5BMl A/. 37. “Join us for the PBCore Sub-Committee Meeting at AMIA!” Public Broadcasting Metadata Dictionary Project Blog, November 11, 2015, http://pbcore.org/join-us-for-the-pbcore-sub- committee-meeting-at-amia/. http://dx.doi.org/10.1177/0165551514565972 http://dx.doi.org/10.1.1.108.7190 http://www.w3.org/DesignIssues/RDF-XML.html http://dx.doi.org/10.1080/19386389.2014.891890 http://www.loc.gov/standards/vracore/convert_v3-v4.pdf http://vraweb.org/vra-core-rdf-ontology-available-for-review/ https://www.loc.gov/bibframe/ http://bibframe.org/tools/ https://www.loc.gov/bibframe/faqs/#q06 http://pbcore.org/ http://projecthydra.org/community-2-2/partners-and-more/wgbh/ http://projecthydra.org/community-2-2/partners-and-more/wgbh/ https://tech.ebu.ch/MetadataEbuCore https://docs.google.com/document/d/1pWDfYIzHpfjCn5RWJ1fioweXg5RIrXuDxCWkBQ5BMlA/ https://docs.google.com/document/d/1pWDfYIzHpfjCn5RWJ1fioweXg5RIrXuDxCWkBQ5BMlA/ http://pbcore.org/join-us-for-the-pbcore-sub-committee-meeting-at-amia/ http://pbcore.org/join-us-for-the-pbcore-sub-committee-meeting-at-amia/ TRANSITIONING FROM XML TO RDF | HARDESTY doi: 10.6017/ital.v35i1.9182 64 38. “MODS and RDF Descriptive Metadata Subgroup,” last modified March 19, 2016, https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgrou p 39. “MODS RDF Ontology,” Library of Congress, https://www.loc.gov/standards/mods/modsrdf/. 40. “MODS and RDF Descriptive Metadata Subgroup,” last modified March 19, 2016, https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgrou p 41. “Avalon Media System,” http://www.avalonmediasystem.org. 42. See Silvia B. Southwick, “A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies,” Journal of Library Metadata 15, no. 1 (March 2015): 1– 35, http://dx.doi.org/10.1080/19386389.2015.1007009. 43. The URL for information is a blog with no links to a data set (https://www.library.unlv.edu/linked-data) and the collection site seems to still be based on CONTENTdm (http://digital.library.unlv.edu/collections). 44. “Oregon Digital,” http://oregondigital.org. 45. See Karen Estlund and Tom Johnson, “Link It or Don’t Use It: Transitioning Metadata to Linked Data in Hydra,” July 2013, http://ir.library.oregonstate.edu/xmlui/handle/1957/44856, accessed from ScholarsArchive@OSU. 46. “ERA: Education & Research Archive,” https://era.library.ualberta.ca. 47. Farnel, “Metadata at a Crossroads.” 48. https://docs.google.com/spreadsheets/d/1hSd6kf4ABm- m8VtYNyqfJGtiZG7bLJQ3fWRbF_nVoIw/edit#gid=1362636241. 49. The substantially revised data model is not available online yet, but the following shows some of the progress toward an RDF data model: “Overview of DAMs Metadata Workflow,” UC San Diego, May 21, 2014, https://tpot.ucsd.edu/metadata-services/mas/data-workflow.html; “DAMS4 Data Dictionary,” https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/da ta-dictionary.html, retrieved from GitHub. 50. See the Apache Jena SPARQL Tutorial for an example of complex RDF with sample queries against that complexity. “SPARQL Tutorial - Data Formats,” The Apache Software Foundation, https://jena.apache.org/tutorials/sparql_data.html. https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup https://www.loc.gov/standards/mods/modsrdf/ https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup http://www.avalonmediasystem.org/ http://dx.doi.org/10.1080/19386389.2015.1007009 https://www.library.unlv.edu/linked-data http://digital.library.unlv.edu/collections http://oregondigital.org/ http://ir.library.oregonstate.edu/xmlui/handle/1957/44856 https://era.library.ualberta.ca/ https://docs.google.com/spreadsheets/d/1hSd6kf4ABm-m8VtYNyqfJGtiZG7bLJQ3fWRbF_nVoIw/edit#gid=1362636241 https://docs.google.com/spreadsheets/d/1hSd6kf4ABm-m8VtYNyqfJGtiZG7bLJQ3fWRbF_nVoIw/edit#gid=1362636241 https://tpot.ucsd.edu/metadata-services/mas/data-workflow.html https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html https://jena.apache.org/tutorials/sparql_data.html LITERATURE REVIEW AND CONCEPTS USE CASES FOR RDF DISCUSSION CONCLUSION BIBLIOGRAPHY NOTES 9190 ---- Library Discovery Products: Discovering User Expectations through Failure Analysis Irina Trapido INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 9 ABSTRACT As the new generation of discovery systems evolve and gain maturity, it is important to continually focus on how users interact with these tools and what areas they find problematic. This study looks at user interactions within SearchWorks, a discovery system developed by Stanford University Libraries, with an emphasis on identifying and analyzing problematic and failed searches. Our findings indicate that users still experience difficulties conducting author and subject searches, could benefit from enhanced support for browsing, and expect their overall search experience to be more closely aligned with that on popular web destinations. The article also offers practical recommendations pertaining to metadata, functionality, and scope of the search system that could help address some of the most common problems encountered by the users. INTRODUCTION In recent years, rapid modernization of online catalogs has brought library discovery to the forefront of research efforts in the library community, giving libraries an opportunity to take a fresh look at such important issues as the scope of the library catalog, metadata creation practices, and the future of library discovery in general. While there is an abundance of studies looking at various aspects of planning, implementation, use, and acceptance of these new discovery environments, surprisingly little research focuses specifically on user failure. The present study aims to address this gap by identifying and analyzing potentially problematic or failed searches. It is hoped that focusing on common error patterns will help us gain a better understanding of users’ mental models, needs, and expectations that should be considered when designing discovery systems, creating metadata, and interacting with library patrons. TERMINOLOGY In this paper, we adopt a broad definition of discovery products as “tools and interfaces that a library implements to provide patrons the ability to search its collections and gain access to materials.”1 These products can be further subdivided into the following categories: Irina Trapido (itrapido@stanford.edu) is Electronic Resources Librarian at Stanford University Libraries, Stanford, California. mailto:itrapido@stanford.edu LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 10 • Online catalogs (OPACs)—patron-facing modules of an integrated library system. • Discovery layers (also referred to as “discovery interfaces” or “next-generation library catalogs”)—new catalog interfaces, decoupled from the integrated library system and offering enhanced functionality, such as faceted navigation, relevance-ranked results, as well as the ability to incorporate content from institutional repositories and digital libraries. • Web-scale discovery tools, which in addition to providing all interface features and functionality of next generation catalogs, broaden the scope of discovery by systematically aggregating content from library catalogs, subscription databases, and institutional digital repositories into a central index. LITERATURE REVIEW To identify and investigate problems that end users experience in the course of their regular searching activities, we analyzed digital traces of user interactions with the system recorded in the system’s log files. This method, commonly referred to as transaction log analysis, has been a popular way of studying information-seeking in a digital environment since the first online search systems came into existence, allowing researchers to monitor system use and gain insight into the users’ search process. Server logs have been used extensively to examine user interactions with web search engines, consistently showing that web searchers tend to engage in short search sessions, enter brief search statements, do not browse the results beyond the first page, and rarely resort to advanced searching.2 A similar picture has emerged from transaction log studies of library catalogs. Researchers have found that library users employ the same surface strategies: queries within library discovery tools are equally short and simply constructed; 3 the majority of search sessions consist of only one or two actions.4 Patrons commonly accept the system’s default search settings and rarely take advantage of a rich set of search features traditionally offered by online catalogs, such as Boolean searching, index browsing, term truncation, and fielded searching.5 Although advanced searching in library discovery layers is uncommon, faceted navigation, a new feature introduced into library catalogs in the mid-2000s, quickly became an integral part of the users’ search process. Research has shown that facets in library discovery interfaces are used both in conjunction with text searching, as a search refinement tool, and as a way to browse the collection with no search term entered.6 A recent study that analyzed interaction patterns in a faceted library interface at the North Carolina State University using log data and user experiments demonstrated that users of faceted interfaces tend to issue shorter queries, go through fewer iterations of query reformulation, and scan deeper along the result list than those who use nonfaceted search systems. The authors also concluded that facets increase search accuracy, especially for complex and open-ended tasks, and improve user satisfaction.7 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 11 Another traditional use of transaction logs has been to gauge the performance of library catalogs, mostly through measuring success and failure rates. While the exact percentage of failed searches varied dramatically depending on the system’s search capabilities, interface design, the size of the underlying database, and, most importantly, on the researchers’ definition of an unsuccessful search, the conclusion was the same: the incidence of failure in library OPACs was extremely high.8 In addition to reporting error rates, these studies also looked at the distribution of errors by search type (title, author, or subject search) and categorized sources of searching failure. Most researchers agreed that typing errors and misspellings accounted for a significant portion of failed searches and were common across all search types.9 Subject searching, which remained the most problematic area, often failed because of a mismatch between the search terms chosen by the user and the controlled vocabulary contained in the library records, suggesting that users experienced considerable difficulties in formulating subject queries with Library of Congress Subject Headings.10 Other errors reported by researchers, such as the selection of the wrong search index or the inclusion of the initial article for title searches, were also caused by users’ lack of conceptual understanding of the search process and the system’s functions.11 These research findings were reinforced by multiple observational studies and user interviews, which showed that patrons found library catalogs “illogical,” “counter-intuitive,” and “intimidating,”12 and that patrons were unwilling to learn the intricacies of catalog searching.13 Instead, users expected simple, fast, and easy searching across the entire range of library collections, relevance-ranked results that exactly matched what users expected to find, and convenient and seamless transition from discovery to access.14 Today’s library discovery systems have come a long way: they offer one-stop search for a wide array of library resources, intuitive interfaces that require minimal training to be searched effectively, facets to help users narrow down the result set, and much more.15 But are today’s patrons always successful in their searches? Usability studies of next-generation catalogs and, more recently, of web-scale discovery systems have pointed to patron difficulties associated with the use of certain facets, mostly because of terminological issues and inconsistencies in the underlying metadata.16 Researchers also reported that users had trouble interpreting and evaluating the results of their search;17 users also were confused as to what resources were covered by the search tool.18 Our study builds on this line of research by systematically analyzing real-life problematic searches as reported by library users and recorded in transaction logs. BACKGROUND Stanford University is a private, four-year or above research university offering undergraduate and graduate degrees in a wide range of disciplines to about sixteen thousand students. The study analyzed the use of SearchWorks, a discovery platform developed by Stanford University Libraries. SearchWorks features a single search box with a link to advanced search on every page, relevance- ranked results, faceted navigation, enhanced textual and visual content (summaries, tables of LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 12 content, book cover images, etc.), as well as “browse shelf” functionality. SearchWorks offers searching and browsing of catalog records and digital repository objects in a single interface; however, it does not allow article-level searching. SearchWorks was developed on the basis of Blacklight (projectblacklight.org), an open-source application for searching and interacting with collections of digital objects.19 Thanks to Blacklight’s flexibility and extensibility, SearchWorks enables discovery across an increasingly diverse range of collections (MARC catalog records, archival materials, sound recordings, images, geospatial data, etc.) and allows to continuously add new features and improvements (e.g., https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released). STUDY OBJECTIVES The goal of the present study was two-fold. First, we sought to determine how patrons interact with the discovery systems, which features they use and with what frequency. Second, this study aimed to identify and analyze problems that users encounter in their search process. METHOD This study used data comprising four years of SearchWorks use, which was recorded in Apache Solr logs. The analysis was performed at the aggregate level; no attempts were made to identify individual searchers from the logs. At the preprocessing stage, we created and used a series of Perl scripts to clean and parse the data and extract only those transactions where the user entered a search query and/or selected at least one facet value. Page views of individual records were excluded from the analysis. The resulting output file contained the following parameters for each transaction: a time stamp, search mode used (basic or advanced), query terms, search index (“all fields,” “author,” “title,” “subject,” etc.), facets selected, and the number of results returned. The query stream was subsequently partitioned into task-based search sessions using a combination of syntactic features (word co- occurrence across multiple transactions) and temporal features (session time-outs: we used fifteen minutes of inactivity as a boundary between search sessions). The analysis was conducted over the following datasets: Dataset 1. Aggregate data of approximately 6 million search transactions conducted between February 13, 2011, and December 31, 2014. We performed quantitative analysis of this set to identify general patterns of system use. Dataset 2. A sample of 5,101 search sessions containing 11,478 failed or potentially problematic interactions performed in the basic search mode and 2,719 sessions containing 3,600 advanced searches, annotated with query intent and potential cause of the problem. The searches were performed during eleven twenty-four-hour periods, representing different years, academic http://projectblacklight.org/ https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 13 quarters, times of the school year (beginning of the quarter, midterms, finals, breaks), and days of the week. This dataset was analyzed to identify common sources of user failure. Dataset 3. User feedback messages submitted to SearchWorks between January 2011 and December 2014 through the “Feedback” link, which appears on every SearchWorks page. While the majority of feedback messages were error and bug reports, this dataset also contained valuable information about how users employed various features of the discovery layer, what problems they encountered, and what features they felt would improve their search experience. For the manual analysis of dataset 2, all searches within a search session were reconstructed in SearchWorks and, in some cases, also in external sources such as WorldCat, Google Scholar, and Google. They were subsequently assigned to one of the following categories: known-item searches (searches for a specific resource by title, combination of title and author, a standard number such as ISSN or ISBN, or a call number), author searches (queries for a specific person or organization responsible for or contributing to a resource), topical searches, browse searches (searches for a subset of the library collection, e.g., “rock operas,” “graphic novels,” “DVDs,” etc.), invalid queries, and queries where the search intent could not be established. To identify potentially problematic transactions, the following heuristic was employed: we selected all search sessions where at least one transaction failed to retrieve any records, as well as sessions consisting predominantly of known-item or author searches, where the user repeated or reformulated the query three or more times within a five-minute time frame. We hypothesized that this search pattern could be part of the normal query formulation process for topical searches, but it could serve as an indicator of the user’s dissatisfaction with the results of the initial query for known-item and author searches. We identified seventeen distinct types of problems, which we further aggregated into the following five groups: input errors, absence of the resource from the collection, queries at the wrong level of granularity, erroneous or too restrictive use of limiters, and mismatch between the search terms entered and the library metadata. Each search transaction in dataset 2 was manually reviewed and assigned to one or more of these error categories. FINDINGS Usage Patterns Our analysis of the aggregate data suggests that keyword searching remains the primary interaction paradigm with the library discovery system, accounting for 76 percent of all searches. However, users also increasingly take advantage of facets both for browsing and refining their searches: the use of facets grew from 25 percent in 2011 to 41 percent in 2014. LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 14 Although both the basic and the advanced search modes allow for “fielded” searches, where the user can specify which element of the record to search (author, title, subject, etc.), searchers rarely made use of this feature, relying mostly on the system’s defaults (the “all fields” search option in the basic search mode): users selected a specific search index in less than 25 percent of all basic searches. Advanced searching was infrequent and declining (from 11 percent in 2011 to 4 percent in 2014). Typically, users engaged in short sessions with a mean session length of 1.5 queries. Search queries were brief: 2.9 terms per query on average. Single terms made up 23 percent of queries; 26 percent had two terms, and 19 percent had three terms. Error Patterns The breakdown of errors by category and search mode is shown in figure 1. In the following sections, we describe and analyze different types of errors. Figure 1. Breakdown of errors by category and search mode Input Errors Input errors accounted for the largest proportion of problematic searches in the basic search mode (29 percent) and for 5 percent of problems in the advanced search. While the majority of such errors occurred at the level of individual words (misspellings or typographical errors), entire search statements were also imprecise and erroneous (e.g., “Diary of an Economic Hit Man” instead of “Confessions of an Economic Hit Man” and “Dostoevsky War and Peace” instead of “Tolstoy War and Peace”). It is noteworthy that in 46 percent of all search sessions containing INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 15 problems of this type, users subsequently entered a corrected query. However, if such errors occurred in a personal name, they were almost half as likely to be corrected. Absence of the Item Sought from the Collection Queries for materials that were not in the library’s collection accounted for about a quarter of all potentially problematic searches. In the advanced search modality, where the query is matched against a specific search field, such queries typically resulted in zero hits and can hardly be considered failures per se. However, in the default cross-field search, users were often faced with the problem of false hits and had to issue multiple progressively more specific queries to ascertain that the desired resource was absent from the collection. Queries at the Wrong Level of Granularity A substantial number of user queries failed because they were posed at the level of specificity not supported by the catalog. Such queries accounted for the largest percentage of problematic advanced searches (63 percent), where they consisted almost exclusively of article-level searching: users either tried to locate a specific article (often by copying the entire citation or its part from external sources) or conducted highly specific topical searches more suitable for a full- text database. In the basic search mode, the proportion of searches at the wrong granularity level was much lower, but still substantial (20 percent). In addition to searches for articles and narrowly defined subject searches, users also attempted to search for other types of more granular content, such as book chapters, individual papers in conference proceedings, poems, songs, etc. Erroneous or Too Restrictive Use of Limiters Another common source of failure was the selection of the wrong search index or a facet that was too restrictive to yield any results. The majority of these errors were purely mechanical: users failed to clear out search refinements from their previous search or entered query terms into the wrong search field. However, our analysis also revealed several conceptual errors, typically stemming from a misunderstanding of the meaning and purpose of certain limiters. For example, “Online,” “Database,” and “Journal/Periodical” facets were often perceived by the user as a possible route to article-level content. Even seemingly straightforward limiters such as “Date” caused confusion, especially when applied to serial publications: users attempted to employ this facet to drill down to the desired journal issue or article, most likely acting on the assumption that the system included article-level metadata. Lack of Correspondence between the Users’ Search Terms and the Library Metadata A significant number of problems in this group involved searches for non-English materials. When performed in their English transliteration, such queries often failed because of users’ lack of LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 16 familiarity with the transliteration rules established by the library community, whereas searches in the vernacular scripts tended to produce incomplete or no results because not all bibliographic records in the database contained parallel non-Roman script fields. Author and title searches often failed because of the users’ tendency to enter abbreviated queries. For example, personal name searches where the user truncated the author’s first or middle name to an initial while the bibliographic records only contained this name in its full form were extremely likely to fail. Abbreviations were also used in searches for journals, conference proceedings, and occasionally even for book titles (e.g., “AI: a modern approach” instead of “Artificial intelligence: a modern approach”). Such queries were successful only if the abbreviation used by the searcher was included in the bibliographic records as a variant title. A somewhat related problem occurred when the title of a resource contained a numeral in its spelled out form but was entered as a digit by the user. Because these title variations are not always recorded as additional access points in the bibliographic records, the desired item either did not appear in the result set or was buried too deep to be discovered. Topical searches within the subject index were also prone to failure, mostly because patrons were unaware that such searches require the use of precise terms from controlled vocabularies and resorted to natural language searching instead. User Feedback Our analysis of user feedback revealed substantial differences in how various user groups approach the search system and which areas of it they find problematic. Students were often frustrated by the absence of spelling suggestions, which, as one user put it, “left the users wander [to?] in the dark” as to the cause of searching failure. This user group also found certain social features desirable: for example, one user suggested that having ratings for books would be helpful in his choice of a good programming book. By contrast, faculty and researchers were more concerned about the lack of the more advanced features, such as cross-reference searching and left-anchored browsing of the title, subject, and author indexes. However, there were several areas that both groups found problematic: students and faculty alike saw the system’s inability to assist in the selection of the correct form of the author’s name as a major barrier to effective author searching and also converged on the need for more granular access to formats of audiovisual materials. DISCUSSION Scope of the Discovery System The results of our analysis point to users’ lack of understanding of what is covered by the discovery layer. Users are often unaware of the existence of separate specialized search interfaces for different categories of materials and assume that the library discovery layer offers Google-like INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 17 searching across the entire range of library resource types. Moreover, they are confused by the multiple search modalities offered by the discovery layer: one of the common misconceptions in SearchWorks is that the advanced search will allow the user to access additional content rather than offer a different way of searching the same catalog data. In addition to the expanded scope of the discovery tools, there is also a growing expectation of greater depth of coverage. According to our data, searching in a discovery layer occurs at several levels: the entire resource (book, journal title, music recording), its smaller integral units (book chapters, journal articles, individual musical compositions, etc.), and full text. User Search Strategies The search strategies employed by SearchWorks users are heavily influenced by their experiences with web search engines. Users tend to engage in brief search sessions and use short queries, which is consistent with the general patterns of web searching. They rely on relevance ranking and are often reluctant to examine search results in any depth: if the desired item does not appear within the first few hits, users tend to rework their initial search statement (often with only a minimal change to the search terms) rather than scrolling down to the bottom of the results screen or looking beyond the first page of results. Given these search patterns, it is crucial to fine-tune relevance-ranking algorithms to the extent that the most relevant results are displayed not just on the first page but are included in the first few hits. While this is typically the case for unique and specific queries, more general searches could benefit from a relevance-ranking algorithm that would leverage the popularity of a resource as measured by its circulation statistics. Adding this dimension to relevance determination would help users make sense of large result sets generated by broad topical queries (e.g., “quantum mechanics,” “linear algebra,” “microeconomics”) by ranking more popular or introductory materials higher than more specialized ones. It could also provide some guidance to the user trying to choose between different editions of the same resource and improve the quality of results of author searches by ranking works created by the author before critical and biographical materials. Users’ query formulation strategies are also modeled by Google, where making search terms as specific as possible is often the only way to increase the precision of a search. Faceted search systems, however, require a different approach: the user is expected to conduct a broad search and subsequently focus it by superimposing facets on the results. Qualifying the search upfront through keywords rather than facets is not only ineffective, but may actually lead to failure. For example, a common search pattern is to add the format of a resource as a search term (e.g., “Fortune magazine,” “Science journal,” “GRE e-book,” “Nicole Lopez dissertation,” “Woody Allen movies”), and because the format information is coded rather than spelled out in the bibliographic records, such queries either result in zero hits or produce irrelevant results. In a similar vein, making the query overly restrictive by including the year of publication, publisher, or edition LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 18 information often causes empty retrievals because the library might not have the edition specified by the user or because the query does not match the data in the bibliographic record. Thus our study lends further weight to claims that even in today’s reality of sophisticated discovery environments and unmediated searching, library users can still benefit from learning the best search techniques that are specifically tailored to faceted interfaces.20 Error Tolerance Input errors remain one of the major sources of failure in library discovery layers. Users have become increasingly reliant on error recovery features that they find elsewhere on the web, such as “Did you mean . . . ” suggestions, automatic spelling corrections, and helpful suggestions on how to proceed in situations where the initial search resulted in no hits. But perhaps even more crucial are error-prevention mechanisms, such as query autocomplete, which helps users avoid spelling and typographical errors and provides interactive search assistance and instant feedback during the query formulation process. Our visual analysis of the logs from the most recent years revealed an interesting search pattern, where the user enters only the beginning of the search query and then increments it by one or two letters: pr pro proq proque proques proquest Such search patterns indicate that users expect the system to offer query expansion options and show the extent to which the query autocomplete feature (currently missing from SearchWorks) has become an organic part of the users’ search process. Topical Searching While next-generation discovery systems represent a significant step toward enabling more sophisticated topical discovery, a number of challenges still remain. Apart from mechanical errors, such as misspellings and wrong search index selections, the majority of zero-hit topical searches were caused by a mismatch between the user’s query and the vocabulary in the system’s index. In many cases such queries were formulated too narrowly, reflecting the users’ underlying belief that the discovery layer offers full-text searching across all of the library’s resources. In addition to keyword searching, libraries have traditionally offered a more sophisticated and precise way of accessing subject information in the form of Library of Congress Subject Headings (LCSH). However, our results indicate that these tools remain largely underused: users took advantage of this feature in only 21 percent of all subject searches in our sample. We also found INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 19 that 95 percent of LCSH usage came from clicks on subject heading links within individual bibliographic records rather than from “Subject” facets, corroborating the results of earlier studies.21 There is a whole range of measures that could help patrons leverage the power of controlled vocabulary searching. They include raising the level of patron familiarity with the LCSHs, integrating cross-references for authorized subject terms, enabling more sophisticated facet- based access to subject information by allowing users to manipulate facets independently, and exposing hierarchical and associative relationships among LCSHs. Ideally, once the user has identified a helpful controlled vocabulary term, it should be possible to expand, refine, or change the focus of a search through broader, narrower, and related terms in the LCSH’s hierarchy as well as to discover various aspects of a topic through browse lists of topical subdivisions or via facets. Known-Item Searching Important as it is for the discovery layer to facilitate topical exploration, our data suggests that SearchWorks remains, first and foremost, a known-item lookup tool. While a typical SearchWorks user rarely has problems with known-work searches, our analysis of clusters of closely related searches has revealed several situations where users’ known-item search experience could be improved. For example, when the desired resource is not in the library’s collection, the user is rarely left with empty result sets because of automatic word-stemming and cross-field searching. While this is a boon for exploratory searching, it becomes a problem when the user needs to ensure that the item sought is not included in the library’s collection. Another common scenario arises when the query is too generic, imprecise, or simply erroneous, or when the search string entered by the user does not match the metadata in the bibliographic record, causing the most relevant resources to be pushed too far down the results list to be discoverable. Providing helpful “Did you mean . . . ” suggestions could potentially help the user distinguish between these two scenarios. Another feature that would substantially benefit the user struggling with the problem of noisy retrievals is highlighting the user’s search terms in retrieved records. Displaying search matches could alleviate some of the concerns over lack of transparency as to why seemingly irrelevant results are retrieved, repeatedly expressed in user feedback, as well as expedite the process of relevance assessment. Author Searching Author searching remains problematic because of a convergence of factors: a. Misspellings. According to our data, typographical errors and misspellings are by far the most common problem in author searching. When such errors occur in personal names, they are much more difficult to identify than errors in the title, and in the absence of LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 20 index-based spell-checking mechanisms, often require the use of external sources to be corrected. b. Mismatch between the form and fullness of the name entered by the user and the form of the name in the bibliographic record. For example, a user’s search for “D. Reynolds” will retrieve records where “D” and “Reynolds” appear anywhere in the record (or anywhere in the author fields, if the user opts for a more focused “author” search), but will not bring up records where the author’s name is recorded as “Reynolds, David.” c. Lack of cross-reference searching of the LC Name Authority file. If the user searches for a variant name represented by a cross-reference on an authority record, she might not be directed to the authorized form of the name. d. Lack of name disambiguation, which is especially problematic when the search is for a common name. While the process of name authority control ensures the uniqueness of name headings, it does not necessarily provide information that would help users distinguish between authors. For instance, the user often has to know the author’s middle name or date of birth to choose the correct entry, as exemplified by the following choices in the “Author” facet resulting from the query “David Kelly”: Kelly, David Kelly, David (David D.) Kelly, David (David Francis) Kelly, David F. Kelly, David H. Kelly, David Patrick Kelly, David St. Leger Kelly, David T. Kelly, David, 1929 July 11– Kelly, David, 1929– Kelly, David, 1929–2012 Kelly, David, 1938– Kelly, David, 1948– Kelly, David, 1950– Kelly, David, 1959– e. Errors and inaccuracies in the bibliographic records. Given the past practice of creating undifferentiated personal-name authority records, it is not uncommon to have one name heading for different authors or contributors. Conversely, situations where a single person is identified by multiple headings (largely because some records still contain obsolete or variant forms of a personal name) are also prevalent and may INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 21 become a significant barrier to effective retrieval as they create multiple facet values for the same author or contributor. f. Inability to perform an exhaustive search on the author’s name. A fielded “Author” search will miss the records where the name does not appear in the “Author” fields but appears elsewhere in the bibliographic record. g. Relevance ranking. Because search terms occurring in the title have more weight than search terms in the “Author” fields, works about an author are ranked higher than works of the author. Browsing Like many other next-generation discovery systems, SearchWorks features faceted navigation, which facilitates both general-purpose browsing and more targeted search. In SearchWorks, facets are displayed from the outset, providing a high-level overview of the collection and jumping-off points for further exploration. Rather than having to guess the entry vocabulary, the searcher may just choose from the available facets and explore the entire collection along a specific dimension. However, findings from our manual analysis of the query stream suggest that facets as a browsing tool might not be used to their fullest potential: users often resort to keyword searching when faceted browsing would have been a more optimal strategy. There are at least two factors that contribute to this trend. The first is users’ lack of awareness of this interface feature: it is common for SearchWorks users to issue queries such as “dissertations,” “theses,” and “newspapers” instead of selecting the appropriate value of the “Format” facet. Second, many of the facets that could be useful in the discovery process are not available as top-level browsing categories. For example, users expect more granular faceting of audiovisual resources, which would include the ability to browse by content type (“computer games,” “video games”) and genre (“feature films,” “documentaries,” “TV series,” “romantic comedies”). Another category of resources commonly accessed by browsing is theses and dissertations. Users frequently try to browse dissertations by field or discipline (issuing searches such as “linguistics thesis,” “dissertations aeronautics,” “PhD thesis economics,” “biophysics thesis”), by program or department and by the level of study (undergraduate, master’s, doctoral), and could benefit from a set of facets dedicated to these categories. Browsing for books could be enhanced by additional faceting related to intellectual content, such as genre and literary form (e.g., “fantasy,” “graphic novels,” “autobiography,” “poetry”) and audience (e.g., “children’s books”). Users also want to be able to browse for specific subsets of materials on the basis of their location (e.g., permanent reserves at the engineering library). Browsing for new acquisitions with the option of limiting to a specific topic is also a highly desirable feature. LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 22 While some browsing categories are common across all types of resources, others only apply to specific types of materials (e.g., music, cartographic/geospatial materials, audiovisual resources, etc.). For example, there is a strong demand among music searchers for systematic browsing by specific musical instruments and their combinations. Ideally, the system should offer both an optimal set of initial browse options and intuitive context-specific ways to progressively limit or expand the search. Offering such browsing tools may require improvements in system design as well as significant data remediation and enhancement because much of the metadata that could be used to create these browsing categories is often scattered across multiple fixed and variable fields in the bibliographic records, inconsistently recorded, or not present at all. One of the hallmarks of modern discovery tools has been their increased focus on developing tools that would facilitate serendipitous browsing. SearchWorks was one of the pioneers to offer virtual “browse shelf” feature, which is aimed at emulating browsing the shelves in a physical library. However, because this functionality relies on the classification number, it does not allow browsing of many other important groups of materials, such as multimedia resources, rare books, or archival resources. Call-number proximity is only one of the many dimensions that could be leveraged to create more opportunities for serendipitous discoveries. Other methods of associating related content might include recommendations based on subject similarity, authorship, keyword associations, forward and backward citations, and use. Implications for Practice Addressing the issues that we identified would involve improvements in several areas: • Scope. Our findings indicate that library users increasingly perceive the discovery interface as a portal to all of the library’s resources. Meeting this need goes far beyond offering the ability to search multiple content sources from a single search box: it is just as important to help users make sense of the results of their search and to provide easy and convenient ways to access the resources that they have discovered. And whatever the scope of the library discovery layer is, it needs to be communicated to the user with maximum clarity. • Functionality. Users expect a robust and fault-tolerant search system with a rich suite of search-assistance features, such as index-based alternative spelling suggestions, result screens displaying keywords in context, and query auto-completion mechanisms. These features, many of which have become deeply embedded into user search processes elsewhere on the web, could prevent or alleviate a substantial number of issues related to problematic user queries (misspellings, typographical errors, imprecise queries, etc.), enable more efficient recovery from errors by guiding users to improved results, and facilitate discovery of foreign-language materials. Equally important is the continued focus on relevance ranking algorithms, which ideally should move beyond simple keyword- INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 23 matching techniques toward incorporating social data as well as leveraging the semantics of the query itself and offering more intelligent and possibly more personalized results depending on the context of the search. • Metadata. The quality of the user experience in the discovery environments depends as much on the metadata as it does on the functionality of the discovery layer. Thus it remains extremely important to ensure consistency, granularity, and uniformity of metadata, especially as libraries are increasingly faced with the problem of integrating heterogeneous pools of metadata into a single discovery tool. CONCLUSIONS AND FUTURE DIRECTIONS The analysis of the transaction log data and user feedback has helped us identify several common patterns of search failure, which in turn can reveal important assumptions and expectations that users bring to the library discovery. These expectations pertain primarily to the system’s functionality: in addition to simple, intuitive, and visually appealing interfaces and relevance- ranked results, users expect a sophisticated search system that would consistently produce relevant results even for incomplete, inaccurate, or erroneous queries. Users also expect a more centralized, comprehensive, and inclusive search environment that would enable more in-depth discovery by offering article-level, chapter-level, and full-text searching. Finally, the results of this study have underscored the continued need for a more flexible and adaptive system that would be easy to use for novices while offering advanced functionality and more control over the search process for the “power” users, a system that would provide targeted support for the different types of information behavior (known-item look-up, author searching, topical exploration, browsing) and would facilitate both general inquiry and very specialized searches (e.g., searches for music, cartographic and geospatial materials, digital collections of images, etc.). Just like discovery itself, building discovery tools is a dynamic, complex, iterative process that requires intimate knowledge of ever-changing and evolving user needs and expectations. It is hoped that ongoing focus on user problems and frustrations in the new discovery environments can complement other assessment methods by identifying unmet user needs, thus helping create a more holistic and nuanced picture of users’ search and discovery behaviors. REFERENCES 1. Marshall Breeding, “Library Resource Discovery Products: Context, Library Perspectives, and Vendor Positions,” Library Technology Reports 50, no. 1 (2014): 5–58. 2. Craig Silverstein et al., “Analysis of a Very Large Web Search Engine Query Log,” SIGIR Forum 33, no. 1 (1999): 6–12; Bernard J. Jansen, Amanda Spink, and Tefko Saracevic, “Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web,” Information LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 24 Processing & Management 36, no. 2 (2000): 207–27, http://dx.doi.org/10.1016/S0306- 4573(99)00056-4; Amanda Spink, Bernard J. Jansen, and H. Cenk Ozmultu, “Use of Query Reformulation and Relevance Feedback by Excite Users,” Internet Research 10, no. 4 (2000): 317–28; Amanda Spink et al., “Searching the Web: The Public and Their Queries,” Journal of the American Society for Information Science & Technology 52, no. 3 (2001): 226–34; Bernard J. Jansen and Amanda Spink, “An Analysis of Web Searching by European AllteWeb.com Users,” Information Processing & Management 41, no. 2 (2005): 361–81, http://dx.doi.org/10.1016/S0306-4573(03)00067-0. 3. Cory Lown and Bradley Hemminger, “Extracting User Interaction Information from the Transaction Logs of a Faceted Navigation OPAC,” code4lib 7, June 26, 2009, http://journal.code4lib.org/articles/1633; Eng Pwey Lau and Dion Ho-Lian Goh, “In Search of Query Patterns: A Case Study of a University OPAC,” Information Processing & Management 42, no. 5 (2006): 1316–29, http://dx.doi.org/10.1016/j.ipm.2006.02.003; Heather Moulaison, “OPAC Queries at a Medium-Sized Academic Library: A Transaction Log Analysis,” Library Resources & Technical Services 52, no. 4 (2008): 230–37. 4. William H. Mischo et al., “User Search Activities within an Academic Library Gateway: Implications for Web-Scale Discovery Systems,” in Planning and Implementing Resource Discovery Tools in Academic Libraries, edited by Mary Pagliero Popp and Diane Dallis, 153–73 (Hershey, : Information Science Reference, 2012); Xi Niu, Tao Zhang, and Hsin-liang Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library,” International Journal of Human-Computer Interaction 30, no. 5 (2014): 422–33, http://dx.doi.org/10.1080/10447318.2013.873281. 5. Eng Pwey Lau and Dion Ho-Lian Goh, “In Search of Query Patterns”; Niu, Zhang, and Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library.”. 6. Lown and Hemminger, “Extracting User Interaction; Kristin Antelman, Emily Lynema, and Andrew K. Pace, “Toward a Twenty-First Century Library Catalog,” Information Technology & Libraries 25, no. 3 (2006): 128–39; Niu, Zhang, and Chen, “Study of User Search Activities with Two Discovery Tools at an Academic Library.” 7. Xi Niu and Bradley Hemminger, “Analyzing the Interaction Patterns in a Faceted Search Interface,” Journal of the Association for Information Science & Technology 66, no. 5 (2015): 1030–47, http://dx.doi.org/10.1002/asi.23227. 8. Steven D. Zink, “Monitoring User Search Success through Transaction Log Analysis: The WolfPAC Example,” Reference Services Review 19, no. 1 (1991): 49–56; Deborah D. Blecic et al., “Using Transaction Log Analysis to Improve OPAC Retrieval Results,” College & Research Libraries 59, no. 1 (1998): 39–50; Holly Yu and Margo Young, “The Impact of Web Search http://dx.doi.org/10.1016/S0306-4573(99)00056-4 http://dx.doi.org/10.1016/S0306-4573(99)00056-4 http://dx.doi.org/10.1016/S0306-4573(03)00067-0 http://journal.code4lib.org/articles/1633 http://dx.doi.org/10.1016/j.ipm.2006.02.003 http://dx.doi.org/10.1080/10447318.2013.873281 http://dx.doi.org/10.1080/10447318.2013.873281 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 25 Engines on Subject Searching in OPAC,” Information Technology & Libraries 23, no. 4 (2004): 168–80; Moulaison, “OPAC Queries at a Medium-Sized Academic Library.” 9. Thomas Peters, “When Smart People Fail,” Journal of Academic Librarianship 15, no. 5 (1989): 267–73; Zink, “Monitoring User Search Success through Transaction Log Analysis”; Rhonda H. Hunter, “Successes and Failures of Patrons Searching the Online Catalog at a Large Academic Library: A Transaction Log Analysis,” Reference Quarterly (Spring 1991): 395–402. 10. Karen Antell and Jie Huang, “Subject Searching Success: Transaction Logs, Patron Perceptions, and Implications for Library Instruction,” Reference & User Services Quarterly 48, no. 1 (2008): 68–76; Hunter, “Successes and Failures of Patrons Searching the Online Catalog at a Large Academic Library”; Peters, “When Smart People Fail.” 11. Peters, “When Smart People Fail.”; Moulaison, “OPAC Queries at a Medium-Sized Academic Library”; Blecic et al., “Using Transaction Log Analysis to Improve OPAC Retrieval Results.” 12. Lynn Silipigni Connaway, Debra Wilcox Johnson, and Susan E. Searing, “Online Catalogs from the Users’ Perspective: The Use of Focus Group Interviews,” College & Research Libraries 58, no. 5 (1997): 403–20, http://dx.doi.org/10.5860/crl.58.5.403. 13. Karl V. Fast and D. Grant Campbell, “‘I Still Like Google’: University Student Perceptions of Searching OPACs and the Web,” ASIST Proceedings 41 (2004): 138–46; Eric Novotny, “I Don’t Think I Click: A Protocol Analysis Study of Use of a Library Online Catalog in the Internet Age,” College & Research Libraries 65, no. 6 (2004): 525–37, http://dx.doi.org/10.5860/crl.65.6.525. 14. Xi Niu et al., “National Study of Information Seeking Behavior of Academic Researchers in the United States,” Journal of the American Society for Information Science & Technology 61, no. 5 (2010): 869–90, http://dx.doi.org/10.1002/asi.21307; Lynn Sillipigni Connaway, Timothy J. Dikey, and Marie L. Radford, “If It Is Too Inconvenient I’m Not Going after It: Convenience as a Critical Factor in Information-Seeking Behaviors,” Library & Information Science Research 33, no. 3 (2011): 179–90; Karen Calhoun, Joanne Cantrell, Peggy Gallagher and Janet Hawk, Online Catalogs: What Users and Librarians Want: An OCLC Report (Dublin, OH: OCLC Online Computer Library Center, 2009). 15. F. William Chickering and Sharon Q. Young, “Evaluation and Comparison of Discovery Tools: An Update,” Information Technology & Libraries 33, no.2 (2014): 5–30, http://dx.doi.org/10.6017/ital.v33i2.3471. 16. William Denton and Sarah J. Coysh, “Usability Testing of VuFind at an Academic Library,” Library Hi Tech 29, no. 2 (2011): 301–19, http://dx.doi.org/10.1108/07378831111138189; Jennifer Emanuel, “Usability of the VuFind Next-Generation Online Catalog,” Information Technology & Libraries 30, no. 1 (2011): 44–52; Erin Dorris Cassidy et al., “Student Searching http://dx.doi.org/10.5860/crl.58.5.403 http://dx.doi.org/10.5860/crl.65.6.525 http://dx.doi.org/10.1002/asi.21307 http://dx.doi.org/10.6017/ital.v33i2.3471 http://dx.doi.org/10.1108/07378831111138189 LIBRARY DISCOVERY PRODUCTS: DISCOVERING USER EXPECTATIONS THROUGH FAILURE ANALYSIS |IRINA TRAPIDO |doi:10.6017/ital.v35i2.9190 26 with EBSCO Discovery: A Usability Study,” Journal of Electronic Resources Librarianship 26, no. 1 (2014): 17–35, http://dx.doi.org/10.1080/1941126X.2014.877331. 17. Sarah C. Williams and Anita K. Foster, “Promise Fulfilled? An EBSCO Discovery Service Usability Study,” Journal of Web Librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; Rice Majors, “Comparative User Experiences of Next-Generation Catalogue Interfaces,” Library Trends 61, no. 1 (2012): 186– 207; Andrew D. Asher, Lynda M. Duke, and Suzanne Wilson, “Paths of Discovery: Comparing the Search Effectiveness of EBSCO Discovery Service, Summon, Google Scholar, and Conventional Library Resources,” College & Research Libraries 74, no. 5 (2013): 464–88. 18. Jody Condit Fagan et al., “Usability Test Results for a Discovery Tool in an Academic Library,” Information Technology & Libraries 31, no. 1 (2012): 83–112; Megan Johnson, “Usability Test Results for Encore in an Academic Library,” Information Technology & Libraries 32, no. 3 (2013): 59–85. 19. Elizabeth (Bess) Sadler, “Project Blacklight: A Next Generation Library Catalog at a First Generation University,” Library Hi Tech 27, no. 1 (2009): 57–67, http://dx.doi.org/10.1108/07378830910942919; Bess Sadler, “Stanford's SearchWorks: Unified Discovery for Collections?” in More Library Mashups: Exploring New Ways to Deliver Library Data, edited by Nicole C. Engard, 247–260 (London: Facet, 2015). 20. Andrew D. Asher, Lynda M. Duke and Suzanne Wilson, “Paths of Discovery: Comparing the Search Effectiveness of EBSCO Discovery Service, Summon, Google Scholar, and Conventional Library Resources,” College & Research Libraries 74, no. 5 (2013): 464–88; Kelly Meadow and James Meadow, “Search Query Quality and Web-Scale Discovery: A Qualitative and Quantitative Analysis,” College & Undergraduate Libraries 19, no. 2–4 (2012): 163–75, http://dx.doi.org/10.1080/10691316.2012.693434. 21. Sarah C. Williams and Anita K. Foster, “Promise Fulfilled? An EBSCO Discovery Service Usability Study,” Journal of Web Librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; Kathleen Bauer and Alice Peterson-Hart, “Does Faceted Display in a Library Catalog Increase Use of Subject Headings?” Library Hi Tech 30, no. 2 (2012), 347–58, http://dx.doi.org/10.1108/07378831211240003. http://dx.doi.org/10.1080/1941126X.2014.877331 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378830910942919 http://dx.doi.org/10.1080/10691316.2012.693434 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378831211240003 ABSTRACT INTRODUCTION REFERENCES 9255 ---- Critical Success Factors for Integrated Library System Implementation in Academic Libraries: A Qualitative Study Shea-Tinn Yeh and Zhiping Walter INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 27 ABSTRACT Integrated library systems (ILSs) support the entire business operations of an academic library from acquiring and processing library resources to making them available to user communities and preserving them for future use. As libraries’ needs evolve, there is a pressing demand for libraries to migrate from one generation of ILS to the next. This complex migration process often requires significant financial and personnel investment, but its success is by no means guaranteed. We draw on enterprise resource planning and critical success factors (CSFs) literature to identify the most salient CSFs for ILS migration success through a qualitative study with four cases. We found that careful selection process, top management involvement, vendor support, project team competence, staff user involvement, interdepartmental communication, data analysis and conversion, project management and project tracking, staff user education and training, and managing staff user emotions are the most salient CSFs that determine the success of a migration project. INTRODUCTION The first generation of integrated library systems (ILSs) were developed specifically for library operations focused on the selection, acquisition, cataloging, and circulation of print collections. As libraries’ nonprint materials steadily grow, the print-centric ILSs became less and less efficient in supporting libraries’ daily operations. Recent years have seen an emergence of a new generation of ILSs, commonly called Library Services Platforms (LSPs), that takes into account the management of both print and electronic collections. LSPs take advantage of cloud computing and network advancements to provide economies of scale and to allow a library to better share data with other libraries. Furthermore, LSPs unify the entire suite of library operations to provide efficient workflow at the back end and advanced online discovery tools at the front end for the library.1 Given the claimed benefits of the emerging LSP and the fact that vendors are phasing out support for their legacy ILSs, we project that more libraries will be migrating to LSPs as the systems mature and libraries’ needs evolve. Shea-Tinn Yeh (sheila.yeh@du.edu) is Assistant Professor and Library Digital Infrastructure and Technology Coordinator, University of Denver Libraries. Zhiping Walter (zhiping.walter@ucdenver.edu) is Associate Professor, Business School, University of Colorado Denver. mailto:sheila.yeh@du.edu mailto:zhiping.walter@ucdenver.edu) CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 28 Migrating from one generation of ILS to another is a significant initiative that affects the entire library operation.2 Because of its scale and complexity, the migration project is not always smooth and often fraught with problems, with some projects falling behind migration completion schedule.3, 4, 5 In addition, committing to a new system often results in significant financial and personnel costs for an academic library.6 Understandably, there is considerable trepidation before, during, and after the migration process.6, 7 What contributes to a smooth migration process and a successful migration project? This is an urgent question at present and an enduring question for the future. This is because, as libraries continue to evolve, their operations and management needs are destined to outgrow functionalities of the current generation of ILS. Therefore migration to a new generation of ILS is destined to occur periodically for a library. In this research, we study critical success factors (CSFs) that contribute to a smooth migration process and a successful migration project defined as on-time and on-budget project completion and a smooth implementation process. To achieve our research goal, we anchor our theoretical foundation in the enterprise resource planning (ERP) system-implementation literature. ERP is “business process management software that allows an organization to use a system of integrated applications to manage the business and automate many back office functions related to technology, services and human resources.”9 Since a complete ILS is formed from a suite of integrated functions to manage a broad range of library processes, it is in fact an ERP for libraries.10 A literature review of CSFs for ERP system implementation success revealed more than ninety CSF factors.11, 12 The contribution of our research is in identifying, through qualitative research method, the most salient CSFs that contribute to the success of a library system migration project from one generation of ILS to another. Results of this study can help library administrators to improve the chance of success and decrease the level of anxiety during a migration project now and in the future. The remainder of the article is organized as follows: Section 2 reviews ERP, ILS, LSP, CSFs, and information system success measurement described in the literature. Section 3 describes the guided interviews that have been conducted to identify the CSFs, the results, and the analysis of the results. Finally, we offer conclusions and limitations as well as recommend future work. LITERATURE REVIEW ERP is business-management software comprising a suite of integrated applications that an organization can use to collect, store, manage, and interpret data from many business activities, including product planning, manufacturing, service delivery, marketing and sales, and human resources. The core idea of an ERP system is to integrate both the data and the process dimensions in a business so that transactions can be monitored and analyzed for planning and strategic purposes.13 Modules of the system cover different functions within a company and are linked so users can see what is happening in all areas of the company. An ERP system can improve a business’s back offices as well as its front-end functions, with both operational and strategic benefits.14 Some of the benefits include reliability in information access, data and operations INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 29 redundancy, data retrieval and reporting efficiency, easy module extension, and Internet commerce capability. Just like an ERP system for a business, a complete library management solution comprises a suite of integrated applications that manage a broad range of library processes including circulation, acquisition, cataloging, electronic resources management, and system administration. LSPs, the current generation of library management systems, are designed to manage both physical and digital collections. LSPs follow the service-oriented architecture (SOA) and can be deployed through multitenant Software as a Service (SaaS) distribution model.15 In addition to supporting all library functions, LSPs integrate with other university systems, such as student registry and finance, and provide front-end for library patrons in a cloud environment that leverages a global network of systems for discovery of a wide array of resources.16 Since an LSP is essentially an enterprise system for library functions, CSFs of ERP implementation success could guide LSP implementation. CSFs are conditions that must be met for an implementation to be successful.17 More than ninety CSFs have been identified for ERP implementation success.18, 19 Those CSFs have been classified according to various schemes, but we found the strategic versus tactical classification most relevant to the library context.20 Strategic factors address the big picture involving the breakdown of goals into do-able items. Tactical factors, on the other hand, are the methods to accomplish the doable items that lead to achieving the goals.21 By examining the entire list of CSFs from both the strategic and the tactical perspectives, we identify top CSFs for library-management-solution implementation and migration success, defined as on-time and on-budget delivery as well as smooth implementation process,22, 23 through a qualitative study. METHOD We conducted semi-structured interviews with open-ended questions to identify the most salient CSFs for implementation success. Since we needed to reduce more than ninety CSFs in the literature to a list of most salient CSFs in the library context and to potentially identify new CSFs, a qualitative-interview approach was more suitable than a quantitative-survey approach. A two- step process was used to arrive at the final list. First, we evaluated all CSFs in the literature and identified a subset of CSFs that might be most relevant for library-systems implementation.24 Second, this CSFs subset was used to develop an interview guide for semistructured interviews conducted later to further reduce this subset. Open-ended questions were also used during the interviews to elicit additional CSFs. An institutional review board (IRB) application was submitted and approved. The result of this two-step process is a list of ten CSFs discussed in the results section, with nine CSFs coming from our initial list and one CSF emerging from the interviews. The criterion for recruiting study libraries is that the library has implemented a new LSP within the last three years. This is because the LSP is the current generation of ILS, and it is only within the last few years that various LSP vendors began to promote and implement the LSPs. A CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 30 recruitment email was sent to libraries listed as adopters on various vendors’ press release sites. Participating recipients referred the interview request to appropriate migration team members whom we later contacted to schedule interviews. This resulted in up to five people from each participating library being interviewed in person or via Skype. Their positions are listed in table 1. Interviews were recorded, transcribed, and cleaned. Emails to the same interviewees were used for follow-up questions as needed. After interviews with each library, qualitative data analysis was performed to identify CSFs that emerged from the interviews. Interviews continued until no new CSFs emerged in the last interview. In total, staff from four libraries were interviewed between October 2014 and March 2015 about their implementation process and experience from staff user perspective. The design and implementation of discovery public interface experience was not part of this inquiry. Table 1 summarizes characteristics of the four libraries. Case numbers instead of university names are used to protect identities of participating libraries and interviewees. Case 1 Case 2 Case 3 Case 4 Type of university Private Public Public Private Student population 11,000+ 32,000+ 2,400+ 2,700+ Operating budget 11 million 13 million 1.5 million 1.3 million Library employees 150 400 17 13.5 Project length 6 months 9 months 6 months 9 months ILS used before Millennium Aleph Evergreen Voyager LSP implemented Sierra Alma Sierra Sierra Reasons for migration Discontinued vendor system support; servers out of warranty; vendor gave incentives Outdated servers; servers out of warranty In need of a robust system and provides discovery layer In need of a modern system demonstrating the library’s moving with the times Positions of interviewees Head of systems; module experts Heads of systems Director of library; head of systems Director of library Table 1. Summary of case study site characteristics. RESULTS The following CSFs emerged from interviews: careful selection process, top management involvement, vendor support, project team competence, staff user involvement, interdepartmental communication, data analysis and conversion, project management and project tracking, staff user education and training, managing staff user emotions. We discuss each CSF next. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 31 Careful Selection Process Most ILSs are commercial, off-the-shelf software systems that can vary dramatically in functionality from system to system.25 For example, some packages are more suitable for large institutions while others are more suitable for smaller ones. To mitigate risks in productivity or transaction loss and to minimize system and implementation costs, a library needs to determine the best “fitness-of-use” system. Such a determination is the outcome of a careful selection process. Although there is no commonly accepted technique, method, or tool for this process, all selection processes share common key steps suggested in the literature.26 They are the following as applied to library-systems selection: define stakeholder requirements, search for products, create a short list of most promising candidates based on a set of “must-have” requirements, evaluate the candidates on the short list, and analyze the evaluation data to make a selection. In addition, if the server option was chosen instead of the cloud option, selected hardware needs to satisfy system requirements for the final configuration. Careful selection process emerged as a CSF that affected implementation outcome for all four libraries. All cases were migrating to an LSP system. Some systems can be offered as locally installed systems, which require appropriate in-house and hardware capabilities. Case 1 did not consider its IT capability when deciding on a turnkey system. As a result, the library experienced difficulties in setting up the infrastructure in-house during the implementation. Each of the other three cases considered the candidate system’s compatibility with the legacy system, the match between library needs and system functionalities, system maturity, migration costs, data storage needs, and vendor support before and during the implementation as well as continued vendor support throughout the life of the new system. Even though each of the three libraries arrived at its system choice differently, on reflection, interviewees expressed relief and satisfaction in their decisions to choose their respective systems. “We were in the position where our servers were out of date and warranty, needed to be replaced. The servers were too small. We had sizing issues and we couldn’t update to the most recent version of Aleph . . . Alma being a cloud based solution will eliminate our need to be ‘in the server business.’” (Case 2). “We went through a very extensive formal process to select this system.” (Case 3) Top Management Involvement Successful implementation requires strong leadership by executives who understand, support, and champion the project.27 When this involvement is trickled down through organizational hierarchy, it leads to an organizational commitment, which is required for implementation success for complex projects.28, 29 Since library-system implementation is a complex project that (if done correctly) will transform the entire library and reposition it for better efficiency, strong leadership is critical as well. CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 32 In all four cases, top management were involved in the final decisions of their respective system choices. In cases 1 and 2, top management also took charge in securing funding for the migration projects. Interviewees stressed that top management support was very important in their respective project implementations. “The top level management took the recommendations from the systems librarians at the time, with the blessing of the council determined whether they want to proceed with the product Alma, and had funding conversations with the financial people.” (Case 2) “We have faculty library committee, faculty governance oversight. We showed them webinars of the products we considered before we signed them, so we have faculty representation on board. We held open forum and were inclusive in our invitations.” (Case 4) Vendor Support With a new technology, it is critical to acquire external technical expertise, often from the vendor, to facilitate successful implementation.30 Effective vendor support includes adequate and high- quality technical support during and after implementation, sufficient training provided for both the project team and staff users, and positive relationships between all parties in the project.31 Additionally, there should be adequate knowledge transfer between the vendor consultants and the clients, which can be achieved by defining roles, achieving shared understanding, and enhancing relationships through competent communication.32, 33 In the case of library-system implementations, vendor support is particularly important because of the complexity of each new generation of the system and the library personnel’s knowledge gap in understanding the nuts and bolts of the new system. Effective vendor support was identified in each case as a critical success factor determining the implementation outcome even though the form of vendor support varied from case to case. In case 1, the vendor sent different consultants with various expertise as project managers on the basis of the project phase. In case 2, the vendor sent one consultant who served as the main project manager. In case 3, the vendor provided a project manager and a team of technicians. In case 4, consultants were shared across multiple consortium libraries that were implementing the system at the same time. No matter how vendor support was provided, it was essential for implementation success as indicated by interviewees. “The vendor has been very supportive and provides a group of experts throughout the process, some are knowledgeable in server business while others are skilled project managers.” (Case 1) Project Team Competence Since library-system migration affects all functional areas of a library, members of the implementation team need to be cross-functional. Furthermore, members with both business INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 33 knowledge and technology knowhow are especially crucial for implementation success.34 Competence of vendor consultants assigned to the project also influences implementation success, as discussed earlier. Additionally, it is important to have an in-house project leader who champions the project and who has the essential skills and authority to set goals that legitimize change.35 Having a competent project team was essential for implementation success for each of our cases. In each case, the vendor provided the project manager and the library provided a co-manager who was a champion figure. Other team members came from various functional areas such as acquisition, circulation, cataloging, electronic resources management, and system administration. For example, in case 1, the technology librarian participated as a co-project manager. The project- management team comprised module experts within the library and from functional areas. In addition, the university’s technology services department lent technical support during early stages of implementation when servers need to be set up. The interviewees all stressed the importance of project-team competence. “Without the infrastructure knowledge from the university’s technology team and their time and full support to negotiate with the vendor, the migration project would not have been possible.” (Case 1) “The university’s IT made sure that we are in compliance with campus policies and expectations for securities.” (Case 2) Staff User Involvement It is important that the project team involve staff users early on, otherwise the implementation process may be bumpy. When end users are involved in decisions relating to system selection and implementation, they are more invested in and concerned with the success of the system, which in turn leads to greater system use and user satisfaction.36, 37 As such, it is one of the most cited critical success factors in ERP implementation.38 Because personal relevance to the system is just as important for library-system implementation, effective staff user involvement with implementation is positively related to implementation success. Staff user involvement has emerged as a main success factor in all our cases and contributed to the implementation project outcome. In case 1, staff users were not consulted as to whether an LSP was necessary for the library, although they were informed of the reasons for implementation. Additionally, staff users were not involved when the project timetable was negotiated. This lack of early staff user involvement led to considerable stress down the road, which made the implementation process bumpy. The other three cases involved staff users early on; as a result, staff users experienced much less stress and frustration down the road. Specifically, in case 2, the staff users were educated about the need for migration through staff meetings, town hall meetings, supervisory meetings, council meetings, and forums. Many product-demo sessions were conducted for the staff so they would have the knowledge to participate before the final decision CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 34 was made. There were daily internal newsletters conveying implementation news throughout implementation months. In case 3, the entire library was involved with the selection of a new system. While the key staff (such as circulation manager, acquisition manager, and reference manager) had more input than others, everyone offered input about the project. As such, the buy- in with the new system was strong from all stakeholders. In case 4, staff users were involved early on through open forums and webinars. The following quotes are examples of interviewee sentiment concerning staff user involvement: “Everybody is involved in choosing the system; partially because Evergreen had been so problematic. We wanted to make sure that everyone is on board.” (Case 3) “Migration is the most time consuming aspect of the library staff work during the time of the project, without their buy-ins, it is difficult to have a successful project.” (Case 4) Interdepartmental Communication The importance of effective communications across functional and departmental boundaries is well known in information-systems-implementation literature.39 With consultants coming from the vendor, project team members coming from different functional areas, and staff users with different perceptions and understandings of the implementation project, the importance of effective communications between all involved cannot be overstated. Communications should start early, be consistent and continuous throughout various stages of the implementation process, and include a system overview, rationale for implementation, briefings for process changes, and contact-points establishment.40 Expectations and goals should be communicated to all stakeholders and to all levels of the organization.41 Effectiveness of interdepartmental communication affected the implementation outcome in all our cases. In case 1, the library’s project manager was designated to communicate with the vendor when issues arose, such as hardware and software configurations, system backup and use, and task assignments. The formal project plan was established using the web-based Basecamp so that team members in different roles with different responsibilities could communicate and work together online. Regular meetings were held and emails were exchanged between project team members. However, there is a lack of effective interdepartmental communication with staff who were not on the project team. This resulted in the absence of necessary system testing that would have detected some data-integrity issues. Such issues later caused the system to be offline for days, which brought much frustration and stress to everyone. In the other three cases, all actors were well informed through news releases, meetings, presentations, and webinars. Concerns were communicated to the project team and addressed timely. As a result, the level of frustration was very low for those three cases. Data Analysis and Conversion A fundamental requirement for the effectiveness of an ERP system is the accuracy of its data,42 and the same is true for a library system. Data types in a legacy ILS are often of an outdated format and INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 35 can differ from formats supported by a new library system. Conversion from one format to another can be an overwhelming process, especially when there is no existing expertise in the library. Since migrating legacy data to the new system is essential, effective data analysis for conversion is a critical success factor for implementation success. The smoothness of each of the four implementation cases was related to the project team’s data analysis and conversion efforts. In case 1, the library did not spend any effort to analyze, convert, or clean the data. As a result, the system experienced data-integrity issues after it went live. The other three libraries either devoted time to clean and convert the data or had a third party do the data cleaning. As a result, no system issues arose from data-integrity problems. Interviewees from case 2 told us, “We elected to freeze the data 30 days sooner in terms of bibliographic data, so that we can do an authority control project with a third party vendor.” Project Management and Project Tracking According to ERP implementation literature, effective project-management practices are critical for implementation success. Such practices include defining clear objectives, establishing a formal implementation plan, designing a realistic work plan, and establishing resource requirements.43 The formal implementation plan needs to identify modules to be implemented, tasks to be undertaken, and all technical and nontechnical issues to be considered.44 Project progress must be carefully monitored through meetings and reports.45, 46 Effective project management and tracking has affected implementation outcome in all our cases. A popular project management and tracking software is Basecamp, a web-based project management and collaboration tool initially released in 2004.47 It offers discussion boards, to-do lists, file sharing, milestone management, event tracking, and messaging system that help project teams stay organized and connected despite their different locations. All cases used Basecamp for project management and tracking, which contributed to on-time and on-budget project completion for all cases. Staff User Education and Training A new system often frustrates users who do not receive adequate training in its functionalities and use.48 When feeling frustrated and stressed, users may avoid using the system. Proper and adequate training will sooth users and eliminate their reluctance to use the new system, which in turn helps realize productivity gains.49, 50 Training processes should consider factors such as training curriculum, user commitment, trainers’ personnel skills and competence, as well as training schedule, budget, evaluation, and methods.51 Effective staff user training has emerged as a critical success factor from all our cases. In case 1, staff users had access to a vendor-supplied preview portal, which simulated system functionalities. Staff users were so familiar with the new system by the time the system went live that they were eager to engage with it. In cases 2, 3 and 4, staff users were trained through demo products, online video trainings, Q&A, and on-site training sessions conducted by the vendor. CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 36 These training materials and sessions served to ease staff user’s feeling of uncertainty and anxiety, as the following quotes show: “The online training videos were provided to all staff in the library and followed up with Q&A sessions which members of the committee will host in their respective areas. . . . Then Ex Libris did a week long onsite training workshop serve for the final deep configuration issues. . . . We know that there are staff users who want to be ahead of the game, yet there are always people who don’t want to learn until the day before they go live.” (Case 2) “We have a training package with several onsite visits, each one is for a few days. The trainer focused on one aspect of the system. It was more than watching the videos online. Because of the small staff here, almost everyone attended at least one training.” (Case 3) “The trainers varied with their expertise, we developed fondness for some more than others. The training is functional in nature. The vendor’s priority was about trainer availability and to keep the project on time. We became familiar with trainers’ expertise; we were able to request the right trainer with the job.” (Case 4) Managing Staff User Emotions Although education and training eases user anxiety, it does not completely eliminate it. Emotions felt by users early in the implementation of a new system have important effects on the use of the system later on.52 How to manage staff user anxiety and negative emotions when they appear has emerged as a critical success factor in all our cases, as shown in the following quotes: “There were so many things going on in the library during the migration go-live week. The unknown of the migration success made staff users uncomfortable. Should the migration date be decided in consideration of other initiatives, the frustration experienced would have been a lot less and might not have been ignored during the going-live week.” (Case 1) “The frustration was just change; it was the fact that we have to learn something new. . . . Primarily the frustration was handled by the lead.” (Case 2) “There was a challenge, especially early on, in getting people to engage with the manuals and the literature in documentation. It is as if everyone is being asked to learn a new language. . . . The key relationship between the onsite coordinator and the project manager on the vendor side is important. When those two exchange information and handle frustration diplomatically, this bridge between the two organizations can smooth over a lot of rough feathers on either or both sides.” (Case 4) INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 37 This final CSF did not come directly from the ninety-plus CSFs that we started with, although it aligned closely with “Change Management” category.53 This CSF emerged mostly from the interview process. Summary of Results The results of the case studies for each critical factor are summarized in table 2. Implementation project outcome is summarized in table 3. An implementation is considered successful if it was completed on-time and on-budget and if the implementation process was smooth as reflected in the number and degree of unexpected problems along the way. Critical Success Factor Case 1 Case 2 Case 3 Case 4 Careful selection process No Yes Yes Yes Top management involvement Yes Yes Yes Yes Vendor support Yes Yes Yes Yes Project team competence Yes Yes Yes Yes Staff user involvement No Yes Yes Yes Interdepartmental communication No Yes Yes Yes Data analysis & conversion No Yes Yes Yes Project management and tracking Yes Yes Yes Yes Staff user education and training Yes Yes Yes Yes Managing staff user emotions No Yes Yes Yes Table 2. Summary of case study critical success factors findings Case 1 Case 2 Case 3 Case 4 On time implementation Yes Yes Yes Yes On budget implementation Yes Yes Yes Yes Smoothness of implementation No Staff users experienced data integrity issue, system downtime, as well as anxiety and stress with the system implementation process Yes Yes Yes Table 3. Summary of case study implementation success measures DISCUSSION AND CONCLUSIONS The implementation of a new ILS is a large-scale undertaking that affects every aspect of a library’s operations as well as every staff user’s workflow process. As such, it is imperative for CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 38 library administrators to understand what factors contribute to a successful implementation. Our qualitative study shows that there are two categories of CSFs: strategic and tactical. From the strategic perspective, top management involvement, vendor support, staff user involvement, interdepartmental communication, and staff user emotion management are critical. From the tactical perspective, project team competence, project management and project tracking, data analysis and conversion, and staff user education and training to break down the technical barrier greatly affect implementation outcome. In addition, selection of the final system from a variety of choices and options requires a careful consideration of both strategic and tactical issues. Each factor identified is important in its own right during the implementation process. Combined, they complement each other to guide an implementation to success. Among the list of CSFs identified, the role of staff user emotion management was not identified during the theoretical phase of the study; it only emerged as an important CSF during interviews. Top management involvement, vendor support, project team competence, project management and tracking, and staff user education and training are CSFs that were somewhat intuitive, and they were implemented by all cases. However, a library may select an end system without careful considerations. It may also be unaware of the importance of involving users early on, the importance of opening clear lines of interdepartmental communications, or the importance of performing data analysis and conversion before the implementation. Staff user emotion management, especially, is at the risk of being an afterthought of an implementation. By identifying the most salient CSFs, this study offers practical contributions to academic library leaders and administrators in understanding how critical success factors play a role in ensuring a smooth and successful ILS implementation. Although CSFs have been extensively studied in the discipline of information-systems management, this is the first study to apply CSFs in the library context. Since library management has its unique challenges compared to businesses, identifying CSFs for library-system-implementation success is important not only for the current migration to LSPs but also for future migrations to future generations of ILSs as the needs of libraries continue to evolve. As with any empirical research, there are limitations to this study. The number of academic libraries interviewed is small despite no new information being discovered after the fourth interview. The vendors represented in this study are only two of the many in the market providing LSPs to libraries. With these aforementioned limitations, the results of this study may not be generalizable to libraries implementing an LSP with vendors other than Innovative Interfaces and Ex Libris. Additionally, the results may not be generalizable to nonacademic libraries. This research can be extended to validate the proposed CSFs quantitatively by performing a survey research in academic libraries. Studying interactions between identified factors will offer an even greater contribution. This research can be experimented in other types of libraries to generalize inferences. In addition, case libraries 3 and 4 both expressed that LSP changes the public interface that is used by external users, and they wished to have more opportunities for outreach prior to the implementation. Although the design and implementation of the public interface was not considered within the scope of this research, this comment is insightful because INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 39 it may imply that future studies should consider a project champion to be a critical success factor. The project champion must have people-related skills and position to introduce changes in achieving buy-in from staff users.54, 55 REFERENCES 1. Richard M. Jost, Selecting and Implementing an Integrated Library System: The Most Important Decision You Will Ever Make (Boston: Chandos, 2015). 2. Ibid., 3. 3. Suzanne Julich, Donna Hirst and Brian Thompson, “A Case Study of ILS Migration: Aleph500 at the University of Iowa,” Library Hi Tech 21, no. 1 (2003): 44–55, http://dx.doi.org/10.1108/07378830310467391. 4. Zahiruddin Khurshid, “Migration from DOBIS LIBIS to Horizon at KFUPM,” Library Hi Tech 24, no. 3 (2006): 440–51, http://dx.doi.org/10.1108/07378830610692190. 5. Vandana Singh, “Experiences of Migrating to an Open-Source Integrated Library System,” Information Technology & Libraries 32, no. 1 (2013): 36–53. 6. Jost, “Selecting and Implementing an Integrated Library System.” 7. Yongming Wang and Trevor A. Dawes, “The Next Generation Integrated Library System: A Promise Fulfilled,” Information Technology & Libraries 31, no. 3 (2012): 76–84. 8. Keith Kelley, Carrie C. Leatherman, and Geraldine Rinna, “Is It Really Time to Replace Your ILS with a Next-Generation Option?” Computers in Libraries 33, no. 8 (2013): 11–15. 9. Vangie Beal, “ERP—Enterprise Resource Planning,” Webopedia, http://www.webopedia.com/TERM/E/ERP.html. 10. “Library Management System,” Tangient LLC, https://libtechrfp.wikispaces.com/Library+Management+System. 11. Christopher P. Holland and Ben Light, “A Critical Success Factors Model for ERP Implementation,” IEEE Software 16, no. 3 (1999): 30–36, http://dx.doi.org/10.1109/52.765784. 12. Levi Shaul and Doron Tauber, “Critical Success Factors in Enterprise Resource Planning Systems: Review of the Last Decade,” ACM Computing Surveys 45 no. 4 (2013): 1–39, http://dx.doi.org/10.1145/2501654.2501669. 13. Yahia Zare Mehrjerdi, “Enterprise Resource Planning: Risk and Benefit Analysis,” Business Strategy Series 11, no. 5 (2010): 308–24, http://dx.doi.org/10.1108/17515631011080722. 14. Mohammad A. Rashid, Liaquat Hossain, and Jon David Patrick, “The Evolution of ERP Systems: A Historical Perspective,” in Enterprise Resource Planning: Global Opportunities and Challenges (Hershey, PA: Idea Group, 2002). http://dx.doi.org/10.1108/07378830310467391 http://dx.doi.org/10.1108/07378830610692190 http://www.webopedia.com/TERM/E/ERP.html https://libtechrfp.wikispaces.com/Library+Management+System http://dx.doi.org/10.1109/52.765784 http://dx.doi.org/10.1145/2501654.2501669 http://dx.doi.org/10.1108/17515631011080722 CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 40 15. Marshall Breeding, “Library Systems Report 2014: Competition and Strategic Cooperation,” American Libraries 45, no. 5 (2014): 21–33. 16. Sharon Yang, “From Integrated Library Systems to Library Management Services: Time for Change?” Library Hi Tech News 30, no. 2 (2013): 1–8, http://dx.doi.org/10.1108/LHTN-02- 2013-0006. 17. Shahin Dezdar, “Strategic and Tactical Factors for Successful ERP Projects: Insights from an Asian Country,” Management Research Review 35, no. 11 (2012): 1070–87, http://dx.doi.org/10.1108/14637151111182693. 18. Ibid. 19. Shahin Dezdar and Ainin Sulaiman, “Successful Enterprise Resource Planning Implementation: Taxonomy of Critical Factors,” Industrial Management & Data Systems 109, no. 8 (2009): 1037– 52, http://dx.doi.org/10.1108/02635570910991283. 20. Sherry Finney and Martin Corbett, “ERP Implementation: A Compilation and Analysis of Critical Success Factors,” Business Process Management Journal 13, no. 3 (2007): 329–47, http://dx.doi.org/10.1108/14637150710752272. 21. F. Pearce, Business Building and Promotion: Strategic and Tactical Planning (Houston: Pearman Cooperation Alliance, 2004). 22. Jennifer Bresnahan, “Mixed Messages,” CIO (May 16, 1996), 72, http://dx.doi.org/10.1016/j.jchf.2013.07.005. 23. Majed Al-Mashari, Abdullah Al-Mudimigh, and Mohamed Zairi, “Enterprise Resource Planning: A Taxonomy of Critical Factors,” European Journal of Operational Research 146, no. 2 (2003): 352–64, http://dx.doi.org/10.1016/S0377-2217(02)00554-4. 24. Shaul and Tauber, “Critical Success Factors in Enterprise Resource Planning Systems.” 25. H. Akkermans and K. van Helden, “Vicious and Virtuous Cycles in ERP Implementation: A Case Study of Interrelations between Critical Success Factors,” European Journal of Information Systems 11, no. 1 (2002): 35–46, http://dx.doi.org/10.1057/palgrave.ejis.3000418. 26. Abdallah Mohamed, Guenther Ruhe, and Armin Eberlein, “COTS Selection: Past, Present, and Future” (paper presented at the 14th Annual IEEE International Conference and Workshops on the Engineering of Computer-based System, 2007), http://dx.doi.org/10.1109/ECBS.2007.28. 27. M. Michael Umble, Elisabeth J. Umble, and Ronald R. Haft, “Enterprise Resource Planning: Implementation Procedures and Critical Success Factors,” European Journal of Operational Research 146 no. 2 (2003): 241–57, http://dx.doi.org/10.1016/S0377-2217(02)00547-7. 28. Jim Johnson, “Chaos: the Dollar Drain of IT Project Failures,” Application Development Trends 2, no. 1 (1995): 41–47. http://dx.doi.org/10.1108/LHTN-02-2013-0006 http://dx.doi.org/10.1108/LHTN-02-2013-0006 http://dx.doi.org/10.1108/14637151111182693 http://dx.doi.org/10.1108/02635570910991283 http://dx.doi.org/10.1108/14637150710752272 http://dx.doi.org/10.1016/j.jchf.2013.07.005 http://dx.doi.org/10.1016/S0377-2217(02)00554-4 http://dx.doi.org/10.1057/palgrave.ejis.3000418 http://dx.doi.org/10.1109/ECBS.2007.28 http://dx.doi.org/10.1016/S0377-2217(02)00547-7 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2016 41 29. Prasad Bingi, Maneesh K. Sharma, and Jayanth K. Godla, “Critical Issues Affecting an ERP Implementation,” Information Systems Management 16, no. 3 (1999): 7–14, http://dx.doi.org/10.1201/1078/43197.16.3.19990601/313. 30. Mary Sumner, “Critical Success Factors in Enterprise Wide Information Management Systems Projects,” Proceedings of the 1999 ACM SIGCPR Conference on Computer Personnel Research, 1999 (New York: ACM, 1999), http://dx.doi.org/10.1145/299513.299722. 31. Eric T. G. Wang et al., “The Consistency among Facilitating Factors and ERP Implementation Success: A Holistic View of Fit,” Journal of Systems & Software 81 no. 9 (2008): 1609–21, http://dx.doi.org/10.1016/j.jss.2007.11.722. 32. Dong-Gil Ko, Laurie J. Kirsch, and William R. King, “Antecedents of Knowledge Transfer from Consultants to Clients in Enterprise System Implementations,” MIS Quarterly 29, no. 1 (2005): 59–85. 33. Al-Mashari, “Enterprise Resource Planning.” 34. Fiona Fui-Hoon Nah and Santiago Delgado, “Critical Success Factors for Enterprise Resource Planning Implementation and Upgrade,” Journal of Computer Information Systems 46 no. 5 (2006): 99–113. 35. Liang Zhang et al., “A Framework of ERP Systems Implementation Success in China: An Empirical Study,” International Journal of Production Economics 98, no. 1 (2005): 56–80, http://dx.doi.org/10.1016/j.ijpe.2004.09.004. 36. Ann-Marie K. Baronas and Meryl Reis Louis, “Restoring a Sense of Control During Implementation: How User Involvement Leads to System Acceptance,” MIS Quarterly 12, no. 1 (1988): 111–24. 37. Joseph Esteves, Joan Pastor and Joseph Casanovas, “A Goals/Questions/Metrics Plan for Monitoring User Involvement and Participation in ERP Implementation Projects,” IE working paper, March 11, 2004, http://dx.doi.org/10.2139/ssrn.1019991. 38. Khaled Al-Fawaz, Zahran Al-Salti, and Tillal Eldabi, “Critical Success Factors in ERP Implementation: A Review” (paper presented at the European and Mediterranean Conference on Information Systems, Dubai, May 25–26, 2008). 39. H. Akkermans and K. van Helden, “Vicious and Virtuous Cycles in ERP Implementation: A Case Study of Interrelations between Critical Success Factors,” European Journal of Information Systems 11, no. 1 (2002): 35–46, http://dx.doi.org/10.1057/palgrave.ejis.3000418. 40. Nancy Bancroft, Henning Seip, and Andrea Sprengel, Implementing SAP R/3: How to Introduce a Large System Into a Large Organisation (Greenwich, UK: Manning, 1998). 41. Nah, “Critical Success Factors.” http://dx.doi.org/10.1201/1078/43197.16.3.19990601/313 http://dx.doi.org/10.1016/j.jss.2007.11.722 http://dx.doi.org/10.1016/j.ijpe.2004.09.004 http://dx.doi.org/10.2139/ssrn.1019991 http://dx.doi.org/10.1057/palgrave.ejis.3000418 CRITICAL SUCCESS FACTORS FOR INTEGRATED LIBRARY SYSTEM IMPLEMENTATION IN ACADEMIC LIBRARIES: A QUALITATIVE STUDY | YEH AND WALTER |doi:10.6017/ital.v35i2.9255 42 42. Toni M. Somers and Klara Nelson, “The Impact of Critical Success Factors Across the Stages of Enterprise Resource Planning Implementations,” Proceedings of the 34th Hawaii International Conference on System Sciences, 2001, http://dx.doi.org/10.1109/HICSS.2001.927129. 43. Shi-Ming Huang et al., “Assessing Risk in ERP Projects: Identify and Prioritize the Factors,” Industrial Management & Data Systems 104, no. 8 (2004): 681–88, http://dx.doi.org/10.1108/02635570410561672. 44. Nah, “ERP Implementation.” 45. Umble, “Enterprise Resources Planning.” 46. Nah, “ERP Implementation.” 47. “Basecamp, in a Nutshell,” Basecamp, https://basecamp.com/about/press. 48. Nah, “ERP Implementation.” 49. Umble, “Enterprise Resources Planning.” 50. Mo Adam Mahmood et al., “Variables Affecting Information Technology End-user Satisfaction: A Meta-analysis of the Empirical Literature,” International Journal of Human-Computer Studies 52, no. 4 (2000): 751–71, http://dx.doi.org/10.1006/ijhc.1999.0353. 51. Iuliana Dorobat and Floarea Nastase, “Training Issues in ERP Implementations,” Accounting & Management Information Systems 11, no. 4 (2012): 621–36. 52. Anne Beaudry and Alain Pinsonneault, “The Other Side of Acceptance: Studying the Direct and Indirect Effects of Emotions on Information Technology Use,” MIS Quarterly 34, no. 4 (2010): 689–710. 53. Shaul and Tauber, “Critical Success Factors in Enterprise Resource Planning Systems.” 54. Andrew Lawrence Norton et al., “Ensuring Benefits Realisation from ERP II: The CSF Phasing Model,” Journal of Enterprise Information Management 26, no. 3 (2013): 218–34, http://dx.doi.org/10.1108/17410391311325207. 55. Chong Hwa Chee, “Human Factor for Successful ERP2 Implementation,” New Straits Times, July 28, 2003, https://www.highbeam.com/doc/1P1-76161040.html. http://dx.doi.org/10.1109/HICSS.2001.927129 http://dx.doi.org/10.1108/02635570410561672 https://basecamp.com/about/press http://dx.doi.org/10.1006/ijhc.1999.0353 http://dx.doi.org/10.1108/17410391311325207 https://www.highbeam.com/doc/1P1-76161040.html ABSTRACT INTRODUCTION LITERATURE REVIEW METHOD RESULTS Careful Selection Process Top Management Involvement Vendor Support Project Team Competence Staff User Involvement Interdepartmental Communication Data Analysis and Conversion Project Management and Project Tracking Staff User Education and Training Managing Staff User Emotions Summary of Results 9268 ---- Editorial Board Thoughts: The Importance of Staff Change Management in the Face of the Growing “Cloud” Mark Dehmlow INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 3 The library vendor market likes to throw around the word “cloud” to make their offerings seem innovative and significant. In many ways, much of what the library IT market refers to as “cloud,” especially SAAS (software as a service) offerings, are really just a fancier term for hosted services. The real gravitas behind the label cloud really emanated from grid-computing or large interconnected, and quickly deployable infrastructure like Amazon’s AWS or Microsoft’s Azure platforms. Infrastructure at that scale and that level of geographic distribution was revolutionary when it emerged. Still these offerings at their core are basically IAAS (infrastructure as a service) bundled as a menu of services. So I think the most broadly applicable synonym for the “cloud” could be “IT as a service” in various forms. Outsourcing in this way isn’t entirely new to libraries. The function and structure of OCLC has arguably been one of the earlier instantiations of “IT as a service” for libraries vis-à-vis their MARC record aggregation and distribution which OCLC has been doing for decades. The more recent trend toward hosted IT services has been relatively easy for non-IT related units in our library. A service no different to most library staff based on where it is hosted. And with many services implementing APIs for libraries, that distinction is becoming less significant for our application developers too. For many of our technology staff, who have built careers around systems administration, application development, systems integration, and application management, hosted services represent a threat to not only their livelihoods but in some ways also their philosophical perspectives that are grounded in open source and do-it- yourself oriented beliefs. In many ways the “cloud” for the IT segment of our profession is perhaps more synonymous with change, and with change requires effective management of that change, especially for the human element of our organizations. Recently, our Office of Information Technologies started an initiative to move 80% of their technology infrastructure into the cloud. They have proposed an inverted pyramid structure for determining where IT solutions should reside — focusing first on hosted software as a service solutions for the largest segment of applications, followed by hosting those applications we would have typically installed locally onto a platform or infrastructure as a service provider, and then limiting only those applications that have specialized technical or legal needs to reside on premise. This is a big shift for our IT staff, especially, but not limited to, our systems administrators. The IAAS platform our university is migrating to is Amazon Web Services and their infrastructure is Mark Dehmlow (mdehmlow@nd.edu), a member of LITA and the ITAL editorial board, is the Director, Information Technology Program, Hesburgh Libraries, University of Notre Dame, South Bend, Indiana. EDITORIAL BOARD THOUGHTS: THE IMPORTANCE OF STAFF CHANGE MANAGEMENT IN THE FACE OF THE GROWING “CLOUD” | DEHMLOW | doi: 10.6017/ital.v35i1.8965 4 largely accessible via a web dashboard, so that the myriad of tasks our systems administrators took days and weeks to do can now, in some adjusted way, be accomplished with a few clicks. This example is on one extreme end of the spectrum as far as IT change goes, but simultaneously, we have looked at the vendor market to lease pre-packaged tools that support standard functions in academic libraries and can be locally branded and configured with our data — things like course guides, A-Z journal lists, scheduling events, etc. The overarching goals of these efforts are cost savings and increased velocity and resiliency of infrastructure, but also and perhaps more important, is giving us flexibility in how we invest our staff time. If we are able to move high level tasks from staff to a platform, then we will be able to reallocate our staff’s time and considerable talent to take on the constant stream of new, high level technology needs. Partnering with the University, we are aiming towards their defined goal of moving 80% of our technical infrastructure into the “cloud.” We have adopted their overall strategy of approach to systems infrastructure, at least in principle and are integrating into our own strategy significant consideration for the impact of these changes on our staff. Our organization has recognized that people form not only habits around process, but also personal and emotional attachments to why we do things the way we do them, both from a philosophical as well as a pragmatic perspective. Our approach to staff change is layered as well as long term. We know that getting from shock to acceptance is not an overnight process and that staff who adopt our overarching goals and strategy as their own will be more successful in the long term. To make this transition, we have developed several strategic approaches: 1. Explaining the Case: My experience is that staff can live through most changes as long as they understand why. Helping them gain that understanding can take some time, but ultimately having that comprehension will help them fully understand our strategic goals as well as help them make decisions that are in alignment with the overall approach. I often find it is important to remember that, as managers, we have been a part of all of the change conversations and we have had time to assimilate ideas, discuss points of view, and process the implications of change. Each of our staff needs to go through the same process and it is up to leadership to guide them through that process and ensure they get to participate in similar conversations. It is tempting to want to hit an initiative running, but there is significant value in seeding those discussions gradually over a somewhat gradual time period to more holistically integrate staff into the broader vision. It is important to explain the case for change multiple times and actively listen to staff thoughts and concerns and to remember to lay out the context for change, why it is important, and how we intend to accomplish things. Then reassure, reassure, and reassure. The threats to staff may seem innocuous or unfounded to managers, but staff need to feel secure during a process to ultimately buy in. 2. Consistency and Persistence: Staff acceptance doesn’t always come easy — nor should it necessarily. Listening and integrating their perspectives into the planning and INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2016 5 implementation process can help demonstrate that they matter, but equally important is that they feel our approach is built on something solid. Stability is reinforced through consistency in messaging. Not only in individual consistency, but also team consistency, and upper management consistency — everyone should be able to support and explain messaging around a particular change. Any time staff approach me and say, “it was much easier to do it this other way,” I talk about the efficiency we will garner through this change and how we will be able to train and repurpose staff in the future. The more they hear the message, the more ingrained it becomes, and the more normative it begins to feel. 3. Training and Investment: IT futures require investment, not just in infrastructure, but also in skill development. We continue to invest significantly in providing some level of training on new technologies that we implement. That training will not only prove to staff that you are invested in their development as well as their job security, but it will also give them the tools they need to be successful in implementing new technologies. Change is anxiety inducing because it exposes so many unknowns. Providing training helps build confidence and competence for staff, reducing anxieties and providing some added engagement in the process. It also gives them exposure to the real world implementation of technologies where they can begin to see the benefits that you have been communicating for themselves. 4. Envisioning the Future: Improvements and Roles — One of the initial benefits we will be getting from recouping staff time is around shoring up our processes. We have generally had a more ad hoc approach to managing the day to day. It has been difficult to institute a strong technical change management process, in part, because of time. We will be able to remove that consideration from our excuses as we take advantage of the “cloud.” The net effect will be that we will do our work more thoughtfully and less ad hoc and use better defined processes that will meet group-developed expectations. In addition to doing things better, we do expect to do things differently. With fewer tasks at the operational level, we believe we will be able to transition staff into newly defined roles. Some of these roles include DevOps Engineers, a hybrid of application engineering (the dev) and systems administration (the ops), these staff will help design automation and continuous integration processes that allow developers to focus on their programming and less on the environment they are deploying their applications in; Financial Engineers who will take system requirements and calculate costs in somewhat complex technical cloud environments; Systems Architects who will be focused on understanding the smorgasbord of options that can be tied together to provide a service to meet expected response performance, disaster recovery, uptime, and other requirements; and Business Analysts - who will focus on taking technical requirements and looking at all of the potential approaches to solve that need whether it be a hosted service, a locally developed solution, an implementation of an open source system, or some integration of all or some of the EDITORIAL BOARD THOUGHTS: THE IMPORTANCE OF STAFF CHANGE MANAGEMENT IN THE FACE OF THE GROWING “CLOUD” | DEHMLOW | doi: 10.6017/ital.v35i1.8965 6 above. This list is by no means exhaustive, but I think it forms a good foundation on which to help staff develop their skill set along with our changing environment. I believe it is important to remind those of us who are managing IT departments in Libraries that in many ways the easiest parts of change are the logistics. The technology we work with is bounded by sets of guidelines that define how they are used and ensure that if they are implemented properly, they will work effectively. People on the other hand are not bounded as neatly by stringent rules. They are guided by diverse backgrounds, personalities, experiences, and feelings. They can be unpredictable, difficult to fully figure out, and behaviorally inconsistent. And yet, they are the great constant in our organizations and therefore require significant attention. Our field needs “servant leaders” dedicated to supporting and developing staff, and not just being competent at implementing technologies. Those managers who invest in staff, their well-being, development, and sense of engagement in their jobs, will find their organizations are able to tackle most anything. But those who ignore their staffs’ needs over pragmatic goals will likely find their organizations struggling to move quickly and instead spend too much energy overcoming resistance instead of energizing change. 9343 ---- Let’s Get Virtual: Examination of Best Practices to Provide Public Access to Digital Versions of Three-Dimensional Objects Tanya M. Johnson INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 39 ABSTRACT Three-dimensional objects are important sources of information that should not be ignored in the increasing trend towards digitization. Previous research has not addressed the evaluation of digitized versions of three-dimensional objects. This paper first reviews research concerning such digitization, in both two and three dimensions, as well as public access in this context. Next, evaluation criteria for websites incorporating digital versions of three-dimensional objects are extrapolated from previous research. Finally, five websites are evaluated, and suggestions for best practices to provide public access to digital versions of three-dimensional objects are proposed. INTRODUCTION Much of the literature surrounding the increased efforts of libraries and museums to digitize content has focused on two-dimensional forms, such as books, photographs, or paintings. However, information does not only come in two dimensions; there are sculptures, artifacts, and other three-dimensional objects that have been unfortunately neglected by this digital revolution. As one author stated, “While researchers do not refer to three-dimensional objects as commonly as books, manuscripts, and journal articles, they are still important sources of information and should not be taken for granted” (Jarrell 1998, 32). The importance of three-dimensional objects as information that can and should be shared is not a new phenomenon; indeed, as early as 1887, museologists and educators forwarded the view that “museums were in effect libraries of objects” that provided information not supplied by books alone (Given and McTavish 2010, 11). However, it is only recently, with the advent of newer technological mechanisms, that such objects could be shared with the public on a larger scale. No longer do people need to physically visit museums to experience and learn from three- dimensional objects. Rather, various techniques have been utilized to place digital versions of such objects on the websites of museums and archives, and projects have been created by various universities in order to enhance that digital experience. Nevertheless, as Newell (2012) states: Collections-holding institutions increasingly regard digital resources as additional objects of significance, not as complete replacements for the original. Digital technologies work best when they enable people who feel connected to museum objects to have the freedom to deepen these Tanya M. Johnson (tmjohnso@gmail.com), a recent MLIS degree graduate from the School of Communication & Information, Rutgers, The State University of New Jersey, is winner of the 2016 LITA/Ex Libris Student Writing Award. mailto:tmjohnso@gmail.com LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 40 relationships and, where appropriate, to extend outsiders’ understandings of the objects’ cultural contexts. The raison d’être of museums and other cultural institutions remains centred on the primacy of the object and in this sense continues to privilege material authenticity. (303) In this regard, three-dimensional visualization of physical objects can be seen as the next step for museums and cultural heritage institutions that seek to further patrons’ connection to such objects via the internet. Indeed, in this digital age, the goals of museums and archives are changing, converging with those of libraries to focus more efforts on providing information to the public, and, along with the growing trend to digitize information contained within libraries, there has been a concomitant trend to digitize the contents of museums in order to provide greater public access to collections (Given and McTavish 2010). In light of this progress, this paper will review various methods of presenting three-dimensional objects to the public on the internet and, based on an evaluation of five digital collections, attempt to provide some advice as to best practices for museums or institutions seeking to digitize such objects and present them to the public via a digital collection. LITERATURE REVIEW Two-Dimensional Digitization There are many ways to present digital versions of three-dimensional objects on a webpage, ranging from simple two-dimensional photography to complicated three-dimensional scanning and rendering. Beginning on the simpler end of the scale, Bincsik, Maezaki, and Hattori (2012) describe the process of photographing Japanese decorative art objects in order to create an image database of objects from multiple museums. Specifically, the researchers explain that they need high quality photographs showing each object in all directions, as well as close-up images of fine details, in order to recreate the physical research experience as closely as possible. They also note that, for the same reason, the context of each object must be recorded, including photographs of any wrapping or storage materials and accompanying documentation. For this project, the researchers utilized Nikon professional or semi-professional cameras, with zoom and macro lenses, and often used small apertures to increase depth-of-field. At times, they also took measurements of the objects in order to assist museums in maintaining accurate records. The raw image files were then processed with programs such as Adobe Photoshop, saved as original TIF files, and converted into JPEG format for upload. Despite the success of the project, the researchers also noted the limitations of digitizing three-dimensional objects: With decorative art objects some information is inevitably lost, such as the weight of the object, the feeling of its surface texture or the sense of its functionality in terms of proportions and balance. Digital images clearly can fulfill many research objectives, but in some cases they can only be used as references. One objective of the decorative arts database is to advise the researcher in selecting which objects should be examined in person. (Bincsik, Maezaki, and Hattori 2012, 46) One difficulty with photography, particularly when digitizing artwork, is that color is a function of light. Thus, a single object will often appear to be different colors when photographed in different lighting conditions using conventional digital cameras, which process images using RGB filters. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 41 More accurate representations of objects can be acquired using multispectral imaging, which uses a higher number of parameters (the international standard is 31, compared to RGB’s 3) in order to obtain more information about the reflectance of an object at any particular point in space (Novati, Pellegri, and Schettini 2005). Multispectral imaging, however, is very expensive and, despite some researchers’ attempts to create affordable systems (e.g., Novati, Pellegri, and Schettini 2005), the acquisition of multispectral images is generally limited to large institutions with considerable funding (Chane et al. 2013). The use of two-dimensional photography to digitize objects is not limited to the arts; in the natural sciences, different types of photographic equipment have been developed to document existing collections and enhance scientific observation. Gigapixel imaging, for example, has been utilized to allow museum visitors to virtually explore large petroglyphs located in remote locations as well as for documentation and viewing of dinosaur bone specimens that are not on public display (Louw and Crowley 2013). This technology consists of taking many, very high resolution photographs that are then, via computer software, “aligned, blended, and stitched” together to create one extremely detailed composite image (Louw and Crowley 2013, 89–90). Robotic systems, such as GigaPan, have been developed to speed up the process and permit rapid recording and processing of the necessary area. Once the gigapixel image is created, it can then be uploaded and displayed on the web in dynamic form, including spatial navigation of the image with embedded text, audio, or video at specific locations and zoom levels to provide further information (Louw and Crowley 2013). Various types of gigapixel imaging, including the GigaPan system, have also been used to digitize important collections of biological specimens, particularly insects, which are often stored in large drawers. One study examined the documentation of entomological specimens by “whole-drawer imaging” using various gigapixel imaging technologies (Holovachov, Zatushevsky, and Shydlovsky 2014). The researchers explained that different gigapixel imaging systems (many of which are commercial and proprietary) utilize different types of cameras and lenses, as well as different types of software for processing. However, despite the expensive cost of some commercially available systems, it is possible for museums and other institutions to create their own, economically viable versions. The system created by Holovachov, Zatushevsky, and Shydlovsky utilized a standard SLR camera, fitted with a macro lens and attached to an immovable stand. The researchers manually set up lighting, focus, aperture, and other settings, and moved the insect drawer along a pre-determined grid pattern in order to obtain the multiple overlapping photographs necessary to create a large gigapixel image. They used a freely available stitching software program and manually corrected stitching artifacts and color balance issues that resulted from the use of a non-telecentric lens.1 Despite the lower cost of their individualized system, however, the researchers noted that the process was much more time-consuming and necessitated more labor from workers digitizing the collection. Moreover, technologically speaking, the researchers emphasized the limits of two-dimensional imaging, given that the 1The difference between telecentric and non-telecentric lenses is explained by the researchers: “Contrary to ordinary photographic lenses, object-space telecentric lenses provide the same object magnification at all possible focusing distances. An object that is too close or too far from the focus plane and not in focus, will be the same size as if it were in focus. There is no perspective error and the image projection is parallel. Therefore, when such a lens is used to take images of pinned insects in a box, all vertical pins will appear strictly vertical, independent of their position within the camera’s field of view” (Holovachov, Zatushevsky, and Shydlovsky 2014, 7). LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 42 “diagnostic characteristics of three-dimensional insects,” as well as the accompanying labels, are often invisible when a drawer is only photographed from the top. Thus, the researchers concluded that, ultimately, “the whole-drawer digitizing of insect collections needs to be transformed from two-dimensions to three-dimensions by employing complex imaging techniques (simultaneous use of multiple cameras positioned at different angles) and a digital workflow” (Holovachov, Zatushevsky, and Shydlovsky 2014, 7). Three-Dimensional Digitization Given the goal of obtaining as accurate a representation as possible when digitizing objects, many researchers have turned to the use of various techniques in order to obtain three-dimensional data. Acquiring a three-dimensional image of an object takes place in three steps: 1. Preparation, during which certain preliminary activities take place that involve the decision about the technique and methodology to be adopted as well as the place of digitization, security planning issues, etc. 2. Digital recording, which is the main digitization process according to the plan from phase 1. 3. Data processing, which involves the modeling of the digitized object through the unification of partial scans, geometric data processing, texture data processing, texture mapping, etc. (Pavlidis et al. 2007, 94) Steps 2 and 3 have been more technically described as (2) obtaining data from an object to create point clouds (from thousands to billions of X,Y,Z coordinates representing loci on the object); and (3) processing point clouds into polygon models (creating a surface on top of the points), which can then be mapped with textures and colors (Metallo and Rossi 2011). There are several techniques that can be utilized to acquire three-dimensional data from a physical object. Table 1 explains the four general methods most commonly used by museums. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 43 Type Description Positives Negatives Approx. Price Range Laser Scanning A laser source emits light onto the object’s surface, which is detected by a digital camera; geometry of the object is extracted by triangulation or time of flight calculations High accuracy in capturing geometry; can capture small objects and entire buildings (using different hardware) Limited texture and color captured; shiny surfaces refract the laser $3,000– $200,000 White Light (Structured Light) Scanning A pattern of light is projected onto the object’s surface, and deformations in that pattern are detected by a digital camera; geometry is extracted by triangulation from deformations Captures texture details, making it very accurate; can capture color Dark, shiny, or translucent objects are problematic $15,000– $250,000 Photogrammetry Three-dimensional data is extracted from multiple two- dimensional pictures Can capture small objects and mountain ranges; good color information Need either precise placement of cameras or more precise software to obtain accurate data Cameras: $500– $50,000; Software: free– $40,000 Volumetric Scanning Magnetic resonance imaging (MRI) uses a strong magnetic field and radio waves to detect geometric, density, volume and location information; computed tomography (CT) uses rotating x-rays to create two- dimensional slices, which can then be reconstructed into three-dimensional images Both types can view the interior and exterior of an object; CT can be used for reflective or translucent objects; MRI can image soft tissues No color information; MRI requires object to have high water content $200,000– $2,000,000 Table 1. Description of four general methods of acquiring three-dimensional data about physical objects (table information compiled by reference to Pavlidis et al. 2007; Metallo and Rossi 2011; Abel et al. 2011; and Berquist et al. 2012). The type of three-dimensional digitization used can ultimately depend upon the types of objects to be imaged or the type of data needed. For example, in digitizing human skeletal collections, one study explained that three-dimensional laser scanning was an advantageous technique to create models of bones for preservation and analysis, but cautioned that CT scans would be needed to examine the internal structures of such specimens (Kuzminsky and Gardiner 2012). Another study LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 44 utilized several techniques in an attempt to decipher graffiti inscriptions on ancient Roman pottery shards, ultimately concluding that high-resolution photography (similar to gigapixel imaging) and three-dimensional laser scanning both provided detailed and helpful data (Montani et al. 2012). Additionally, sometimes multiple types of digitization can be used for the same objects with similar results. One study, for example, obtained virtually equivalent three- dimensional models of the same object using laser scanning and two types of photogrammetry (Lerma and Muir 2014). Most recently, researchers have been utilizing combinations of digitization techniques to obtain the most accurate representations possible. Chane et al. (2013), for example, examined methods of combining three-dimensional digitization with multispectral photography in order to obtain enhanced information concerning the physical object in question. The researchers explained that combining the two processes is difficult because, in order to obtain multispectral textural data that is mapped to geometric positions, the object must be imaged from identical locations by multiple scanners/cameras or else the data processing that combines the two types of data becomes extremely complex. As a compromise, the researchers created a system of optical tracking based on photogrammetry techniques that permits the collection and integration of geometric positioning data and multispectral textures utilizing precise targeting procedures. However, the researchers noted that most systems integrating multispectral photography with three- dimensional digitization tended to be quite bulky, did not adapt easily to different types of objects, and needed better processing algorithms for more complex three-dimensional objects (Chane et al. 2013). Public Access to Three-Dimensionally Digitized Objects Despite museums’ growing focus on increasing public access to collections via digitization (Given and McTavish 2010), there is very little literature addressing public access to three-dimensionally digitized objects. Indeed, studies in this realm tend to focus on the technological aspects of either the modeling of specific objects or collections or website viewing of three-dimensional models. For example, Abate et al. (2011) described the three-dimensional digitization of a particular statue from the scanning process to its ultimate depiction on a website. The researchers explained in detail the particular software architecture utilized in order to permit the remote rendering of the three-dimensional model on users’ computers via a Java applet without compromising quality or necessitating download of potentially copyrighted works. By contrast, literature concerning the Digital Michelangelo project, during which researchers three-dimensionally digitized various Michelangelo works, focused on the method used to create an accurate three-dimensional model, complete with color and texture mapping, and a visualization tool (Dellepiane et al. 2008). One study did describe a project that was designed to place three-dimensional data about various cultural artifacts in an online repository for curators and other professionals (Hess et al. 2011). This repository was contained within database management software, a web-based interface was designed for searching, and user access to three-dimensional images and models was provided via an ActiveX plugin. Despite the potential of the prototype, however, it appears that the project has ceased,2 and the institution’s current three-dimensional imaging project is focused on the design 2See http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e- curator. http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 45 of a traveling exhibition incorporating, among other things, three-dimensional models of artifacts and physical replicas created from such models.3 Studies that do address public access directly tend to focus on the improvement of museum websites generally. For example, in terms of user expectations of museum websites, one study found that approximately 63 percent of visitors to a museum’s website did so in order to search the digital collection (Kravchyna and Hastings 2002). Another study found four types of museum website users, who each had different needs and expectations of sites. Relevantly, educators sought collections that were “the more realistic the better,” including suggestions like incorporating three-dimensional simulations of physical objects so that students could “explore the form, construction, texture and use of objects” (Cameron 2003, 335). Further, non-specialist users “value free choice learning” and “access online collections to explore and discover new things and build on their knowledge base as a form of entertainment” (Cameron 2003, 335). Similarly, some studies have addressed the incorporation of Web 2.0 technologies into museum websites. Srinivasan et al. (2009), for example, argue that Web 2.0 technologies must be integrated into museum catalogs rather than simply layered over existing records because users’ interest in objects is increased by participation in the descriptive practice. An implementation of this concept is found in Hunter and Gerber’s (2010) system of social tagging attached to three- dimensional models. This paper is an effort to address the gap between the technical process of digitizing and presenting three-dimensional objects on the web and the user experience of such. Through the evaluation of five websites, this paper will provide some guidance for the digitization of three- dimensional objects and their presentation in digital collections for public access. METHODOLOGY AND EVALUATIVE CRITERIA Evaluations of digital museums are not as prevalent as evaluations of digital libraries. However, given the similar purposes of digital museums and digital libraries, it is appropriate to utilize similar criteria. For digital libraries, Saracevic (2000) synthesized evaluation criteria into performance questions in two broad areas: (a) user-centered questions, including how well the digital library supports the society or community served, how well it supports institutional or organizational goals, how well it supports individual users’ information needs, and how well the digital library’s interface provides access and interaction; and (b) system- centered questions, including hardware and network performance, processing and algorithm performance, and how well the content of the collection is selected, represented, organized, and managed. Xie (2008) focused on user-centered evaluation and found five general criteria that exemplified users’ own evaluations of digital libraries: interface usability, collection quality, service quality, system performance, and user satisfaction. Parandjuk (2010) used information architecture to construct criteria for the evaluation of a digital library, including the following: • uniformity of standards, including consistency among webpages and individual records; • findability, including ease of use and multiple ways to access the same information; • sub-navigation, including indexes, sitemaps, and guides; 3See http://www.3dencounters.com. http://www.3dencounters.com/ LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 46 • contextual navigation, including simplified searching and co-location of different types of resources; • language, including consistency in labeling across pages and records and appropriateness for the audience; and • integration of searching and browsing. This system is particularly appropriate in the context of digital museums, as it emphasizes the curatorial or organizational aspect of the collection in order to support learning objectives. In one comprehensive evaluation of the websites of art museums, Pallas and Economides (2008) created a framework for such evaluation, incorporating six dimensions: content, presentation, usability, interactivity and feedback, e-services, and technical. Each dimension then contained several specific criteria. Many of the criteria overlapped, however, and three-dimensional imaging, for example, was placed within the e-services dimension, under virtual tours, although it could have been placed within presentation, with other multimedia criteria, or even within interactivity, with interactive multimedia applications. The problem in trying to evaluate a particular part of a museum’s website, namely, the way it presents three-dimensional objects in digital form, is that the level of specificity almost renders many of the evaluation criteria from previous studies irrelevant. As Hariri and Norouzi (2011) suggest, evaluation criteria should be based on the objective of the evaluation. Hence, based on portions of the above-referenced studies, this author has created a more focused evaluation framework, concentrating on criteria that are particularly relevant to museums’ digital presentations of three-dimensional objects. This framework is detailed in table 2, below. Dimension Description Functionality What technology is used to display the object? How well does it work? Must programs or files be downloaded? Are the loading times of displays acceptable? Usability How easy is the site to use? What is the navigation system? Are there searching and browsing functions, and how well does each work? How findable are individual objects? Presentation How does the display of the object look? What is the context in which the object is presented? Are there multiple viewing options? Is there any interactivity permitted? Content Does the site provide an adequate collection of objects? For individual objects, is there sufficient information provided? Is there additional educational content? Table 2. Summary of evaluative criteria Five digital collections, specified below, will be evaluated based on these criteria. This will be done in a case study manner, describing each website based on the above criteria and then using those evaluations to make suggestions for best practices. RESULTS INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 47 It is difficult to compare different types of digital collections, particularly when the focus is on different types of technology utilized to display similar objects. However, because the goal here is to determine the best practices for the digital presentation of three-dimensional objects, it is important to evaluate a variety of techniques in a variety of fields. Thus, the following digital collections have been chosen to illustrate different ways in which such objects can be displayed on a website. Museum of Fine Arts, Boston (MFA) (http://www.mfa.org/collections) The MFA, both in person and online, boasts a comprehensive and extensive collection of art and historical artifacts of varying forms. The website is very easy to navigate, with well-defined browsing options and easy search capabilities, allowing for refinement of results by collection or type of item. There are many collections, which are well organized and curated into separate exhibits and galleries. In addition, when viewing each gallery, suggestions are linked for related online exhibitions as well as tours and exhibits at the physical museum. Each item record contains a detailed description of the item as well as its provenance. Thus, the MFA website attains a very high rating for usability and content. However, individual items are represented by only single pictures of varying quality. Some pictures are color, some are black and white, and no two pictures appear to have the same lighting. Additionally, despite being slow to load, even the pictures that appear to be of the best quality cannot be of high resolution, as zooming in makes them slightly blurry. Accordingly, the MFA website receives a medium rating for functionality and a low rating for presentation. Digital Fish Library (DFL) (http://www.digitalfishlibrary.org/index.php) The DFL project is a comprehensive program that utilizes MRI scanning to digitize preserved biological fish samples from a particular collection housed at the Scripps Institution of Oceanography. After MRI scans of a specimen are taken, the data is processed and translated into various views that are placed on the website, accompanied by information about each species (Berquist et al. 2012). Navigating the DFL website is very intuitive, as the individual specimen records are organized by taxonomy. It is easy to search for particular species or browse through the clickable, pictorial interface. Records for each species include detailed information about the individual specimen, the specifics of the scans used to image each, and broader information about the species. Individual records also provide links to other species within the taxonomic family. Thus, the DFL website attains high ratings in both usability and content. For functionality and presentation, however, the ratings are medium. Although for each item there are videos and still images obtained from three- dimensional volume renderings and MRI scans, they are small in size and have low resolution. There is no interactive component, with the possible exception of the “digital fish viewer” that supposedly requires Java, but this author could not get it to work despite best efforts. One nice feature, shown in figure 1 below, is that some of the specimen records have three-dimensional renderings showing and explaining the internal structures of the species. http://www.mfa.org/collections http://www.digitalfishlibrary.org/index.php LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 48 Figure 1. Annotated three-dimensional rendering of internal structures of hammerhead shark, from the Digital Fish Library (http://www.digitalfishlibrary.org/library/ViewImage.php?id=2851) The Eton Myers Collection (http://etonmyers.bham.ac.uk/3D-models.html) The Eton Myers Collection of ancient Egyptian art is housed at Eton College, and a project to three- dimensionally digitize the items for public access was undertaken via collaboration between that institution and the University of Birmingham. Digitization was accomplished with three- dimensional laser scanners, data was then processed with Geomagic software to produce point cloud and mesh forms, and individual datasets were reduced in size and converted into an appropriate file type to allow for public access (Chapman, Gaffney, and Moulden 2010). Usability of the Eton Myers Collection website is extremely low. The initial interface is simply a list of three-dimensional models by item number with a description of how to download the appropriate program and files. Another website from the University of Birmingham (http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=The+Eton+Myers+Col lection) contains a more museum-like interface, but contains many more records for objects than are contained on the initial list of three-dimensional models. Moreover, most of the records do not even include pictures of the items, let alone links to the three-dimensional models, and the records that do include pictures do not necessarily include such links. Even when a record has a link to the three-dimensional model, it actually redirects to the full list of models rather than to the individual item. There is no search functionality from the initial list of three-dimensional models, and no way to browse other than to, colloquially speaking, poke and hope. Individual items are only identified by item number, and, aside from the few records that have accompanying pictures on the University of Birmingham site, there is no way to know to what item any given number refers. The http://www.digitalfishlibrary.org/library/ViewImage.php?id=2851 http://etonmyers.bham.ac.uk/3D-models.html http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=The+Eton+Myers+Collection http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=The+Eton+Myers+Collection INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 49 website attains only a low rating for content; although it seems that there may be a decent number of items in the collection, it is impossible to know for certain given the problems with the interface and the fact that individual items are virtually unidentified. The Eton Myers Collection website also receives a low rating for functionality. In order to access three-dimensional models of items, users must download and install a program called MeshLab, then download individual folders of compressed files, then unzip those files, and finally open the appropriate file in MeshLab. Despite compression, some of the file folders are still quite large and take some time to download. Presentation of the items is also rated low. Even for the high - resolution versions of the three-dimensional renderings, viewed in MeshLab, the geometry of the objects seems underdeveloped (e.g., hieroglyphics are illegible) and surface textures are not well mapped (e.g., colors are completely off). This is evident from a comparison of the three- dimensional rendering with a two-dimensional photograph of the same item, as in figure 2, below. Figure 2. Comparison of original photograph (left) and three-dimensional rendering (right) of Item Number ECM 361, from the Eton Myers Collection (http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+3 61&op-earliest_year=%3D&op-latest_year=%3D). Notably, Chapman, Gaffney, and Moulden (2010) indicate that the detailed three-dimensional imaging enabled them to identify tooling marks and read previously unclear hieroglyphics on certain items. Thus, it is possible that the problems with the renderings may be a result of a loss in quality between the original models and the downloaded versions, particularly given that the files were reduced in size and converted prior to being made available for download. http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3D&op-latest_year=%3D http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3D&op-latest_year=%3D LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 50 Epigraphia 3D Project (http://www.epigraphia3d.es) The Epigraphia 3D project was created to present an online collection of various historical Roman epigraphs (also known as inscriptions) that were discovered and excavated in Spain and Italy; the physical collection is housed at the Museo Arqueológico Nacional (Madrid). Digital imaging was accomplished using photogrammetry, free software was utilized to create three-dimensional object models and renderings, and Photoshop was used to obtain appropriate textures. Finally, the three-dimensional model was published on the web using Sketchfab, a web service similar to Flickr that allows in-browser viewing of three-dimensional renderings in many different formats (Ramírez-Sánchez et al. 2014). The Epigraphia 3D website is intuitive and informative. Browsing is simple because there are not many records, but, although it is possible to search the website, there is no search function specifically directed to the collection. Thus, usability is rated as medium. Despite the fact that the website provides descriptions of the project and the collection, as well as information about epigraphs generally, the website attains a medium rating for content in light of the small size of the collection and the limited information given for each individual item. However, the Epigraphia 3D website receives very high ratings for functionality and presentation. The individual three- dimensional models are detailed, legible, and interactive. Individual inscriptions are transcribed for each item. The use of Sketchfab to display the models is effective; no downloading is necessary, and it takes an acceptable amount of time to load. When viewing the item, users can rotate the object in either “orbit” or “first person” mode, as well as view it full-screen or within the browser window. Users can also display the wireframe model and the textured or surfaced rendering, as shown in Figure 3 below. Figure 3. Three-dimensional textured (left) and wireframe (middle) renderings from the Epigraphia 3D project (http://www.epigraphia3d.es/3d-01.html), as compared to an original two- dimensional photograph of the same object (right) (http://eda- bea.es/pub/record_card_1.php?refpage=%2Fpub%2Fsearch_select.php&quicksearch=dapynus&r ec=19984). http://www.epigraphia3d.es/ http://www.epigraphia3d.es/3d-01.html http://eda-bea.es/pub/record_card_1.php?refpage=%2Fpub%2Fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2Fpub%2Fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2Fpub%2Fsearch_select.php&quicksearch=dapynus&rec=19984 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 51 Smithsonian X 3D (http://3d.si.edu) The Smithsonian X 3D project, although affiliated with all of the Smithsonian’s varying divisions, was created to test the application of three-dimensional digitization techniques to “iconic collection objects” (http://3d.si.edu/about). The website provides significant detail concerning the project itself, mostly in the form of videos, and individual items, many of which are linked to “tours” that incorporate a story about the object. Content is rated as medium because, despite the depth of information provided about individual items, there are still very few items within the collection. The website also receives a medium rating for usability, given the simple browsing structure, easy navigation, and lack of a search feature (all likely due at least in part to the limited content). Functionality and presentation, however, are rated high. The X3D Explorer in-browser software (powered by Autodesk) does more than simply display a three-dimensional rendering of an object; it also permits users to edit the model by changing color, lighting, texture, and other variables as well as incorporates detailed information about each item, both as an overall description and as a slide show, where snippets of information are connected to specific views of the item. The individual three-dimensional models are high resolution, detailed, and well- rendered, with very good surface texture mapping. However, it must be noted that the X3D Explorer tool is in Beta and, as such, still has some bugs; for example, this author has observed a model disappear while zooming in on the rendering. Table 3, below, summarizes the results of the evaluation. Functionality Usability Presentation Content MFA Medium Very High Low Very High DFL Medium High Medium High Eton Myers Low Low Low Low Epigraphia 3D Very High Medium Very High Medium Smithsonian X 3D High Medium High Medium Table 3. Summary of evaluation results for each website by individual criteria DISCUSSION Based on the evaluation of the five websites described above, some suggested best practices for the digitization and presentation of three-dimensional objects become apparent. When digitizing, the museum should utilize the method that best suits the object or collection. For example, while MRI scanning is likely the best method for three-dimensionally digitizing biological fish specimens, it is not going to be effective or feasible for digitizing artwork or artifacts (Abel et al. 2011; Berquist et al. 2012). Regardless of the method of digitization used, however, the people conducting the imaging and processing should fully comprehend the hardware and software necessary to complete the task. Additionally, although financial restraints must be considered, museums should note that some three-dimensional scanning equipment is just as economically feasible as standard digital cameras (Metallo and Rossi 2011). However, if a museum chooses to utilize only two-dimensional imaging, http://3d.si.edu/ http://3d.si.edu/about LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 52 each item should be photographed from multiple angles in high resolution, to avoid creating a website, like the MFA’s, on which everything other than the object itself is presented outstandingly. Further, museums deciding on two-dimensional imaging should explore the possibility of utilizing photogrammetry to create three-dimensional models from their two- dimensional photographs, like the Epigraphia 3D project. There is free or inexpensive software that functions to permit the creation of three-dimensional object maps from very few photographs (Ramírez-Sánchez et al. 2014). Finally, compatibility is a key issue when conducting three- dimensional scans; the museum should ensure that the software used for rendering models is compatible with the way in which users will be viewing the models. In the context of public access to the museum’s digital collections, the website should be easy and intuitive to navigate. The MFA website is an excellent example; browsing and search functions should both be present, and reorganization of large numbers of objects into separate collections may be necessary. Where searching is going to be the primary point of entry into the collection, it is important to have sufficient metadata and functional search algorithms to ensure that item records are findable. Furthermore, remember that the website is simply a way to access the museum itself. Hence, the collections on the website, like the collections in the physical museum, should be curated; there should be a logical flow to accessing object records. The museum may also want to have sections that are similar to virtual exhibitions, like the “tours” provided by the Smithsonian X 3D project. Finally, museums should ensure that no additional technological know-how (beyond being able to access the internet) is required to access the three-dimensional content in object records. Users should not be required to download software or files to view records; Epigraphia 3D’s use of Sketchfab and the Smithsonian’s X 3D Explorer tool are both excellent examples of ways in which three-dimensional content can be viewed on the web without the need for extraneous software. Museums and cultural heritage institutions are increasing the focus on providing public access to collections via digitization and display on websites (Given and McTavish 2010). In order to do this effectively, this paper has attempted to provide some guidance as to best practices of presenting digital versions of three-dimensional objects. In closing, however, it must be noted that this author is not a technician. Although this paper has tried to contend with the issues from the perspective of a librarian, there are complicated technical concerns behind any digitization project that have not been adequately addressed. In addition, this paper has not examined the role of budgetary constraints on digitization or the concomitant issues of creating and maintaining websites. Moreover, because this paper has been treated as a broad overview of the digitization and presentation for public access of three-dimensional objects, the five websites evaluated were from varying fields of study. Museums should look to more specific comparisons in order to appropriately digitize and present their collections on the web. CONCLUSION There may not be a direct substitute for encountering an object in person, but for people who cannot obtain physical access to three-dimensional objects, the digital realm can serve as an adequate proxy. This paper has demonstrated, through an evaluation of five distinct digital collections, that utilizing three-dimensional imaging and presenting three-dimensional models of physical objects on the web can serve the important purpose of increasing public access to otherwise unavailable collections. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 53 REFERENCES Abate, D., R. Ciavarella, G. Furini, G. Guarnieri, S. Migliori, and S. Pierattini. “3D Modeling and Remote Rendering Technique of a High Definition Cultural Heritage Artefact.” Procedia Computer Science 3 (2011): 848–52. http://dx.doi.org/10.1016/j.procs.2010.12.139. Abel, R. L., S. Parfitt, N. Ashton, Simon G. Lewis, Beccy Scott, and C. Stringer. “Digital Preservation and Dissemination of Ancient Lithic Technology with Modern Micro-CT.” Computers and Graphics 35, no. 4 (August 2011): 878–84. http://dx.doi.org/10.1016/j.cag.2011.03.001. Berquist, Rachel M., Kristen M. Gledhill, Matthew W. Peterson, Allyson H. Doan, Gregory T. Baxter, Kara E. Yopak, Ning Kang, H.J. Walker, Philip A. Hastings, and Lawrence R. Frank. “The Digital Fish Library: Using MRI to Digitize, Database, and Document the Morphological Diversity of Fish.” PLoS ONE 7, no. 4: (April 2012). http://dx.doi.org/10.1371/journal.pone.0034499. Bincsik, Monika, Shinya Maezaki, and Kenji Hattori. “Digital Archive Project to Catalogue Exported Japanese Decorative Arts.” International Journal of Humanities and Arts Computing 6, no. 1– 2 (March 2012): 42–56. http://dx.doi.org/10.3366/ijhac.2012.0037. Cameron, Fiona. “Digital Futures I: Museum Collections, Digital Technologies, and the Cultural Construction of Knowledge.” Curator: The Museum Journal 46, no. 3 (July 2003): 325–40. http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x. Chane, Camille Simon, Alamin Mansouri, Franck S. Marzani, and Frank Boochs. “Integration of 3D and Multispectral Data for Cultural Heritage Applications: Survey and Perspectives.” Image and Vision Computing 31, no. 1 (January 2013): 91–102. http://dx.doi.org/10.1016/j.imavis.2012.10.006. Chapman, Henry P., Vincent L. Gaffney, and Helen L. Moulden. “The Eton Myers Collection Virtual Museum.” International Journal of Humanities and Arts Computing 4, no. 1–2 (October 2010): 81–93. http://dx.doi.org/10.3366/ijhac.2011.0009. Dellepiane, M., M. Callieri, F. Ponchio, and R. Scopigno. “Mapping Highly Detailed Colour Information on Extremely Dense 3D Models: The Case of David's restoration.” Computer Graphics Forum 27, no. 8 (December 2008): 2178–87. http://dx.doi.org/10.1111/j.1467- 8659.2008.01194.x. Given, Lisa M., and Lianne McTavish. “What’s Old Is New Again: The Reconvergence of Libraries, Archives, and Museums in the Digital Age.” Library Quarterly 80, no. 1 (January 2010): 7– 32. http://dx.doi.org/10.1086/648461. Hariri, Nadjla, and Yaghoub Norouzi. “Determining Evaluation Criteria for Digital Libraries’ User Interface: A Review.” The Electronic Library 29, no. 5 (2011): 698–722. http://dx.doi.org/10.1108/02640471111177116. Hess, Mona, Francesca Simon Millar, Stuart Robson, Sally MacDonald, Graeme Were, and Ian Brown. “Well Connected to Your Digital Object? E-curator: A Web-Based E-Science Platform for Museum Artefacts.” Literary and Linguistic Computing 26, no. 2 (2011): 193– 215. http://dx.doi.org/10.1093/llc/fqr006. http://dx.doi.org/10.1016/j.cag.2011.03.001 http://dx.doi.org/10.1371/journal.pone.0034499 http://dx.doi.org/10.3366/ijhac.2012.0037 http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x http://dx.doi.org/10.1016/j.imavis.2012.10.006 http://dx.doi.org/10.3366/ijhac.2011.0009 http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1086/648461 http://dx.doi.org/10.1108/02640471111177116 http://dx.doi.org/10.1093/llc/fqr006 LET’S GET VIRTUAL: EXAMINATION OF BEST PRACTICES TO PROVIDE PUBLIC ACCESS TO DIGITAL VERSIONS OF THREE-DIMENSIONAL OBJECTS | JOHNSON | doi:10.6017/ital.v35i2.9343 54 Holovachov, Oleksandr, Andriy Zatushevsky, and Ihor Shydlovsky. “Whole-Drawer Imaging of Entomological Collections: Benefits, Limitations and Alternative Applications.” Journal of Conservation and Museum Studies 12, no. 1 (2014): 1–13. http://dx.doi.org/10.5334/jcms.1021218. Hunter, Jane, and Anna Gerber. 2010. “Harvesting Community Annotations on 3D Models of Museum Artefacts to Enhance Knowledge, Discovery and Re-Use.” Journal of Cultural Heritage 11, no. 1 (2010): 81–90. http://dx.doi.org/10.1016/j.culher.2009.04.004. Jarrell, Michael C. “Providing Access to Three-Dimensional Collections.” Reference & User Services Quarterly 38, no. 1 (1998): 29–32. Kravchyna, Victoria, and Sam K. Hastings. “Informational Value of Museum Web Sites.” First Monday 7, no. 4 (February 2002). http://dx.doi.org/10.5210/fm.v7i2.929. Kuzminsky, Susan C. and Megan S. Gardiner. “Three-Dimensional Laser Scanning: Potential Uses for Museum Conservation and Scientific Research.” Journal of Archaeological Science 39, no. 8 (August 2012): 2744–51. http://dx.doi.org/10.1016/j.jas.2012.04.020. Lerma, José Luis, and Colin Muir. “Evaluating the 3D Documentation of an Early Christian Upright Stone with Carvings from Scotland with Multiples Images.” Journal of Archaeological Science 46 (June 2014): 311–18. http://dx.doi.org/10.1016/j.jas.2014.02.026. Louw, Marti, and Kevin Crowley. “New Ways of Looking and Learning in Natural History Museums: The Use of Gigapixel Imaging to Bring Science and Publics Together.” Curator: The Museum Journal 56, no. 1 (January 2013): 87–104. http://dx.doi.org/10.1111/cura.12009. Metallo, Adam, and Vince Rossi. “The Future of Three-Dimensional Imaging and Museum Applications.” Curator: The Museum Journal 54, no. 1 (January 2011): 63–69. http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x. Montani, Isabelle, Eric Sapin, Richard Sylvestre, and Raymond Marquis . “Analysis of Roman Pottery Graffiti by High Resolution Capture and 3D Laser Profilometry.” Journal of Archaeological Science 39, no. 11 (2012): 3349–53. http://dx.doi.org/10.1016/j.jas.2012.06.011. Newell, Jenny. “Old Objects, New Media: Historical Collections, Digitization and Affect.” Journal of Material Culture 17, no. 3 (September 2012): 287–306. http://dx.doi.org/10.1177/1359183512453534. Novati, Gianluca, Paolo Pellegri, and Raimondo Schettini. “An Affordable Multispectral Imaging System for the Digital Museum.” International Journal on Digital Libraries 5, no. 3 (May 2005): 167–78. http://dx.doi.org/10.1007/s00799-004-0103-y. Pallas, John, and Anastasios A. Economides. “Evaluation of Art Museums' Web Sites Worldwide.” Information Services and Use 28, no. 1 (2008): 45–57. http://dx.doi.org/10.3233/ISU- 2008-0554. Parandjuk, Joanne C. “Using Information Architecture to Evaluate Digital Libraries.” The Reference Librarian 51, no. 2 (2010): 124–34. http://dx.doi.org/10.1080/02763870903579737. http://dx.doi.org/10.5334/jcms.1021218 http://dx.doi.org/10.1016/j.culher.2009.04.004 http://dx.doi.org/10.5210/fm.v7i2.929 http://dx.doi.org/10.1016/j.jas.2012.04.020 http://dx.doi.org/10.1016/j.jas.2014.02.026 http://dx.doi.org/10.1111/cura.12009 http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x http://dx.doi.org/10.1016/j.jas.2012.06.011 http://dx.doi.org/10.1177/1359183512453534 http://dx.doi.org/10.1007/s00799-004-0103-y http://dx.doi.org/10.3233/ISU-2008-0554 http://dx.doi.org/10.3233/ISU-2008-0554 http://dx.doi.org/10.1080/02763870903579737 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 55 Pavlidis, George, Anestis Koutsoudis, Fotis Arnaoutoglou, Vassilios Tsioukas, and Christodoulos Chamzas. “Methods for 3D Digitization of Cultural Heritage.” Journal of Cultural Heritage 8, no. 1 (2007): 93–98, http://dx.doi.org/10.1016/j.culher.2006.10.007. Ramírez-Sánchez, Manuel, José-Pablo Suárez-Rivero, and María-Ángeles Castellano-Hernández. “Epigrafía digital: tecnología 3D de bajo coste para la digitalización de inscripciones y su acceso desde ordenadores y dispositivos móviles.” El Profesional de la Información 23, no. 5 (2014): 467–74. http://dx.doi.org/10.3145/epi.2014.sep.03. Saracevic, Tefko. “Digital Library Evaluation: Toward an Evolution of Concepts.” Library Trends 49, no. 3 (2000): 350–69. Srinivasan, Ramesh, Robin Boast, Jonathan Furner, and Katherine M. Becvar. “Digital Museums and Diverse Cultural Knowledges: Moving past the Traditional Catalog.” The Information Society 25, no. 4 (2009): 265–78, http://dx.doi.org/10.1080/01972240903028714. Xie, Hong Iris. “Users’ Evaluation of Digital Libraries (DLs): Their Uses, Their Criteria, and Their Assessment.” Information Processing and Management 44, no. 3 (May 2008): 1346–73, http://dx.doi.org/10.1016/j.ipm.2007.10.003. http://dx.doi.org/10.1016/j.culher.2006.10.007 http://dx.doi.org/10.3145/epi.2014.sep.03 http://dx.doi.org/10.1080/01972240903028714 http://dx.doi.org/10.1016/j.ipm.2007.10.003 INTRODUCTION 9446 ---- Microsoft Word - December_ITAL_Biswas_final.docx Analyzing Digital Collections Entrances: What Gets Used and Why It Matters Paromita Biswas and Joel Marchesoni INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 19 ABSTRACT This paper analyzes usage data from Hunter Library’s digital collections using Google Analytics for a period of twenty-seven months from October 2013 through December 2015. The authors consider this data analysis to be important for identifying collections that receive the largest number of visits. We argue this data evaluation is important in terms of better informing decisions for building digital collections that will serve user needs. The authors also study the benefits of harvesting to sites such as the Digital Public Library of America, and they believe this paper will contribute to the literature on Google Analytics and its use by libraries. INTRODUCTION Hunter Library at Western Carolina University (WCU) has fourteen digital collections hosted in CONTENTdm—a digital collection management system from OCLC. Users can enter the collections in various ways—through the Library’s CONTENTdm landing pages,1 search engines, or sites such as the Digital Public Library of America (DPLA) where all the collections are harvested.2 Since October 2013, the Library has collected usage data from its collections’ websites and from DPLA referrals via Google Analytics. This paper analyzes this usage data covering a period of approximately twenty-seven months from October 2013 through December 2015. The authors consider this data analysis important for identifying collections receiving the largest number of visits, including visits through harvesting sites such as the DPLA. The authors argue that such data evaluation is important because it can better inform decisions taken to build collections that will attract users and serve their needs. Additionally, this analysis of usage data generated from harvesting sites such as the DPLA demonstrates the usefulness of harvesting in increasing digital collections’ usage. Lastly, this paper contributes to the broader literature on Google Analytics and its use by libraries in data analysis. LITERATURE REVIEW Using Google Analytics to study usage of electronic resources is common; a considerable amount of material exists describing the use of Google Analytics in marketing and business fields.3 Paromita Biswas (pbiswas@email.wcu.edu) is Metadata Librarian and Joel Marchesoni (jmarch@email.wcu.edu) is Technology Support Analyst, Hunter Library, Western Carolina University, Cullowhee, North Carolina. ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 20 However, the published literature offers little about the use of this software for studying usage of collections consisting of unique materials digitized and placed online by libraries and cultural heritage organizations. For example, Betty has written about using Google Analytics to track statistics for user interaction with librarian-created digital media such as quizzes and video tutorials.4 Fang discusses using Google Analytics to track the behavior of users who visited the Rutgers-Newark Law Library website.5 Fang looked at the number of visitors, what and how many pages they visited, how long they stayed on each page, where they were coming from, and which search engine or website had referred them to the library’s website. Findings were evaluated and used to make improvements to the library’s website. For example, Fang mentions using Google Analytics data for tracking the percentage of new and returning visitors before and after the website redesign. Among articles that discuss using web analytics to learn how users access digital collections, most have focused on a comparison between third-party platforms, online search engines, and the traditional library catalog to find preferred modes of access and whether results call for a shift in how libraries share their digital collections. For example, in their article on the impact of social media platforms such as HistoryPin and Pinterest on the discovery and access of digital collections, Baggett and Gibbs use Google Analytics for tracking usage of digital objects on the library’s website as well statistics collected from HistoryPin’s and Pinterest’s first-party analytics tools.6 The authors conclude that while neither HistoryPin nor Pinterest drive users back to the library’s website, they help in the discovery of digital collections and can enhance user access to library collections. Schlosser and Stamper compare the effects on usage of a collection housed in an institutional repository and reposted on Flickr.7 Whether housing a collection on a third-party site had an adverse effect on attracting traffic to the library’s website was not as important as ensuring users accessed the collection somewhere. Likewise, O’English demonstrates how data from web analytics were used to compare access to archival materials via online search engines as opposed to library catalogs using MARC records for descriptions.8 O’English argues library practices should change accordingly to promote patron access and use. Ladd’s article on the access and use of a digital postcard collection from Miami University uses statistics from Google Analytics, CONTENTdm, and Flickr over a period of one year.9 Ladd’s findings reveal that few users came to the main digital collections website to search and browse; instead, most arrived via external sources such as search engines and social media sites. The resulting increase in views makes it imperative, Ladd asserts, that regular updates both in CONTENTdm and Flickr are important for promoting access and use of the postcards. Articles on using Google Analytics for tracking digital collection usage have explored tracking the geographic base of users. For example, Herold uses Google Analytics to demonstrate usage of a digital archival collection by users at institutional, national, and international levels.10 Herold looks at server transaction logs maintained in Google Analytics, on- and off-campus searching counts, user locations, and repeat visitors to the archival images representing cultural heritage materials related to Orang Asli peoples and cultures of Malaysia. She uses these data to ascertain INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 21 the number of users by geographic region and determine that, while most visitors came from the United States, Malaysia ranked second. The data supported, according to Herold, that this particular digital collection was able to reach another target audience: users from Malaysia. Herold’s findings indicate that digitization of unique materials makes them available to a worldwide audience. Whether harvesting has increased usage of digital collections available via DPLA or its hubs has received limited exploration in the literature. Most writings on harvesting digital collections have focused more on the technical aspects of the process, like the DPLA’s ingestion method, the quality and scalability of metadata remediation and enhancement,11 and large metadata encoding.12 For example, Gregory and Williams write about the North Carolina Digital Heritage Center as one of the service hubs of the DPLA. The service hubs are centers that aggregate digital collection metadata provided by institutions for harvesting by the DPLA. The authors discuss metadata requirements, software review, and establishment of workflow for sending large metadata feeds to the DPLA.13 Boyd, Gilbert, and Vinson, in their article on the South Carolina Digital Library (SCDL), another service hub for DPLA, describe the planning behind setting up the SCDL, its management, and the technology involved in metadata harvesting.14 Freeland and Moulaison discuss the Missouri hub as a model for “institutions with similar collective goals for exposing and enriching their data through the DPLA.”15 According to them, by harvesting their metadata to the DPLA, institutions are able to share their digital collections with the broader public. Additionally, institutions that harvest metadata to the DPLA get value-added services like geocoding of location- based metadata and expression of contributed metadata as linked data. Data Collection Parameters Hunter Library digital collections usage data included information on item views16 and referrals17 for each of the collections including DPLA referrals. The authors also considered keyword search terms18 across all referrals, and within CONTENTdm specifically, that brought users to the Library’s collections. The authors considered the most frequently occurring keywords to be representing the subjects of collections that were most used. Repeat visitors to the Library’s digital collections’ website were also tracked. Finally, sessions19 were traced by the geographic area20 of the users. Hunter Library’s collections vary in size. The Library’s largest and one of the oldest collections, Craft Revival [Note: collections are set in roman and capitalized] showcases documents, photographs, and craft objects housed in Hunter Library and smaller regional institutions. The collection’s items represent the late nineteenth and early twentieth century (1890s–1940s) Craft Revival movement in Western North Carolina, which was characterized by a renewed interest in handmade objects, including Cherokee arts and crafts. The Craft Revival collection began in 2005 and includes 1,982 items. The second largest collection, Great Smoky Mountains, which highlights efforts that went into the establishment of the park and includes photographs on the landscape and flora and fauna in the park, began in 2012 and consists of 1,829 items. Not all digital ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 22 collections were harvested to the DPLA at the same time. While some older collections were harvested to the DPLA in 2013, smaller, institution-specific collections started later were also harvested later. For example WCU—Oral Histories, a collection of interviews collected by students of one of WCU’s history classes documenting the history and culture of Western North Carolina and the lives of WCU athletes or artists’ like Josephina Niggli who taught drama at WCU; Highlights from WCU, a collection of unique items from WCU’s Mountain Heritage Center and other departments on campus, including letters from the Library’s Special Collections transcribed by WCU’s English department students; and WCU—Fine Art Museum, showcasing art work from the university’s Fine Art Museum, were harvested to the DPLA in 2015. As these smaller collections were started later, their total item views and referral counts would likely be less than some of the Library’s older collections; however, these newer collections were included as they might provide valuable data regarding harvesting referrals and returning visitors. Table 1 shows the years the collections were started, the number of items included in each collection, and the year they were harvested to the DPLA. Collection Name Start Year Collection Size (Number of Items) Harvested Since Cherokee Traditions 2011 332 2013 Civil War 2011 68 2013 Craft Revival 2005 1,982 2013 Great Smoky Mountains 2013 1,829 2013 Highlights from WCU 2015 39 2015 Horace Kephart 2005 552 2013 Picturing Appalachia 2012 972 2013 Stories of Mountain Folk 2012 374 2013 Travel Western North Carolina 2011 160 2013 WCU—Fine Art Museum 2015 87 2015 WCU—Herbarium 2013 91 2013 WCU—Making Memories 2012 408 2013 WCU—Oral Histories 2015 67 2015 Western North Carolina Regional Maps 2015 37 2015 Table 1. Collections by year INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 23 Collecting Data Using Google Analytics The Library has had Google Analytics set up on online exhibits—websites outside of CONTENTdm that provide additional insight into the collection—since 2008 and began using Google Analytics to track its CONTENTdm materials with the 6.1.2 release in October 2013. CONTENTdm version 6.4 introduced a configuration field that allowed the authors to enter a Google Analytics ID and automatically generate the tracking code in pages to simplify the setup. Following that software update, OCLC made Google Analytics the default data logging mechanism. The Library set up Google Analytics such that online exhibits are tracked together with their CONTENTdm collections. This is accomplished by using custom tracking on all webpages and a custom script in CONTENTdm. This allows the Library to link its CONTENTdm and wcu.edu domains within Google Analytics so that sessions can be viewed across all online digital collections. Data were collected from Google Analytics using several tools. Google provides an online tool called Query Explorer (https://ga-dev-tools.appspot.com/query-explorer/) that can create and execute custom searches against Google Analytics. This application was used to craft the queries. Microsoft Excel was primarily used to download data, using the custom plugin Rest to Excel Library (http://ramblings.mcpher.com/Home/excelquirks/json/rest) to parse information from Google Analytics into worksheets. The Excel add-on works well, but requires knowledge of Microsoft Visual Basic for Applications (VBA) programming to use effectively. This limitation prompted the authors to look for a simpler way of retrieving data. The authors found OpenRefine (https://github.com/OpenRefine/OpenRefine) to collect, sort, and filter data, with Excel used for results analysis. Once in Excel, formulas were used to mine data for specific targets. RESULTS ANALYSIS The data collected using Google Analytics spanned a period of approximately twenty-seven months, from October 2013 through December 2015. Table 1 and graph 1 show each collection’s item views, item referrals, and size (number of items in the collection). These numbers were calculated for each collection as a percentage of total item views, total items referrals, and total number of items for all collections together. In table 2, the top five collections in terms of items views and referrals are highlighted. Graph 1, a graphical representation of table 2, displays more starkly the differences between collections in terms of views and referrals. ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 24 Collection Name Item Views as Percentage of Total Views Item Referrals as Percentage of Total Referrals Number of Items as Percentage of Total Items for all Collections Cherokee Traditions 6.38 6.12 4.74 Civil War 1.89 0.88 0.97 Craft Revival 41.35 52.39 28.32 Great Smoky Mountains 7.50 6.34 26.14 Highlights from WCU 0.23 0.08 0.56 Horace Kephart 11.67 7.62 7.89 Picturing Appalachia 10.03 9.99 13.89 Stories of Mountain Folk 3.51 2.45 5.344 Travel Western North Carolina 7.87 9.57 2.29 WCU—Fine Art Museum 0.19 0.08 1.24 WCU—Herbarium 0.71 0.45 1.30 WCU—Making Memories 7.13 2.64 5.83 WCU—Oral Histories 0.80 1.08 0.96 Western North Carolina Regional Maps 0.26 0.11 0.53 Total 100.00 100.00 100.00 Table 2. Collections by percentage Graph 1. Collections by percentage INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 25 As demonstrated in the preceding table and graph, Craft Revival, one of the Library’s oldest and largest collections, contributes more than 28 percent of all digital collections’ items and garners close to 42 percent of all item views and 53 percent of all item referrals. Great Smoky Mountains, the second largest collection, contributes a little more than 26 percent of items but receives only about 8 percent of all item views and 7 percent of all referrals. The Horace Kephart collection, focusing on the life and works of Horace Kephart—author, librarian, and outdoorsman who made the mountains of Western North Carolina his home later in life—is the Library’s fourth largest collection. It receives almost 12 percent of all item views and about 8 percent of all item referrals. Picturing Appalachia, the third largest collection—consisting of photographs showcasing the history, culture, and natural landscape of Southern Appalachia in the Western North Carolina region—makes up 14 percent of items and receives approximately 10 percent of all referrals and views. Travel Western North Carolina—visual journeys of Western North Carolina communities through three generations—contributes fewer than 3 percent of items but scores high on both items views and referrals. WCU—Making Memories, which highlights the people, buildings, and events from WCU’s history, and Stories of Mountain Folk (SOMF), which is a collection of radio programs from Western North Carolina non-profit Catch the Spirit of Appalachia and archived at Hunter Library, are collections that are similar in size—receiving fewer than 3 percent of all item referrals. However, WCU—Making Memories receives a more than 7 percent of all item views compared to SOMF’s almost 4 percent. These findings are not surprising as the Making Memories collection documents Western Carolina University’s history and may receive many views from within the institution. Overall, however, the Craft Revival collection can be considered the Library’s most popular collection. The Horace Kephart collection appears to be the second most popular collection. And, not surprisingly, Cherokee Traditions, a collection of art objects, photographs, and recordings similar in content to the Craft Revival in terms of its focus on Cherokee culture and history, is quite popular and receives more item referrals than both WCU—Making Memories and SOMF and more item views than SOMF (table 2). An analysis of keyword searches within CONTENTdm and keyword searches across all referral sources reiterates these findings. As part of the analysis, data collected for this twenty-seven- month period for the top keyword searches within CONTENTdm and the top keyword searches counting all referrals was recorded in an Excel spreadsheet and then uploaded to OpenRefine. OpenRefine allows text and numeric data to be sorted by name (alphabetical) and count (highest to lowest occurring). Once the Excel spreadsheet was uploaded to OpenRefine, keywords were sorted numerically and clustered. OpenRefine has a “cluster” function to bring together text that has the same meaning but differs by spelling or capitalization (for example, “CHEROKEE,” “cherokee,” “cheroke”) or by order (for example, “Jane Smith,” “Smith, Jane”). The clustering function provides a count of the number of times a keyword was used regardless of exact spelling. After identifying keywords belonging to a cluster (for example, a cluster of the word “Cherokee” spelled differently), the differently spelled or organized keywords in each cluster were merged in ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 26 OpenRefine with their most accurate counterparts. Finally, it should be noted that keywords including “!” and “+” symbols were most likely generated from either using multiple search terms within CONTENTdm’s advanced search or from curated search links maintained on some of our online exhibit websites. These links take users to commonly used result sets within the collection. Tables 3 and 4 provide a listing of the ten most frequently searched keywords within CONTENTdm across all referrals and names of collections that are most relevant to these searches. Keywords Occurrence Count Relevant Collection(s) Cherokee 187 Craft Revival; Cherokee Traditions Cherokee Language 107 Craft Revival; Cherokee Traditions Southern Highland Craft Guild 98 Craft Revival basket!object 96 Craft Revival; Cherokee Traditions Indian masks—Appalachian Region, Southern 83 Craft Revival; Cherokee Traditions Basket!photograph postcard 82 Craft Revival; Cherokee Traditions W.M. Cline Company 78 Picturing Appalachia; Craft Revival Cherokee +Indian! photograph 72 Craft Revival; Cherokee Traditions Wood-carving— Appalachian Region, Southern 70 Craft Revival Indian wood-carving— Appalachian Region, Southern 69 Craft Revival Table 3. Top keywords searches within CONTENTdm INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 27 Keywords Number of Sessions Relevant Collection(s) cherokee traditions 442 Craft Revival; Cherokee Traditions horace kephart 185 Horace Kephart; Great Smoky Mountains; Picturing Appalachia cherokee pottery 55 Craft Revival; Cherokee Traditions kephart knife 50 Horace Kephart amanda swimmer 37 Craft Revival; Cherokee Traditions appalachian people 36 Craft Revival; Cherokee Traditions; Great Smoky Mountains; WCU—Oral Histories cherokee indian pottery 36 Craft Revival; Cherokee Traditions cherokee baskets 34 Craft Revival; Cherokee Traditions weaving patterns 33 Craft Revival; Cherokee Traditions basket weaving 26 Craft Revival; Cherokee Traditions Table 4. Top keyword searches across all referrals Tables 3 and 4 show that top searches relate to arts and crafts from the Western North Carolina region (“baskets,” “Indian masks,” “Indian wood carving,” “Cherokee pottery”), artists (“amanda swimmer”), or topics relating to Cherokee culture (“cherokee,” “cherokee language”). Searches relating to the Horace Kephart collection (“horace kephart,” “kephart knife”) are also popular, explaining the fact that the Kephart collection, which accounts for fewer than 8 percent of the Library’s digital collections’ items scores highly in terms of item views (second) and referrals (fourth). The popularity of topics related to Western North Carolina is reiterated in the geographic base of the users. Graph 2 shows North Carolina accounts for most of the searches, with cities in Western North Carolina (Asheville, Franklin, Cherokee, Waynesville) accounting for more than 40 percent of sessions. ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 28 Graph 2. Cities by session count The majority of item referrals come from search engines such as Google, Bing, and Yahoo! Graph 3 shows the percentage of item referrals from these external searches.21 However, the DPLA also generates a fair amount of incoming traffic to the collections. For example, while all collections get referrals from the DPLA, harvesting to the DPLA is particularly useful for smaller collections such as Highlights from WCU, WCU—Fine Art Museum, and Civil War Collection. Each of these collections gets 17 percent of referrals from the DPLA, making DPLA the largest referral source following the search engines for the Highlights and Fine Art Museum collections. Graph 4 shows referrals each collection receives via the DPLA as a percentage of total referrals. This indicates the usefulness of harvesting to the DPLA. A trend seems also to show there is an increase in total referrals from DPLA per month the longer items are in DPLA (graph 5). Graph 3. Percentage of search engine item referrals (Google, Bing, and Yahoo!) 367 319 171 146 144 135 122 109 105 98 44% 29% 47% 44% 75% 43% 57% 11% 23% 75% 74% 38% 33% 6% 22% INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 29 Graph 4. Percentage of DPLA item referrals Graph 5. Increase in DPLA referrals over time Lastly, new and returning visitors to the collections were tracked as a marker of user interest in particular collections. Graph 6 shows data collected for new and returning visitors calculated as a proportion of the total number of visits for each collection. Some smaller collections like Highlights from WCU, WNC Regional Maps, WCU—Fine Art Museum, and WCU—Oral Histories score highly in terms of attracting return visitors (graph 6). 6% 17% 3% 12% 17% 4% 11% 6% 3% 17% 3% 4% 5% 0% ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 30 Graph 6. New and returning visitors DISCUSSION The aim behind gathering data was to study usage of Hunter Library’s digital collections and examine the usefulness of harvesting in promoting use. Although usage data logs were unable to shed much light on the actual usefulness of the collections to users, the logs provided information on volume of use, what materials were accessed, and where users were located. Analysis of the transaction logs indicates that while all collections likely benefitted from harvesting, Craft Revival, Cherokee Traditions, and Horace Kephart (collections focusing on the culture and history of western North Carolina) were the most heavily used and most visitors came from the state of North Carolina and from the region in particular. Search terms in the transaction logs also indicated a strong interest in items related to Cherokee culture and Horace Kephart. As Herold, who traced the second largest group of users of the Orang Asli digital image archive to Malaysia notes, the geographic base of a collection’s users can be indicative of the popularity of a subject area.22 Likewise, Matusiak asserts that users’ comments can be indicative of the relevance of collections to users’ needs and provide direction for the future development of digital collections.23 As neither the Craft Revival, Cherokee Traditions, nor Horace Kephart collection includes items that relate specifically to the university’s history—unlike other institution-specific collections mentioned earlier—it is possible collection users may be more representative of the larger public than the university. These findings point to the need for questioning identification of an academic INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 31 library’s user base as mainly students and faculty of the institution and whether librarians should give greater consideration to the needs of a wider audience.24 Data supporting the existence of this user base, whose true import or preferences might not be captured in surveys and questionnaires, can serve as a valuable source of information for individuals responsible for building digital collections. In an informal survey of Hunter Library faculty carried out by Hunter Library’s Digital Initiatives Unit in September of 2014, respondents considered collections such as Craft Revival to be more useful to users external to the university. While the survey could allude to the nature of the user base of a collection like Craft Revival, it understandably could not capture the scale of the item views and referrals garnered by this collection as well as a usage data analysis could. On the other hand, analysis of usage data, as demonstrated in this paper, indicated that certain collections— Highlights from WCU, WCU—Fine Art Museum, and WCU—Oral Histories—possibly served a niche audience. These smaller and more recently established collections consisting of university- created materials attracted more returning visitors (see graph 6). These returning visitors were likely internal users whose visits indicated, as Fang points out, a loyalty to these collections.25 In the paper “A Framework of Guidance for Building Good Digital Collections,” authored by the National Information Standards Organization Framework Advisory Group, the authors point out that while there are no absolute rules for creating quality digital collections, a good collection should include data pertaining to usage.26 The authors point to multiple assessment matrixes including using a combination of observations, surveys, experiments, and transaction log analyses. As the WCU digital collections findings demonstrate, a careful analysis of the popularity of collections can indicate the need for balancing quantitative data with more qualitative survey and interview data. These findings also indicate that usage data analysis can be very valuable in identifying the extent of collection usage by visitors who may not have significant survey representation. Results from the small (fewer than ten respondents) WCU survey indicate that some respondents question the institutional usefulness of collections such as Craft Revival. These results show the importance of taking multiple factors into account when assessing user needs and interests in digital collections. CONCLUSION The authors feel future projects might stem from this data analysis. For example, local subject fields based on the highest recurring keywords that were mined from the transaction logs can be added for all of Hunter Library’s digital collections. Usage statistics at a later period could be evaluated to study if addition of user generated keywords increased use of any collection. As Matusiak points out in her article on the usefulness of user-centered indexing in digital image collections, social tagging—despite its lack of synonym control or misuse of the singular and plural—is a powerful form of indexing because of “close connection with users and their language,” as opposed to traditional indexing.27 The terms users assign to describe images are also the ones they are most likely to type while searching for digital images. Likewise, according to Walsh, a ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 32 study conducted by the University of Alberta found more than forty percent of collections reviewed used a locally developed classification for indexing and searching their collections, and many of these schemes could work well for searches within the collection by users who are familiar with the culture of the collection.28 Usage-data analysis can constitute useful information that guides decisions for building digital collections that better serve user needs. It can identify a library’s digital collections’ users and what they want. These are important considerations to keep in mind if library services are to be all about engaging and building relationship with the users.29 Harvesting to a national portal such as the DPLA is beneficial for Hunter Library’s collections. At the same time, the Library’s institution-specific collections receive more return visits, likely because of sustained interest from the large user base of the university’s students and employees, an assessment supported by survey findings. Conversely, collections not so directly tied to the institution receive the most one- time item views and referrals. Items that get used are a good indication of what users want and, as this paper demonstrates, the focus of academic digital library collections should consider the needs of both the university audience and the general public. REFERENCES 1. A landing page refers to the homepage of a collection. 2. The DPLA provides a single portal for accessing digital collections held by cultural heritage institutions across the United States. “History,” Digital Public Library of America, accessed May 19, 2016, http://dp.la/info/about/history/. 3. Paul Betty, “Assessing Homegrown Library Collections: Using Google Analytics to Track Use of Screencasts and Flash-Based Learning Objects,” Journal of Electronic Resources Librarianship 21, no. 1 (2009): 75–92, https:// doi.org/10.1080/19411260902858631. 4. Ibid. 5. Wei Fang, “Using Google Analytics for Improving Library Website Content and Design: A Case Study,” Library Philosophy and Practice (e-journal), June 2007, 1-17, http://digitalcommons.unl.edu/libphilprac/121. 6. Mark Baggett and Rabia Gibbs, “Historypin and Pinterest for Digital Collections: Measuring the Impact of Image-Based Social Tools on Discovery and Access,” Journal of Library Administration 54, no. 1 (2014): 11–22, https:// doi.org/10.1080/01930826.2014.893111. 7. Melanie Schlosser and Brian Stamper, “Learning to Share: Measuring Use of a Digitized Collection on Flickr and in the IR,” Information Technology and Libraries 31, no. 3 (September 2012): 85–93, https:// doi.org/10.6017/ital.v31i3.1926. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 33 8. Mark R. O’English, “Applying Web Analytics to Online Finding Aids: Page Views, Pathways, and Learning about Users,” Journal of Western Archives 2, no. 1 (2011): 1–12, http://digitalcommons.usu.edu/westernarchives/vol2/iss1/1. 9. Marcus Ladd, “Access and Use in the Digital Age: A Case Study of a Digital Postcard Collection,” New Review of Academic Librarianship 21, no. 2 (2015): 225–31, https://doi.org/10.1080/13614533.2015.1031258. 10. Irene M. H. Herold, “Digital Archival Image Collections: Who Are the Users?” Behavioral & Social Sciences Librarian 29, no. 4 (2010): 267–82, https://doi.org/10.1080/01639269.2010.521024. 11. Mark A. Matienzo and Amy Rudersdorf, “The Digital Public Library of America Ingestion Ecosystem: Lessons Learned After One Year of Large-Scale Collaborative Metadata Aggregation,” in 2014 Proceedings of the International Conference on Dublin Core and Metadata Applications (DCMI, 2014), 1–11, http://arxiv.org/abs/1408.1713. 12. Oskana L. Zavalina et al., “Extended Date/Time Format (EDTF) in the Digital Public Library of America’s Metadata: Exploratory Analysis,” Proceedings of the Association for Information Science and Technology 52, no. 1 (2015), 1–5, http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.145052010066/abstract. 13. Lisa Gregory and Stephanie Williams, “On Being a Hub: Some Details behind Providing Metadata for the Digital Public Library of America,” D-Lib Magazine 20, no. 7/8 (July/August 2014): 1–10, https://doi.org/10.1045/july2014-gregory. 14. Kate Boyd, Heather Gilbert, and Chris Vinson, “The South Carolina Digital Library (SCDL): What Is It and Where Is It Going?” South Carolina Libraries 2, no. 1 (2016), http://scholarcommons.sc.edu/scl_journal/vol2/iss1/3. 15. Chris Freeland and Heather Moulaison, “Development of the Missouri Hub: Preparing for Linked Open Data by Contributing to the Digital Public Library of America,” Proceedings of the Association for Information Science and Technology 52, no. 1 (2015): 1–4, http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.1450520100105/abstract. 16. A single view of an item in a digital collection. 17. Visits to the site that began from another site with an item page being the first page viewed. 18. Keywords are words visitors used to find the Library’s website when using a search engine. Google Analytics provides a list of these keywords. 19. A session is defined as a “group of interactions that take place on a website within a given time frame” and can include multiple kinds of interactions like page views, social interactions, and economic transactions. In Google Analytics, a session by default lasts thirty minutes, though ANALYZING DIGITAL COLLECTIONS ENTRANCES: WHAT GETS USED AND WHY IT MATTERS | BISWAS AND MARCHESONI | https://doi.org/10.6017/ital.v35i4.9446 34 one can adjust this length to last a few seconds or several hours. “How a Session Is Defined in Analytics,” Google, Analytics Help, accessed May 20, 2016, https://support.google.com/analytics/answer/2731565?hl=en. 20. Locations were studied in terms of mostly cities and states. 21. The percentage is based on the total referral count a collection gets—for example, a 44 percent referral count for Cherokee Traditions would mean that the search engines account for 44 percent of the total referrals this collection gets. 22. Herold, “Digital Archival Image Collections,” 278. 23. Krystyna K. Matusiak, “Towards User-centered Indexing in Digital Image Collections,” OCLC Systems & Services: International Digital Library Perspectives 22, no. 4 (2006): 283–98, https://doi.org/10.1108/10650750610706998. 24. Ladd, “Access and Use in the Digital Age,” 230. 25. Fang points out that the improvements made to the Rutgers-Newark Law Library website could attract more return visitors and thus achieve loyalty. Fang, “Using Google Analytics for Improving Library Website,” 11. 26. NISO Framework Advisory Group, A Framework of Guidance for Building Good Digital Collections, 2nd ed. (Bethesda, MD: National Information Standards Organization, 2004), https://chnm.gmu.edu/digitalhistory/links/cached/chapter3/link3.2a.NISO.html. 27. Matusiak, “Towards User-centered Indexing,” 289. 28. John Walsh, “The Use of Library of Congress Subject Headings in Digital Collections,” Library Review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 29. Lynn Silipigni Connaway, The Library in the Life of the User: Engaging with People Where They Live and Learn, (Dublin: OCLC Research, 2015), http://www.oclc.org/research/publications/2015/oclcresearch-library-in-life-of-user.html. 9462 ---- Editor’s Comments: Odds and Ends Bob Gerrity INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 1 This issue marks the midpoint of Information Technology and Libraries’ fifth year as an open- access e-only journal. The move to online-only in 2012 was inevitable, as ITAL’s print subscription base was longer covering the costs of producing and distributing the print journal. Moving to an e- only model using an open-source publishing platform (the Public Knowledge Project’s Open Journal Systems) provided a low-cost production and distribution system that has allowed ITAL to continue publishing without requiring a large ongoing investment from LITA. The move to open access, however, was not inevitable, and I commend LITA for supporting that move and for continuing to provide a base subsidy that supports the journal’s ongoing publication. I also thank the Boston College Libraries for their ongoing support in hosting ITAL along with a number of other OA journals. Since ITAL is now open, access to it can no longer be offered as an exclusive benefit that comes with LITA membership. Regardless of the publishing model, though, ITAL has always relied on voluntary contributions of the time and expertise of reviewers and editors. I’d like to acknowledge the contributions of our past and current Editorial Board members, who play a key role in ensuring the ongoing quality and vitality of the journal. We will be adding a few additional Board members shortly, to help ensure that review of submissions to the journal are completed as quickly and effectively as possible. Speaking of peer review, one of the recent innovative startups in the scholarly communication space is a company called publons, which tracks and verifies peer-review activity, providing a mechanism for academics to report (and possibly receive institutional credit for) their peer- review work, an undervalued part of the scholarly communication framework. (Full disclosure: at University of Queensland we are conducting a pilot project with publons, to integrate the peer- review activities of our academics into our institutional repository.) In addition to new approaches to peer review, such as publons and Academic Karma, there are quite a few recent examples of innovations in various aspects of scholarly communication that are worth keeping an eye on. These include new collaborative authoring tools such as Overleaf, impact-measurement tools such as Impactstory, and personal digital library platforms such as Readcube. On a broader scale, initiatives such as PeerJ are building open access publishing platforms intended to dramatically improve the efficiency of and drive down the overall costs of scholarly publishing. February marked the 14th anniversary of a key trigger event in the Open Access movement—the launch of the Budapest Open Access Initiative in 2002. Bob Gerrity (r.gerrity@uq.edu.au), a member of LITA and the Editor of Information Technology and Libraries, is University Librarian at the University of Queensland, Brisbane, Australia. http://ejournals.bc.edu/ojs/index.php/ https://publons.com/ http://academickarma.org/ https://www.overleaf.com/ https://impactstory.org/ https://www.readcube.com/ https://peerj.com/ http://www.budapestopenaccessinitiative.org/read mailto:r.gerrity@uq.edu.au EDITOR’S COMMENTS | GERRITY doi: 10.6017/ital.v35i2.9462 2 Much has happened in the 14 years since the Budapest Initiative, on various fronts: o policy—introduction and widespread adoption of funder and institutional OA mandates; o technology--development and widespread adoption of institutional repositories, recent development of mechanisms to facilitate the discovery of OA publications (e.g., SHARE on the library side and CHORUS on the publisher side); o publishing—establishment of new OA megajournals (e.g., PLOS, BioMed Central), embrace of hybrid OA models by mainstream commercial publishers. Yet despite all the hype, acrimony, and activity triggered by the OA movement, a recent analysis in Chronical of Higher Education suggests the growth of OA has been slow and incremental: the percentage of research articles published annually in fully open-access format has increased at an average rate of of around one percent a year, from 4.8% in 2008 to 12% in 2015. At this rate, the tipping point for OA still seems very far away. Lots of energy has been and continues to be invested by different stakeholders in different approaches, and the green vs. gold argument still predominates. Recent developments suggest momentum is gaining for a more radical shift. In December 2015, the Max Planck Institute, a key player in the launch of OA with the Berlin Declaration on Open Access in 2003, hosted the 12th version of its annual OA conference to further the discussion around open access. Ironically, unlike previous meetings and seemingly in philosophical conflict with the underpinnings of the OA movement, the meeting was by invitation only. Given the topic, though, a “Proposal to Flip Subscription Journals to Open Access,” the closed nature of of the meeting is understandable. Underpinning the proposal was a 2015 paper from the Max Planck Digital Library that suggested that the amount of money currently being spent (largely by libraries) on journal subscriptions should be sufficient to fund research publication costs if applied to a “flipped” journal publishing business model, from subscription-based to gold open access.1 In the Netherlands, the university sector has adopted a national approach in negotiating deals with several major publishers (Springer, SAGE, Elsevier, and Wiley) that allow Dutch authors to publish their papers as gold OA, without additional charges (but, depending on the publisher, with limits on total numbers and/or which journals are available within the deals).2 The so-called “Dutch Deal” by the VSNU (Association of universities in the Netherlands) and UKB (Dutch Consortium of University Libraries and Royal Library) takes a national approach to flipping the model, attempting to bundle access rights for Dutch readers with APC credits for Dutch authors. http://www.arl.org/focus-areas/shared-access-research-ecosystem-share#.V3XhlZN95TY http://www.chorusaccess.org/ http://chronicle.com/article/As-an-Open-Access-Megajournal/234890 http://chronicle.com/article/As-an-Open-Access-Megajournal/234890 https://openaccess.mpg.de/Berlin-Declaration https://openaccess.mpg.de/Berlin-Declaration https://www.mpg.de/9202262/area-wide-transition-open-access INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2016 3 The Dutch government, which currently holds the EU presidency, is pushing hard for a Europe- wide adoption of this approach. Last month, the EU’s Competitiveness Council agreed that all scientific papers should be freely available by 2020.3 Meanwhile, in the US, the “Pay it Forward” research project at the University of California is examining what the institutional financial impact would be with a flipped model. The study is looking at existing institutional journal expenditures on subscriptions and modeling what a future, APC-based model would look like based on institutional research publication output and estimated average APC charges. Who knows when or if a global flip might occur, but it does strike me that the scholarly publishing world is overdue for a major shakeup. From the point of view of a university librarian, focused on keeping journal subscription costs in line (unsuccessfully I might add), I think there is real danger in not considering what a flip to a gold model might look like. The commercial publishers we all complain about are successfully exploiting the gold model as an additional revenue stream which, for the most part, academic libraries have been ignoring, since the individual APCs typically are paid from someone else’s budget. This has allowed the overall envelope of spending on research publication (subscriptions and APCs) to grow significantly. Perhaps a more interesting question is what the impact of a flip on libraries would be. If gold OA became the predominant model, we would no longer need all of the complex systems we’ve built to manage subscriptions and user access. To quote Homer Simpson, “Woohoo!” In the “watch this space” arena, EBSCO’s recently-launched open-source library services platform (LSP) initiative is beginning to take shape. It now has a name—FOLIO (for Future of the Libraries Is Open)—and as Marshall Breeding put it, the project “injects a new dynamic into the competitive landscape of academic library technology, pitting and open source framework backed by EBSCO against a proprietary market dominated by Ex Libris, now owned by EBSCO archrival ProQuest.”4 Publicly listed participants in the project include (in addition to EBSCO) OLE, Index Data, ByWater, BiblioLabs, and SIRSI Dynix.5 The platform release timetable calls for an initial, “technical preview” release of of the code for the base platform in August 2016, and an anticipated release of the apps needed to operate a library in early 2018.6 1. Ralf Schimmer, Kai Karin Geschuhn, Andreas Vogler, Disrupting the Subscription Journals’ Business Model for the Necessary Large-Scale Transformation to Open Access, (2015), doi:10.17617/1.3 2. Frank Huysmans, VSNU-Wiley: Not Such a Big Deal for Open Access, Warekennis (blog), March 1, 2016, https://warekennis.nl/vsnu-wiley-not-such-a-big-deal-for-open-access/ 3. Martin Enserink, “In dramatic statement, European leaders call for ‘immediate’ open access to all scientific papers by by 2020,” Science, May 27, 2016, doi:10.1126/science.aag0577. http://icis.ucdavis.edu/?page_id=286 https://warekennis.nl/vsnu-wiley-not-such-a-big-deal-for-open-access/ EDITOR’S COMMENTS | GERRITY doi: 10.6017/ital.v35i2.9462 4 4. Marshall Breeding, EBSCO Supports New Open Source Project, Amercian Libraries, April 22, 2016, https://americanlibrariesmagazine.org/2016/04/22/ebsco-kuali-open-source-project/ 5. https://www.folio.org/collaboration.php. 6. https://www.folio.org/apps-timelines.php. https://americanlibrariesmagazine.org/2016/04/22/ebsco-kuali-open-source-project/ https://www.folio.org/collaboration.php https://www.folio.org/apps-timelines.php 9469 ---- December_ITAL_Oud_final Accessibility of Vendor-Created Database Tutorials for People with Disabilities Joanne Oud INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 7 ABSTRACT Many video, screencast, webinar, or interactive tutorials are created and provided by vendors for use by libraries to instruct users in database searching. This study investigates whether these vendor- created database tutorials are accessible for people with disabilities to see whether librarians can use these tutorials instead of creating them in-house. Findings on accessibility were mixed. Positive accessibility features and common accessibility problems are described, with recommendations on how to maximize accessibility. INTRODUCTION Online videos, screencasts, and other multimedia tutorials are commonly used for instruction in academic libraries. These online learning objects are time consuming to create in-house and require a commitment to maintain and revise when database interfaces change. Many database vendors provide screencasts or online videos on how to use their databases. Should libraries use these vendor-provided instructional tools rather than spend the time and effort to create their own? Many already do: a study shows that 17.7 percent of academic libraries link to tutorials created by third parties, mainly by vendors or other libraries.1 When deciding whether to use vendor-created tutorials, one consideration is whether the tutorials meet accessibility requirements for people with disabilities. The importance of accessibility for online tutorials has been increasingly recognized and outlined in recent library literature.2 People with disabilities make up one of the largest minority groups in the United States and Canada, and studies show that about 9 percent of university or college students have a disability.3 Problems with web accessibility have been well documented. People with disabilities are often unable to access the same online sites and resources as others, creating a digital divide.4 Even if people with disabilities can access a site, it is more difficult for many to use it.5 Assistive technologies, like screen-reading software, enable access but add an extra layer of complexity in interacting with the site, and blind or low-vision users can’t always rely on visual cues to navigate and interpret sites. A recent study of library website accessibility concluded that typical library websites are not designed with people with disabilities in mind.6 Joanne Oud (joud@wlu.ca) is Instructional Technology Librarian and Instruction Coordinator, Wilfrid Laurier University, Ontario, Canada. ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 8 Libraries, which are founded on a philosophy of equal access to information, should be concerned about online accessibility. Legal requirements for providing accessible online web content vary, but exist in every jurisdiction in the United States and Canada. Apart from the legal requirements, recent literature points out that equitable access to information for people with disabilities is a matter of human rights and an issue of diversity and social justice, and calls on libraries and librarians to improve their commitment to online accessibility.7 It is important for libraries to participate in creating level playing field and to avoid creating conditions that make people feel unequal or prevent them from equitable access. It is unclear whether librarians can assume vendor-created instructional tutorials are accessible. Studies on vendor database accessibility have been mixed, showing some commitment to and improvements in accessibility on one hand, but sometimes substantial gaps in accessibility on the other.8 The focus until now has been exclusively on the accessibility of database interfaces. This study investigates the accessibility of online tutorials, including videos, screencasts, interactive multimedia, and archived webinars created by database and journal vendors and offered as instructional materials to librarians and patrons, to determine whether they are a viable alternative to making in-house training materials. LITERATURE REVIEW Although a few articles exist on how to make video tutorials accessible,9 no studies have evaluated the accessibility of already-created video or screencast tutorials. There are, however, some studies evaluating the accessibility of vendor databases. Byerley, Chambers, and Thohira surveyed vendors in 2007 and found that most felt they had integrated accessibility standards into their search interfaces, and nearly all tested for accessibility to some degree, though not always with actual users.10 These findings conflict somewhat with the results of other studies. Tatomir and Durrance evaluated the accessibility of thirty-two databases with a checklist and found that although many did contain accessibility features, 72 percent were marginally accessible or inaccessible.11 Similarly, Dermody and Majekodunmi found that students with print-related disabilities who use screen-reading software could only complete 55 percent of tasks successfully because of accessibility barriers and usability challenges.12 DeLancey surveyed vendors and examined VPATs, or product accessibility claims, and found that vendors felt they were compliant with 64 percent of US Section 508 items.13 Especially relevant to this study, only 23 percent of vendors said that the multimedia content within their products was compliant, and 46 percent admitted multimedia content was not compliant at all. Since vendor VPAT forms are completed for databases and other products only, and not the instructional tutorials created by vendors on how to use those products, vendor accessibility claims for instructional tutorials are unknown. Although no studies have been done on the accessibility of video or screencast tutorials, some have been done on the accessibility of multimedia or other related kinds of online learning. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 9 Roberts, Crittenden, and Crittenden surveyed 2,366 students taking online courses at several US universities. A total of 9.3 percent of those students reported that they had a disability, and of those, 46 percent said their disability affected their ability to succeed in their online course, although most reasons cited were not related to technical accessibility barriers.14 Kumar and Owston studied students with disabilities using online learning units that contained videos. All students in the study reported at least one barrier to completing the learning units.15 Although this study involves student use of video tutorials, it doesn’t report on accessibility issues specific to those tutorials. Previous studies of vendor products focus exclusively on database interfaces, and previous studies of online learning have not focused on screencast accessibility. Therefore this study’s goal is to investigate how accessible vendor-created video tutorials are. Accessibility is defined as both technical accessibility (can people with disabilities locate, access, and use them) and usability (how easy it is for people with disabilities to use them). This study will look at which major accessibility issues there are (if any) and make recommendations on whether librarians can direct students to them rather than making in-house instructional videos. METHOD An evaluation checklist (see appendix 2) was developed for this study using criteria drawn from the Web Content Accessibility Guidelines (WCAG) 2.0. WCAG 2.0 is the most widely recognized web-accessibility standard internationally. Much recent accessibility legislation adopts it, including the in-process revisions to Section 508 guidelines in the United States.16 WCAG 2.0 is also consistent with tutorial accessibility best-practice advice found in recent articles, which emphasize the need for accurate captions, keyboard accessibility, descriptive narration, and alternate versions for embedded objects, among other criteria.17 The checklist has twenty items and is split into two sections, “Functionality” and “Usability.” Functionality items test whether the tutorial can be used by people using screen-reading software or a keyboard only, and include whether the tutorial is findable on the page and playable, whether player controls and interactive content can be operated by keyboard, whether captions are available, and whether audio narration is descriptive enough so someone who can’t see the video can understand what is happening. Usability items test how easy the tutorial is to use. Examples include clear visuals and audio, use of visual cues to focus the viewer’s attention, and short and logically focused content. To help prioritize the importance of checklist items, the local Accessible Learning Centre (ALC), which supports students on campus who use assistive technologies, was consulted about the difficulties most encountered by students. The ALC’s highest priority was the provision of an alternate accessible version of a tutorial, since it is difficult to make complex embedded web content accessible for everyone under every circumstance and an alternate version allows people to work with content in a way that suits their needs. ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 10 For the evaluation, major database vendors were chosen through a scan of common vendors and platforms at universities, with input from collections colleagues. Some vendors were eliminated because they don’t provide instructional tutorials on their websites. Twenty-five vendors were included in the study (see appendix 1). A large majority of the tutorials found were screencast or video tutorials; a few vendors provided recorded webinars, and a few provided interactive multimedia tutorials, mainly text captions or visuals with clickable areas or quizzes. In total, 460 tutorials were evaluated for accessibility: 417 video, screencast, or interactive tutorials from twenty-foure vendors, and 41 recorded webinars from four vendors. If tutorials were available in more than one place, most commonly on both the vendor’s website and YouTube, both locations were tested. If more than thirty tutorials were provided by a vendor, every other one was tested. If multiple formats of tutorial were available, such as screencasts and recorded webinars, each format was tested. Testing from the perspective of people with visual impairments was a key focus. Other assistive technologies such as Kurzweil (for people who can see but have print-related disabilities) and Zoomtext (for enlargement) are widely used, but if webpages work well using screen-reading software intended for people with visual impairments, they also generally work using other kinds of assistive software. Tutorials were tested with two screen-reading programs used by people with visual impairments: NVDA (with Firefox), a free open source program, and JAWS (with Internet Explorer), a widely used commercial product. Both were used to determine whether any difficulties were due to the quirks of a particular software product or a result of inherent accessibility problems. In addition, captions were evaluated to determine accessibility for people who are deaf or have hearing difficulties. People with visual or some physical impairments use the keyboard only, so all tutorials were tested without a mouse using solely the keyboard. During testing, each task was tried three different ways within NVDA or JAWS before deciding that it couldn’t be completed. If one of the three methods worked the task was marked as successfully completed. If a task could be completed successfully in one screen-reading program but not the other, it was marked as unsuccessful. Screen-reader support needs to be consistent across platforms, since people may be using a variety of types of assistive software. FINDINGS AND DISCUSSION Tutorials created by the same vendor nearly all used the same approach and had the same checklist results. This is positive, since consistency is important for accessibility and helps in navigation and ease of use. None of the forty-one recorded webinars tested in this study were accessible. Webinars did not have player controls that were findable on the page by screen-reading software or usable by INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 11 keyboard. None had captions, transcripts, or alternate accessible versions. Often webinars were quite long, with no clear structure and no cues to focus attention on the screen. Recorded webinars had almost no accessibility features and can’t be recommended for use as accessible instructional materials in their current form. None of the screencast or video tutorials tested were completely accessible, and all failed in at least one checklist item. Tutorials from some vendors, however, came close to meeting all checklist requirements. Overall, there were many positive accessibility features in the video and screencast tutorials. Most of these tutorials were findable and playable by screen reading software in some way, had video player controls usable by keyboard, had descriptive narration so people who can’t see the screen can tell what is happening, had clear visuals and audio narration, used simple language, and were relatively short and focused in content. The most accessible screencast or video tutorials were produced by the American Psychological Association (APA), American Theological Library Association (ATLA), Modern Language Association (MLA), and Ebsco. Their tutorials had many accessibility features and rated highly on the checklist. They included much less commonly found accessibility features, especially the use of visual and/or audio cues to focus the viewer’s attention and the inclusion of accurate and properly synchronized closed captions. Visual cues are important for people with learning or attention- related disabilities, and help all viewers interpret and follow the video more easily. People who are deaf can’t access the content without captions, and captions also help people who have English as a second language or are at public computers without headphones. Tutorials from these vendors also had an alternate version or transcript available. As mentioned earlier, the highest-priority checklist item is the presence of an alternate accessible version, since it is difficult to design multimedia that works for people with all disabilities in all circumstances. People with disabilities may also have previous negative experiences with online multimedia and prefer to use an alternate format that they have had more success with. In the case of these above-average vendors, the alternate accessible version was a transcript consisting of the video’s closed captions, auto-generated by YouTube. Since the tutorials’ narration was descriptive and the captions were accurate, the auto-generated transcripts are useful. However, the YouTube transcript is hard to find on the YouTube page. Also, most of these vendors had tutorials available both from their own websites and from YouTube, and none had alternate versions available on their own websites. Viewers requiring an alternate format would need to know to go to the YouTube site instead of the vendor site to find it. Two other vendors also had quite accessible tutorials. IEEE’s tutorials had the same positive accessibility features already mentioned. Tutorials were done in-house and presented through the vendor’s site. While most tutorials presented on vendor sites were lacking in accessibility, IEEE’s were well thought out from an accessibility perspective and usable by screen-reading software. These were the only tutorials tested where all interactivity, including pop-up screens, was easily ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 12 usable and navigable by keyboard. The one accessibility issue was the lack of an alternate accessible version. Elsevier’s ScienceDirect tutorials took a different approach to accessibility than other vendors, or even than Elsevier’s tutorials for other Elsevier products. The Science Direct tutorials were not accessible, but an alternate text version was available and people using screen-reader software were informed of this when they get to the tutorial page and were redirected to the text version. The ideal is to have one version that is accessible to everyone, but this approach is a good way to implement an alternate version if one accessible version isn’t possible. Screencasts or video tutorials from other vendors also have some good accessibility features, but these were balanced with serious accessibility problems. The main accessibility issues discovered include the following: Alternate accessible versions: vendors who had captions and hosted their videos on YouTube did have auto-generated YouTube transcripts, but these were hard to find and were only useful if the captions were descriptive and accurate, which many were not. Apart from Elsevier’s ScienceDirect tutorials, no vendors provided another format deliberately as an accessible alternative. Captions: captions were missing or problematic in the tutorials of fourteen vendors, or 59 percent of the total. Five (21 percent) of vendors provided no captions at all for their tutorials. Nine (38 percent) had unedited, auto-generated YouTube captions, which are highly inaccurate and therefore don’t provide usable access to the content for people who are deaf. Tutorial not findable or playable on page: Twelve vendors (50 percent) had tutorials that were not findable on the webpage or playable for people using a keyboard or screen- reading software. Most of these issues are with tutorials on vendor sites, which were often Flash-based or offered through non-YouTube third party sites like Vimeo. Four vendors (17 percent) offered access to their tutorials both through their own (inaccessible) website and YouTube, which is findable and playable by screen reading software. Eight (33 percent), however, only provided access through their (inaccessible) webpages, which means that people using a keyboard or screen reading software would not be able to use their tutorials. No visual cues to focus attention: Eight vendors (33 percent) had no visual cues to focus attention in the video. Visual cues help people with certain disabilities focus on the essential part of the screen that is being discussed, help everyone more easily interpret and follow what is happening, and are known to help facilitate successful multimedia learning.18 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 13 Nondescriptive narration: Six vendors (25 percent) had tutorials with audio narration that didn’t sufficiently describe what was happening on the screen. Narration needs to describe what is happening in enough detail so people who can’t see the screen are not missing information available for sighted viewers. Fuzzy visuals: Five vendors (21 percent) had tutorials with visuals that were fuzzy and hard to see. This makes viewing difficult for people with low vision, and challenging even for people with normal vision. Fuzzy audio or background music: Three vendors (13 percent) had poor-quality audio narration or background music playing during narration. Background music is distracting for those with hearing difficulties and makes it more difficult to focus on what is being said. Eliminating extraneous sound also makes it easier for people to learn from multimedia.19 Tutorials consisting only of text captions: Three vendors (13 percent) had tutorials consisting of text captions with no narration. The text captions were not readable by screen-reading software, and no alternate accessible versions were provided. Providing narration in tutorials is recommended for accessibility, since it allows people who can’t see the screen to access the content more easily, and has been shown to improve learning and recall over on-screen text and graphics alone.20 RECOMMENDATIONS AND CONCLUSIONS This study attempted to determine how accessible vendor-created database tutorials are, and whether academic librarians can use them instead of re-creating them locally. For recorded webinars, the answer is a clear no, since none were technically accessible for people using screen- reading software. For video or screencast tutorials, however, the answer less is clear. Results showed that many vendors created tutorials with positive features like clear visuals and audio, being short and focused on one main point, and using descriptive narration. However, technical accessibility was much less successful, with 59 percent of vendors omitting usable captions and 50 percent presenting tutorials that couldn’t be found on the page or played by people using screen-reading software. These technical accessibility issues prevent people with hearing, vision, or some mobility impairments from using the tutorials at all. Although none of the tutorials studied met all the checklist criteria, some came close and could be used by librarians depending on local requirements, policies, and priorities for accessibility. In part, this study found that the accessibility of many tutorials depends on how they are presented. Disappointingly, 50 percent of vendors had tutorials on their websites that were not findable or playable by people with disabilities. Many vendors, however, hosted tutorials on YouTube as well as their own site. In these cases, YouTube was always a more accessible option ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 14 than the vendor site. YouTube itself is relatively accessible, with both pages and players that are navigable by keyboard and by screen-reading software. There are options for accessibility settings in YouTube, such as having captions display automatically, and more accessible third-party overlays are available for the YouTube player. On vendor sites, there were more likely to be issues with Flash and an inability for people using screen-reading software or keyboards to find and play videos. Some vendors embed YouTube videos on their site. Even if the embedded videos are findable and playable, this method omits important accessibility features found on the YouTube page, such as the text transcript. The results of this study show that using YouTube where available is recommended. Further, linking to YouTube rather than embedding the video is preferred, unless a separate link to the transcript is made to provide an alternate accessible version. Captions are another key accessibility problem identified in this study: nearly two-thirds had unusable captions. Often, auto-generated YouTube captions were present but were not usable. The presence of captions is not enough for accessibility; those captions need to be accurate and present the same content as the narration. YouTube auto-captioning does not generate captions that are accurate enough to be useful without manual editing. YouTube auto-generates transcripts from the captions, so if the captions are inaccurate the transcript will not be useful either. Editing YouTube auto-generated captions is necessary to ensure accessibility. A few accessibility issues found in this study would be easy to improve with some thought during tutorial creation. Adding visual cues like arrows or highlighting to the screen to help people focus attention, or remembering that not everyone can see the screen while recording narration, can be easily achieved and would improve accessibility significantly. Other issues would require more planning and effort to improve. Given the widespread technical accessibility problems identified in this study, it is particularly important for people creating tutorials to provide alternate formats that are accessible if tutorials themselves are not accessible. Almost no vendors do this currently, but it would have the most significant impact on accessibility for the broadest range of people. Adding usable captions is the second most important area for improvement. To provide access for people who are deaf, captions need to be added or auto- generated YouTube captions need to be edited for accuracy. Both alternate formats and captions require some thought and effort to implement but ensure that tutorials will meet accessibility requirements and be usable by everyone. NOTES AND BIBLIOGRAPHY 1. Eamon Tewell, “Video Tutorials in Academic Art Libraries: A Content Analysis and Review,” Art Documentation 29, no. 2 (2010): 53–61. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 15 2. Amanda S. Clossen, “Beyond the Letter of the Law: Accessibility, Universal Design, and Human-Centered Design in Video Tutorials,” Pennsylvania Libraries: Research & Practice 2, no. 1 (2014): 27–37, https://doi.org/10.5195/palrap.2014.43; Joanne Oud, “Improving Screencast Accessibility for People with Disabilities: Guidelines and Techniques,” Internet Reference Services Quarterly 16, no. 3 (2011): 129–44, https://doi.org/10.1080/10875301.2011.602304; Kathleen Pickens and Jessica Long, “Click Here! (And Other Ways to Sabotage Accessibility),” Imagine, Innovate, Inspire: The Proceedings of the ACRL 2013 Conference (Chicago: ACRL, 2013), 107–12. 3. DeAnn Barnard-Brak, Lucy Lechtenberger, and William Y. Lan, “Accommodation Strategies of College Students with Disabilities,” Qualitative Report 15, no. 2 (2010): 411–29. 4. Cyndi Rowland et al., “Universal Design for the Digital Environment: Transforming the Institution,” Educause Review 45, no. 6 (2010): 14–28. 5. Peter Brophy and Jenny Craven, “Web Accessibility,” Library Trends 55, no. 4 (2008): 950–72. 6. Kyunghye Yoon, Laura Hulscher, and Rachel Dols, “Accessibility and Diversity in Library and Information Science: Inclusive Information Architecture for Library Websites,” Library Quarterly 86, no. 2 (2016): 213–29. 7. Ruth V. Small, William N. Myhill, and Lydia Herring-Harrington, “Developing Accessible Libraries and Inclusive Librarians in the 21st Century: Examples from Practice,” Advances in Librarianship 40 (2015): 73–88, https://doi.org/10.1108/S0065-2830201540; John Carlo Jaeger, Paul T. Wentz, and Brian Bertot, “Libraries and the Future of Equal Access for People with Disabilities: Legal Frameworks, Human Rights, and Social Justice,” Advances in Librarianship 40 (2015): 237–53; Yoon, Hulscher, and Dols, “Accessibility and Diversity in Library and Information Science: Inclusive Information Architecture for Library Websites.” 8. Suzanne L. Byerley, Mary Beth Chambers, and Mariyam Thohira, “Accessibility of Web-Based Library Databases: The Vendors’ Perspectives in 2007,” Library Hi Tech 25, no. 4 (2007): 509– 27, https://doi.org/10.1108/07378830710840473; Kelly Dermody and Norda Majekodunmi, “Online Databases and the Research Experience for University Students with Print Disabilities,” Library Hi Tech 29, no. 1 (2011): 149–60, https://doi.org/10.1108/07378831111116976; Jennifer Tatomir and Joan C. Durrance, “Overcoming the Information Gap: Measuring the Accessibility of Library Databases to Adaptive Technology Users,” Library Hi Tech 28, no. 4 (2010): 577–94, https://doi.org/10.1108/07378831011096240. 9. Pickens and Long, “Click Here!”; Clossen, “Beyond the Letter of the Law”; Oud, “Improving Screencast Accessibility for People with Disabilities”; Nichole A. Martin and Ross Martin, “Would You Watch It? Creating Effective and Engaging Video Tutorials,” Journal of Library & ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 16 Information Services in Distance Learning 9, no. 1–2 (2015): 40–56, https://doi.org/10.1080/1533290X.2014.946345. 10 . Byerley, Chambers, and Thohira, “Accessibility of Web-Based Library Databases.” 11. Tatomir and Durrance, “Overcoming the Information Gap.” 12. Dermody and Majekodunmi, “Online Databases and the Research Experience for University Students with Print Disabilities.” 13. Laura DeLancey, “Assessing the Accuracy of Vendor-Supplied Accessibility Documentation,” Library Hi Tech 33, no. 1 (2015): 103–13, https://doi.org/10.1108/LHT-08-2014-0077. 14. Jodi B. Roberts, Laura A. Crittenden, and Jason C. Crittenden, “Students with Disabilities and Online Learning: A Cross-Institutional Study of Perceived Satisfaction with Accessibility Compliance and Services,” Internet and Higher Education 14, no. 4 (2011): 242–50, https://doi.org/10.1016/j.iheduc.2011.05.004. 15. Kari L. Kumar and Ron Owston, “Evaluating E-Learning Accessibility by Automated and Student-Centered Methods,” Educational Technology Research and Development 64, no. 2 (2015): 263–83, https://doi.org/10.1007/s11423-015-9413-6. 16. US Access Board, “Draft Information and Communication Technology ( ICT ) Standards and Guidelines,” 36 CFR Parts 1193 and 1194, RIN 3014-AA37 (2015), https://www.access- board.gov/attachments/article/1702/ict-proposed-rule.pdf. 17. Pickens and Long, “Click Here!”; Clossen, “Beyond the Letter of the Law”; Martin and Martin, “Would You Watch It?”; Oud, “Improving Screencast Accessibility for People with Disabilities.” 18. See the Signaling Principle in Richard E. Mayer, Multimedia Learning, 2nd ed. (Cambridge: Cambridge University Press, 2009): 108–17. 19. See the Coherence Principle, Ibid., 89–107. 20. See the Modality Principle, Ibid., 200–220. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 17 Appendix 1. List of Vendors 1. ACM 2. Adam Matthew 3. Alexander St Press 4. APA 5. ATLA 6. ChemSpider 7. Cochrane Library (webinars only) 8. Ebsco 9. Elsevier 10. Factiva 11. Gale 12. IEEE 13. Lexis Nexis Academic (tutorials and webinars) 14. Marketline 15. MathSciNet 16. OVID/Wolters Kluwer (tutorials and webinars) 17. Oxford 18. Proquest (tutorials and webinars) 19. Pubmed 20. Sage 21. SciFinder 22. Standard & Poor/NetAdvantage 23. Taylor and Francis 24. Web of Knowledge/Thompson Reuters 25. Zotero ACCESSIBILITY OF VENDOR-CREATED DATABASE TUTORIALS FOR PEOPLE WITH DISABILITIES | OUD https://doi.org/10.6017/ital.v35i4.9469 18 Appendix 2. Tutorial Accessibility Evaluation Checklist Functionality � Equivalent alternate format(s) are provided � Transcript/test version � Audio � Other ___________________________ � Alternate formats provided are accessible � Alternate formats provided are findable on the page by screen reader � Screen reading software can find the video on the webpage � Screen-reading software can access and play the video � Video-player functions can by operated by keyboard/screen-reading software � Interactive content can be accessed and used by keyboard/screen-reading software � User has some control over timing (pause/rewind capability) � Alternate modes of presentation are available for all, meaning presented through text, visuals, narration, color, or shape � Synchronized closed captions are available for all audio � Audio/narration is descriptive Usability � User controls if/when the video starts (no auto play) � Video is easy to use by screen-reading software � Clear, high-contrast visuals and text � Clear, high-contrast audio (no background noise/music) � Uses visual cues to focus attention (e.g., highlighting, arrows) � Is short and concise � Is clearly and logically organized � Has consistent navigation, look, and feel � Uses simple language, avoids jargon, and defines unfamiliar terms � Explicit structure with sections, headings to give viewers context � Learning outcome/goal clearly outlined and content focused on outcome 9474 ---- June_ITAL_Rubel_final Picture Perfect: Using Photographic Previews to Enhance Realia Collections for Library Patrons and Staff Dejah T. Rubel INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 59 ABSTRACT Like many academic libraries, the Ferris Library for Information, Technology, and Education (FLITE) acquires a range of materials, including learning objects, to best suit our students’ needs. Some of these objects, such as the educational manipulatives and anatomical models, are common to academic libraries but others, such as the tabletop games, are not. After our liaison to the School of Education discovered some accessibility issues with Innovative Interfaces' Media Management module, we decided to examine all three of our realia collections to determine what our goals in providing catalog records and visual representations would be. Once we concluded that we needed photographic previews to both enhance discovery and speed circulation service, choosing processing methods for each collection became much easier. This article will discuss how we created enhanced records for all three realia collections including custom metadata, links to additional materials, and photographic previews. INTRODUCTION Ferris State University’s full-time enrollment for Fall 2015 was 14,715 students. Of these students, 10,216 are Big Rapids residents and the other 4,499 are either Kendall College of Art and Design students or at other off-campus sites across Michigan.1 During the 2014-2015 school year, FLITE had 14,647 check-outs including 2,558 check-outs of items in reserves, which is where our realia collections are located.2 However, reserves includes other items in addition to these collections, thus making analysis of circulation statistics problematic. Another problem with conducting such an analysis is that the educational manipulative collection already had photographic previews and the tabletop game collection is a pilot project, so there is no clear before and after comparison. We can, however, demonstrate that enhancing the catalog records for our anatomical model collection had an incredibly significant impact, jumping from a handful of check-outs from 2014-2015 to almost 450 in 2016. LITERATURE REVIEW Although there are very few libraries using photographic previews for their realia collections, the ones that do described similar limitations with bibliographic records and goals that only Dejah T .Rubel (rubeld@ferris.edu) is the Metadata and Electronic Resources Management Librarian, Ferris State University, Big Rapids, MI. PICTURE PERFECT: USING PHOTOGRAPHIC PREVIEWS TO ENHANCE REALIA COLLECTIONS FOR LIBRARY PATRONS AND STAFF | RUBEL | https://doi.org/10.6017/ital.v36i2.9474 60 photographic previews could meet. Most realia collections that warranted this extra effort are either curriculum materials or anatomical models, which is not surprising considering how difficult they are to describe. As Butler and Kvenild noted in their article on cataloging curriculum materials, “Patrons struggled to identify which game or kit they sought based on the…information in the online catalog,” because “Discovering curriculum materials in the catalog and getting a sense of the item are not easy when using traditional catalog descriptions...”3. As they continue, “The inventory and retrieval problems…were compounded by the fact that existing catalog records were not as descriptive as they should be.”4 This was also a problem for our collections because our names and descriptions were often not intuitive or precise. In addition, as Loesch and Deyrup discovered while cataloging their curriculum materials collection, “…there was great inconsistency among the OCLC records regarding the labeling of the format…,”5 which was another issue we needed to address. Although the General Material Designation (GMD) has since been rendered obsolete, FLITE continues to use it to highlight certain material. This choice is due to some limitations with our library management system as well as our discovery layer, namely the lack of good mapping or use of the 33X fields. Until this is rectified with a more modern system, we have it found it easier to retain certain GMDs like “sound recording”, “electronic resource”, and “realia”. Thus, we needed to standardize our terms for each collection. Another problem that our predecessors indicated photographic previews might resolve was missing objects or pieces of objects.6 This becomes especially important for our tabletop games collection because most of those pieces are very small and too numerous for a piece count upon return. Fortunately, “Previews…can aid users in making better decisions about potential relevance, and extract gist more accurately and rapidly than traditional hit lists provided by search engines.”7 Ideally, a preview will display an appropriate level of information about the object it represents in order “…to support users in making a correct judgement about the relevance of that object to the user’s information need.”8 Greene goes further by listing the main roles for previews of which the first two are the most applicable for photographic previews: aiding retrieval and aiding users in quickly making relevance decisions.9 For these uses, photographic previews of realia are ideal because users can examine the object without needing to see its details and they expect them to be abstract, not exhaustive, unlike digital surrogates that an archive would use.10 As Greene also notes, the high-level goal of any preview is to "...communicate the level and scope of objects to users so that comprehension is maximized and disorientation is minimized."11 A common finding among all the previous projects was that even a single photograph provides more readily comprehensible information than several lines of description. As Moeller states regarding their journal project, "They [previews of each issue's cover] give the researcher or student an immediate idea of the nature of the journal."12 He goes further to give the example of an innocuous journal title for a propagandist serial whose political nature is transparent once you view its imagery. From a staff perspective, photographic previews can also easily illustrate the number of INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 61 pieces and an object's condition or orientation. This can be very useful in determining whether something is missing or damaged without having to do a time-consuming individual piece count upon check-in. But as Butler and Kvenild discuss, layout within each photograph is key for illustrating missing pieces.13 Unfortunately, aside from a few small projects mentioned in Butler and Kvenild's article, there are not many examples of photographic previews for realia collections currently being used by academic libraries. One reason might be software limitations. Innovative's Media Management module is still unique among ILS/LMS software in that most vendors either provide a separate digital repository for special collections digital surrogates or they incorporate images into the catalog using third party software like Syndetic SolutionsTM. Another reason for the lack of photographic previews within catalogs may simply be the rarity of realia in academic libraries. Every library certainly has a few unique pieces, like a skeleton for the pre-medical students, but often not enough to consider them an entire collection much less a complex enough collection to warrant the extra effort to create photographic previews of each item. At FLITE, we had already crossed that threshold of complexity. Therefore, this article will start by discussing our educational manipulative collection, which provided the basis for how we would catalog and process the tabletop games and anatomical models. Educational Manipulative Collection Our first foray into creating photographic previews was completed by the previous Cataloger with over 300 items cataloged in 2004 and another 30-40 added to the collection over the next decade. Unlike the other realia collections, the educational manipulatives were cataloged using Innovative’s Course Reserves module, so no attempt was made to find or create OCLC records. Nevertheless, the minimal metadata is very consistent across the collection, which supports Greene’s recommendation “…that it was important to define a set of consistent attributes at the high level of the collection if any effective browsing across the collections was to be provided.”14 In our case, we rely on a combination of the GMD ([realia]), a custom call number prefix (TOYS Box #), and a limited amount of local subject headings as shown below with “Manipulatives” as the common subject for the entire collection. 690 = (d) Current local subject headings in use as of 12/3/15: Art. Infant/Toddler. Block props. Magnets. Boards. Manipulatives. Cognitive. Music. Discovery Box. Oversize books. Discovery. Posters. Dramatics. Puppets. Finger Puppets. Story apron. Flannel Board. Story props. Gross Motor. Woodworking. PICTURE PERFECT: USING PHOTOGRAPHIC PREVIEWS TO ENHANCE REALIA COLLECTIONS FOR LIBRARY PATRONS AND STAFF | RUBEL | https://doi.org/10.6017/ital.v36i2.9474 62 Due to the nature of descriptive metadata, photographic previews of the educational manipulatives made logical sense because “The images…are not the content. They are the metadata, the description of the materials.”15 As Moeller describes, Innovative’s Media Management module links images and many other file types directly to bibliographic records without requiring users to click an additional link unless they want to view a larger image of a thumbnail.16 Similar to Butler and Kvenild’s project, all of our photos were 900 pixels wide by 600 pixels tall, which is slightly smaller than their default width of 1000 pixels.17 One advantage of using the Media Management module is its ability to automatically create thumbnails 185 pixels wide by 85 pixels tall. A bigger advantage is that the images are hosted on the same server that runs our catalog, which allows us to freely distribute the images in an intuitive manner (thumbnails instead of links) without having to worry about authentication to a shared folder from off-campus, unlike our PDF files. Unfortunately, our liaison to the School of Education recently discovered some accessibility issues with Media Management that forced us to consider whether we should change the embedded photographic previews to external links. The most significant of these problems is simply the language of the proprietary viewer software. Because it is written in Java, if you click on a thumbnail for a larger image, many browsers, like Chrome, will not run it and those that will often require a security exception to do so. We have attempted to ameliorate some of these issues by providing an FAQ entry on which browsers are best for viewing these images and how to add a security exception for our website, but unless or until Innovative rewrites this software in a different language, these accessibility issues will persist because Java is being phased out of many browsers. Butler and Kvenild also noted its slow response time compared to their own server.18 Another issue they mentioned was that the thumbnails would not be visible in their consortial catalog, so they needed to add links in the 856 field for these users.19 This is less of an issue for us because we do not contribute any of our realia records to our consortia catalog, but Moeller’s concern that in general “…enhancements involving scanned images…will not be easily shared with other libraries,”20 is entirely valid. Unlike OCLC records, there is no way to share attached or embedded images as part of the metadata and not the content. Contrariwise, Butler and Kvenild’s concerns regarding catalog migration are very pertinent because we are considering moving to a new LMS within the next few years.21 Although we acknowledge that “Utilizing 856 tags is an indirect method of accessing the images, as users must take the intiative to follow the links,” we will eventually have to move and link our photographic previews to ensure accessibility after migration.22 Tabletop Game Collection Unlike the educational manipulatives, the majority of the tabletop game collection was previously cataloged in OCLC, so finding good bibliographic records was easy. Once downloaded, we decided to add a unique GMD ([game]), custom call number prefix (BOARD GAME Box #), and local subject heading “Tabletop games”. However, our Emerging Technlogies Librarian who coordinated this INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 63 pilot project felt that the single subject heading was not descriptive enough. So he gave us a spreadsheet with more specific subject headings such as “Deck Building”, “Historical”, and “Resource Management” that we added as genre/form subject headings in the 655_4 field. He also suggested that we add links to the rule books, which we did using the 856 field and the link text “connect to rule book (PDF)”. Because tabletop games are commercial products, finding images online was also easy. At first, we had some concerns about copyright, but we are not reselling these products or using the image as a replacement for the item. So, we concurred with Butler and Kvenild that “…the images in our project fall under copyright fair use.”23 Another plus to using commercial images is that we could use more than one to show various aspects of setup and play. The downside to this benefit is image sizes and content photographed varied widely, so we used our best judgement in creating labels and tried to keep them as consistent as possible. To ensure consistency across the collection, we decided that the first image should always be the top of the game’s box labeled “Box Cover” or “Box Cover – Front” if there was a “Box Cover – Back” image. (We only displayed the back of the box cover if there was significant information about the game printed on it.) Then we added up to five additional images showing parts of the game like “Card Examples”, “Game Pieces”, and “Game Set-up”. Overall, this number of images worked very well in both Encore’s Attached Media Viewer and the Classic Catalog/Web OPAC, but there is a slight duplication in images by Syndetic SolutionsTM for a few games. This results in a larger version of the box top image displaying to the right of the title and above the smaller thumbnails of images we added using Media Management. In regards to piece counts, we presumed that we would need photographic previews to aid in piece counting upon return of a tabletop game. However, our Emerging Technologies Librarian assured us that because we are an educational institution, we could contact the vendor for free replacement pieces at any time. He also emphasized that unlike the educational manipulatives or the anatomical models, this was a pilot collection, so extensive processing would not be a good investment of our labor. Fortunately, the anatomical model collection would require images for piece counts as well as several other cataloging customizations to increase discoverability and speed circulation. Anatomical Model Collection Similar to our educational manipulative collection, but not nearly as extensive, our anatomical model collection has been a part of FLITE since its inception. Unlike the manipulatives, which are used primarily by the early childhood education students, the anatomical models support a range of allied health programs including but not limited to dental hygiene, radiology, and nursing. The majority of our two dozen models were purchased in the 20th century and, like the manipulatives, the majority were cataloged using Innovative’s Course Reserves module. Unfortunately, none of these records were very descriptive, some being so poor as to be merely a title like “Jawbones” and a barcode. So, the first task was to match objects with OCLC records. Fortunately, this task PICTURE PERFECT: USING PHOTOGRAPHIC PREVIEWS TO ENHANCE REALIA COLLECTIONS FOR LIBRARY PATRONS AND STAFF | RUBEL | https://doi.org/10.6017/ital.v36i2.9474 64 became easier once we discovered that it was easier to match the object to the vendor’s catalog image and then search OCLC by vendor model name or number than it is to decipher written descriptions if you do not know human anatomy. Once good bibliographic records were downloaded, we decided to add one of three GMDs depending on the type of model ([model], [chart], or [flash card]), a custom call number prefix (MODEL #), and one or more of the local subject headings shown below. 690 = (d) Anatomy model. Anatomy chart. Anatomy models. Anatomy charts. Dental hygiene model. Dental model. Dental hygiene models. Dental models. Technically, all dental models could be used as anatomical models, but not vice versa. Therefore, the common subject headings for the collection are “Anatomy model” and “Anatomy models”. To make things easier to shelve, retrieve, and inventory, we also designed numeric ranges for the call numbers, as shown below, so we would know what type of model we should expect when referring to a specific model number. 099 = (c) MODEL #00X following this hierarchy: 001-099 Anatomical Charts and Flash Cards 100-199 Articulated Skeletons 200-299 Disarticulated Skeletons and Bone Kits 300-399 Organs 400-499 Skulls (anatomical and dental hygiene) 500-599 Other Dental Models (dental studies, dental decks) We also scanned and linked PDFs of the heavily worn model keys with the link text “connect to key PDF” before washing and rehousing all the models. Once they were clean, they were ready for their shoot with Ferris State University’s Media Production team. Due to winter break, Media Production was able to shoot the majority of the collection fairly quickly. They returned to us high-resolution TIFFs the same size as those for the manipulatives, 900 pixels by 600 pixels. In case of Java viewer failure, we requested that there be one top-level image that showcases exactly what the model contains with images of individual pieces or drawers as the succeeding images. For example, our disarticulated skeletons are housed in small plastic carts with three drawers in each cart. Therefore, the first image would be a shot of all the pieces of the disarticulated skeleton and the second image would be the contents of the top drawer, the third image the contents of the middle drawer, and the last image the contents of the bottom drawer. In this specific example, we re-used the images that we posted in the catalog INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 65 record by pasting them on top of the cart to show circulation staff what to expect in each drawer upon check-in. Overall, photographic previews for this collection appear to be working very well for both catalog users and circulation staff “…to inform users about size, extent, and availability of collections or objects.”24 In fact, they have been working so well for this collection that usage has increased exponentially compared to previous years. Figure 1. Circulation Statistics 2014-2016 CONCLUSIONS AND FUTURE DIRECTIONS Although we implemented photographic previews for three realia collections, we could not define any standard workflow for the process beyond correcting or downloading the metadata first and adding the images second. Part of this is due to our working primarily with legacy collections because we often discovered issues, like the model keys, while working through another issue. The other part is due to the nuances involved in processing realia in general. Even with good, readily available catalog records like those for the tabletop games, time still had to be spent separating, organizing, and rehousing game pieces as well as hunting down useful images. Unfortunately, any type of realia processing, even if it is just textual description, is much more time-consuming than the majority of academic library cataloging. Adding in the extra steps to create, upload, and link a photographic preview can nearly double that labor investment. Notwithstanding, as Butler and Kvenild advocate “…not supplying images as metadata for items that most need them (i.e. kits, games, and models) is to make them nearly irretrievable. Providing bare-bones traditional metadata for these items is analogous to delegating them to the backlog shelves of yesteryear.”25 367 317 114 10 1 444 24 0 50 100 150 200 250 300 350 400 450 500 2014 2015 2016 Circulation Statistics Manipulatives Models Games PICTURE PERFECT: USING PHOTOGRAPHIC PREVIEWS TO ENHANCE REALIA COLLECTIONS FOR LIBRARY PATRONS AND STAFF | RUBEL | https://doi.org/10.6017/ital.v36i2.9474 66 Unfortunately, neither the library management system nor the third-party catalog enhancement market currently provides a good solution to this problem. Considering how great an impact photographic previews have had in the online retail market, this lack of technical support is surprising. Yes, Syndetic SolutionsTM is a great product for cover images and tables of content for books. However, once you go beyond traditional resources, there is a great need to allow institutions to submit their own images as part of catalog record enhancement and not to serve as separate digital surrogates in a digital respository. This could be done either within the library management system, like the Media Management module, or as an option for catalog enhancement where libraries could add images to either a shared database or their own database using standard identifiers on a third-party platform like SyndeticsTM. Further research on photographic previews is also sorely needed. As of this writing, we only have a handful of case studies and some guiding philosophy on the use of previews. Consultation with internet retailers and literature on online marketing might be more applicable than library science research to evaluate their impact, but research into their direct impact vs. textual descriptions on catalog use would be ideal. REFERENCES 1. Fact Book 2015 – 2016 (Big Rapids, MI: Ferris State University Institutional Research & Testing, 2016), http://www.ferris.edu/HTMLS/admision/testing/factbook/FactBook15-16- 2.pdf, 47. 2. Ibid, 12. 3. Marcia Butler and Cassandra Kvenild, “Enhancing Catalog Records with Photographs for a Curriculum Materials Center,” Technical Services Quarterly 31 (2014): 122-138, https://doi.org/10.1080/07317131.2014.875377, 122-124. 4. Ibid, 126. 5. Martha Fallahay Loesch and Marta Mestrovic Deyrup, “Cataloging the Curriculum Library: New Procedures for Non-Traditional Formats,” Cataloging & Classification Quarterly 34, no. 4 (2002): 79-89, https://doi.org/10.1300/J104v34n04_08, 82. 6. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 128. 7. Stephan Greene, Gary Marchionini, Catherine Plaisant, and Ben Shneiderman, “Previews and Overviews in Digital Libraries: Designing Surrogates to Support Visual Information Seeking,” Journal of the American Society for Information Science 51, no. 4 (2000): 380-393, https://doi.org/10.1002/(SICI)1097-4571(2000) 51:4<380::AID-ASI7>3.0.CO;2-5, 381. 8. Ibid. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 67 9. Ibid, 384. 10. Ibid, 385. 11. Ibid. 12. Paul Moeller, “Enhancing Access to Rare Journals: Cover Images and Contents in the Online Catalog,” Serials Review 33, no. 4 (2007): 231-237, https://doi.org/10.1016/j.serrev.2007.09.003, 235. 13. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 128. 14. Greene et. al., “Previews and Overviews in Digital Libraries,” 388. 15. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 124. 16. Moeller, “Enhancing Access to Rare Journals,” 234. 17. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 129. 18. Ibid, 132. 19. Ibid, 126. 20. Moeller, “Enhancing Access to Rare Journals,” 237. 21. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 131. 22. Ibid, 135. 23. Ibid, 134. 24. Greene et. al., “Previews and Overviews in Digital Libraries,” 386. 25. Butler and Kvenild, “Enhancing Catalog Records with Photographs,” 136. 9526 ---- Microsoft Word - 9526-16430-5-CE.docx President’s Message: Reflections on LITA’s Past and Future Aimee Fifarek INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 2016 3 When I reached out to ITAL Editor Bob Gerrity about my first President’s Column, he graciously provided copies of past LITA Presidents’ columns to get me started. It reminded me once again of the illustrious company I am in, starting with Stephen R. Salmon, the first president of the Information Services and Automation Division, as we were known until 1977. I am proud to be at the head of LITA as it begins to celebrate its 50th Anniversary year. A half century ago when LITA was founded the world was experiencing an era of profound technological change. The US and Soviet Union were battling to be first in the Space Race, and an increasing number of world powers were engaging in nuclear testing. While Civil Rights demonstrations and the fighting in Vietnam dominated the news, we were imagining peace via the technologically-driven future depicted in a new TV series called Star Trek. With TV focused on the stars, we were able to go to the movies and explore the strange new world of inner space in Fantastic Voyage. Technology was poised to enter our daily lives as well, with Diebold demonstrating the first ATM1 and Ralph H. Baer writing the 4-page paper that would lay the foundation for the video game industry.2 Heady times for technology indeed, and the fact that Libraries were sufficiently advanced to require an Association dedicated to supporting technologists is hardly surprising. By the time of LITA’s founding at the 1966 Midwinter Meeting in Chicago, library automation had been in development for over a decade.3 MARC was just being invented, with the first tapes from the Library of Congress scheduled to go to the sixteen pilot libraries later that year. Membership in the only organization that existed, the Committee on Library Automation (COLA), was restricted to the handful of professionals who either developed or managed existing library systems. But technology was beginning to impact many more librarians than just those rarified few. According to President Salmon, “It was clear that large numbers of librarians who didn't meet COLA's standards for membership were in need of information on library automation and wanted leadership.”4 The first meeting of our Division on July 14, 1966 at the ALA Annual Conference in New York was attended by several hundred librarians interested in information sharing, technology standards, and technology training for library staff. This group created the first mission, vision, and bylaws that set us on a 50-year path of success. LITA is well positioned to take the first steps into our next 50 years. Thanks to the efforts of last year’s LITA Board, we are on the verge of adopting a new two-year strategic plan that is designed Aimee Fifarek (aimee.fifarek@phoenix.gov) is LITA President 2016-17 and Deputy Director for Customer Support, IT and Digital Initiatives at Phoenix Public Library, Phoenix, AZ. PRESIDENT’S MESSAGE | FIFAREK doi: 10.6017/ital.v35i3.9526 4 to guide us through the current transitional period. It will be accompanied by a tactical plan that will allow us to document our accomplishments and set the stage for an ongoing culture of continuous planning. Also, Jenny Levine has proven to be extremely capable as she completes her first year as LITA Executive Director. She has just the right combination of ALA experience, technology know-how, and calm competence to guide us through the retooling and reimagining that is required to take a middle-aged Association into the next phase of its life. The four areas of focus in the new strategic plan will help us to balance our efforts between preserving the strengths of our past and adapting our organization for a successful future. The first area of focus, Member Engagement, shows that our primary commitment needs to be to LITA members. Without you, LITA would not exist. One of the key efforts is to increase the value of LITA for members who are unable to travel to conferences. With travel budgets down and staying low, online member engagement is an area all of ALA needs to improve, and who better to lead in this area than LITA. The next area, Organizational Sustainability, is all about keeping the infrastructure of the organization strong, much of which happens in the domain of LITA staff. Budgeting, quality communication, and strategic planning all live here. The section on Education and Professional Development recognizes the important role that webinars, online courses, online journal, and print publications play in allowing LITA members to share their knowledge on both cutting edge and practical topics with the rest of the Association and ALA in general. We are already doing great work here and we need to better support and expand these efforts. The last focus area, Advocacy and Information Policy, represents a future growth area for LITA. Now that everyone in the library world "does" technology to a certain extent, LITA needs to think about how we will differentiate ourselves as outside competencies increase. Our advantage is that we have been doing and thinking about technology for much longer than anyone else. With our vast wealth of experience, it's appropriate that we work to become thought leaders and implementers in the information policy realm. In this, as always, we return to where we started: our members. LITA has thrived over the last 50 years because of this, our most important resource. LITA was founded on the concept of sharing information about technology through conversation, publications, and knowledge creation. We endure because you, the committed, passionate information professionals are willing to share what you know with those who come after. And like our founders, there are always individuals who are willing to take on the mantle of leadership, whether through getting elected to LITA Board, becoming a Committee or Interest Group Chair, serving in key editorial roles for our monographs, journal, and blog, or joining the all-important LITA Staff. Thanks to all of you who make LITA’s future happen every day. I am proud to be in your company. INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 1016 5 REFERENCES 1 . Alan Taylor, “50 years ago: a look back at 1966,” The Atlantic Photo, March 23, 2016, http://www.theatlantic.com/photo/2016/03/50-years-ago-a-look-back-at-1966/475074/, Photo 46. 2. “Take me back to August 30, 1966,” http://takemeback.to/30-August-1966#.V8SzItLrtaQ. 3. “Library Technology Timeline,” http://web.york.cuny.edu/~valero/timeline_reference_citations.htm. 4. Stephen R. Salmon, “LITA’s First 25 Years, a Brief History,” http://www.ala.org/lita/about/history/1st25years. 9527 ---- Editorial Board Thoughts: Requiring and Demonstrating Technical Skills for Library Employment Emily Morton-Owens INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 2016 6 Recently I’ve been involved in a number of conversations about technical skills for library jobs, sparked by an ITAL article by Monica Maceli1 and a code4lib presentation by Jennie Rose Halperin.2 Maceli performed a text analysis of job postings on code4lib to reveal what skills are co- occurring and most frequent. Halperin problematized the expense of the MLS credential in comparison to the qualifications actually required by library technology jobs and the salaries offered for technical versus nontechnical work. This work has inspired many conversations about the shift in skills required for library work, the value placed on different kinds of labor, and how MLS programs can teach library technology. During a period of hiring at my institution and through teaching a library school course in which many of the students are on the brink of graduation, my attention has been called particularly to one point in the library employment process: job postings. These advertisements are the first step in matching aspiring library staff with the real-life needs of libraries—where the rubber meets the road between employer expectations and new-grad experience. Most libraries already use the practice of distinguishing between required and preferred qualifications, which is a good start, especially for technology jobs where candidates may offer strong learning proficiency yet lack a few particular tools. Although there have been conflicting interpretations of the Hewlett-Packard research suggesting that men are more likely than women to apply to jobs when they don’t meet all the requirements,3 I observe a general tendency among graduating students to err on the side of caution because they’re not sure which qualifications they can claim. Among my students, for example, constant confusion attends the years of experience required. Is this library experience? General job experience? Experience at the same type of library? Paid or unpaid? Postings are often ambiguous and students may choose to apply or not. Similarly, there are questions about what extent of experience qualifies someone to know a technology: mastering it through creating new projects at a paid job, experience maintaining it, or merely basic familiarity? Not knowing who has been hired, and on the basis of what kind of experience, is a gap for researchers trying to close the loop on job advertisements. Even when a job posting has avoided an overlong list of required technical skills, it might still be expressing a narrow sense of what’s required to qualify. Someone who understands Subversion will be capable of understanding Git, so we see plenty of job advertisements that ask for experience with a “a version control system (e.g. Git, Subversion, or Mercurial).” I recently polled staff in our department and found very few of us with bachelor’s degrees in technical subjects. More of us had come to working in library technology through work experience or graduate programs. And yet, our job postings contained long statements that conflated education and experience, such as “Bachelor’s degree in Computer Science, Information Science, or other Emily Morton-Owens (egmowens@upenn.edu), a member of the ITAL Editorial Board, is Director of Digital Library Development and Systems, University of Pennsylvania Libraries, Philadelphia, Pennsylvania. mailto:egmowens@upenn.edu EDITORIAL BOARD THOUGHTS | MORTON-OWENS doi: 10.6017/ital.v35i3.9527 7 relevant field and at least 3 years of experience application development in Object Oriented and scripting languages or equivalent combination of education and experience. Master’s desirable.” I edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “Bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a Master’s degree is preferred,” followed by a separate description of technical skills needed. This increased the number and quality of our applications, so I’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. Meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. First, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. Second, they ask about possibilities to formalize skills. Recently, I’ve gotten questions about a certificate program in UX and whether there is any formal certification to be a systems librarian. Surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-MLS work experience—doesn’t suggest any standard method for substantiating technical knowledge. Once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. Some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,4 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. At Penn Libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. This gives us concrete code to discuss in a far more realistic and relaxed context. While it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new MLS grads for library technology jobs. The new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. Others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. Even if we make efforts to narrow the gap between employers and job- seekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. Library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. There persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. EDITORIAL BOARD THOUGHTS | MORTON-OWENS doi: 10.6017/ital.v35i3.9527 8 REFERENCES 1. Monica Maceli, “What Technology Skills Do Developers Need? A Text Analysis of Job Listings in Library and Information Science (LIS) from Jobs.code4lib.org,” Information Technology and Libraries 34 no3 (2015): 8-21, doi:10.6017/ital./v23i3.5893. 2. Jennie Rose Halperin, “Our $50,000 Problem: Why Library School?” code{4}lib, http://code4lib.org/conference/2015/halperin. 3. Tara Sophia Mohr, “Why Women Don’t Apply for Jobs Unless They’re 100% Qualified,” Harvard Business Review, August 25, 2014, https://hbr.org/2014/08/why-women-dont-apply-for-jobs- unless-theyre-100-qualified. 4. Erin Kissane, “Job Listings That Don’t Alienate,” https://storify.com/kissane/job-listings-that- don-t-alienate. http://dx.doi.org/10.6017/ital./v23i3.5893 http://code4lib.org/conference/2015/halperin https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://storify.com/kissane/job-listings-that-don-t-alienate https://storify.com/kissane/job-listings-that-don-t-alienate 9540 ---- December_ITAL_Maceli_final Technology Skills in the Workplace: Information Professionals’ Current Use and Future Aspirations Monica Maceli and John J. Burke INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 35 ABSTRACT Information technology serves as an essential tool for today’s information professional, and ongoing research is needed to assess the technological directions of the field over time. This paper presents the results of a survey of the technologies used by library and information science practitioners, with attention to the combinations of technologies employed and the technology skills that practitioners wish to learn. The most common technologies employed were email, office productivity tools, web browsers, library catalog- and database-searching tools, and printers, with programming topping the list of most-desired technology skill to learn. Similar technology usage patterns were observed for early and later-career practitioners. Findings also suggested the relative rarity of emerging technologies, such as the makerspace, in current practice. INTRODUCTION Over the past several decades, technology has rapidly moved from a specialized set of tools to an indispensable element of the library and information science (LIS) workplace, and today it is woven throughout all aspects of librarianship and the information professions. Information professionals engage with technology in traditional ways, such as working with integrated library systems, and in new innovative activities, such as mobile-app development or the creation of makerspaces.1 The vital role of technology has motivated a growing body of research literature, exploring the application of technology tools in the workplace, as well as within LIS education, to effectively prepare tech-savvy practitioners. Such work is instrumental to the progression of the field, and with the rapidly-changing technological landscape, requires ongoing attention from the research community. One of the most valuable perspectives in such research is that of the current practitioner. Understanding current information professionals’ technology use can help in understanding the role and shape of the LIS field, provide a baseline for related research efforts, and suggest future Monica Maceli (mmaceli@pratt.edu) is Assistant Professor, School of Information, Pratt Institute, New York. John J. Burke (burkejj@miamioh.edu) is Library Director and Principal Librarian, Gardner-Harvey Library, Miami University Middletown, Middletown, Ohio. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 36 directions. The practitioner perspective is also valuable in separating the hype that often surrounds emerging technologies from the reality of their use and application within the LIS field. This paper presents the results of a survey of LIS practitioners, oriented toward understanding the participants’ current technology use and future technology aspirations. The guiding research questions for this work are as follows: 1. What combinations of technology skillsets do LIS practitioners commonly use? 2. What combinations of technology skillsets do LIS practitioners desire to learn? 3. What technology skillsets do newer LIS practitioners use and desire to learn as compared to those with ten-plus years of experience in the field? LITERATURE REVIEW The growth and increasing diversity of technologies used in library settings has been matched by a desire to explore how these technologies impact expectations for LIS practitioner skill sets. Triumph and Beile examined the academic library job market in 2011 by describing the required qualifications for 957 positions posted on the ALA JobLIST and ARL Job Announcements websites.2 The authors also compared their results with similar studies conducted in 1996 and 1988 to see if they could track changes in requirements over a twenty-three-year period. They found that the number of distinct job titles increased in each survey because of the addition of new technologies to the library work environment that require positions focused on handling them. The comparison also found that computer skills as a position requirement increased by 100 percent between 1988 and 2011, with 55 percent of 2011 announcements requiring them. Looking more deeply at the technology requirements specifically, Mathews and Pardue conducted a content analysis of 620 jobs ads from the ALA JobList to identify skills required in those positions.3 The top technology competencies required were web development, project management, systems development, systems applications, networking, and programming languages. They found a significant overlap of librarian skill sets with those of IT professionals, particularly in the areas of web development, project management, and information systems. Riley-Huff and Rholes found that the most commonly sought technology-related job titles were systems/automation librarian, digital librarian, emerging and instructional technology librarian, web services/development librarian, and electronic resources librarian.4 A few years later, Maceli added to this list with newly popular technology-relating titles, including emerging technologies librarian, metadata librarian, and user experience/architect librarian.5 Beyond examining which specific technologies librarians should be able to use, researchers have also pondered whether a list of skills is even possible to create. Crawford synthesized a series of blog posts from various authors to discuss which technology skills are essential and which are too specialized to serve as minimum technology requirements for librarians.6 He questioned whether universal skill sets should be established given the variety of tasks within libraries and the unique backgrounds of each library worker. Crawford also questioned the expectation that every librarian INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 37 will have a broad array of technology skills from programming to video editing to game design and device troubleshooting. Partridge et al. reported on a series of focus groups held with 76 librarians that examined the skills required for members of the profession, especially those addressing technology.7 In the questions they asked the focus groups, the authors focused on the term “library 2.0” and attempted to gather suggestions on skills that current and future librarians need to assist users. They concluded that the groups identified that a change in attitudes by librarians was more important to future library service than the acquisition of skills with specific technology tools. Importance was given to librarians’ abilities to stay aware of technological changes, be resilient and reflective in the face of them, and to communicate regularly and clearly with the members of their communities. Another area examined in the studies is where the acquisition of technology skills should and does happen for librarians. Riley-Huff and Rholes reported on a dual approach to measure librarians’ preparation for performing technology-related tasks.8 The authors assessed course offerings for LIS programs to see if they included sufficient technology preparation for new graduates to succeed in the workplace. They then surveyed LIS practitioners and administrators to learn how they acquired their skills and how difficult it is to find candidates with enough technology preparation for library positions. Their findings suggest that while LIS programs offer many technology courses, they lack standardization, and graduates of any program cannot be expected to have a broad education in library technologies. Further research confirmed this troubling lack of consistency in technology-related curricula. Singh and Mehra assessed a variety of stakeholders, including students, employers, educators, and professional organizations, finding widespread concern about the coverage of technology topics in LIS curricula.9 Despite inconsistencies between individual programs, several studies provided a holistic view of the popular technology offerings within LIS curricula. Programs commonly offered one or more introductory technology courses, as well as courses in database design and development, web design and development, digital libraries, systems analysis, and metadata.10,11,12 As researchers have emphasized from a variety of perspectives, new graduates could not realistically be expected to know every technology with application to the field of information.13 There was widespread acknowledgement that learning in this area can, and must, continue in a lifelong fashion throughout one’s career. Riley-Huff and Rholes reported that LIS practitioners saw their own experiences involving continuing skill development on the job, both before and after taking on a technology role.14 However, literature going back many decades suggests that the increasing need for continuing education in information technology has generally not been matched by increasing organizational support for these ventures. Numerous deterrents to continuing technology education were noted, including lack of time,15 organizational climate, and the perception of one’s age.16 While studies in this area have primarily focused on MLS-level positions, Jones reported on academic library support staff members and their perceptions of technology use over a ten-year period and found that increased technology responsibilities added TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 38 to workloads and increased workplace stress.17 Respondents noted that increasing use of technology in their libraries has increased their individual workloads along with the range of responsibilities that they hold. METHOD To build an understanding of the research questions stated above, which focus on the technologies currently used by information professionals and those they desired to learn, we designed and administered a thirteen-question anonymous survey (see appendix) to the subscribers of thirty library-focused electronic discussion groups between February 25 and March 13, 2015. The groups were chosen to target respondents employed in multiple types of libraries (academic, public, school, and special) with a wide array of roles in their libraries (public services librarians, systems staff members, catalogers, and so on). We solicited respondents with an email sent to the groups asking for their participation in the survey and with the promise to post initial results to the same groups. The survey included closed and open-ended questions oriented toward understanding current technology use and future aspirations as well as capturing demographics useful in interpreting and generalizing the results. The survey questions have been previously used and iteratively expanded over time by the second author, first in the fall of 2008, then spring of 2012, with summative results presented in the last three editions of the Neal-Schuman Library Technology Companion. We obtained a total of 2,216 responses to the question, “Which of the following technologies or technology skills are you expected to use in your job on a regular basis?” Of these responses, 1,488 (67 percent) of the respondents answered the question regarding technologies they would like to learn: “What technology skill would you like to learn to help you do your job better?” We conducted basic reporting of response frequency for closed questions to assess and report the demographics of the respondents. To analyze the open-ended survey question results in greater depth, we conducted a textual analysis using the R statistical package (https://www.r-project.org/). We used the tm (text mining) package in R (http://CRAN.R- project.org/package=tm) to calculate frequency, correlation of terms, generate plots, and cluster terms. RESULTS The following section will first present an overview of survey responses and respondents, and then explore results as related to the stated four research questions. The LIS practitioners who responded to the survey reported that their libraries are located in forty US states, eight Canadian provinces, and forty-three other countries. Academic libraries were the most common type of library represented, followed by public, school, special, and other (see table 1). INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 39 Library Type Number of Respondents Percentage of All Respondents Academic 1,206 54.4 Public 545 24.6 School 266 12 Special 138 6.2 Other 61 2.8 Table 1. The types of libraries in which survey respondents work Respondents also provided their highest level of education. A total of 77 percent of responding LIS practitioners have earned a library-related or other master’s degrees, dual master’s degrees, or doctoral degrees. From these reported levels of education, it is likely that more respondents are in librarian positions than in library support staff positions. However, individuals with master’s degrees serve in various roles in library organizations, so the percentage of graduate degree holders may not map exactly to the percentage of individuals in positions that require those degrees. Significantly fewer respondents (16 percent) reported holding a high school diploma, some college credit, an associate degree, or a bachelor’s degree as their highest level of education. Another aspect we measured in the survey was tasks that respondents performed on a regular basis. The range of tasks provided in the survey allowed for a clearer analysis of job responsibilities than broad categories of library work such as “public services” or “technical services.” Some respondents appeared to be employed in solo librarian environments where they are performing several roles. Even respondents who might have more focused job titles such as “reference librarian” or “cataloger” may be performing tasks that overlap traditional roles and categories of library work. The tasks offered in the survey and the responses to each are shown in table 2. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 40 Task Number of Respondents Percentage of Respondents Reference 1,404 63.4 Instruction 1,296 58.5 Collection development 1,260 56.9 Circulation 917 41.4 Cataloging 905 40.8 Electronic resource management 835 37.7 Acquisitions 789 35.6 User experience 775 35 Library administration 769 34.7 Outreach 758 34.2 Marketing/public relations 722 32.6 Library/IT systems 672 30.3 Periodicals/serials 659 29.7 Media/audiovisuals 566 25.5 Interlibrary loan 518 23.4 Distance library services 474 21.4 Archives/special collections 437 19 Other 209 9.40% Table 2. Tasks performed on a regular basis by survey respondents While public services-related activities lead the list, with reference, instruction, collection development, and circulation as the top four task areas, technical services-related activities are well represented; the next three in rank are cataloging, electronic resource management, and acquisitions. The overall list of tasks shows the diversity of work LIS practitioners engage in, as each respondent chose an average of six tasks. The results also suggest that the survey respondents are well acquainted with a wide variety of library work rather than only having experience in a few areas, making their uses of technology more representative of the broader library world. The survey also questioned the barriers LIS practitioners face as they try to add more technology to their libraries, and 2,161 respondents replied to the question, “Which of the following are barriers to new technology adoption in your library?” Financial considerations proved to be the most common barrier, with “budget” chosen by 80.7 percent of respondents, followed by “lack of staff time” (62.4 percent), “lack of staff with appropriate skill sets” (48.5 percent), and “administrative restrictions” (36.7 percent). INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 41 What Combinations of Technology Skillsets do LIS Practitioners Commonly Use? Responses from survey question 8, “Which of the following technologies or technology skills are you expected to use in your job on a regular basis?,” were analyzed to build an understanding of this research questions. A total of 2,216 responses to this question were received. Survey respondents were asked to select from a detailed list of technologies/skills (visible in question 8 of the appendix) that they regularly used. The top answers respondents chose for this question were: email, word processing, web browser, library catalog (public side), and library database searching. The full list of the top twenty-five technology skills and tools used is detailed in figure 1, with the list of the bottom fifteen technology skills used presented in figure 2. Figure 1. Top twenty-five technology skills/tools used by respondents (N = 2,216) 0 500 1,000 1,500 2,000 Email Word Processing Web Browser Library Catalog Public Side Library Database Searching Spreadsheets Printers Web Searching Teaching Others To Use Technology Presentation Software Windows OS Laptops Scanners Library Management System Staff Side Downloadable Ebooks Web Based Ebook Collections Cloud Based Storage Technology Troubleshooting Teaching Using Technology Online Instructional Materials/Products Tablets Web Video Conferencing Educational Copyright Knowledge Library Website Creation Or Management Cloud-Based Productivity Apps TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 42 Figure 2. Bottom fifteen technology skills/tools used by respondents (N = 2,216) Text analysis techniques were then used to determine the frequent combinations of technology skills used in practice. First, a clustering approach was taken to visualize the most popular technologies that were commonly used in combination (figure 3). Clustering helps in organizing and categorizing a large dataset when the categories are not known in advance, and, when plotted in a dendrogram chart, assists in visualizing these commonly co-occurring terms. The authors numbered the clusters identified in figure 3 for ease of reference. From left to right, the first cluster is focuses on communication and educational tools, the second emphasizes devices and software, the third contains web and multimedia creation tools, the fourth contains office productivity and public-facing information retrieval tools, and the fifth cluster has a diverse collection of responsibilities including systems-oriented responsibilities (from operating systems to specific hardware devices), working with ebooks, teaching with technology, and teaching technology to others. 0 500 1,000 1,500 2,000 Mac OS Audio Recording And Editing Technology Equipment Installation Computer Programming Or Coding Assistive Adaptive Technology RFID Chromebooks Network Management Server Management Statistical Analysis Software Makerspace Technologies Linux 3D Printers Augmented Reality Virtual Reality INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 43 Figure 3. Cluster analysis of most frequent technology skills used in practice, with red outlines on each numbered cluster Notably, the list of top skills used (figure 1) falls more on the end-user side of technology; skills more oriented toward systems work (e.g. Linux, server management, computer programming, or coding) were less frequently mentioned, and several were among the lowest reported (figure 2). Of the 2,216 respondents, 15 percent used programming or coding skills regularly in their job (which is of interest as programming or coding was the skill most desired to learn by respondents; this will be discussed further in the context of the next research question). Plotting the correlations between the more advanced technology skillsets can provide a picture of the work such systems-oriented positions are commonly responsible for, particularly as they are less well represented in the responses as a whole. Figure 4 plots the correlated terms for those tasked with “server management.” It is fair to assume someone with such responsibilities falls on the highly technical end of the spectrum. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 44 Figure 4. Terms correlated with “server management,” indicating commonly co-occurring workplace technologies for highly-technical positions The more common task of “library website creation or management,” which fell to those with a broad level of technological expertise, had numerous correlated terms. Figure 5 demonstrated a wide array of technology tools and responsibilities. Figure 5. Terms correlated with “library website creation or management,” indicating commonly co-occurring technologies used on the job INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 45 And lastly, teaching using technology and teaching technology to others is a long-standing responsibility of librarians and library staff. The following plot (figure 6) presents the skills correlated with “teaching others to use technology.” Figure 6. Terms correlated with “teaching others to use technology,” indicating commonly co- occurring technologies used on the job What Combinations of Technology Skillsets do LIS Practitioners Desire to Learn? We analyzed responses to survey question 10, “What technology skill would you like to learn to help you do your job better?,” to explore this research question. As summarized in Burke18—and consistent with the prior year’s findings—coding or programming remained the most desired technology skillset, mentioned by 19 percent of respondents. The raw text analysis yielded a fuller list of the top terms mentioned by participants (table 3 and visualized in figure 7). TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 46 Technology Term Number of Respondents Percentage of Respondents Coding or programming (combined for reporting) 292 19.59 Web 178 11.96 Software 158 10.62 Video 112 7.53 Apps 106 7.12 Editing 105 7.06 Design 85 5.71 Database 76 5.11 Table 3. Terms mentioned by 5 percent or more of survey respondents Figure 7. Wordcloud of responses to “what technology skill would you like to learn to help you do your job better?” INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 47 We then explored the deeper context of responses and individually analyzed responses specific to the more popular technology desires. First, we assessed the responses mentioning the desire to learn coding or programming. Of these responses, the most common specific technologies mentioned were HTML, Python, CSS, JavaScript, Ruby, and SQL, listed in decreasing order of interest. Although most participants did not describe what they would like to do with their desired coding or programming skills, of those that did, the responses indicated interest in ● becoming more empowered to solve their own technology problems (e.g., “I would like to learn the [programming languages] so I don't have to rely on others to help with our website,” “I’m one of the most tech-skilled people at my library, but I’d like to be able to build more of my own tools and manage systems without needing someone from IT or outside support.”); ● improving communication with IT (e.g., “how to speak code, to aid in communication with IT,” “to better identify problems and work with IT to fix them”); ● creating novel tools and improving system interoperability (e.g. “coding for app and API creation”); and ● bringing new technologies to their library and patrons (e.g., “coding so that I can incorporate a hackerspace in my library”). Next, we took a clustering approach to visualize the terms commonly desired in combination. Figure 8 describes the clustered terms that we found within the programming or coding responses. The terms “programming” and “coding” form a distinct cluster to the right of the diagram, indicating that many responses contained only those two terms. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 48 Figure 8. Clustering of terms present in responses indicating the desire to learn coding or programming The remaining portion of the diagram begins to illustrate the specific technologies mentioned for those respondents that answered in greater detail or expanded on their general answer of programming or coding. Other related desired technology-skill areas become apparent: database management, HTML and CSS (as well as the more general “web design,” which appeared in the top terms in table 3), PHP and JavaScript, Python and SQL, and XML creation, among others. The bulleted list presented in the previous paragraph illustrates some of the potential applications participants envisioned these skills being useful in, but the majority did not provide this level of detail in their response. Editing was another prominent term that appeared across participant responses and was largely meant in the context of video editing. Because of the vagueness of the term “editing,” a closer look was necessary to determine other technology desires. Looking at terms highly correlated with “editing” revealed both video and photo editing to be important to respondents. Several of the top- appearing terms were used more generally: “database” and mobile “apps” were mentioned without specifying the technology tool or scenario of use, such that a more contextual analysis could not be conducted. These responses can be particularly difficult to interpret as the term “databases” can have a technical meaning (e.g., working with SQL) or it can refer to the use of library databases from an end user perspective. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 49 What Technology Skillsets do Newer LIS Practitioners Use and Desire to Learn as Compared to Those with Ten-Plus Years Experience in the Field? Of the 2,216 survey responses, 877 stated they had worked in libraries for ten or fewer years. We analyzed these responses separately from the remaining 1,334 respondents who had worked in libraries for more than ten years. Of this group, 644 had worked in libraries for twenty-plus years (figure 9). A handful of participants did not answer the question and were omitted from the analysis. Figure 9. Number of survey responses falling into the various categories for number of years working in libraries The top technology skills used in the workplace did not differ significantly between the different groups. The top skills, as discussed earlier and presented in figure 1, were well represented and similarly ordered. A few small percentage points of difference were noted in a handful of the top skills (figure 10). Those newer to the field were slightly more likely to teach others to use technology, use cloud-based storage, and use cloud-based productivity apps. More experienced practitioners regularly used the library management system (on the staff side) more than those that were newer to the field. 0 100 200 300 400 500 600 700 0-2 3-5 6-10 11-15 16-20 21+ TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 50 Figure 10. Top twenty-five technology skills used by respondents in the zero to ten years’ experience (dark blue) and eleven-plus years experience (light blue) groups For the question regarding technologies they would like to learn, 69 percent of the participants with zero to ten years’ experience answered the question compared to a slightly smaller 65 percent of the participants with more than ten-years’ experience. Top terms for both groups were very similar, including coding or programming, software, web, video, design, and editing. These terms were not dissimilar to the responses taken as a whole (table 3), indicating that respondents were generally interested in learning the same sorts of technology skills regardless of how long they had been in the field. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Email Word Processing Web Browser Library Catalog Public Side Library Database Searching Spreadsheets Web Searching Printers Teaching Others To Use Technology Presentation Software Windows OS Laptops Scanners Downloadable Ebooks Cloud Based Storage Library Management System Staff Side Web Based Ebook Collections Technology Troubleshooting Teaching Using Technology Online Instructional Materials/Products Cloud-based Productivity Apps Tablets Web Video Conferencing Library Website Creation Or Management Educational Copyright Knowledge 0-10 Years 11+ Years INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 51 A few noticeable differences between the two groups emerged. The most popular skills mentioned, coding or programming, were mentioned by 28 percent of the respondents with zero to ten years’ experience, and by 15 percent of the respondents with eleven-plus years experience. There was slightly more interest (by a few percentage points) in databases, design, Python, and Ruby in the zero to ten years’ experience group. Taking a closer look at the different year ranges in the zero to ten years of experience or less group, revealed that those with three to five years of experience were most likely to be interested in learning coding or programming skills. Figure 11. Percentage of respondents interested in learning coding or programming in the groups with ten or fewer years’ experience Of the participants that answered the question at all, several stated that there were no technology skills they would need or like to learn for their position, either because they were comfortable with their existing skills or were simply open to learning more as needed (but nothing specific came to mind). Combined with those who did not answer the question (and so presumably did not have a particular technology they were interested in learning), 28 percent of the zero to ten years’ experience group and 31 percent of the eleven-plus years experience group did not have any technologies that they desired to learn at the moment. DISCUSSION As detailed earlier, the most common technologies employed by LIS practitioners were email, office productivity tools, web browsers, library catalog and database searching tools, and printers. Generally similar technology usage patterns were observed for early and later-career practitioners and programming topped the list of most-desired technology skill to learn. 0% 5% 10% 15% 20% 25% 30% 35% 0-2 years 3-5 years 6-10 years TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 52 The cluster analysis presented in figure 3 suggests that a relatively small percentage of practitioners have technology-intensive roles that would require skills such as programming, working with databases, systems administration, etc. Rather, the cluster analysis showed common technology skillsets focused on the end-user side of technology tools. In fact, most of the top ten skills used—email, office productivity tools (word processing, spreadsheets and presentation software), web browsers, library catalog and database searching, printers, and teaching others to use technology—are fairly nontechnical in nature. A potential exception is that of teaching technology. Figure 6 suggests that teaching others to use technology entails several hardware devices (for example, laptops, tablets, smartphones, and scanners) as well as online and digital resources, such as ebooks. However, most of the popular skills used would be considered baseline skills for information workers in any domain. As suggested by Tennant, programming and other advanced technical skills do not necessarily need to be a core skill for all information professionals, but knowledge of the potential applications and possibilities of such tools is required.19 This idea was echoed by Partridge et al., whose findings emphasized the need for awareness and resilience in tackling new technological developments.20 These skills alone would obviously be too little for LIS practitioners explicitly seeking a high-tech role, as discussed in Maceli.21 However, further research directed toward exploring the mental models and general technological understanding of information professionals would be helpful in understanding the true level of practitioner engagement with technology, to complement the list of relatively low-tech tools employed. Programming has been a skill of great interest within the information professions for many years and the respondents’ enthusiasm and desire to learn in this area was readily apparent from the survey results, with nearly 20 percent of participants citing either “programming” or “coding” as a skill they desired to learn. In the context of their current responsibilities, 15 percent of respondents overall mentioned “computer programming or coding” as a regular technological skill they employed (figure 2). There was a slight difference between the librarians with fewer than eleven years of experience—19 percent coded regularly—compared to 13 percent of those with eleven or more years of experience. Within the years-of-experience divisions, the newer practitioners were more interested in learning programming, with the peak of interest at three to five years in the workplace (figure 11). The relatively low interest or need to learn programming in the newest practitioners potentially indicates a hopeful finding—that their degree program was sufficient preparation for the early years of their career. Prior research would contradict this finding. For example, Choi and Rasmussen’s 2006 survey found that, in the workplace, librarians frequently felt unprepared in their knowledge of programming and scripting languages.22 In the intervening years, curriculum has shifted to more heavily emphasize technology skills, including web development and other topics covering programming,23 perhaps better preparing early career practitioners. Overall, INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 53 programming remains a popular skill in continuing education opportunities as well as in job listings,24 which aligns well with the respondents’ strong interest in this area. The skills commonly co-occurring with programming in practice included working with Linux, database software, managing servers, and webpage creation (figure 4). Taken as a whole, these skills indicate job responsibilities falling toward the systems side, with webpage creation a skill that bridged intensely technical and more user-focused work (as also evident in figure 4).This indicates that, though programming may be perceived as highly desirable for communicating and extending systems, as a formal job responsibility it may still fall to a relatively small number of information professionals in any significant manner. Makerspace technologies and their implementation possibilities within libraries have garnered a great deal of excitement and interest in recent years, with much literature highlighting innovative projects in this area (such as American Library Association25 and Bagley26). Fourie and Meyer provided an overview of the existing makerspace literature, finding that most research efforts focus on the needs and construction of the physical space.27 Given the general popularity of the topic (as detailed in Moorefield-Lang),28 it is interesting to note that such technologies were infrequently mentioned by survey participants, both in those desiring to learn these tools and those who were currently using them. The most infrequent skills used (figure 2) included makerspace technologies, 3D printers, augmented, and virtual reality. Only a small number of respondents currently used this mix of makerspace-oriented and emerging technologies, and only 3 percent of respondents mentioned interest in learning makespace-related skills. Despite many research efforts exploring the particulars of unique makerspaces in a case-study approach (for example, Moorefield-Lang),29 little data exists on the total number of makerspaces within libraries, and the skillset is largely absent from prior research describing LIS curriculum and job listings. This makes it difficult to determine whether the low number of participants that reported working with makerspace technologies is reflective of the small number of such spaces in existence or simply that few practitioners are assigned to work in this area, no matter their popularity. In either case, these findings provide a useful baseline with which to track the growth of makerspace offerings over time and librarian involvement in such intensely technological work. Despite the interest and clear willingness to learn and use technology, several workplace challenges became apparent from participant responses. As prior research explored (notable Riley-Huff and Rholes),30 practitioners assumed they would be continually learning and building skills on the job throughout their career to stay current technologically. As described in the earlier results section, many participants mentioned that, although they were highly willing and able to learn, the necessary organizational resources were lacking. As one participant noted, “I’d like to learn anything but the biggest problem seems to be budget (time and monetary).” Several participants expressed feeling overwhelmed with their current workload. New learning opportunities, technological or otherwise, were simply not feasible. Although the survey results indicated that practitioners of all ages were roughly equally interested in learning new TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 54 technologies, a handful of responses mentioned that ageist issues were creating barriers. Though few, these respondents described being dismissed as technologists because of their age. These themes have long been noted in the large body of continuing-education-related literature going back several decades. Stone’s study ranked lack of time as the top deterrent to professional development for librarians, and it appears little has changed.31 Chan and Auster noted that organizational climate and the perception of one’s age may impair the pursuit of professional development, among other impediments.32 However, research has noted a generally strong drive in older librarians to continue their education; Long and Applegate found a preference in later- career librarians for learning outlets provided by formal library schools and related professional organizations, but a lower interest in generally popular topics such as programming.33 These findings were consistent with the participant responses gathered in this survey. Finally, as detailed in the results section, a significant percent of respondents (33 percent) did not answer the question regarding what technologies they would like to learn. As is a limitation with survey research, it is difficult to know what the respondent’s intention was in not answering the question, i.e., are they comfortable with their current technology skills? Do they lack the time or interest in pursuing further technology education? And of those that did answer, many did not specify their intended use of the technologies they desired to learn. So a deeper exploration of what technologies LIS practitioners desire to learn and why would be of value as well. These questions are worth pursuing in more depth through further research efforts. CONCLUSION This study provides a broad view into the technologies that LIS practitioners currently use and desire to learn, across a variety of types of libraries, through an analysis of survey responses. Despite a marked enthusiasm toward using and learning technology, respondents described serious organizational limitations impairing their ability to grow in these areas. The LIS practitioners surveyed have interested patrons, see technology as part of their mission, and are not satisfied with the current state of affairs, but they seem to lack money, time, skills, and a willing library administration. Though respondents expressed a great deal of interest in more advanced technology topics, such as programming, the majority typically engaged with technology on an end-user level, with a minority engaged in deeply technical work. This study suggests future work in exploring information professionals’ conceptual understanding of and attitudes toward technology, and a deeper look at the reasoning behind those who did not express a desire to learn new technologies. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 55 REFERENCES 1. Marshall Breeding, “Library Technology: The Next Generation,” Computers in Libraries 33, no. 8 (2013): 16–18, http://librarytechnology.org/repository/item.pl?id=18554. 2. Therese F. Triumph and Penny M. Beile, “The Trending Academic Library Job Market: An Analysis of Library Position Announcements from 2011 with Comparisons to 1996 and 1988,” College & Research Libraries 76, no. 6 (2015): 716–39, https://doi.org/10.5860/crl.76.6.716. 3. Janie M. Mathews and Harold Pardue, “The Presence of IT Skill Sets on Librarian Position Announcements,” College & Research Libraries 70, no. 3 (2009): 250–57, https://doi.org/10.5860/crl.70.3.250. 4. Debra A. Riley-Huff and Julia M. Rholes, “Librarians and Technology Skill Acquisition: Issues and Perspectives,” Information Technology and Libraries 30, no. 3 (2011): 129–40, https://doi.org/10.6017/ital.v30i3.1770. 5. Monica Maceli, “Creating Tomorrow’s Technologists: Contrasting Information Technology Curriculum in North American Library and Information Science Graduate Programs against Code4lib Job Listings,” Journal of Education for Library and Information Science 56, no. 3 (2015): 198–212, https://doi.org/10.12783/issn.2328-2967/56/3/3. 6. Walt Crawford, “Making it Work Perspective: Techno and Techmusts,” Cites and Insights 8, no. 4 (2008): 23–28. 7. Helen Partridge et al., “The Contemporary Librarian: Skills, Knowledge and Attributes Required in a World -f Emerging Technologies,” Library & Information Science Research 32, no. 4 (2010): 265–71, https://doi.org/10.1016/j.lisr.2010.07.001. 8. Riley-Huff and Rholes, “Librarians and Technology Skill Acquisition.” 9. Vandana Singh and Bharat Mehra, “Strengths and Weaknesses of the Information Technology Curriculum in Library and Information Science Graduate Programs,” Journal of Librarianship and Information Science 45, no. 3 (2013): 219–231, https://doi.org/10.1177/0961000612448206. 10. Riley-Huff and Rholes, “Librarians and Technology Skill Acquisition.” 11. Sharon Hu, “Technology Impacts on Curriculum of Library and Information Science (LIS)—A United States (US) Perspective,” LIBRES: Library & Information Science Research Electronic Journal 23, no. 2 (2013): 1–9, http://www.libres-ejournal.info/1033/. 12. Singh and Mehra, “Strengths and Weaknesses of the Information Technology Curriculum.” 13. See, for example, Crawford, “Making it Work Perspective”; Partridge et al., “The Contemporary Librarian.” TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 56 14. Riley-Huff and Rholes, “Librarians and Technology Skill Acquisition.” 15. Elizabeth W. Stone, Factors Related to the Professional Development of Librarians (Metuchen, NJ: Scarecrow, 1969). 16. Donna C. Chan and Ethel Auster, “Factors Contributing to the Professional Development of Reference Librarians,” Library & Information Science Research 25, no. 3 (2004): 265–86, https://doi.org/10.1016/S0740-8188(03)00030-6. 17. Dorothy E. Jones, “Ten Years Later: Support Staff Perceptions and Opinions on Technology in the Workplace,” Library Trends 47, no. 4 (1999): 711–45. 18. John J. Burke, The Neal-Schuman Library Technology Companion: A Basic Guide for Library Staff, 5th edition (New York: Neal-Schuman, 2016). 19. Roy Tennant, “The Digital Librarian Shortage,” Library Journal 127, no. 5 (2002): 32. 20. Partridge et al., “The Contemporary Librarian.” 21. Monica Maceli, “What Technology Skills Do Developers Need? A Text Analysis of Job Listings in Library and Information Science (LIS) from Jobs.code4lib.org,” Information Technology and Libraries 34, no. 3 (2015): 8–21, https://doi.org/10.6017/ital.v34i3.5893. 22. Youngok Choi and Edie Rasmussen, “What Is Needed to Educate Future Digital Libraries: A Study of Current Practice and Staffing Patterns in Academic and Research Libraries,” D-Lib Magazine 12, no. 9 (2006), http://www.dlib.org/dlib/september06/choi/09choi.html. 23. See, for example, Maceli, “Creating Tomorrow's Technologists.” 24. Elías Tzoc and John Millard, “Technical Skills for New Digital Librarians,” Library Hi Tech News 28, no. 8 (2011): 11–15, https://doi.org/10.1108/07419051111187851. 25. American Library Association, “Manufacturing Makerspaces,” American Libraries 44, no. 1/2 (2013), https://americanlibrariesmagazine.org/2013/02/06/manufacturing-makerspaces/. 26. Caitlin A. Bagley, Makerspaces: Top Trailblazing Projects, A LITA Guide (Chicago: American Library Association, 2014). 27. Ina Fourie and Anika Meyer, “What to Make of Makerspaces: Tools and DIY Only or is there an Interconnected Information Resources Space?,” Library Hi Tech 33, no. 4 (2015): 519–25, https://doi.org/10.1108/LHT-09-2015-0092. 28. Heather Moorefield-Lang, “Change in the Making: Makerspaces and the Ever-Changing Landscape of Libraries,” TechTrends 59, no. 3 (2015): 107–12, https://doi.org/10.1007/s11528-015-0860-z. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 57 29. Heather Moorefield-Lang, “Makers in the Library: Case Studies of 3D Printers and Maker Spaces in Library Settings,” Library Hi Tech 32, no. 4 (2014): 583–93, https://doi.org/10.1108/LHT-06-2014-0056. 30. Riley-Huff and Rholes, “Librarians and Technology Skill Acquisition.” 31. Stone, Factors Related to the Professional Development of Librarians. 32. Chan and Auster, “Factors Contributing to the Professional Development of Reference Librarians.” 33. Chris E. Long and Rachel Applegate, “Bridging the Gap in Digital Library Continuing Education: How Librarians Who Were Not ‘Born Digital’ Are Keeping Up,” Library Leadership & Management 22, no. 4 (2008), https://journals.tdl.org/llm/index.php/llm/article/view/1744. TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 58 Appendix. Survey Questions 1. What type of library do you work in? 2. Where is your library located (state/province/country)? 3. What is your job title? 4. What is your highest level of education? 5. Which of the following methods have you used to learn about technologies and how to use them? Please mark all that apply. • Articles • As part of a degree I earned • Books • Coworkers • Face-to-face credit courses • Face-to-face training sessions • Library patrons • Online credit courses • Online training sessions (webinars, etc.) • Practice and experiment on my own • Web resources I regularly check (sites, blogs, Twitter, etc.) • Web searching • Other: 6. Which of the following skill areas are part of your responsibilities? Please mark all that apply. • Acquisitions • Archives/special collections • Cataloging • Circulation • Collection development • Distance library services • Electronic resource management • Instruction • Interlibrary loan INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 59 • Library administration • Library IT/systems • Marketing/public relations • Media/audiovisuals • Outreach • Periodicals/serials • Reference • User experience • Other: 7. How long have you worked in libraries? • 0–2 years • 3–5 years • 6–10 years • 11–15 years • 16–20 years • 21 or more years 8. Which of the following technologies or technology skills are you expected to use in your job on a regular basis? Please mark all that apply • Assistive/adaptive technology • Audio recording and editing • Augmented reality (Google Glass, etc.) • Blogging • Cameras (still, video, etc.) • Chromebooks • Cloud-based productivity apps (Google Apps, Office 365, etc.) • Cloud-based storage (Google Drive, Dropbox, iCloud, OneDrive, etc.) • Computer programming or coding • Computer security and privacy knowledge • Database creation/editing software (MS Access, etc.) • Dedicated e-readers (Kindle, Nook, etc.) • Digital projectors TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 60 • Discovery layer/service/system • Downloadable e-books • Educational copyright knowledge • E-mail • Facebook • Fax machine • Image editing software (Photoshop, etc.) • Laptops • Learning management system (LMS) or virtual learning environment (VLE) • Library catalog (public side) • Library database searching • Library management system (staff side) • Library website creation or management • Linux • Mac operating system • Makerspace technologies (laser cutters, CNC machines, Arduinos, etc.) • Mobile apps • Network management • Online instructional materials/products (LibGuides, tutorials, screencasts, etc.) • Presentation software (MS PowerPoint, Prezi, Google Slides, etc.) • Printers (public or staff) • RFID (radio frequency identification) • Scanners and similar devices • Server management • Smart boards/interactive whiteboards • Smartphones (iPhone, Android, etc.) • Software installation • Spreadsheets (MS Excel, Google Sheets, etc.) • Statistical analysis software (SAS, SPSS, etc.) • Tablets (iPad, Surface, Kindle Fire, etc.) • Teaching others to use technology INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2016 61 • Teaching using technology (instruction sessions, workshops, etc.) • Technology equipment installation • Technology purchase decision-making • Technology troubleshooting • Texting, chatting, or instant messaging • 3D printers • Twitter • Using a web browser • Video recording and editing • Virtual reality (Oculus Rift, etc.) • Virtual reference (text, chat, IM, etc.) • Word processing (MS Word, Google Docs, etc.) • Web-based e-book collections • Web conferencing/video conferencing (Webex, Google Hangouts, Goto Meeting, etc.) • Webpage creation • Web searching • Windows operating system • Other: 9. Which of the following are barriers to new technology adoption in your library? Please mark all that apply. • Administrative restrictions • Budget • Lack of fit with library mission • Lack of patron interest • Lack of staff time • Lack of staff with appropriate skill sets • Satisfaction with amount of available technology • Other: 10. What technology skill would you like to learn to help you do your job better? 11. What technologies do you help patrons with the most? 12. What technology item do you circulate the most? TECHNOLOGY SKILLS IN THE WORKPLACE: INFORMATION PROFESSIONALS’ CURRENT USE AND FUTURE ASPIRATIONS | MACELI AND BURKE | https://doi.org/10.6017/ital.v35i4.9540 62 13. What technology or technology skill would you most like to see added to your library? 9585 ---- June_ITA_Buljung_final Up Against the Clock: Migrating to LibGuides v2 on a Tight Timeline Brianna Buljung and Catherine Johnson INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 68 ABSTRACT During Fall semester 2015, Librarians at the United States Naval Academy were faced with the challenge of migrating to LibGuides version 2 and integrating LibAnswers with LibChat into their service offerings. Initially, the entire migration process was anticipated to take almost a full academic year; giving guide owners considerable time to update and prepare their guides. However, with the acquisition of the LibAnswers module, library staff shortened the migration timeline considerably to ensure both products went live on the version 2 platform at the same time. The expedited implementation timeline forced the ad hoc implementation teams to prioritize completion of the tasks that were necessary for the system to remain functional after the upgrade. This paper provides an overview of the process the staff at the Nimitz Library followed for a successful implementation on a short timeline and highlights transferable lessons learned during the process. Consistent communication of expectations with stakeholders and prioritization of tasks were essential to the successful completion of the project. INTRODUCTION Academic libraries all over the United States have migrated from LibGuides version 1 to the new, sleeker, responsive design of version 2. Approaches to the migration can differ vastly depending on library size, staff capabilities and time frame available for completing the project. In 2015, the Nimitz Library at the United States Naval Academy, began planning to both upgrade LibGuides to version 2 and to acquire LibAnswers with LibChat. The Web Team and Reference Department partnered to migrate the LibGuides platform and integrate LibAnswers into the Library’s web presence. The Library first adopted Springshare’s LibGuides in 2009. By 2015, the subscription had grown to 61 published guides with 10,601 views. The LibGuides collection was modified and expanded during two web site upgrades and several staffing changes. Throughout 2014 and 2015, Library staff periodically discussed the possibility of upgrading to the version 2 interface, but timing, Brianna Buljung (bbuljung@mines.edu) is Instruction & Research Librarian, Colorado School of Mines, Golden, CO. Catherine Johnson (cjohnson@usna.edu) is Head of Reference & Instruction at the United States Naval Academy, Annapolis, MD. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 69 staffing vacancies and the priority of other projects kept the migration from taking place. In late summer 2015, with the acquisition of Springshare’s LibAnswers with LibChat pending, staff determined that it was finally time to migrate to the new LibGuides interface. Initially, the migration team planned to spend nearly a full academic year completing the migration process. This timeline would provide guide owners with ample time for staff training, revising guides, conducting usability testing and preparing the migrated guides to go live without distracting from their other duties. However, right before starting the project, the Library finalized the acquisition of Springshare’s LibAnswers with LibChat which they decided to launch with the version 2 interface. The team pushed up the LibGuides migration by several months to keep from confusing patrons with multiple interfaces and launch dates. The migration of LibGuides and the implementation of LibAnswers would take place during the Fall semester and both products would go live in the version 2 interface before the start of the Spring semester. This paper provides an overview of the process that the staff at Nimitz Library followed for a successful implementation on a short timeline and highlights transferable lessons learned during the process. The authors also include a post-implementation reflection on the process. LITERATURE REVIEW Much of the currently available literature on migration of platforms, especially the LibGuides platform is published informally. Librarians from universities across the country have created help guides, checklists and best practices for surviving the migration. Most migration help-guides are tailored to each specific institution but they can still provide helpful suggestions that can be adapted by another.1 Springshare also provides extensive help content and checklists, including a list of the most important steps for administrators to complete.2 However, little of the available literature discusses the minimally acceptable amount of work needed to be completed by guide authors. This type of information was crucial to the Nimitz Library team after drastically shortening the migration timeline. A clearly delineated list of required and optional tasks was needed for guide owners, given time constraints and other job duties. In addition to the informally published help materials, several articles have been published on various aspects of research guide design and evaluation. A few articles examine the migration process. Hernandez and McKeen offer advice for libraries contemplating migration; including setting goals and performing usability testing against the new guides.3 Duncan et al provide a case study of the implementation process at the University of Saskatchewan.4 Some articles discuss the basics of guide design and usage in the library. These best practices can be adapted to different platforms, web sites and user populations. They discuss the importance of various web design elements such as word choice and page layout.5 Another aspect the literature exposes is student use of the guides.6 Finally, usability of research guides is one of the most important and widely discussed topics in the literature. Creating and maintaining guide content depends on the user’s ability to locate and use the guides in their research.7 Most often, research guides are designed UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 70 with the student in mind; to assist them in beginning a project, researching when a librarian is unavailable or as a reference for follow-up after an instruction session.8 As Pittsley and Memmott discuss, navigation elements can impact a student’s use of research guides.9 The Process As preparations for the migration began, it became immediately apparent that the Web Team and Reference Department would have to divide the project into manageable segments to complete the work without overwhelming guide owners. Three ad hoc teams, made up of librarians from several different departments, were created to take the lead on different elements of the project. The migration team was responsible for researching, organizing and supervising the migration of LibGuides to version 2. The LibAnswers team learned about LibAnswers and how to effectively integrate the product into the Library’s web site. The LibChat team tested the functionality of LibChat and determined how it would fit into the Library’s reference desk staffing model. Dividing the project into manageable segments allowed each team to focus on the execution of their area of responsibility. The team approach allowed the Library to draw on individual strengths and staff willingness to participate without depending on one single staff member to manage the entire migration and implementation process on such a short timeline. Migration Team The migration team was responsible for determining the tasks that were mandatory for guide owners to complete, the amount of training they would need to use the new interface and how each product should be incorporated into the Library’s web site. The LibGuides migration team relied heavily on advice from other libraries and the documentation from Springshare to guide them in determining mandatory tasks. The Engineering and Computer Science librarian reached out to the ASEE Engineering Libraries Division listserv for advice from peer libraries that had already completed migration. The team also made use of the Springshare help guides and best practices guides posted by other universities. Ultimately, the migration team created checklists and spreadsheets to help guide owners prepare their guides for migration. A pre-migration checklist (Appendix A) was shared with guide owners; containing all of the required and optional tasks that needed to be completed before the migration took place in early November. Tasks such as deleting outdated or unused images and evaluating low use guides for possible deletion were required for guide owners to complete. Other tasks such as checking each guide for a friendly url or checking database descriptions for brevity and jargon free language were encouraged but considered optional. The team determined that items directly related to the ability of post-migration guides to function properly made the required list, while more cosmetic or stylistic tasks could be completed on a time-allowed basis. A post-migration checklist (Appendix B) was created for guide owners following the migration. This list included portions of the guides that had to be checked to ensure widgets, links and other assets had UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 71 migrated properly. Both checklists were accompanied by tips, screenshots, deadlines and indicated which team member to contact with questions. Clear explanation of the expectations for the project, and accommodating the guide owners’ busy schedules made the migration more successful. The migration team gave the new, more robust A-Z list significant attention. LibGuides version 2 allows the A-Z list to be sorted by subject, type and vendor. It also allows a library to tag “Best Bets” databases in each subject area. The databases categorized as Best Bets display more prominently in the list of databases by subject. Using Google Sheets, the Electronic Resources Librarian quickly and easily solicited feedback from liaison librarians about which databases to tag as Best Bets for each subject area. Google Sheets also made it easy for librarians to edit the list of databases related to their subject expertise. Some databases had been incorrectly categorized and, in some subjects, newer subscriptions didn’t appear on the list. LibGuides version 2 allows users to sort databases by type, but doesn’t provide a predetermined list of types. In order to create the list of material types into which all databases would be sorted, the migration team examined lists found on other library web sites. Several lists were combined and duplicates, or irrelevant types were removed. An additional military specific type was added to address the most common research conducted by midshipmen. Then, the liaison librarians were solicited for input on the language used to describe each type and which databases should be tagged by each type. Name choices are a matter of local preference, such as having a single type category for both dictionaries and encyclopedias, or two separate categories. To keep the list of material types to a manageable length, the team decided that each type must contain more than one or two databases. It takes time to get well defined lists of subjects and types. Staff working with patrons are able to gather informal feedback about the categorizations in their current form, and make suggestions, corrections, or additions based on patron feedback. The migration of LibGuides and acquisition of LibAnswers provided the Reference Department and Web Team with an opportunity to update policies and establish new best practices for guide owners. One important cosmetic update included more encouragement for guide owners to use a photo in their profiles. Profile pictures had been used inconsistently in the first LibGuides interface, and several guide owners used the default grey avatar. Guide owners who were reluctant to have a headshot on their profile were encouraged to take advantage of stock photos made available through the Naval Academy’s Public Affairs Office. A photo shoot was also organized for guide owners. On a voluntary basis, guide owners spent about an hour helping each other to take pictures in and around the Library. The event helped to get a collection of more professional photos for guide owners to choose from. Another important update was the re-evaluation of LibGuides policies in light of the new functionality available in version 2. The guide owners gathered for a meeting midway through the pre-migration guide cleanup process to troubleshoot problems and consider best practices for the UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 72 new interface. Guide owners discussed the standardization of tab names in the guides, the information important to include in author profile boxes, and potential categories for the “types” dropdown in the A-Z database list. The meeting provided a great opportunity to discuss the options available to guide owners and to solicit feedback on interface mock-ups and guide templates created by the Systems Librarian. Many items from the discussion were incorporated into the update LibGuides policies for guide owners. LibAnswers and LibChat Teams Integrating LibAnswers with LibChat, an additional Springshare product, at the same time as the migration to LibGuides version 2 is not necessary. Because the acquisition of LibAnswers coincided with the need to upgrade to version 2, the Library staff determined that the two should be done at the same time in order to minimize disruption for patrons. The ad hoc teams tasked with implementing LibAnswers and LibChat met regularly to learn about the new products and to consider how these products would fit into the library’s existing points of service. While the LibAnswers and LibChat teams began as two distinct groups, it became increasingly clear that the functionality of these two systems is interwoven so closely that they must be reviewed and discussed together. The teams spent considerable time learning the functionality of the new systems, considering how the new service points would integrate into the existing offerings, and creating draft policies to provide guidance to staff. The teams developed a set of tips and guidelines to address staff concerns and provide guidance on how the new system should be used (see Appendix C). The teams also held training sessions focused on providing opportunities for staff to explore and practice using the new products. Although the implementation of LibAnswers with LibChat was not necessary to upgrade to LibGuides version 2, undertaking all of these upgrades at once allowed the ad hoc groups to collaborate with ease, define policies and procedures that would help these products integrate seamlessly with existing services, and prevent change fatigue within the Library. Updating the Library Website The final element of migration and implementation the teams had to consider was integration into the Library’s existing web site. Many elements of the Library’s site are dictated by the broader university web policy and content management system. However, working within guidelines the teams were able to take advantage of the new LibGuides interface, especially the more robust A-Z list of databases, to provide users with multiple ways of accessing the new tools. The Library makes use of a tabbed box to provide entry to Summon, the catalog, the list of databases and LibGuides. The new functionality of LibGuides version 2 enabled the team to provide easier access directly to the alphabetical listing of databases. The LibGuides tab was also updated to provide a drop down list of all the guides and a link to browse by guide owner, subject or type of guide. These enhancements saved time for the user and cut down on the number of clicks needed to access database content licensed by the library. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 73 Integrating the LibAnswers product into the site was achieved by providing several different ways for patrons to access it. An FAQ tab was added to the main tabbed box to provide quick access to LibAnswers, complete with a link to submit questions. The “Contact Us” section on the site home page was updated to include a link to LibAnswers as well as newer, more modern icons for the different contact methods. All guide owners were instructed to update the contact information on their guides to include a LibAnswers widget. A great source of inspiration on integrating the tools into the Library site came from looking at other library web sites. The teams worked from the list of LibGuides community members provided on the Springshare help site and by viewing the sites of known peer libraries. Working through an unfamiliar web site can be a quick way to find design ideas and work flows that are successful and attractive. Team members found wording, icons and placement ideas that could be adapted for use on the Nimitz Library site. Advice for Managing a Short Migration Timeline While on a short implementation timeline or with a small staff that has to accomplish this project in addition to their regular duties, it's important to consider a few strategies that can make the process simpler and less stressful. First, communicate expectations with everyone involved in the project at all steps of the process. Determine which stakeholders need to know about the various checklists and upcoming deadlines. Communicating needs and expectations throughout the entirety of the project reduces confusion and enables teams and individual guide owners to complete the project on time. Although LibGuides had predominantly been the domain of the Nimitz Reference Department, projects of this scale also impacted other parts of the library, from systems to the Electronic Resources librarian. Email communication and short notices in the Library’s weekly staff update were the primary means of communication with stakeholders. Documents were shared via Google Drive to provide guide owners with a centralized file of help materials. Also, the point of contact for questions with each element of the migration was clearly identified on each checklist and tip sheet. This single addition to the checklists helped guide owners to quickly and easily get questions and technical issues addressed. On a short timeline it is also important to consider the elements that are crucial for completion and those that can be delayed. Some critical needs in a LibGuides migration include deleting guides that are no longer being used, checking for boxes that will not migrate and deleting bad links. These tasks must be completed by guide owners or administrators to ensure that the migrated data formats properly. Careful attention to these tasks also save the staff unnecessary work on updating and fixing the new guides before going live. Other elements of guide design and migration are merely nice to have. They complement the user’s experience with the final product but neglecting them will not affect basic functionality. These secondary tasks can be completed as time allows. For guide owners, optional tasks include shortening link descriptions, checking for a UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 74 guide description and friendly url and other general updates to the guides. The migration was broken into manageable tasks by giving guide owners a clear list of required and optional items. Team leaders will also need to manage expectations. It can be difficult to remember that web pages, especially LibGuides, are living documents. They can be updated fairly easily after the system has gone live. On a short timeline, in the midst of other duties and responsibilities, it is acceptable for a guide to be just good enough. There is rarely enough time for each guide to reach a state of perfection prior to going live. A guide that is spell-checked and contains accurate information can be edited and made more aesthetically pleasing as time allows after the entire site has gone live. While additional edits are taking place, students still have access to the information they need for their academic work. Lists, such as the subjects and material types in the A-Z list, are always a work in progress based on feedback from service points and usability testing. Updates and edits should be made as patrons interact with the products. Regular use can help library staff identify problems with or confusion about the products that might not be anticipated prior to going live. Stress on guide owners can be greatly reduced by communicating expectations throughout the process. Post-Implementation Nimitz Library successfully went live with both LibGuides version 2 and LibAnswers with LibChat in early January 2016, right before midshipmen returned to campus for the Spring semester. LibAnswers with LibChat was introduced to the campus community with a soft launch at the beginning of the Spring semester due to staffing levels and shifts at the reference desk. The librarian on duty at the reference desk was also responsible for answering any chats or LibAnswers questions initiated during their shift. The volume of questions remained fairly low during the semester. On average, the Library received two synchronous and 1.5 asynchronous incoming questions per week via LibAnswers with LibChat. The low volume was beneficial in that it allowed librarians to become familiar with answering questions and editing FAQs. They were able to handle both face-to-face interactions with patrons in the library and the web traffic. However, the volume was so low that it became apparent more marketing of the service was needed. At the start of the fall 2016 semester, the library made an effort to increase awareness of the new LibAnswers products by emailing all students, mentioning the service in every instruction session, and creating fliers advertising the service and distributing them around the library. Though data is preliminary, statistics have shown that use of these services has more than tripled in the first month of the new semester. As discussed above, the expedited implementation timeline forced the ad hoc teams to prioritize completion of the tasks that were necessary for the system to remain functional after the upgrade. This meant other necessary, but not urgent, updates to guides were left untouched during the migration. Given the amount of effort needed to prepare the guides for migration, it is understandable that guide owners had grown tired of making LibGuides updates and found it UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 75 necessary to move on to other projects. With this fatigue in mind, the team leaders will continue to remind guide authors that LibGuides are living pages in need of constant attention. The team leaders will also take advantage of user feedback to promote continued updates to LibGuides. Throughout the migration process team leaders solicited feedback from staff and users in a variety of ways. First, reference staff wereinformed of design and implementation changes made throughout the migration. They were given time to view and evaluate the master guide template prior to the migration. The team solicited feedback on the names and organization of categories in the A-Z list. After the products went live, the team gathered informal feedback through reference desk interviews, in information literacy instruction sessions and in conversations with faculty and students. Student volunteers participated in usability testing during the Spring semester. They were asked to complete a series of tasks related to the different aspects of the new interface. Their feedback, especially from thinking aloud while completing the tasks, revealed to librarians how students actually use the guides. Both formal and informal feedback helped librarians adapt and improve the guides. Based on the feedback, the Systems Librarian made global changes to improve system functionality. In one instance, users were having difficulty submitting a new LibAnswers question when they could not find an appropriate FAQ response. The Systems librarian made the “Submit Your Question” link more prominent for users in that situation. The LibGuides continue to be evaluated by staff for currency and ease of use. In discussing the first round of usability test results it was determined that more testing during the Fall semester of 2016 would be helpful. During the upgrade to version 2 and implementation of LibAnswers with LibChat, librarians focused on the functions in the system that were most essential or most desired. All of these products contain additional functionality that was not implemented during the upgrade. After a brief rest, the reference department and library web team explored the products’ additional functionality and determined what avenues to explore next. CONCLUSIONS Migration of any platform can be an extensive and time consuming task for library staff. Preparations and post-migration clean up can interrupt staff workflows and strain limited resources. Using migration teams was a successful strategy on a short timeline because it helped spread the workload by delegating specific learning and tasks to specific people. Those people, in turn, became experts in their area of focus and served as a resource for others in the library. This model cultivated a sense of ownership in the migration across many stakeholders that might not have otherwise existed. That sense of ownership in the project, coupled with checklists and spreadsheets full of discrete tasks in need of completion made it possible for a small staff to complete the migration quickly and successfully. Migrating on a short timeline can be especially stressful but careful planning and good communication of expectations helps stakeholders focus on the end goal. Upon completion of the project there was a very real sense of fatigue with this UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 76 project. As a result, tasks that were listed as optional because they weren’t critical for migration went unattended for quite some time after the migration. Slowly, months later, guide owners are ready to revisit guides and continue making improvements. If given more time, this migration may have been completed more methodically and with the intent of having everything perfect before moving on to the next step. Instead, working on a tight timeline forced us to continue moving forward, making necessary changes, and making note of changes to be made in the future. Ultimately, it was a constant reminder that our online presence is and should be a constant work in progress, not the subject of a big, occasional update. REFERENCES 1. Luke F. Gadreau, “Migration checklist for guide owners,” last modified April 3, 2015, https://wiki.harvard.edu/confluence/display/lg2/Migration+Checklist+for+Guide+Owners; Leeanne Morrow et al., “Best Practice Guide for LibGuides,” accessed November 17, 2016, http://libguides.ucalgary.ca/c.php?g=255392&p=1703394; Rebecca Payne, “Updating LibGuides & Preparing for LibGuides v2,” last modified November 18, 2014, https://wiki.doit.wisc.edu/confluence/pages/viewpage.action?pageId=85630373; Julia Furay, “Libguides Presentation: Migrating from v1 to v2 (Julia),” last modified September 29, 2015, http://guides.cuny.edu/presentation/migration. 2. Anna Burke, “LibGuides 2: Content Migration is Here!” last modified April 30, 2014, http://blog.springshare.com/2014/04/30/libguides-2-content-migration-is-here/; Springshare, “On your Checklist: Five Tips & Tricks for Migrating to LibGuides v2,” last modified February 18, 2016; http://buzz.springshare.com/springynews/news-27/springytips; Springshare, “Migrating to LibGuides v2(and going live!),” last modified November 7, 2016, http://help.springshare.com/libguides/update/whyupdate. 3. Lauren McKeen and John Hernandez, “Moving mountains: surviving the migration to LibGuides 2.0,” Online Searcher 39 (2015): 16-21, http://www.infotoday.com/OnlineSearcher/Articles/Features/Moving-Mountains-Surviving- the-Migration-to-LibGuides--102367.shtml. 4. Vicky Duncan et al., “Implementing LibGuides 2: an academic case study,” Journal of Electronic Resources Librarianship, 27 (2015): 248-258, https://dx.doi.org/10.1080/1941126X.2015.1092351 5. Jimmy Ghaphery and Erin White, “Library use of web-based research guides,” Information Technology and Libraries 31 (2012): 21-31, http://dx.doi.org/10.6017/ital.v31i1.1830; Danielle A Becker; “LibGuides remakes: how to get the look you want without rebuilding your website,” Computers in Libraries 34 (2014): 19-22, http://www.infotoday.com/cilmag/jun14/index.shtml; Michal Strutin, “Making research guides UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 77 more useful and more well used,” Issues in Science and Technology Librarianship 55(2008), https://dx.doi.org/10.5062/F4M61H5K. 6. Ning Han and Susan L. Hall, “Think Globally! Enhancing the International Student Experience with LibGuides,” Journal of Electronic Resources Librarianship 24(2012): 288-297, https://dx.doi.org/10.1080/1941126X.2012.732512; Gabriela Castro Gessner et al., “Are you reaching your audience?: The Intersection Between LibGuide Authors and LibGuide Users,” Reference Services Review 43(2015): 491-508, http://dx.doi.org/10.1108/RSR-02-2015-0010. 7. Luigina Vileno, “Testing the usability of two online research guides,” Partnership: The Canadian Journal of Library and Information Practice and Research 5 (2012), https://dx.doi.org/10.21083/partnership.v5i2.1235; Rachel Hungerford et., “LibGuides usability testing: customizing a product to work for your users,” http://hdl.handle.net/1773/17101; Alec Sonsteby and Jennifer DeJonghe, “Usability testing, user-centered design, and LibGuides subject guides: a case study,” Journal of Web Librarianship 7(2013): 83-94, https://dx.doi.org/10.1080/19322909.2013.747366. 8. Mardi Mahaffy, “Student use of library research guides following library instruction,” Communications in Information Literacy 6(2012): 202-213, http://www.comminfolit.org/index.php?journal=cil&page=article&op=view&path%5B%5D=v 6i2p202. 9. Kate A Pittsley and Sara Memmot, “Improving Independent Student Navigation of Complex Educational Web Sites: An Analysis of Two Navigation Design Changes in LibGuides,” Information Technology and Libraries 31 (2012): 52-64, https://dx.doi.org/10.6017/ital.v31i3.1880. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 78 Appendix A: LibGuides Pre-Migration Checklist If there are issues, contact the Head of Reference & Instruction. Required before migration: Due Date Task Check when Complete 26 October 2015 Review attached report of guides that have not been updated in the last year. Delete or consolidate unneeded, practice, or backup guides.* 26 October 2015 Review attached report of guides with fewer than 500 hits. Delete or consolidate unneeded, practice, or backup guides.* 26 October 2015 Review all links to all databases included on your guides and make sure the links are mapped to the A-Z list. 26 October 2015 Review all guides for links not included in the current A-Z List. List any links that you think should be included in the A-Z List moving forward on the shared spreadsheet (A-Z Additions and Best Bets). Be sure to include all necessary information, including subject and type. Mid- October 2015 & 28 October 2015 Review forthcoming reports about broken links. Anticipate one report on October 13, and one October 26. 26 October 2015 Review the Databases by subject page of the A-Z list and make sure everything that should be included in your subject is there. Add anything you’d like removed from your subject to UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 79 the shared spreadsheet (tab 2). Identify 3 “best bets” databases for each of your subject areas on the shared spreadsheet (tab 3). 26 October 2015 Ensure all images have an alt tag 26 October 2015 Delete outdated or unused images in your image collection 26 October 2015 Convert all tables to percentages, not pixels 26 October 2015 Review attached report of boxes that will not migrate into version 2. (This won’t apply to everyone) 26 October 2015 Email the chair of the Web Team if you have guides with boxes containing custom formatting or code (this is only necessary if you manually adjusted the HTML or CSS, or use a tabs within a box on your guide). We are keeping a master list to double check after migration. 26 October 2015 Check all links to the catalog in your guides to make sure they are accurate 26 October 2015 Check all widgets (like catalog search boxes) to ensure they function properly, delete any widgets you don’t need, and keep a list of widgets to check post-migration to make sure they still function. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 80 Optional before migration: Due Date Task Check when Complete Consider turning links in ‘rich text’ boxes to a ‘links and list’ box. Review all guides to ensure they have a friendly URL, are assigned to a subject, have assigned tags, and a brief guide description. Shorten database descriptions to one to two sentences. Consider including dates of coverage and why it’s useful for this particular subject. Helpful hints: *If you’d like to hold on to content from guides you plan to delete, create an unpublished “master guide” where you can store content you plan to use in the future. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 81 Appendix B: LibGuides Post-Migration Checklist and Guide Clean up NOTE: Now that migration is complete, if you make an update to your version 1 guides, your change will not transfer to version 2. This means broken links will need to be fixed in both versions. If there are issues or questions contact the Head of Reference & Instruction (general questions), the Systems Librarian (technical issues), or the Electronic Resources Librarian (database assets and A-Z list). CLEAN UP AND CHECK CONTENT 1) Check boxes to make sure content is correctly displayed on all your guides. Check all boxes closely, as some had the header end up below the first bullet point. For example: To fix an issue like this - Click on at the bottom of the box you are working on. Then click on “Reorder Content”. You can move the links down and the text up 2) Ensure all guides have a friendly URL, are assigned to a subject, have assigned tags if you didn’t do this pre-migration. See the pre- migration handout for help. In version 2 this information will display at the TOP of our guides in edit mode and at the BOTTOM of our guides on the public interface. 3) Ensure images are resized to fit general web guidelines - See this guide for help http://guidefaq.com/a.php?qid=12922 4) Check all your widgets to ensure they still function properly 5) Add a guide type to each of your guides. This is a new feature in LibGuides version 2. It is under the gear on the right side of your guide while in edit mode. This will help us sort and organize them in the list of guides. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 82 ADD NEW LIBGUIDES 2 CONTENT 1) Make a box pointing to related guides. Research has shown that a box on the guide home page pointing to related guides can be very helpful to students. Link to other subject guides that would be of interest and any course guides for that subject. For example: the box on the Mechanical Engineering guide contains links to EM215 and Nuclear Engineering (which is part of the Mechanical Engineering department). To do this - go to the bottom of your welcome box, click the Add/Reorder button, and then on Guide List, your first option is to manually choose guides to add to the list. 2) Add a tab to every guide that is named Citing Your Sources and redirects to the Citing Your Sources LibGuide. To do this: a. Create a blank page named Citing Your Sources at the bottom of your left side navigation b. On your blank page click on the to open the options for editing the page. c. Click on Redirect URL and paste the link to the Citing Sources guide in the box. d. It is also a good idea to mark the open in a new window box as well e. If you’ve completed it successfully your Citing Your Sources tab will look like this in edit mode. Since the Citing Your Sources guide is still a work in progress it is unpublished and you will get an error when you preview it. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 83 f. Finally, REMOVE the plagiarism and citing sources box from your guides. 3) Now is a good time to take advantage of new functionality and to update the content of your guides. You can now combine multiple types of information into the same box, you can also take advantage of tabbed boxes. See this LibGuide for further assistance: http://support.springshare.com/libguides/migration/v2cleanup-regular 4) Create your new Profile Box At the meeting on Oct 20th, the Reference & Instruction department agreed that the following elements should be consistent in the profile box: Box Name: Librarian Image: A stock photo or a personal photo (picture day coming soon) In the Contact box: Title Nimitz Library XXXX Dept. Office # XXX 410-293-XXXX EMAIL ADDRESS And your subjects will be displayed below UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 84 Appendix C: Tips & Guidelines for LibAnswers with LibChat WHAT MODES OF INQUIRY WILL BE AVAILABLE TO USERS? Using the LibAnswers platform, users will be able to submit questions via chat or by using the question form within LibAnswers. Users will also be able to ask questions as they did before: at the reference desk, via askref@usna.edu, and by calling 410-293-2420. WHAT ARE “BEST PRACTICES” OR GUIDELINES FOR LIBANSWERS W/ LIBCHAT? See the tips for responding to tickets at the bottom of this document. See the tips for creating/maintaining FAQ at the bottom of this document. See the tips for responding to chat questions at the bottom of this document. WHAT PRIORITY SHOULD I GIVE RESPONSES COMING THROUGH VARIOUS MODES OF INQUIRY? Reference staff will have to use their professional judgement when deciding what priority to give questions coming in through various modes of inquiry. While the addition of chat and tickets may seem overwhelming at first, the same rules you’ve applied in the past will work. If a chat comes in while you’re helping someone face-to-face, use that as an opportunity to advertise the chat service. Explain to the patron that you also help users via chat and you’re going to let the chatter know that you’ll be with them shortly. The same can apply if you’re finishing up a chat when a face-to-face user walks up. Simply explain that the library also offers a chat service and you’re just finishing up a question. Remember to get comfortable with and take advantage of the canned messages in chat, let the phone go to voicemail if necessary, and explain to face-to-face users what’s happening. During the pilot phase you should also keep track of strategies that worked well for you, or times when the various modes of inquiry became too overwhelming. We’ll take all of that into consideration when we reexamine this service. Chat, phone, and face-to-face interactions are synchronous modes of communication, so users expect responses immediately. Tickets are asynchronous modes of communication and should be dealt with on a first come, first served basis. Respond to tickets when you have time. When responding to tickets, respond to the oldest tickets first as that user has been waiting the longest for an answer. However, feel free to use your judgement and, if you choose, respond to questions with quick answers right away. HOW SHOULD I PRIORITIZE QUESTIONS FROM USNA V. NON-USNA USERS? Priority should be given to midshipmen, faculty, and staff. If an outside user makes use of the chat or ticket service, feel free to explain to them that this service is primarily for faculty/staff/students and they should direct their question to askref@usna.edu. If you are free and have time, feel free to assist outside patrons via the chat or ticket system. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 85 HOW SHOULD I HANDLE REMAINING QUESTIONS DURING A CHANGE IN SHIFTS? Handle them in the same manner that you would a face-to-face question with a student, faculty, or staff member. Finish up quickly if you can, advise the patron that you need to leave and offer to handle the question when you return, or transfer the chat to another librarian. If there are remaining tickets in the queue, simply notify the next librarian on duty. WHAT ARE THE EXPECTED TURNAROUND TIMES FOR RESPONDING TO PATRON INQUIRIES? Chat, face-to-face, and phone inquiries should be responded to as immediately as possible. Tickets should be responded to within a business day. WHO CAN I CONTACT FOR HELP AND TROUBLESHOOTING? If you have questions, your first stop should be the LibAnswers FAQ, provided by Springshare (available in the “Help” section when logged into LibApps). If you can’t find the answer to your question there, feel free to contact the Head of Reference and Instruction, who will work to resolve the problem with you. GUIDELINES FOR RESPONDING TO LIBANSWERS TICKETS*: ● Keep in mind that when you are responding to tickets, you are a jack of all trades. That means even if the question is outside of your subject area, you should do your best to provide the user information that will get them started. In that email you may also suggest that the user contact the subject specialist. ● Respond to LibAnswers tickets in the same way you would respond to an email inquiry from a user. ● If you provide a factual response, be sure to include the source from which that information came. GUIDELINES FOR CREATING/MAINTAINING FAQS*: ● The FAQ database is a public-facing, searchable collection of questions and answers. The intent is to empower our users to find their answers. Any question that might be considered a frequently asked question should be included in the FAQ. This might include questions about the library, the collections, how to find specific types of information, how to start research on specific and recurring assignments etc. ● When creating an FAQ from a ticket, remember that you can edit the question. Do your best to format the question in a way that would be applicable and relevant to the most users. ● When creating an FAQ from a response you’ve already written, be sure to edit out any personally identifiable information (PII) about the person who initially asked the question. Be sure to check the question and response for any PII. UP AGAINST THE CLOCK: MIGRATING TO LIBGUIDES V2 ON A TIGHT TIMELINE | BULJUNG AND JOHNSON https://doi.org/10.6017/ital.v36i2.9585 86 ● If you want to modify an FAQ: If a member of the staff notices incomplete or incorrect information in an FAQ response, he/she should use professional judgement in deciding how to handle the situation. If it’s an error that may have been caused by a typo, he/she may choose to edit the response immediately. However, if the edit impacts the substantive content of the response, he/she may choose to consult with the librarian who initially wrote the response. GUIDELINES FOR LIBCHAT*: ● If you refer a question, alert the librarian to whom the user is being referred. ● Remember the person you’re chatting with can’t see you so if you leave (to conduct a search, to check a book, to help someone else etc.) let them know you’ll be right back. ● Sometimes chat questions can seem rushed, so it may be tempting to answer the initial question. Remember, like face-to-face interactions, clarifying queries save time for the user and the librarian, allowing for the provision of more accurate and efficient answers. ● When providing responses, remember that as an academic library, our mission is to provide the information needed and to instruct our users so they may become self-reliant; Chat challenges us to balance providing answers and instruction. Do your best to find an appropriate balance. ● As the transaction is ending, remain courteous, check that all the user’s questions have been addressed, and encourage them to use the service again. * Note: These Guidelines are drafts and will evolve as the staff learns more about this system throughout the pilot phase. 9595 ---- Identifying Emerging Relationships in Healthcare Domain Journals via Citation Network Analysis Kuo-Chung Chu, Hsin-Ke Lu, and Wen-I Liu INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 39 Kuo-Chung Chu (kcchu@ntunhs.edu.tw) is Professor, Department of Information Management, and Dean, College of Health Technology, National Taipei University of Nursing and Health Sciences; Hsin-Ke Lu (sklu@sce.pccu.edu.tw) is Associate Professor, Department of Information Management, and Dean, School of Continuing Education, Chinese Culture University; Wen-I Liu (wenyi@ntunhs.edu.tw, Corresponding author) is Professor, Department of Nursing, and Dean, College of Nursing, National Taipei University of Nursing and Health Sciences. ABSTRACT Online e-journal databases enable scholars to search the literature in a research domain or to cross- search an interdisciplinary field. The key literature can thereby be efficiently mapped. This study builds a web-based citation analysis system consisting of four modules: (1) literature search; (2) statistics; (3) articles analysis; and (4) co-citation analysis. The system focuses on the PubMed Central dataset and facilitates specific keyword searches in each research domain for authors, journals, and core issues. In addition, we use data mining techniques for co-citation analysis. The results could help researchers develop an in-depth understanding of the research domain. An automated system for co-citation analysis promises to facilitate understanding of the changing trends that affect the journal structure of research domains. The proposed system has the potential to become a value-added database of the healthcare domain, which will benefit researchers. INTRODUCTION Healthcare is a multidisciplinary research domain of medical services provided both inside and outside a hospital or clinical setting. Article retrieval for systematic reviews in the domain is much more elusive than retrieval for reviews in clinical medicine because of the interdisciplinary nature of the field and the lack of a significant body of evaluative literature. Other connecting research fields consist of the respective research fields of the application domain (i.e., the health sciences, including medicine and nursing).1 In addition, valuable knowledge and methods can be taken from the fields of psychology, the social sciences, economics, ethics, and law. Further, the integration of those disciplines is attracting increasing interest.2 Researchers may use bibliometrics to evaluate the influence of a paper or describe the relationship between citing and cited papers. Citation analysis, one of several possible bibliometric approaches, is more popular than others because of the advent of information technologies.3 Citation analysis counts the frequency of cited papers from a set of citing papers to determine the most influential scholars, publications, or universities in a discipline. It can be classified into two basic types: the first type counts only the citations in a paper that are authored by an individual, while the second mailto:kcchu@ntunhs.edu.tw mailto:sklu@sce.pccu.edu.tw mailto:wenyi@ntunhs.edu.tw IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 40 https://doi.org/10.6017/ital.v37i1.9595 type analyzes co-citations to identify intellectual links among authors in different articles. This paper focuses on the second type of citation analysis. Small defined co-citation analysis as “the frequency with which two items of earlier literature are cited together by the later literature.”4 It is not only the most important type of bibliometric analysis, but also the most sophisticated and popular method. Many other methods originate from citation analysis, including document co-citation analysis, bibliographic coupling,5 author co- citation analysis,6 and co-word analysis.7 There are levels of co-citation analysis: document, author, and journal. Co-citation could be used to establish a cluster or “core” of earlier literature.8 The pattern of links between documents can establish a structure to highlight the relationship of research areas. Citation patterns change when previously less-cited papers are cited more frequently, or old papers are no longer cited. Changing citation patterns imply the possibility of new developments in research areas; furthermore, we can investigate changing patterns to understand the scientific trend within a research domain.9 Co-citation analysis can help obtain a global overview of research domains.10 The aim of this paper is to detect emerging issues in the healthcare research domain via citation network analysis. Our results can provide a basis for knowledge that researchers can use to construct a search strategy. Structural knowledge is intrinsic to problem solving. Because of the interdisciplinary nature of the healthcare domain and the broadness of the term, research is performed in several research fields, such as nursing, nursing informatics, long-term care, medical informatics, geriatrics, information technology, telecommunications, and so forth. Although electronic journals enable searching by author, article, and journal title using keywords or full text, the results are limited to article content and references and therefore do not provide an in-depth understanding of the knowledge structure in a specific domain. The knowledge structure includes the core journals, core issues, the analysis of research trends, and the changes in focus of researchers. For a novice researcher, however, the literature survey remains a troublesome process in terms of precisely identifying the key articles that highlight the overview concept in a specific domain. The process is complicated and time-consuming, and it limits the number of articles collected for retrospective research. The objective of this paper is to provide information about the challenges and methodology of relevant literature retrieval by systematically reviewing the effectiveness of healthcare strategies. To this end, we build a platform for automatically gathering the full text of e- journals offered by the PubMed Central (PMC) database.11 We then analyze the co-citation results to understand the research theme of the domain. METHODS This paper tries to build a value-added literature database system for co-citation analysis of healthcare research. The results of the analysis will be visually presented to provide the structure of the domain knowledge to increase the productivity of researchers. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 41 Dataset For co-citation analysis, a data source of related articles on healthcare is required. For this paper, the articles were retrieved from the PMC database using search terms related to the healthcare domain. To build the article analysis system, we used bibliometrics to locate the relevant references while analysis techniques were implemented by the association rule algorithm of data mining. The PMC database, which is produced by the US National Institutes of Health and is implemented and maintained by the US National Center for Biotechnology Information of the US National Library of Medicine, provides electronic articles from more than one thousand full-text journals for free. We could understand the publication status from the Open Access Subset (OAS) and access to the OAI (Open Archives Initiative) Protocol for Metadata Harvesting, which includes the full text in XML and PDF. Regarding access permission, PMC offers a dataset of many open access journal articles. This paper used a dedicated XML-formatted dataset (https://www.ncbi.nlm.nih.gov/pmc/tools/oai/). The XML-formatted dataset followed the specification of DTD (document type definition) files, which are sorted by journal title. Each article has a PMCID (PMC identification), which is useful for data analysis. In addition to the dataset, the PMC also provides several web services to help widely disseminate articles to researchers. PubMed Central (PMC) citation database Searching module Citation module Web view Users Data sourceMiddle-end Pre-processeingBack-end Front-end XML files Web serverDB server Keyword Co-citation module Statistical module Figure 1. The system architecture of citation analysis with four subsystems. https://www.ncbi.nlm.nih.gov/pmc/tools/oai/ IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 42 https://doi.org/10.6017/ital.v37i1.9595 System Architecture Our development environment consisted of the following four subsystems: front-end, middle-end, back-end, and pre-processing. The front-end creates a “web view,” a visualization of the results for our web-based co-citation analysis system. The system architecture is shown in figure 1. Front-End Development Subsystem We used Adobe Dreamweaver CS5 as a visual development tool for the design of web templates. The PHP programming language was chosen to build the co-citation system that would be used to access and analyze the full-text articles. In terms of the data mining technique, we implemented the Apriori algorithm with the PHP language.12 The results were exported as XML to a charting process, where we used amCharts (https://www.amcharts.com/), to create stock charts, column charts, pie charts, scatter charts, line charts, and so forth. Middle-End Server Subsystem The system architecture was a Microsoft Windows-based environment with a XAMPP 2.5 web server platform (https://www.apachefriends.org/download.html). XAMPP is a cross-platform web development kit that consists of Apache, MySQL, PHP, and Perl. It works across several operating systems, such as Linux, Windows, Apache, macOS, and Oracle Solaris, and provides SSL encryption, a phpMyAdmin database management system, Webalizer traffic management and control suite, a mail server (Mercury Mail Transport System), and FileZilla FTP server. Back-End Database Subsystem To speed up co-citation analysis, the back-end database system used MySQL 5.0.51b with interface phpMyAdmin 2.11.7 for easy management of the database. MySQL includes the following features: • Using C and C++ to code programs, users can develop an application programming interface (API) through Visual Basic, C, C + +, Eiffel, Java, Perl, PHP, Python, Ruby, and Tcl languages with the multithreading capability that can be used in multi-CPU systems and easily linked to other databases. • Performance of querying articles is quick because SQL commands are optimally implemented, providing many additional commands and functions for a user-friendly and flexible operating database. An encryption mechanism is also offered to improve data confidentiality. • MySQL can handle a large-scale dataset. The storage capacity is up to 2TB for Win32 NTS systems and up to 4TB for Linux ext3 systems. • It provides the software MyODBC as an ODBC driver for connecting many programming languages, and it several languages and character sets to achieve localization and internationalization. Pre-processing Subsystem The PMC provides access to the article via OAS, OAI services, e-utilities, and FTP. We used FTP to download a compressed (ZIP) file packaged with a filename following the pattern “articles?-?.xml.tar.gz” on October 28, 2012 (ftp://ftp.ncbi.nlm.nih.gov/pub/pmc), where “?-?” is “0-9” or “A-Z”. The size of the ZIP file was approximately 6.17GB. After extraction, the size of the articles was approximately 10GB. The 571,890 articles from 3,046 journals were grouped and https://www.amcharts.com/ https://www.apachefriends.org/download.html ftp://ftp.ncbi.nlm.nih.gov/pub/pmc INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 43 sorted by journal title in a folder labeled with an abbreviated title. An XML file would, for example, be named “AAPSJ-10-1-2751445.nxml,” where “AAPSJ” was the abbreviated title of the journal American Association of Pharmaceutical Scientists Journal, “10” was the volume of the journal, “1” was number of the issue, and “2751445” was the PMCID. We used related technologies for developing systems that include PHP language, array usage, and the Apriori algorithm to analyze the articles and build the co-citation system.13 Finally, several analysis modules were created to build an integrated co-citation system. RESEARCH PROCEDURE The following is our seven-step research procedure to fulfill the integrated co-citation system: 1. Parse XML file: select tags for construction of database; choose fields for co-citation analysis (for example, , , and ). 2. Present web-based article: design webpage and CSS style; present web-based XML file by indexing variable . 3. Build an abstract database: the database consists of several fields: , , , , , , and . 4. Develop searching module: pass the keyword to the method “POST” in SQL query language and present the search result in the webpage. 5. Develop statistical module: the statistical results include number of article and cited articles, the journals and authors cited in all articles, and the number of cited articles. 6. Develop citation module: visually present the statistical results in several formats; rank searched journals; rank searched and cited journals in all the articles. 7. Develop co-citation module: analyze the association between articles with the Apriori algorithm. Association Rule Algorithms The association rule (AR), usually represented by AB, means that the transaction containing item A also contains item B. There are many such rules in most of the dataset, but some were useless. To validate the rules, two indicators, support and confidence, can be applied. Support, which means usefulness, is the number of times the rules feature in the transactions, whereas confidence means certainty, which is the probability that B occurs whenever the A occurs. We chose the rules for which the values of both support and confidence were greater than a predefined threshold. For example, a rule stipulating “toastjam” has support of 1.2 percent and confidence of 65 percent, implying that 1.2 percent of the transactions contain “toast” and “jam” and that 65 percent of the transactions containing “toast” also contained “jam.” The principle for generating the AR is based on two features of the documents: (1) find the high- frequency items that set their supports greater than the threshold; (2) for each dataset X and its subnet Y, check the rule XY if the support is greater than the threshold, in which the rule XY means that the occurrence in the rule containing X also contains Y. Most studies focus on searching high-frequency item sets.14 The most popular approach for identifying the item sets is Apriori algorithm, as shown in figure 2.15 The algorithm rationale is that if the support of item set I is less IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 44 https://doi.org/10.6017/ital.v37i1.9595 than or equal to the threshold, I is not a high-frequency item set. New item set I that inserts any item A into I would not be a high-frequency item set. According to the rationale, the Apriori algorithm is an iteration-based approach. First, it generates candidate item set C1 by calculating the number of occurrences of each attribute and finding that the high-frequency item set L1 has support greater than the threshold. Second, it generates item set C2 by joining L1 to C1, iteratively finding L2 and generating C3, and so on. 1: L1 = {large 1-item sets}; 2: for (k=2; Lk-1; k++) do begin 3: Ck = Candidate_gen (Lk-1); 4: for all transactions tD do begin /* generate candidate k-dataset*/ 5: Ct = subset (Ck, t); 6: for all candidates c  Ct do 7: c_count=c_count+1; 8: end 9: Lk ={cCk | c_count ≥ minsuppport} 10: end 11: return L =  Lk; Figure 2. The Apriori algorithm. The Apriori algorithm is one of the most commonly used methods for AR induction. The Candidate_gen algorithm, as shown in figure 3, includes join and prune operations for generating candidate sets.16 Steps 1 to 4 generate all possible candidate item sets c from Lk-1. Steps 5 to 8: delete the item set that is not a frequent item set by the Apriori algorithm. Step 9 returns candidate set Ck to the main algorithm. 1: for each item set X1 Lk-1 2: for each item set X2 Lk-1 3: c = join (X1[1], X1[2], X1[k-2], X1[k-1], X2[k-1]) 4: Where X1[1] = X2[1], X1[k-2] = X2[k-2], X1[k-1] < X2[k-1]; 5: for item sets c  Ck do 6: for all (k-1)-subsets s of c do 7: if (s  Lk-1) then add c to Ck; 8: else delete c from Ck; 9: return Ck; Figure 3. The Candidate_gen algorithm. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 45 RESULTS We searched the PMC database with keywords “healthcare,” “telecare,” “ecare,” “ehealthcare,” and “telemedicine” and located 681 articles with a combined 14,368 references. Values were missing from the year field for 4 of the references; this was also the case for 635 of a total of 52,902 authors. According to the keyword search for the healthcare domain, a pie chart of the journal citation analysis, as shown in figure 4, the top-ranked journal in terms of citations was the British Medical Journal (BMJ). It was cited approximately 439 times, 18.89 percent of the total, followed by the Journal of the American Medical Association (JAMA), which was cited approximately 344 times, 14.80 percent of the total. The trend of healthcare citation 1852 to 2009 peaked in 2006 at approximately 1,419 citations, with more than half of the total occurring in this year. Figure 4. Top-cited journals in the healthcare domain by percentage of total citations (N = 2324) With the keyword search for the healthcare domain, Figure 5 shows a pie chart of the author citations. The most-cited author was J. W. Varni, professor of pediatric cardiology at the University of Michigan Mott Children’s Hospital in Ann Arbor. This author was cited approximately 149 times, equivalent to 23.24 percent of the total, followed by D. N. Herndon, professor at the Department of Plastic and Hand Surgery, Friedrich-Alexander University of Erlangen in Germany. This author was cited approximately 73 times, 11.39 percent of the total. By identifying the affiliations of the top- ranked authors, researchers can access related information in their field of interest. The co-citation analysis was conducted using the Apriori algorithm. The relationship of co-citation journals with a supporting degree greater than 38 from 1852 to 2009 is shown in figure 6. Each IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 46 https://doi.org/10.6017/ital.v37i1.9595 journal was denoted by a node, where the node with double circle meant the journal is co-cited with the other in a citing article. BMJ, which covers the fields of evidence-based nursing care, obstetrics, healthcare, nursing knowledge and practices, and others, is the core journal of the healthcare domain. Figure 5. Top-cited authors in journals of the healthcare domain by percentage of total citations (N = 641) Figure 6. The relationship of co-citation journals with BMJ. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 47 To identify the focus of the journal, we analyze the co-citation in three periods. In 1852–1907, journals are not in co-citation relationships; in 1908–61, five candidates had a supporting degree greater than 1 (see table 1); and in 1962–2009, twenty-eight candidates had a supporting degree greater than 14 (see table 2 (for example, BMJ and Lancet had sixty-eight co-citations). Table 1. Candidates in co-citation analysis with a supporting degree greater than 1 (1908–61). No Journals No. of Journals Co-cited Support 1 Publ Math Inst Hung Acad Sci, Publ Math 2 3 2 JAOA, J Osteopath 2 1 3 Antioch Rev, J Abnorm Soc Psychol 2 1 4 N Engl J Med, Am Surg 2 1 5 Arch Neurol Psychiatry, J Neurol Psychopathol, Z Ges Neurol Psychiat 3 1 Table 2. Candidates in co-citation analysis with a supporting degree greater than 14 (1962–2009). No Journals No. of Journals Co-cited Support 1 BMJ, Lancet 2 68 2 BMJ, JAMA 2 65 3 JAMA, Med Care 2 64 4 BMJ, Arch Intern Med 2 61 5 Lancet, JAMA 2 52 6 Soc Sci Med, BMJ 2 52 7 JAMA, Arch Intern Med 2 51 8 Lancet, Med Care 2 50 9 Crit Care Med, Prehospital Disaster Med 2 49 10 N Engl J Med, BMJ 2 49 11 N Engl J Med, Lancet 2 49 12 N Engl J Med, JAMA 2 47 13 N Engl J Med, Med Care 2 47 14 Qual Saf Health Care, BMJ 2 47 15 BMJ, Crit Care Med 2 42 16 Med Care, BMJ 2 38 17 N Engl J Med, J Bone Miner Res 2 33 IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 48 https://doi.org/10.6017/ital.v37i1.9595 18 N Engl J Med, J Pediatr Surg 2 26 19 Lancet, J Pediatr Surg 2 25 20 JAMA, Nature 2 25 21 Lancet, JAMA, BMJ 3 24 22 N Engl J Med, Lancet, BMJ 3 21 23 Intensive Care Med, BMJ 2 21 24 BMJ, N Engl J Med, JAMA 3 20 25 N Engl J Med, JAMA, Lancet 3 20 26 JAMA, Med Care, Lancet 3 14 27 JAMA, Med Care, N Engl J Med 3 14 28 BMJ, JAMA, Lancet, N Engl J Med 4 14 The link of co-citation journals in three periods from 1852 to 2009 can be summarized as follows: (1) three journals were highly cited but were not in a co-citation relationship in 1852–1907 (see figure 7); (2) five clusters of the healthcare journals in co-citation relationships were found for the years 1908–61 (see figure 8); and (3) 1962–2009 had a distinct cluster of four journals within the healthcare domain (see figure 9). Figure 7. The relationship of co-citation journals for the healthcare domain in 1852–1907. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 49 Figure 8. The relationship of co-citation journals for the healthcare domain in 1908–61. Journals with double circles are co-cited with the other in a citing article. Journals with triple circles are co- cited with the other two in a citing article. Figure 9. The relationship of co-citation journals for the healthcare domain in 1962–2009. The thick line and circle indicates the journals are co-cited in a citing article. CONCLUSIONS IDENTIFYING EMERGING ISSUES IN THE HEALTHCARE DOMAIN | CHU, LU, AND LIU 50 https://doi.org/10.6017/ital.v37i1.9595 This paper presented an automated literature system for co-citation analysis to facilitate understanding of the sequence structure of journal articles cited in the healthcare domain. The system visually presents the results of its analysis to help researchers quickly identify the key articles that provide an overview of the healthcare domain. This paper used the keywords related to healthcare for its analysis and found that BMJ is a core journal in the domain. The co-citation analysis found a single cluster within the healthcare domain comprising four journals: BMJ, JAMA, Lancet, and the New England Journal of Medicine. This paper focused on a co-citation analysis of journals. Authors, articles, and issues featured in the co-citation analysis can be further studied in an automated way. A period analysis of publication years is also important. Further analyses can facilitate understanding of the changes in a research domain and the trend of research issues. In addition, the automatic generation of a map would be a worthwhile topic for the future study. ACKNOWLEDGEMENTS This article was funded by the Ministry of Science and Technology of Taiwan (MOST), formerly known as National Science Council (NSC), with Grant No: NSC 100-2410-H-227-003. For the remaining authors none were declared. All the authors have made significant contributions to the article and agree with its content. There is no known conflict of interest in this study. REFERENCES 1 A. Kitson et al., “What are the Core Elements of Patient-Centered Care? A Narrative Review and Synthesis of the Literature from Health Policy, Medicine and Nursing,” Journal of Advanced Nursing 69 (2013): 4–8, https://doi.org/10.1111/j.1365-2648.2012.06064.x. 2 S. J. Brownsell et al., “Future Systems for Remote Health Care,” Journal of Telemedicine and Telecare 5 (1999): 145–48, https://doi.org/10.1258/1357633991933503; B. G. Celler, N. H. Lovell, and D. K. Chan, “The Potential Impact of Home Telecare on Clinical Practice,” Medical Journal of Australia 171 (1999): 518–20; R. Walker et al., “What It Will Take to Create New Internet Initiatives in Health Care,” Journal of Medical Systems 27 (2003): 95–98, https://doi.org/10.1023/A:1021065330652. 3 I. Marshakova-Shaikevich, The Standard Impact Factor as an Evaluation Tool of Science Fields and Scientific Journals,” Scientometrics 35 (1996): 283–85, https://doi.org/10.1007/BF02018487; I. Marshakova-Shaikevich, “Bibliometric Maps of Field of Science,” Information Processing & Management 41(2005):1536–45, https://doi.org/10.1016/j.ipm.2005.03.027; A. R. Ramos- Rodrí guez and J. Ruí z-Navarro, “Changes in the Intellectual Structure of Strategic Management Research: A Bibliometric Study of the Strategic Management Journal, 1980–2000,” Strategic Management Journal 25, no. 10 (2004): 982–1000, https://doi.org/10.1002/smj.397. 4 H. Small, “Co-citation in the Scientific Literature: A New Measure of the Relationship between Two Documents,” Journal of American Society for Information Science 24 (1973): 266–68. https://doi.org/10.1111/j.1365-2648.2012.06064.x https://doi.org/10.1258/1357633991933503 https://doi.org/10.1023/A:1021065330652 https://doi.org/10.1007/BF02018487 https://doi.org/10.1016/j.ipm.2005.03.027 https://doi.org/10.1002/smj.397 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 51 5 M. M. Kessler, “Bibliographic Coupling between Scientific Papers,” American Documentation 14 (1963): 10–25, https://doi.org/10.1002/asi.5090140103; B. H. Weinberg, “Bibliographic Coupling: A Review,” Information Storage and Retrieval 10 (1974): 190–95. 6 H. D. White and B. C. Griffith, “Author Cocitation: A Literature Measure of Intellectual Structure,” Journal of the American Society for Information Science 32 (1981): 164–70, https://doi.org/10.1002/asi.4630320302. 7 Y. Ding, G. G. Chowdhury, and S. Foo, “Bibliometric Cartography of Information Retrieval Research by Using Co-word Analysis,” Information Processing & Management 37 no. 6 (November 2001): 818–20, https://doi.org/10.1016/S0306-4573(00)00051-0. 8 Small, “Co-citation,” 266. 9 D. Sullivan et al., “Understanding Rapid Theoretical Change in Particle Physics: A Month-by- Month Co-citation Analysis,” Scientometrics 2 (1980): 312–16, https://doi.org/10.1007/BF02016351. 10 N. Shibata et al., “Detecting Emerging Research Fronts based on Topological Measures in Citation Networks of Scientific Publications,” Technovation 28 (2008): 762–70, https://doi.org/10.1016/j.technovation.2008.03.009. 11 Weinberg, “Bibliographic Coupling.” 12 White and Griffith, “Author Cocitation.” 13 R. Agrawal and R. Srikant. “Fast Algorithm for Mining Association Rules in Large Databases” (paper, International Conference on Very Large Databases [VLDB], September 12–15, 1994, Santiago de Chile). 14 R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases” (paper, ACM SIGMOD International Conference on Management of Data, Washington, DC, May 25–28, 1993. 15 Agrawal and Srikant, “Fast Algorithm,” 3. 16 Ibid., 4. https://doi.org/10.1002/asi.5090140103 https://doi.org/10.1002/asi.4630320302 https://doi.org/10.1016/S0306-4573(00)00051-0 https://doi.org/10.1007/BF02016351 https://doi.org/10.1016/j.technovation.2008.03.009 Abstract Introduction Methods Dataset System Architecture Front-End Development Subsystem Middle-End Server Subsystem Back-End Database Subsystem Pre-processing Subsystem Research Procedure Association Rule Algorithms Results Conclusions Acknowledgements References 9598 ---- Microsoft Word - March_ITAL_Massicotte_proof_revised.docx Reference Rot in the Repository: A Case Study of Electronic Theses and Dissertations (ETDs) in an Academic Library Mia Massicotte and Kathleen Botter INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 11 ABSTRACT This study examines ETDs deposited during the period 2011-2015 in an institutional repository, to determine the degree to which the documents suffer from reference rot, that is, linkrot plus content drift. The authors converted and examined 664 doctoral dissertations in total, extracting 11,437 links, finding overall that 77% of links were active, and 23% exhibited linkrot. A stratified random sample of 49 ETDs was performed which produced 990 active links, which were then checked for content drift based on mementos found in the Wayback Machine. Mementos were found for 77% of links, and approximately half of these, 492 of 990, exhibited content drift. The results serve to emphasize not only the necessity of broader awareness of this problem, but also to stimulate action on the preservation front. INTRODUCTION A significant proportion of material in institutional repositories is comprised of electronic theses and dissertations (ETDs), providing academic librarians with a rich testbed for deepening our understanding of new paradigms in scholarly publishing and their implications for long-term digital preservation. While academic libraries have long collected and preserved hard copy theses and dissertations of the parent institution, the shift to mandatory electronic deposit of this material has conferred new obligations and curatorial functions not previously incorporated into library workflows. By highlighting ETDs as a susceptible collection deserving of specific preservation actions, we draw attention to some unique responsibilities for libraries housing university-produced content, particularly as scholarly information continues its shift away from commercial production and distribution channels. As Teper and Kraemer point out in their discussion of ETD program goals, “without preservation, long-term access is impossible; without long-term access, preservation is meaningless.”1 What Is Reference Rot, And Why Study It? In addition to linkrot (where a link sends the user to a webpage which is no longer available), Mia Massicotte (Mia.Massicotte@concordia.ca) is Systems Librarian, Concordia University Library, Montreal, Quebec, Canada. Kathleen Botter (Kathleen.Botter@concordia.ca) is Systems Librarian, Concordia University Library, Montreal, Quebec, Canada. A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 12 there are webpages that remain available, but whose contents have undergone change over time-- known as content drift. This dual phenomena of linkrot plus content drift has been characterized as reference rot by the Hiberlink project team,2 and has important implications for digital preservation. Since theses and dissertations are original works born digital by virtue of mandatory deposit programs, a university’s ETD program is effectively a digital publishing initiative, accompanied by a new universe of responsibility for its digital preservation. Due to the specialized nature of graduate-level research, ETDs frequently include links to resources on the open web, for example, personal blogs, project websites, and commercial entities. Digital Object Identifiers (DOIs), useful in the context of published literature, do not apply to URLs on the free web, which are DOI-indifferent. Open web links also fall outside the scope of preservation initiatives such as LOCKSS (Lots of Copies Keep Stuff Safe)3 which aim to safeguard the published literature. With increasing frequency, researchers are citing newer forms of scholarship, which do not readily fall under the rubric of published literature. Moreover, since thesis preparation is conducted over a period of time typically measured in years, links cited therein are likely to be more vulnerable to linkrot and content drift by the time of manuscript submission. Yet despite the surfeit of anecdotal daily evidence that URLs vanish and result in dead links, Phillips, Alemneh, and Ayala point out that “by and large academic libraries are not capturing and maintaining collections of web resources that provide context and historical reference points to the modern theses and dissertations held in their collections.”4 Since an ETD comprises a unique form of scholarly output produced by universities, and simultaneously satisfies the parent institution's degree-granting apparatus, as well as reflecting its academic stature on the international stage, the presence of reference rot in this body of literature is of particular concern and worthy of immediate attention. Smoking Guns There has been no shortage of evidence reporting on the linkrot phenomena over the last two decades. Koehler, whose initial study on linkrot appeared in JASIS in 1999, periodically revisited, analyzed, and reported on the same set of 360 URLs collected in his original study.5,6,7 In 2015, upon the twenty-year benchmark of the original data collection, Oguz and Koehler reported in JASIS that only 2 of the original links remained active.8 A number of foundational studies, including Casserly and Bird,9 Spinellis,10 Sellitto,11 Falagas, Karveli, and Tritsaroli,12 and Wagner et al.13 have reported on linkrot occurring in professional literature. Sanderson, Phillips, and Van de Sompel provide a table of 17 well-known linkrot studies, comparing overall benchmarks, and supplying a succinct summary of the scope of each study.14 Linkrot also gained further important exposure with the Harvard Law School study by Zittrain, Albert, and Lessig, which found that 70% of 3 Harvard law journal references, and 49.9% of URLs in Supreme Court opinions examined, no longer pointed to their originally cited sources.15 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 13 Members of the Hiberlink project, which set out to examine “a vast corpus of online scholarly publication in order to assess what links still work as intended and what web content has been successfully archived using text mining and information extracting tools” have been pivotal in making the case for reference rot.16 Hiberlink demonstrated that failure to link to cited sources was due not only to linkrot, but also to web page content which changed over time.17 A new dimension of the digital preservation universe was thrown into sharp relief with follow-up study by Klein et al. (2014), which examined one million web references extracted from 3.5 million Science, Technology, and Medicine (STM) articles published in Elsevier, PubMed Central, and ArXiv, between the years 1997 and 2012. The study concluded that one in five articles suffers from reference rot.18 Though the study focused on STM articles, its authors drew attention to theses and dissertations as a susceptible class of material. Analyzing the same set of links extracted from this large STM corpus, Jones et al. (2016) recently reported that 75% of referenced open web pages demonstrated changes in content.19 ETDs — A Susceptible Collection The digital preservation part of institutionally mandated ETD deposit has yet to have its dots fully connected to the rest of the diagram. After four years of research into academic institutions’ ETD programs, Halbert, Skinner, and Schultz reported that close to 75% of respondents surveyed had no preservation plan for their ETD collections.20 Despite the prevalence of linkrot studies, linkrot in ETDs has not been subjected to similar scrutiny, and the implications of disappearance of content is underappreciated. While mandatory deposit programs have become relatively commonplace, focus has largely remained on policy and implementation aspects, metadata quality, interoperability and conformance to standards.21,22 There are few studies which focus on institutional repository link content. The study conducted by Sanderson, Philips, and Van de Sompel (2011) was a large-scale examination of two repositories.23 400,144 papers deposited in ArXiv, and 3,595 papers in the University of North Texas (UNT) digital library repository were studied, and more than 160,000 URLs examined. Links were analyzed for persistence and the availability of mementos, that is, whether prior versions of the page existed in a public web archive, such as the Internet Archive's Wayback Machine. For 72% of UNT URLs, either mementos were available, or the resource still existed at its original location, or both. Although 54% (9,880) were available in one or more international web archives, 28% (5,073) of UNT's ETD links were found to no longer exist, nor had they been archived by the international archival community. Phillips, Alemneh, and Ayhala looked at overall general patterns and trends of URL references in repository ETDs, examining 4,335 ETDs between the years 1999-2012 in the UNT repository.24 The team analyzed 26,683 unique URLs in 2,713 ETDs containing one or more links, finding an overall average of 10.58 unique URLs per ETD with one or more links. The UNT team provided a A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 14 breakdown of domain and subdomain occurrence frequency, and indicated areas of future investigation into content-based URL linking patterns of ETDs. ETD link decay was studied by Sife and Bernard, who performed a citation analysis on URLs in 83 theses published between 2007 and 2011 at Tanzania's Sokoine National Agricultural Library.25 15,468 citations were examined, 9.6% (1,487) of which were open web citations. URLs were considered active if found at the original location, or available after a URL redirect. The authors manually tested URLs over a period of seven days to record their accessibility, noting down inaccessible URLs error messages and domains, and analyzing the types of errors encountered. The authors calculated that it took only 2.5 years for half of the web citations to disappear. At the ETD2014 conference,26 an important study of 7,500 ETDs in 5 U.S. universities was presented. Of 6,400 ETDs defended between 2003 and 2010, approximately 18% of open web link content was confirmed as lost, and a further 34% at risk of loss, that is, live links which lacked an archived copy.27 Though the results of that particular study have not been formally published, it was briefly summarized in a session held at the 38th UKSG Annual Conference in Glasgow, Scotland in March 2015, an account of which was subsequently published by Burnhill, Mewissen, and Wincewicz in Insights.28 Given the scarcity of published literature on link content as found in ETDs, this present study which examines reference rot in ETDs in an academic institutional repository is unique, draws attention to an important digital collection which is vulnerable to loss, and highlights need for action. BACKGROUND AND CONTEXT Concordia University is a comprehensive university located in Montreal, with a student population of 43,903 full-time equivalents in 2015, of which 7,835 were graduate students. 27 PhD programs were offered in 2015,29 and 43 programs at the Masters level. Faculties of Arts and Science, Engineering and Computer Science, Fine Arts, and Business have a thesis requirement, and produce upwards of 350 Masters and 150 PhD dissertations annually. The broad disciplines, and the departmental clusters used in this study are shown in Table 1. Prior to the thesis deposit mandate, Concordia University Library housed hard copy versions of theses and dissertations in the collection. In 2009, the Library launched Spectrum, Concordia’s Eprints institutional repository, playing a leadership role in Spectrum's implementation and policy development, and providing training and support to the School of Graduate Studies regarding submission and management of theses for deposit. Following a successful pilot project, the Graduate Studies Office ceased accepting paper manuscripts, and mandated electronic deposit of all theses and dissertations into Spectrum as of spring 2011. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 15 Discipline Department Discipline Department Arts Applied Linguistics Communication Economics Educational Technology History Hist and Phil of Religion Humanities Philosophy Sociology Political Science Psychology Religion Business* Decision Sciences and MIS Finance Management Marketing Engineering** Building Engineering Civil Engineering Computer Science Comp Sci & Software Eng Electrical and Comp Eng Industrial Engineering Info Systems Security Mechanical Engineering Fine Arts Art Education Art History Film and Moving Image Studies Industrial Design Fine Arts Performing Arts Science Biology Chemistry Mathematics Physics Exercise Science Table 1. Summary of departmental clusters used in this study * John Molson School of Business ** Engineering & Computer Science METHODOLOGY We concentrated on PhD dissertations (henceforth ETDs) in Spectrum in order to limit the scope of the project; Master's theses were excluded. A 5-year period was chosen, beginning with the first semester of mandatory deposit, spring 2011, through fall 2015, a total of 720 ETDs. Since Concordia ETDs are released for publication immediately following convocation, the University's official convocation dates were used to identify the set of documents to be downloaded and examined. We proceeded in phases: first downloading ETDs from Spectrum and converting to a text format that could be examined for patterns; then extracting links from each and testing programmatically A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 16 for linkrot; then drawing a stratified random sample of active URLs and visiting them to determine if content drift had taken place. Our methodology for link extraction was similar to those described by Klein et al.,30 and Zhou, Tobin, and Grover.31 During the dissertation download stage, 36 ETDs with embargoed content were encountered and eliminated. ETDs were then converted from existing PDF/A format to xml. A further 20 documents failed to convert due to nonstandard or complex formatting which resulted in unreadable, garbled characters. These documents resisted multiple conversion attempts, and since they could not be mined, had to be eliminated. A final total of 664 ETDs were successfully converted using three different tools: 97% (644) were converted using PDFtoHTML,32 the remaining 3% by either givemetext (14)33 or Adobe Acrobat (3). A spot check of documents was sufficient evidence that many links occurred throughout the text body. Since we intended to extract URLs to the open web, we wanted to err on the side of detecting more links, rather than easily-identifiable well-formed URLs. Links were mined from the body of the text in a manner similar to the study carried out at UNT.34 We wanted a regular expression which would catch as many URLs as possible, expecting to manually clean the link output before further processing. We tested multiple regular expressions35 against a small sample of our converted ETDs and compared the results. We selected one which seemed well-suited for our purpose, as it was liberal in detecting links throughout the text, was able to extract links which contained obvious omissions and problems — for example, those that lacked http:// prefixes — but also caught non-obvious errors, such as ellipses in long URLs. We considered how de- duplication of extracted links might affect the outcome, and opted to count each link as an individual instance. Manual cleanup included catching URLs that broke across new lines, identifying false hits such as titles containing colons and DOIs, and adding escape encoding characters for "&" and "%" in order to generate a clean URL for use in the next step of the process. METHODOLOGY — Linkrot collection A script programmatically used the cURL command line tool to visit each link and fetch the http response code in return.36 An output listing was produced for each doctoral dissertation, comprised of the original URLs, the final URLs, and the http response codes. Link output for each of the converted 664 ETDs was collected from December 2015 to January 2016, with the fall 2015 semester checked in March 2016. 76% (504 of 664) of ETDs contained one or more links, the highest number of links (5,946) falling into the Arts group. 24% (160 of 664) of ETDs contained no links. For the 5-year period, the broad discipline breakdown of documents examined, the number of ETDs with links, and the number of links extracted are shown in Table 2. Converted ETDs by publication year, broken out by broad disciplines, are shown in Figure 1. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 17 Discipline Number of PhD ETDs in Spectrum ETDs converted* Contain no links Contain links Number of links extracted Arts 210 195 31 164 5,946 Business 45 43 12 31 210 Engineering 351 326 82 244 3,259 Fine Arts 28 25 2 23 1,728 Science 86 75 33 42 294 Total 720 664 160 504 11,437 Table 2. 5-year period, 2011-2015, summary of documents examined and links extracted * 56 documents in total eliminated (36 embargoed; plus 20 which failed to convert). Figure 1. Converted ETDs by publication year and broad discipline A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 18 The 11,437 links extracted were checked for linkrot, each link accessed and its http response code recorded. 77% (8,834 of 11,437) of links returned an active 2xx http response code. 23% (2,603) of links could not be reached, returning a response code other than in the 2xx range. This includes 102 links in the 3xx range which failed to reach a destination after 50 redirects and were considered linkrot. Numbers of links, total link response, and link response by year broken down by broad discipline are shown in Figure 2, with accompanying data provided in Table 3 and discussed in the findings section. Figure 2. Link HTTP response codes, by broad discipline and year INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 19 Discipline HTTP response code 2011 2012 2013 2014 2015 Total % Active & Rotten** Arts 2xx 691 864 800 1,108 1,093 4,556 77% all other* 320 428 131 293 218 1,390 23% Business 2xx 14 52 17 22 50 155 74% all other 9 19 5 9 13 55 26% Engineering 2xx 302 702 638 482 404 2,528 78% all other 134 172 180 196 49 731 22% Fine Arts 2xx 165 143 504 467 94 1,373 79% all other 74 56 118 98 9 355 21% Science 2xx 77 34 58 39 14 222 76% all other 25 23 10 11 3 72 24% Subtotal 2xx 1,249 1,795 2,017 2,118 1,655 8,834 77% active all other 562 698 444 607 292 2,603 23% rotten % Rotten 31% 28% 18% 22% 15% 23% Total 1,811 2,493 2,461 2,725 1,947 11,437 100% Table 3. Breakdown by year and discipline showing active (2xx) and rotten (all others) response codes *All other = 0, 1xx, 3xx (unresolved after 50 redirects), 4xx and 5xx response codes combined ** Active and rotten rates based on total links per discipline METHODOLOGY — Content Drift For the content drift phase, we wanted to sample documents from each of the five disciplines. ETDs which did not contain any links were excluded from the sample. Using only documents with one or more active links, a stratified random sample of 10% was drawn for a final sample of 49 ETDs containing a total of 990 links. A snippet of text surrounding each link was then also extracted from each ETD, along with any "date accessed" or "date viewed" information if present. Each link was manually visited, assessed for content drift, and observations recorded. The breakdown of the content drift sample is shown in Table 4. A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 20 Discipline ETDs with links ETDs with active links (2xx) ETDs sampled for content drift* Number of links extracted for sample Arts 164 156 16 668 Business 31 28 3 12 Engineering 244 235 24 154 Fine Arts 23 23 2 136 Science 42 40 4 20 Total 504 482 49 990 Table 4. Breakdown of sample pool of ETDs for content drift analysis * 10% sample drawn from each discipline’s pool of ETDs; only ETDs with URLs relevant for content drift assessment. Visited links were benchmarked against the existence of a memento -- an archived snapshot of that page located in the Wayback Machine.37 Since the University sets a strict thesis submission deadline of 3 months prior to convocation, mementos prior to submission deadline would be sought. Based on the occurrences of "date accessed" and discursive information found in the snippets, we arrived at the supposition that links were likely to have been checked the closer the student approached final stages of manuscript preparation, although this is not verifiable. We set ourselves a soft window for locating an archived snapshot using a date 6 months prior to the convocation date as the benchmark; that is, for each semester's deadline date, an additional 3 months was added, arriving at a 6-months-prior-to-publication marker. Since programmatic analysis of 990 links required time, expertise, and resources not available to us, we approached the problem heuristically. Assuming that online consultations are not linear, active links occurring multiple times in a document were given equal weight. Each link was manually checked in the Wayback Machine using "date viewed" if provided; if no date was provided (the majority of cases), Wayback was checked to see if an archived version existed as close to our 6 month soft marker as possible. If a memento was not found within a month earlier/later than the soft marker, then the nearest neighboring older memento was selected, if one existed. The original URL, the date the URL was visited, and whether a snapshot was located in Wayback was recorded. All links were checked during July-August 2016. If the initial web browser failed to access, a second and sometimes third browser was tried, using Safari, Chrome, and Internet Explorer (IE) in that order. Unsuccessful attempts to reach Wayback were rechecked in September. The question as to whether, and to what degree content drift had occurred was assessed, and is discussed in the next section. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 21 FINDINGS AND DISCUSSION Linkrot Findings Of 664 ETDs examined for linkrot, 77% of links tested returned an active http response code in the 2xx range -- roughly three-quarters overall. Numbers of links by broad discipline varied greatly, as shown in Figure 2 (healthy links in green, linkrot shown in red). Linkrot rates ranged from 21% in Fine Arts, to 26% in Business, as seen in last column of Table 3. It should be noted that 2xx response codes are also returned for pages that disguise themselves as active links. For example, a URL returns an active status code when a domain has been parked (e.g. purchased to reserve the space), or when a customized 404-page-not-found is encountered. Since we had no mechanism in place to treat false positives, these were flagged during the linkrot phase as candidates for subsequent content drift analysis. 23% (2,604 of 11,437) of all links, returned a response code of something other than in the 2xx-range and considered linkrot -- roughly one-quarter. Response codes in the 4xx range alone, including 404-page-not-found errors, comprised 17% (1,916 of 11,437) of all links. Table 5 shows the breakdown of the total number of links that were visited in the spring of 2016 for linkrot determination. HTTP response code category Meaning of http response code* Number of links Percent of total links (%) 0 Empty response** 507 4% 1xx Informational 2 0% 2xx Successful 8,834 77% 3xx Redirection† 102 1% 4xx Client error 1,916 17% 5xx Server error 76 1% Total 11,437 100% Table 5. Breakdown of HTTP response codes received * We used http protocol definitions at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html ** unofficial http response code due to request timing out † failure to resolve after 50 redirects Http responses ranged from a high of 85% active in 2015, to a low of 69% active in 2011, the oldest publication year. To put it differently, the most recent year exhibited a linkrot rate of 15%. Consistent with other studies, linkrot manifests itself quickly after publication and increases over time, as indicated by percentages shown in Figure 2. Content Drift Findings Of the 990 links visited to check for the presence of content drift, 764 (400 + 364), or 77%, had a Wayback memento compared 226 (92+134), or 23%, which did not. Slightly more than half of links with mementos, 52% (400 of 764), demonstrated some level of content drift when the A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 22 memento was compared to the current active link, while 48% (364 of 764) with mementos did not exhibit content drift. The presence of content drift by discipline, with/without mementos showing numbers of links tested, appears in Table 6. Discipline Number of links tested Content Drift detected No Content Drift Memento found Memento not found Total Memento found Memento not found Total Arts 668 261 60 321 254 93 347 Business 12 5 0 5 4 3 7 Engineering 154 74 10 84 55 15 70 Fine Arts 136 55 22 77 38 21 59 Science 20 5 0 5 13 2 15 Total 990 400 92 492 364 134 498 Table 6. Presence of Content Drift by Discipline, with/without mementos For links that had no memento in Wayback, content drift assessment was based on the presence of an observable date in the current active link, including copyright, and/or other details which positively correlated against our extracted snippet information. For example, some links retrieved a .pdf or other static file which correlated with the snippet, there being no reason to conclude its content had undergone change since publication, despite the lack of a memento. Snippets were also used in cases where a robots.txt file at the target URL had prevented Wayback from creating a memento. Occasional examination of the dissertation text was conducted to validate information extracted in the snippet. The 23% (226) which lacked mementos remain at significant risk and will fall prey to further drift as time passes. As seen in Table 7, of 492 URLs manifesting content drift, 11% (54 of 492) were completely lost, linking to web domains that had been sold or were currently up for sale, and webpages replaced or removed. 9% (42 of 492) of web pages exhibited major change such that there was little correlation with snippets, or where website overhauls made assessment difficult, but not impossible. 36% (179 of 492) web links exhibited minor drift, primarily pages that differed somewhat from a memento in visual appearance, such as header and footer differences, changes in background theme or style, or changes in navigation or search functionality which did not represent a high degree of impairment. 7% (34 of 492) linked to continually updating websites, such as Wikipedia and news organizations, and 7% (35 of 492) were customized 404-page-not- found, distinctive enough to warrant separate categories. A full 30% (148 of 492) exhibited a multiplicity of changes of uncertain nature which we grouped together, such as pages where graphic or audio components had been removed or could not be retrieved, broken javascript that impeded access, browser failure, mementos not accessible after repeated attempts -- indicative of a range of issues affecting the quality of web archives and hence preservation.38 The types of INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 23 content drift encountered, broken down by broad discipline and numbers of links, and percentage, is shown in Table 7. Type of content drift Arts Business Engineeri ng Fine Arts Science Total % of type Lost 45 0 3 6 0 54 11% Major but findable 22 0 9 9 2 42 9% Minor – redesigned but recognizable 128 2 30 17 2 179 36% Ongoing updating website 25 3 5 0 1 34 7% Custom 404 23 0 4 8 0 35 7% Other 78 0 33 37 0 148 30% Total 321 5 84 77 5 492 100% Table 7. Types of content drift encountered, number of links by broad discipline Though difficulties encountered during content drift assessment made further extrapolation problematic, the presence of reference rot was confirmed. Our 10% stratified random sample examined 990 active links, finding that roughly half (492 of 990) manifested some degree of content drift. For 364 links, or 36% overall, a benchmark memento was found and no content drift detected. Although many content drift changes can arguably be characterized as minor, it is not possible to ascertain where the content drift scale tips irremediably for any particular reader. What can be said with certainty is that 11% of active links which did not exhibit linkrot, and were quite live and accessible, fell into a small but unsettling group where the context of the cited web source is irrevocably lost. Of the 498 links which did not exhibit any evidence of content drift, 134, approximately one-third, have no memento archived and continue to remain at high risk. A focused and deeper analysis of active links which might lead to a typology of content drift types would be a possible area of future study, though even the well-resourced study by Jones et al. which utilized a strict "ground truth" for comparing textual mementos over time, points out that classifying links would certainly be challenging.39 A larger sample size might also allow closer analysis of disciplinary differences, which may lead to a better understanding of these types of content drift variations. CONCLUSION Reference rot in the form of linkrot and content drift were observed in ETDs in Spectrum, our institutional repository, and this confirmation should give pause for those charged with A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 24 stewardship of ETD collections. Theses and dissertations have long been viewed as material which contribute overall to academic scholarly output, and carry unique status within the academy. In August 2016, OpenDOAR registered 1600 institutional repositories with ETDs,40 up from 1,100 institutions as reported in 2012 by grey literature specialist Schoepfel.41 Academic libraries have, in large part, facilitated the transition from paper to ETD with widespread adoption of institutional repository deposit programs, and along with that adoption comes a range of long-term preservation issues. Yet as Ohio State’s Strategic Digital Initiatives Working Group pointed out, “Even in digital library communities, preservation all too often stands in for or is used interchangeably with byte level backup of content.”42 For long-term access, focus can productively be shifted to offset the immediate threat of incompleteness and inadequate capture.43 Not much has changed since Hedstrom wrote back in 1997: “With few exceptions, digital library research has focussed on architectures and systems for information organization and retrieval, presentation and visualization, and administration of intellectual property rights … The critical role of digital libraries and archives in ensuring the future accessibility of information with enduring value has taken a back seat to enhancing access to current and actively used materials.”44 Our understanding and discussion of digital preservation must be broadened, and attention turned to this key area of responsibility in the preservation life-cycle. The authors maintain that ETD content and link preservation is an editorial, not individual, imperative. Encouraging individual authors to perform their own archiving is doomed to fall short of even reasonable expectations. Instituting measures such as Perma, a distributed, redundant method of capturing and archiving web site content as part of the citation process must be pro-actively sought and built into library, and hence repository, workflows.45 Browser plugins and automated solutions which use the Memento protocol for capturing and archiving web site content as part of the citation process do exist,46 but naturally have to be implemented before they can take effect. Either way, efforts to operationalize existing mechanisms which are designed to reduce future loss would be extremely productive. Responsibility for insuring not only current, but continuing future access to ETD content rests with those who maintain curatorial function of the repository. Academic librarians have assumed a prominent and de facto role as curators, facilitating the role of university publication and emphasizing its break away from previous ties with commercial entities. We collectively bear greater responsibility for this body of scholarly work, and need to move forward from a position of benign neglect to one of informed curation and pro-active preservation of an important collection of scholarly output which is at risk. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 25 REFERENCES 1. Thomas H. Teper and Beth Kraemer, “Long-Term Retention of Electronic Theses and Dissertations,” College & Research Libraries 63, no. 1 (January 1, 2002), 64, https://doi.org/10.5860/crl.63.1.61. 2 The term “reference rot” was introduced by the Hiberlink team. “Hiberlink – About,” accessed March 31, 2016, http://hiberlink.org/about.html. 3. LOCKSS: Lots of Copies Keep Stuff Safe, accessed December 6, 2016, http://www.lockss.org/about/what-is-lockss/. 4. Mark Edward Phillips, Daniel Gelaw Alemneh, and Brenda Reyes Ayala, “Analysis of URL References in ETDs: A Case Study at the University of North Texas,” Library Management 35, no. 4/5 (June 3, 2014), 294, https://doi.org/10.1108/LM-08-2013-0073. 5. Wallace Koehler, “An Analysis of Web Page and Web Site Constancy and Permanence,” Journal of the American Society for Information Science 50, no. 2 (January 1, 1999): 162–80, https://doi.org/10.1002/(SICI)1097-4571(1999)50:2<162::AID-ASI7>3.0.CO;2-B. 6. Wallace Koehler, “Web Page Change and Persistence—a Four-Year Longitudinal Study,” Journal of the American Society for Information Science & Technology 53, no. 2 (January 15, 2002): 162–71, http://doi.org/10.1002/asi.10018. 7. Wallace Koehler, "A longitudinal study of Web pages continued: a consideration of document persistence." Information Research 9, no. 2 (2004): 9-2, http://www.informationr.net/ir/9- 2/paper174.html. 8. Fatih Oguz and Wallace Koehler, “URL Decay at Year 20: A Research Note,” Journal of the Association for Information Science and Technology 67, no. 2 (February 1, 2016): 477–79, https://doi.org/10.1002/asi.23561. 9. Mary F. Casserly and James Bird, “Web Citation Availability: Analysis and Implications for Scholarship,” College and Research Libraries 64, no. 4 (July 2003): 300–317, http://crl.acrl.org/content/64/4/300.full.pdf. 10. Diomidis Spinellis, “The Decay and Failures of Web References,” Communications of the ACM 46, no. 1 (January 2003): 71–77, https://doi.org/10.1145/602421.602422. 11. Carmine Sellitto, “A Study of Missing Web-Cites in Scholarly Articles: Towards an Evaluation Framework,” Journal of Information Science 30, no. 6 (December 1, 2004): 484–95, https://doi.org/10.1177/0165551504047822. A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 26 12. Matthew E. Falagas, Efthymia A. Karveli, and Vassiliki I. Tritsaroli, “The Risk of Using the Internet as Reference Resource: A Comparative Study,” International Journal of Medical Informatics 77, no. 4 (April 2008): 280–86, https://doi.org/10.1016/j.ijmedinf.2007.07.001. 13. Cassie Wagner et al., “Disappearing Act: Decay of Uniform Resource Locators in Health Care Management Journals,” Journal of the Medical Library Association 97, no. 2 (April 2009): 122– 30, https://doi.org/10.3163/1536-5050.97.2.009. 14. Robert Sanderson, Mark Phillips, and Herbert Van de Sompel, “Analyzing the Persistence of Referenced Web Resources with Memento,” arXiv:1105.3459 [Cs], May 17, 2011, http://arxiv.org/abs/1105.3459. 15. Jonathan Zittrain, Kendra Albert, and Lawrence Lessig, “Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations,” Legal Information Management 14, no. 2 (June 2014): 88–99, https://doi.org/10.1017/S1472669614000255. 16. “Hiberlink - About,” accessed March 31, 2016, http://hiberlink.org/about.html. 17. “Hiberlink - Our Research,” accessed March 31, 2016, http://hiberlink.org/research.html. 18. Martin Klein, Herbert Van de Sompel, Robert Sanderson, Harihar Shankar, Lyudmila Balakireva, Ke Zhou, Richard Tobin. “Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot,” PLoS One 9, no. 12 (December 26, 2014), https://doi.org/10.1371/journal.pone.0115253. 19. Shawn M. Jones, Herbert Van de Sompel, Harihar Shankar, Martin Klein, Richard Tobin, Claire Grover. “Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content,” PLOS ONE 11, no. 12 (December 2, 2016): e0167475, https://doi.org/10.1371/journal.pone.0167475. 20. Martin Halbert, Katherine Skinner, and Matt Schultz, “Preserving Electronic Theses and Dissertations: Findings of the Lifecycle Management for ETDs Project,” Text, (August 6, 2015), 2, http://educopia.org/presentations/preserving-electronic-theses-and- dissertations-findings-lifecycle-management-etds. 21. For a recent overview, see Sarah Potvin and Santi Thompson, “An Analysis of Evolving Metadata Influences, Standards, and Practices in Electronic Theses and Dissertations,” Library Resources & Technical Services 60, no. 2 (March 31, 2016): 99–114, https://doi.org/10.5860/lrts.60n2.99. 22. Joy M. Perrin, Heidi M. Winkler, and Le Yang, “Digital Preservation Challenges with an ETD Collection — A Case Study at Texas Tech University,” The Journal of Academic Librarianship 41, no. 1 (January 2015): 98–104, https://doi.org/10.1016/j.acalib.2014.11.002. 23. Sanderson, Phillips, and Van de Sompel, “Analyzing the Persistence of Referenced Web Resources with Memento,” http://arxiv.org/abs/1105.3459. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 27 24. Phillips, Alemneh, and Ayala, "Analysis of URL references," https://doi.org/10.1108/LM-08- 2013-0073. 25. Alfred S. Sife and Ronald Bernard, “Persistence and Decay of Web Citations Used in Theses and Dissertations Available at the Sokoine National Agricultural Library, Tanzania,” International Journal of Education and Development Using Information and Communication Technology 9, no. 2 (2013): 85–94, http://eric.ed.gov/?id=EJ1071354. 26. “ETD2014 — University of Leicester,” University of Leicester, accessed January 27, 2016, http://www2.le.ac.uk/library/downloads/etd2014. 27. EDINA, University of Edinburgh, “Reference Rot: Threat and Remedy,” (Education, 04:54:38 UTC), http://www.slideshare.net/edinadocumentationofficer/reference-rot-and-linked- data-threat-and-remedy. 28. Peter Burnhill, Muriel Mewissen, and Richard Wincewicz, “Reference Rot in Scholarly Statement: Threat and Remedy,” Insights the UKSG Journal 28, no. 2 (July 7, 2015): 55–61, https://doi.org/10.1629/uksg.237. 29. Concordia University University Graduate Programs, accessed April 7, 2016, http://www.concordia.ca/academics/graduate.html. 30. Klein et al., "Scholarly Context Not Found," https://doi.org/10.1371/journal.pone.0115253. 31. Ke Zhou, Richard Tobin, and Claire Grover, “Extraction and Analysis of Referenced Web Links in Large-Scale Scholarly Articles,” in Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’14 (Piscataway, NJ, USA: IEEE Press, 2014), 451–452, http://dl.acm.org/citation.cfm?id=2740769.2740863. 32. Pdftohtml v0.38 win32, meshko (Mikhail Kruk), http://pdftohtml.sourceforge.net/ accessed September 20, 2015. (Actual download is at http://sourceforge.net/projects/pdftohtml/). 33. Give me text! Open Knowledge International, accessed October 26, 2015-March 7, 2016, http://givemetext.okfnlabs.org/. 34. Phillips, Alemneh, and Ayala, "Analysis of URL references," https://doi.org/10.1108/LM-08- 2013-0073. 35. “In Search of the Perfect URL Validation Regex,” accessed December 7, 2015, https://mathiasbynens.be/demo/url-regex. We selected "@gruber v2" for our extraction. 36. cURL v7.45.0, "command line tool and library for transferring data with URLs," accessed October 18, 2015, http://curl.haxx.se/. 37. We have used the term "memento" in lowercase to denote a snapshot souvenir page, to distinguish from an automated service utilizing the Memento protocol. A CASE STUDY OF ELECTRONIC THESES AND DISSERTATIONS (ETDS) IN AN ACADEMIC LIBRARY | MASSICOTTE AND BOTTER | https://doi.org/10.6017/ital.v36i1.9598 28 38. For a good overview of the types of problems, see Michael L. Nelson, Scott G. Ainsworth, Justin F. Brunelle, Mat Kelly, Hany SalahEldeen and Michele Weigle, "Assessing the Quality of Web Archives" 1 vol., Computer Science Presentations, Book 8 (Old Dominion University. ODU Digital Commons, 2014). http://digitalcommons.odu.edu/computerscience_presentations/8. 39. Shawn M. Jones, et al. “Scholarly Context Adrift," https://doi.org/10.1371/journal.pone.0167475. 40. OpenDOAR search of Institutional Repositories with Theses at http://www.opendoar.org/find.php, accessed August 26, 2016. 41. Joachim Schöpfel, "Adding value to electronic theses and dissertations in institutional repositories." D-Lib Magazine 19, no. 3 (2013): 1. https://doi.org/10.1045/march2013- schopfel. 42. Strategic Digital Initiatives Working Group. Implementation of a Modern Digital Library at The Ohio State University. (Apr 2014). https://library.osu.edu/documents/SDIWG/sdiwg_white_paper.pdf. (Published). 43. Tim Gollins. “Parsimonious Preservation: Preventing Pointless Processes! (The Small Simple Steps That Take Digital Preservation a Long Way Forward),” in Online Information Proceedings UK National Archives, 2009. Available at http://www.nationalarchives.gov.uk/documents/information-management/parsimonious- preservation.pdf. 44. Margaret Hedstrom, "Digital preservation: a time bomb for digital libraries." Computers and the Humanities 31, no. 3 (1997): 189-202. https://doi.org/10.1023/A:1000676723815. 45. Zittrain, Albert, and Lessig, "Perma," https://doi.org/10.1017/S1472669614000255. 46. Herbert Van de Sompel, Michael L. Nelson, Robert Sanderson, Lyudmila L. Balakireva, Scott Ainsworth, and Harihar Shankar, “Memento: Time Travel for the Web,” arXiv:0911.1112 [Cs], November 5, 2009, http://arxiv.org/abs/0911.1112. 9601 ---- Microsoft Word - December_ITAL_farnell_final.docx Editorial Board Thoughts: Metadata Training in Canadian Library Technician Programs Sharon Farnel INFORMATION TECHNOLOGIES AND LIBRARIES | DECEMBER 2016 3 The core metadata team at my institution is small but effective. In addition to myself as Coordinator, we include two librarians and two full-time metadata assistants. Our metadata assistant positions are considered to be similar, in some ways, to other senior assistant positions within the organization which require or at least prefer that individuals have a library technician diploma. However, neither of our metadata assistants has such a diploma. Their credentials, in fact, are quite different. In part, this difference is driven by the nature of the work that our metadata assistants do. They work regularly with different metadata standards such as MODS, DC, DDI in addition to MARC. The perform operations on large batches of metadata using languages such as XSLT or R. This is quite different in many ways than the work of their colleagues who work with the ILS, many of whom do have a library technician diploma. As we prepare for an upcoming short-term leave of one of our team members, I have been thinking a great deal about the work our metadata assistants do and whether or not we would find an individual who came through a librarian technician program who had the skills and knowledge we need a replacement to have. And I have also been reminded of conversations I have had with recently graduated library technicians who felt their exposure to metadata standards, practices, and tools beyond RDA and MARC had been lacking in their programs. This got me thinking about the presence or absence of metadata courses in library technician programs in Canada. I reached out to two colleagues from MacEwan University—Norene Erickson and Lisa Shamchuk—who are doing in-depth research into library technician education in Canada. They kindly provided me with a list of Canadian institutions that offer a library technician program so I could investigate further. Now, I must begin with two caveats. One, this is very much a surface level scan rather than an in- depth examination, although this is simply the first step in what I hope will be a longer term investigation. Second, although several Francophone institutions in Canada offer library technician programs, I did not review their programs; I was concerned that my lack of fluency in the French language could lead to inadvertent misrepresentations. Sharon Farnel (sharon.farnel@ualberta.ca), a member of the ITAL Editorial Board, is Metadata Coordinator, University of Alberta Libraries, Edmonton, Alberta. EDITORIAL BOARD THOUGHTS | FARNEL https://doi.org/10.6017/ital.v35i4.9601 4 Canadian institutions offering a library technician program (by province) are: Alberta ● MacEwan University (http://www.macewan.ca/wcm/SchoolsFaculties/Business/Programs/LibraryandInforma tionTechnology/) ● Southern Alberta Institute of Technology (http://www.sait.ca/programs-and-courses/full- time-studies/diplomas/library-information-technology) British Columbia ● Langara College (http://langara.ca/programs-and-courses/programs/library-information- technology/) ● University of the Fraser Valley (http://www.ufv.ca/programs/libit/) Manitoba ● Red River College (http://me.rrc.mb.ca/catalogue/ProgramInfo.aspx?ProgCode=LIBIF- DP&RegionCode=WPG) Nova Scotia ● Nova Scotia Community College (http://www.nscc.ca/learning_programs/programs/plandescr.aspx?prg=LBTN&pln=LIBIN FTECH) Ontario ● Algonquin College (http://www.algonquincollege.com/healthandcommunity/program/library-and- information-technician/) ● Conestoga College (https://www.conestogac.on.ca/parttime/library-and-information- technician) ● Confederation College (http://www.confederationcollege.ca/program/library-and- information-technician) ● Durham College (http://www.durhamcollege.ca/programs/library-and-information- technician) ● Seneca College (http://www.senecacollege.ca/fulltime/LIT.html) ● Mohawk College (http://www.mohawkcollege.ca/ce/programs/community-services-and- support/library-and-information-technician-diploma-800) INFORMATION TECHNOLOGIES AND LIBRARIES | DECEMBER 2016 5 Quebec ● John Abbott College (http://www.johnabbott.qc.ca/academics/career- programs/information-library-technologies/) Saskatchewan ● Saskatchewan Polytechnic (http://saskpolytech.ca/programs-and- courses/programs/Library-and-Information-Technology.aspx) My method was quite simple. Using the program websites listed above, I reviewed the course listings looking for ‘metadata’ either in the title or in the description when it was available. Of the fourteen (14) programs examined, nine (9) had no course with metadata in the title or description. Two (2) programs had courses where metadata was listed as part of the content but not the focus: Langara College as part of “Special Topics: Creating and Managing Digital Collections” and Seneca College as part of “Cataloguing III” which has a partial focus on metadata for digital collections. Three (3) of the programs had a course with metadata in the title or description; all are a variation on “Introduction to Metadata and Metadata Applications”. (Importantly, the three institutions in question - Conestoga College, Confederation College, and Mohawk College - are all connected and share courses online). So, what do these very preliminary and impressionistic findings tell us? It seems that there is little opportunity for students enrolled in library technician programs in Canada to be exposed to the metadata standards, practices, and tools that are increasingly necessary for positions involved in work with digital collections, research data management, digital preservation, and the like. Admittedly, no program can include courses on all potentially relevant topics. In addition, formal course work is only one aspect of training and education that can prepare graduates for their career; practica and work placements and other more informal activities during a program are crucial, as are the skills and knowledge that can only be developed once hired and on the job. Nevertheless, based on the investigation above, one would be justified in asking if we are disadvantaging students by not working to incorporate additional coursework focused on metadata standards, application, and tools, as well as on basic skills in manipulation of metadata in large batches. scripting languages or equivalent combination of education and experience. Master’s desirable.” I edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “Bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a Master’s degree is preferred,” followed by a separate description of technical skills needed. This increased the number and quality of our EDITORIAL BOARD THOUGHTS | FARNEL https://doi.org/10.6017/ital.v35i4.9601 6 applications, so I’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. Meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. First, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. Second, they ask about possibilities to formalize skills. Recently, I’ve gotten questions about a certificate program in UX and whether there is any formal certification to be a systems librarian. Surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-MLS work experience—doesn’t suggest any standard method for substantiating technical knowledge. Once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. Some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,1 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. At Penn Libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. This gives us concrete code to discuss in a far more realistic and relaxed context. While it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new MLS grads for library technology jobs. The new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. Others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. Even if we make efforts to narrow the gap between employers and job- seekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. Library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. There persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. REFERENCES 1. Erin Kissane, “Job Listings That Don’t Alienate,” https://storify.com/kissane/job-listings-that- don-t-alienate. 9602 ---- December_ITAL_fifarek_final President’s Message: Focus on Information Ethics Aimee Fifarek INFORMATION TECHNOLOGIES AND LIBRARIES | DECEMBER 2016 1 Just a few weeks ago we held yet another successful LITA Forum1, this time in Fort Worth, TX. Tight travel budgets and time constraints mean that only a few hundred people get to attend Forum each year, but that is one of the things that make it a great conference. Because of its size you have a realistic chance of meeting everyone there, whether it’s at Game Night, one of the many networking dinners, or just for during hallway chitchat after a session. And the sessions really do give you something to talk about. This year I couldn’t help but notice a theme. Among all the talk about makerspace technologies, analytics, and specific software platforms, the one bubble that kept rising to the surface was information ethics. Why are you doing what you are doing with the information you have, and should you really be doing it? Have you stopped to think what impact collecting, posting, sharing that information is going to have on the world around you? In a post-election environment replete with talk of fake news and other forms of deliberate misinformation, LITA Forum presenters seem to have tapped in to the zeitgeist. Tara Robertson, in her closing keynote2, talked about the harm digitizing analog materials can do when what is depicted is sensitive to individuals and communities. Waldo Jaquith of US Open Data talked about how a government decision to limit options on a birth certificate to either “white” or “colored” effectively wiped the native population out of political existence in Virginia. And Sam Kome from Claremont Colleges talked about how well-meaning librarians can facilitate privacy invasion merely by collecting operational statistics3. There were many other examples brought out by Forum speakers but these in particular emphasized the real consequences the serious consequences the use of data – intentional or not – can have on people. I think it is time for librarians4 to get more vocal about information ethics and the role we play in educating the population about humane information use. Our profession has always been forward thinking about information literacy and is traditionally known for helping our communities make judgements about the information they consume. But we have not done enough to declare our expertise in the information economy, to stand up and say “we’re librarians – this is what we do.” Now, more than ever, people need the skills to think critically about the information they are consuming via all kinds of media, understand the consequences of allowing algorithms to shape their information universe, and make quality judgments about trading their personal information for goods and services. To quote from UNESCO: Aimee Fifarek (aimee.fifarek@phoenix.gov) is LITA President 2016-17 and Deputy Director for Customer Support, IT and Digital Initiatives at Phoenix Public Library, Phoenix, AZ. PRESIDENT’S MESSAGE | FIFAREK https://doi.org/10.6017/ital.v35i4.9602 2 Changes brought about by the rapid development of information and communication technologies (ICT) not only open tremendous opportunities to humankind but also pose unprecedented ethical challenges. Ensuring that information society is based upon principles of mutual respect and the observance of human rights is one of the major ethical challenges of the 21st century.5 I challenge all librarians to make a commitment to propagating information ethics, both personally and professionally. Make an effort to get out of your social media echo chamber6 and engage with uncomfortable ideas. When you see biased information being shared consider it a “teachable moment” and highlight the spin or present more neutral information. And if your library is not actively making information literacy and information ethics part of its programming and instruction, then do what you can to change it. Offer to be on a panel, create a curriculum, or host a program that includes key concepts relating to information “ownership, access, privacy, security, and community”7. The focus of the Libraries Transform campaign this year is all about our expertise: “Because the best search engine in the Library is the Librarian”8 It’s our time to shine. REFERENCES 1. http://forum.lita.org/home/ 2. http://forum.lita.org/speakers/tara-robertson/ 3. http://forum.lita.org/sessions/patron-activity-monitoring-and-privacy-protection/ 4. As always, when I use the term “librarian” my intention is to include any person who works in a library and is skilled in information and library science, not to limit the reference to those who hold a library degree. 5. http://en.unesco.org/themes/ethics-information 6. https://www.wnyc.org/story/buzzfeed-echo-chamber-online-news-politics/ 7. https://en.wikipedia.org/wiki/Information_ethics 8. http://www.ilovelibraries.org/librariestransform/ 9655 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Impact of Information Technology on Library Anxiety: The Role of Computer Attitudes Jiao, Qun G;Onwuegbuzie, Anthony J Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 138 The Impact of Information Technology on Library Anxiety: Oun G. Jiao and Anthony J. Onwuegbuzie The Role of Computer Attitudes Over the past two decades, computer-based technologies have become dominant forces to shape and reshape the products and services the academic library has to offer. The application of library technologies has had a profound impact on the way library resources are being used. Although many students continue to experience high lev- els of library anxiety, it is likely that the new technologies in the library have led to them experiencing other forms of negative affective states that may be, in part, a function of their attitude towards computers. This study investi- gates whether students' computer attitudes predict levels of library anxiety. C omputers and information technologies have experienced considerable growth over the past two decades. As such, familiarity with computers is rapidly becoming a basic skill and a prerequisite for many tasks. Although not every college student is equally prepared for the rising demand of computer skills in the !nformation age, computer literacy is increasingly becom- mg a gatekeeper for students' academic success. 1 Gaps in computer literacy and skills can leave many students behind not only in their academic achievement but also in their future job-market success. The unprecedented pace of technological change in the development of digital information networks and electronic services in recent years has helped to expand the role of the academic library. Once only a storehouse of printed materials, it is now a technology-laden informa- tion network where students can conduct research in a mixed print and digital-resource environment, experi- ence the use of advanced information technologies, and hone their computer skills. Yet, many students are struggling to cope with the changes brought on by the rapid advances of information teclmologies. Academic libraries of various sizes have spent a large percentage of their material budget on elec- tronic commercial content, and the trend will continue.' These days, college students are faced with the choices of ever-changing modes of electronic accessing tools, inter- faces, and protocols along with the traditional print resources in the library. The fact that the same journal article may be available in multiple vendors' aggregator Oun G. Jiao (gerryjiao@baruch.cuny.edu) is Reference Librar- ia~ and_ Associate Professor at Newman Library, Baruch College, City University of New York, and Anthony J. Onwuegbuzie (Tony Onwuegbuzie@aol.com) is Associate Professor at the College of Education, University of South Florida, Tampa. sites (such as EBSCOhost and Gale Group) makes the navigation through these bibliographic databases more complex and challenging. Relevant sources must be iden- tified and navigation protocols must be learned before appropriate information and contents can be found. Furthermore, having located a citation, students still have to search the library online catalog to find out if the jour- nal or book is available in the library and, if not, know how to make an interlibrary loan request either on paper or electronically.' Anxiety levels can be high and patience levels can be low at varying times of conducting library research. 4 . That students experience various levels of apprehen- sion when using academic libraries is not a new phenom- enon. Indeed, the phenomenon is prevalent among college students in the United States and many other countries, and is widely known as library anxiety. Mellon first coined the term in her study in which she noted that 75 percent to 85 percent of undergraduate students described their initial library experiences in terms of anx- iety.5 According to Mellon, feelings of anxiety stem from either the relative size of the library; a lack of knowledge about the location of materials, equipment, and resources of the library; how to initiate library research; or how to proceed with a library search. 6 Library anxiety is an unpleasant feeling or emotional state with physiological and behavioral concomitants that come to the fore in li_brary settings. Typically, library-anxious students expe- rience negative emotions, including ruminations, tension, fear, and mental disorganization, which prevent them from using the library effectively. 7 A student who experi- ences library anxiety usually undergoes either emotional ~r physical discomfort when faced with any library or library-related task. 8 Library anxiety may arise from a lack of self-confidence in conducting research, lack of prior exposure to academic libraries, the inability to see the relevance of libraries to one's field of interest, and lack of familiarity with library equipment and technologies. Library anxiety is often accorded special attention because of its debilitating effects on students' academic achievement.9 Although many students continue to experience high levels of library anxiety, it is likely that the new technolo- gies and electronic databases in libraries have led to stu- dents experiencing other forms of negative affective states. In particular, it is likely that library anxiety experi- enced by students is, in part, a function of their attitudes toward computers. Consistent with this assertion, Mizrachi and Shoham and Mizrachi reported a statisti- cally significant relationship between library anxiety and computer attitudes. 10 They noted in their research that home and work usage of computers, computer games, word processors, computer spreadsheets, and the Internet are all related to the dimensions of library anxi- ety found among Israeli students to varying degrees. 138 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Similarly, Jerabek, Meyer, and Kordinak found levels of computer anxiety to be related to levels of library anxiety for both men and women. 11 These studies focused exclu- sively on undergraduate students. However, no study has examined this relationship among graduate students, a population that uses the academic library more than any other student population. Over the past fifteen years, a large body of research lit- erature on computer attitudes has been generated. In par- ticular, many researchers have studied the relationship between computer attitudes and computer use. 12 The importance of beliefs and attitudes towards computers and technologies is widely acknowledged. 13 Students' computer attitudes arguably impact their willingness to engage in computer-related activities in colleges and uni- versities where effectively using library electronic resources represents an increasingly important part of col- lege education. Negative computer attitudes may inhibit students' interests in learning to use the library resources and thereby weaken their academic performance levels, while at the same time elevating levels of library anxiety. Mclnerney, Mclnerney, and Sinclair observed that nega- tive perceptions about computers among student teach- ers may accompany feelings of anxiety, including worries about being embarrassed, looking foolish, and even damaging the computer equipment. 14 Further, there is often a negative relationship between prior experience with computers and computer anxiety experienced by individuals. 15 Until recently, library anxiety has only been inter- preted in the context of the library setting-that is, a phe- nomenon that occurs while students are undertaking library tasks. Jiao, Onwuegbuzie, and Lichtenstein defined library anxiety as "an uncomfortable feeling or emotional disposition, experienced in a library setting, which has cognitive, affective, physiological, and behav- ioral ramifications." 16 At the same time, unprecedented technological advancement has had a profound impact on the products and services offered by academic libraries. Students now are able to conduct sophisticated library searches from the comfort of their homes. It is clear that the construct of library anxiety needs to be expanded in the new library and information environ- ment, incorporating into its definition other variables that are relevant for the changing library and information con- text. Because many library users spend a significant por- tion of their time using computer-based technologies to conduct information searches, it is natural to ask, to what extent does library anxiety stem from students' prior atti- tudes and experiences with computers and library tech- nologies? However, with the exception of the studies conducted by Mizrachi and Shoham and Mizrachi on Israeli undergraduate students, this link has not been examined. 17 Thus, the present study investigated the rela- tionship between computer attitudes and library anxiety in the rapidly changing library and information environ- ment. As such, the current inquiry replicated the works of Mizrachi, Shoham and Mizrachi, and Jerabek, Meyer, and Kordinak by examining the degree to which computer attitudes predict levels of library anxiety among graduate students in the United States. 18 It was expected that find- ings from this study would help to increase the under- standing of the construct of library anxiety. Indeed, research in this area has become critical in higher educa- tion where educators are responsible for graduating stu- dents with the skills necessary to thrive and to lead in a rapidly changing technological environment in the twenty-first century. I Method Participants Participants were ninety-four African American graduate students enrolled in the College of Education at a histori- cally Black college and university in the eastern U.S. All participants were solicited in either a statistics or a meas- urement course at the time that the investigation took place. In order to participate in the study, students were required to sign an informed-consent document that was given during the first class session of the semester. The majority of the participants were female. Ages of the par- ticipants ranged from twenty-two to sixty-two years (Mean = 30.40, SD = 8.75). Instruments and Procedure All participants were administered two scales, namely, the Computer Attitude Scale (CAS) and the Library Anxiety Scale (LAS). The CAS, developed by Loyd and Gressard, contains forty Likert-type items that assess individuals' attitudes toward computers and the use of computers. 19 This instrument consists of the following four scales, which can be used separately: (1) anxiety or fear of computers; (2) confidence in the ability to use com- puters; (3) liking or enjoying working with computers; and (4) computer usefulness. Loyd and Gressard reported coefficient alpha reliability coefficients of .86, .91, .91, and .95 for scores pertaining to computer anxiety, computer confidence, computer liking, and total scales, respec- tively. For the present study, the score reliabilities were as follows: • computer anxiety, .84 (95 percent confidence interval CI = .79, .88); • computer confidence, .81 (95 percent CI = .75, .86); • computer liking, .89 (95 percent CI= .85, .92); and • computer usefulness, .76 (95 percent CI = .68, .83). THE IMPACT OF INFORMATION TECHNOLOGY ON LIBRARY ANXIETY I JIAO AND ONWUEGBUZIE 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The LAS, developed by Bostick, contains forty-three 5-point Likert-format items that assess levels of library anxiety experienced by college students. 20 It also contains the following five subscales: 1. barriers with staff; 2. affective barriers; 3. comfort with the library; 4. knowledge of the library; and 5. mechanical barriers. A high score on any subscale represents high levels of anxiety in that area. Jiao and Onwuegbuzie, in their exami- nation of the score reliability reported on LAS in the extant literature, found that it has typically been in the adequate to high range for the subscale and total-scale scores. 21 Based on their analysis, Onwuegbuzie, Jiao, and Bostick concluded that "not only does the [LAS] produce scores that yield extremely reliable estimates, but also these estimates are remarkably consistent across samples with different cul- tures, nationalities, ages, years of study, gender composi- tion, educational majors, and so forth." 22 For the current investigation, the subscales generated scores for the com- bined sample that had a classical theory alpha reliability coefficient of .89 (95 percent CI = .85, .92) for barriers with staff, .84 (95 percent CI = .79, .88) for affective barriers, .53 (95 percent CI= .37, .66) for comfort with the library, .62 (95 percent CI= .48 .73) for knowledge of the library, and .70 (95 percent CI= .58, .79) for mechanical barriers. Analysis A canonical correlation analysis was conducted to iden- tify a combination of library anxiety dimensions (barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers) that might be simultaneously related to a combination of com- puter-attitude dimensions (computer anxiety, computer liking, computer confidence, and computer usefulness). Canonical correlation analysis is used to examine the rela- tionship between two sets of variables whereby each set contains more than one variable. 23 In the present investi- gation, the five dimensions of library anxiety were treated as the dependent multivariate set of variables, and the four dimensions of computer attitudes formed the inde- pendent multivariate profile. The number of canonical functions (factors) that can be produced for a given dataset is equal to the number of variables in the smaller of the two variable sets. Because the library-anxiety set contained five dimensions and the computer-attitude set contained four variables, four canonical functions were generated. For any significant canonical coefficient, the standard- ized canonical-function coefficients and structure coeffi- cients were then interpreted. Standardized canonical- function coefficients are computed weights that are applied to each variable in a given set in order to obtain the composite variate used in the canonical correlation analysis. As such, standardized canonical-function coeffi- cients are equivalent to factor-pattern coefficients in fac- tor analysis or to beta coefficients in a regression analysis." Conversely, structure coefficients represent the correlations between a given variable and the scores on the canonical composite (latent variable) in the set to which the variable belongs.2 5 Thus, structure coefficients indicate the degree to which each variable is related to the canonical composite for the variable set. Indeed, structure coefficients are essentially bivariate correlation coeffi- cients that range in value between -1.0 and + 1.0 inclu- sive." The square of the structure coefficient yields the proportion of variance that the original variable shares linearly with the canonical variate. I Results Table 1 presents the intercorrelations among the five dimensions of library anxiety and the four dimensions of computer attitude. Of particular interest were the twenty correlations between the library-anxiety subscale scores and the computer-attitude subscale scores. It can be seen that, after applying the Bonferroni adjustment, four of these relationships were statistically significant. Specifically, computer liking was statistically significantly related to affective barriers, knowledge of the library, and comfort with the library. Using Cohen's criteria of .1, .3, and .5 for small, medium, and large relationships, respec- tively, the first two relationships (involving affective bar- riers and knowledge of the library) were medium, and the third relationship (between computer liking and com- fort with the library) was large. 27 In addition to these three relationships, the association between computer useful- ness and knowledge of the library also was statistically significant, with a medium effect size. The correlation matrix in table 1 was used to examine the multivariate relationship between library anxiety and computer attitudes. This relationship was assessed via a canonical correlation analysis. The canonical analysis revealed that the four canonical correlations combined were statistically significant (p < .0001). Also, when the first canonical root was removed, the remaining three canonical roots were not statistically significant. In fact, removal of subsequent canonical roots did not lead to statistical significance. Together, these results suggested that only the first canonical function was statistically sig- nificant, but the remaining three roots were not statisti- cally significant. This first canonical root also was practically significant (Rc1 = .63), contributing 40.8 per- cent (Rc12) to the shared variance, which represents a large effect size. 28 140 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Data pertaining to the first canonical root are pre- sented in table 2, which provides both standardized func- tion coefficients and structure coefficients. Using a cutoff correlation of 0.3, the standardized canonical-function coefficients revealed that affective barriers, comfort with the library, and knowledge of the library made important contributions to the library-anxiety set, with affective bar- riers and comfort with the library making similarly large contributions. 29 With regard to the computer-attitude set, computer anxiety, computer liking, and computer confi- dence made noteworthy contributions, with the latter two dimensions making the most noteworthy contributions. The structure coefficients revealed that all five dimen- sions of library anxiety made important contributions to the first canonical variate. The square of the structure coefficient indicated that barriers with staff, affective bar- riers, comfort with the library, and knowledge of the library made similarly large contributions, explaining 67.2 percent, 72.3 percent, 72.3 percent, and 60.8 percent of the variance, respectively. With regard to the computer- attitude set, computer liking and computer usefulness made important contributions. These variables explained 64.0 percent and 16.8 percent of the variance, respectively. Comparing the standardized and structure coeffi- cients indicated that computer anxiety and computer con- fidence served as suppressor variables because the standardized coefficients associated with these variables were large, whereas the corresponding structure coeffi- cients were relatively small. 30 Suppressor variables are variables that assist in the prediction of dependent vari- ables due to their correlation with other independent variables. 31 Thus, the inclusion of computer anxiety and computer confidence in the canonical correlation model strengthened the multivariate relationship between library anxiety and computer attitudes. I Discussion The purpose of this study was to investigate the rela- tionship between computer attitudes and library anxi- ety among African American graduate students. Specifically, the multivariate link between these two constructs was examined. A canonical correlation analysis revealed a strong multivariate relationship between library anxiety and computer attitudes. The library-anxiety subscale scores and computer-attitudes subscale scores shared 40.82 percent of the common variance. Specifically, computer liking and computer usefulness were related simultaneously to the following five dimensions of library anxiety: barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers. Computer anxi- ety and computer confidence served as suppressor vari- ables. Thus, computer attitudes predict levels of library anxiety. As such, the present findings are consistent with those of Mizrachi and Shoham and Mizrachi, who found a sta- tistically significant relationship between computer atti- tudes and the following seven dimensions of the Hebrew Library-Anxiety Scale, a modified version of LAS devel- oped by the authors for their Israeli sample: 1. Staff 2. Knowledge 3. Language 4. Physical Comfort 5. Library Computer Comfort 6. Library Policies and Hours, and 7. Resources. 32 According to its authors, the Staff factor refers to stu- dents' attitudes towards librarians and library staff and their perceived accessibility. The Knowledge factor per- tains to how students rate their own library expertise. The Language factor relates the extent to which using English- language searches and materials yield discomfort. Physical Comfort evaluates how much the physical facil- ity negatively affects students' satisfaction and comfort with the library. Library Computer Comfort assesses the perceived trustworthiness of library computer facilities and the quality of directions for using them. Library Policies and Hours concerns students' attitudes toward library rules, regulations, and hours of operation. Finally, Resources refers to the perceived availability of the desired material in the library collection. The correlations between the dimensions of library anxiety and computer attitudes ranged from .11 (physical comfort) to .47 (knowledge). The current results also replicate those of Jerabek, Meyer, and Kordinak, who found levels of com- puter anxiety to be related to levels of library anxiety for both men and women. 33 Nevertheless, caution should be exercised in generaliz- ing the current findings to all graduate students. Though the present study examined the association between library anxiety and computer attitudes among African American graduate students, it should not be assumed that this relationship would hold for other racial groups. Jiao, Onwuegbuzie, and Bostick found that African American students attending a research-intensive institution reported statistically significantly lower levels of library anxiety associated with barriers with staff, affective barri- ers, and comfort with the library than did Caucasian American graduate students enrolled at a doctoral-grant- ing institution, with effect sizes ranging from moderate to large. 34 In a follow-up study, Jiao and Onwuegbuzie com- pared African American and Caucasian American students with respect to library anxiety, controlling for educational background by selecting both racial groups from the same institution. 35 No statistically significant racial differences THE IMPACT OF INFORMATION TECHNOLOGY ON LIBRARY ANXIETY I JIAO AND ONWUEGBUZIE 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 1. lntercorre lations among the Library-Anxiety Subscales and Computer-Att itude Subsca les Subscale 2 3 4 5 6 7 8 9 1 . Barriers with Staff .64 * .63* .49* .46* - .02 .05 -.27 -.09 * .52 * * -.05 .02 -.37 * -.23 2. Affective Barriers .56 .40 3. Comfort with the Library .56 * .44 * -.19 -.20 -.55 * -.16 _39*' -.21 -.11 -.37 * -.32 * 4. Knowledge of the Library 5. Mechanical Barriers -.13 -.01 -.18 .04 .77 * .48 * .46 * 6. Computer Anxiety .67 * .36 * 7. Computer Confidence .43 * 8. Computer Liking 9. Computer Usefulness *Indicates a statistically significant relationsh ip after the Bonferroni adjustment. Table 2. Canon ical Solution for Th ird Function-Re lationship between Library-Anx iety Subscales and Computer-Att itude Subsca les Theme Standardization Coefficient Library-Anxiety Subscale Barriers with Staff .17 Affect ive Barriers .40* Comfort with the Library .39* Know ledge of the Library .31 * Mechanical Barr iers -.12 Computer-Attitude Subscale Computer Anxiety -0.31* Computer Confidence 0.98* Computer Liking -1.25* Computer Usefulness -0 .13 *Loadings with the effect sizes larger than .3. were found in library anxiety for any of the five dimen- sions of LAS. However, across all five library-anxiety measures, the African American sample reported lower scores than did the Caucasian American sample. In fact, using the test of trend by Onwuegbuzie and Levin, they found that the consistency with which the African American graduate students had lower levels of library anxiety than did the Caucasian American students was both statistically and practically significant. 36 Thus, Jiao and Onwuegbuzie's results, alongside those of Jiao, Onwuegbuzie, and Bostick, suggest that racial differences in library anxiety prevail. 37 Thus, future research should investigate whether the relationship between library anxi- Structure Coefficient .82* .85* .85* .78* .39* -.22 - .13 -.80* -.41 * Structure•(%) 67.2 72 .3 72.3 60.8 15.2 4.8 1.7 64.0 16.8 ety and computer attitudes found in the present study among African American graduate students also exists among Caucasian American graduate students, as well as among other racial groups. Further, the causal direction of the relationship found in the current study should be investigated. That is, future studies should investigate whether library anxiety places a person more at risk for experiencing poor com- puter attitudes, or whether the converse is true. More research also is needed to determine how computer atti- tudes might play a role in the library context. Notwithstanding, it appears that the construct of library anxiety can be expanded to include the construct 142 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of computer attitudes. Indeed, one implication of the findings is that Bostick's LAS should be modified to include dimensions of computer attitudes. 38 Such a modi- fication likely would facilitate the identification of library-anxious students. By identifying students with high levels of library anxiety and poor computer atti- tudes, library educators and others could help them improve their dispositions and provide them with the skills necessary to negotiate the rapidly changing techno- logical environment, thereby putting them in a better position to be lifelong learners. References 1. Susan M. Piotrowski, Computer Training: Pathway from Extinction (ERIC Document Reproduction Service, ED 348955, 1992). 2. Thomas H. Hogan, "Drexel University Moves Aggres- sively from Print to Electronic Access for Journals (Interview with Carol Hansen Montgomery, Dean of Libraries)," Computers in Libraries 21, no. 5 (May 2001): 22-27. 3. M. Claire Stewart and H. Frank Cervone, "Building a New [nfrastructure for Digital Media: Northwestern University Library," Information Technology and Libraries 22, no. 2 (June 2003): 69-74. 4. Carol C. Kuhlthau, "Longitudinal Case Studies of the Infor- mation Search Process of Users in Libraries," Library and Informa- tion Science Research 10 (July 1988): 257-304; Carol C. Kuhlthau, "Inside the Search Process: Information Seeking from the User's Perspective," Journal of the American Society for Information Science 42, no. 5 (June 1991): 361-71; Carol C. Kuhlthau, Seeking Meaning: A Process Approach to Library and Information Services (Norwood, N.J.: Ablex, 1993); Carol C. Kuhlthau, "Students and the Informa- tion Search Process: Zones of Intervention for Librarians," Advances in Librarianship 18 (1994): 57-72; Carol C. Kuhlthau et al., "Validating a Model of the Search Process: A Comparison of Aca- demic, Public, and School Library Users," Library and Information Science Research 12, no. 1 (Jan.-Mar. 1990): 5-31. 5. Constance A. Mellon, "Library Anxiety: A Grounded The- ory and Its Development," College & Research Libraries 47, no. 2 (Mar. 1986): 160-65. 6. Ibid. 7. Qun G. Jiao, Anthony J. Onwuegbuzie, and Art Lichten- stein, "Library Anxiety: Characteristics of' At-Risk' College Stu- dents," Library and Information Science Research 18 (spring 1996): 151-63. 8. Constance A. Mellon, "Attitudes: The Forgotten Dimen- sion in Library Instruction," Library Journal 113 (Sept. 1, 1988): 137-39; Constance A. Mellon, "Library Anxiety and the Non- Traditional Student," in Reaching and Teaching Diverse Library User Groups, ed. Teresa B. Mensching (Ann Arbor, Mich.: Pierian, 1989), 77-81; Anthony J. Onwuegbuzie, "Writing a Research Pro- posal: The Role of Library Anxiety, Statistics Anxiety, and Com- position Anxiety," Library and Information Science Research 19, no. 1 (1997): 5-33. 9. Anthony J. Onwuegbuzie and Qun G. Jiao, "Information Search Performance and Research Achievement: An Empirical Test of the Anxiety-Expectation Model of Library Anxiety," four- nal of the American Society for Information Science and Technology (JASIST) 55, no. 1 (2004): 41-54; Anthony J. Onwuegbuzie, Qun G. Jiao, and Sharon L. Bostick, Library Anxiety: Theory, Research, and Applications (Lanham, Md.: Scarecrow, 2004). 10. Diane Mizrachi, "Library Anxiety and Computer Atti- tudes among Israeli B.Ed. Students" (master's thesis, Bar-Ilan University, Israel, 2000); Snunith Shoham and Diane Mizrachi, "Library Anxiety among Undergraduates: A Study of Israeli B.Ed. Students," Journal of Academic Librarianship 27, no. 4 (July 2001): 305-11. 11. Ann J. Jerabek, Linda S. Meyer, and Thomas S. Kordinak, "'Library Anxiety' and 'Computer Anxiety': Measures, Validity, and Research Implications," Library and Information Science Research 23, no. 3 (2001): 277-89. 12. Muhamad A. Al-Khaldi and Ibrahim M. Al-Jabri, "The Relationship of Attitudes to Computer Utilization: New Evi- dence from a Developing Nation," Computers in Human Behavior 9, no. 1 (Jan. 1998): 23-42; Margaret Cox, Valeria Rhodes, and Jennifer Hall, "The Use of Computer-Assisted Learning in Pri- mary Schools: Some Factors Affecting Uptake," Computers in Education 12, no. 1 (1988), 173-78; Gayle V. Davidson and Scott D. Ritchie, "Attitudes toward Integrating Computers into the Class- room: What Parents, Teachers, and Students Report," Journal of Com- puting in Childhood Education 5, no. 1 (1994): 3-27; Donald G. Gardner, Richard L. Dukes, and Richard Discenza, "Computer Use, Self-Confidence, and Attitudes: A Causal Analysis," Com- puters in Human Behavior 9, no. 4 (winter 1993): 427-40; Robin H. Kay, "Predicting Student Teacher Commitment to the Use of Computers," Journal of Educational Computing Research 6, no. 3 (1990): 299-309. 13. Deborah Bandalos and Jeri Benson, "Testing the Factor Structure Invariance of a Computer Attitude Scale over Two Grouping Conditions," Educational and Psychological Measure- ment 50, no. 1 (Spring 1990): 49-60; Frank M. Bernt and Alan C. Bugbee Jr., "Factors Influencing Student Resistance to Computer Administered Testing," Journal of Research on Computing in Education 22, no. 3 (spring 1990): 265-75; Michel Dupagne and Kathy A. Krendl, "Teacher's Attitudes toward Computers: A Review of the Literature," Journal of Research on Computing in Education 24, no. 3 (Spring 1992): 420-29; Elizabeth Mowrer-Popiel, Constance Pollard, and Richard Pollard, "An Analysis of the Perceptions of Preservice Teachers toward Technology and Its Use in the Class- room," Journal of Instructional Psychology 21, no. 2 (June 1994): 131-38; Jennifer D. Shapka and Michel Ferrari, "Computer- Related Attitudes and Actions of Teacher Candidates," Comput- ers in Human Behavior 19, no. 3 (May 2003): 319-34. 14. Valentina Mclnerney, Dennis M. Mclnerney, and Kenneth E. Sinclair, "Student Teachers, Computer Anxiety, and Computer Experience," Journal of Educational Computing Research 11, no. 1 (1994): 27-50. 15. Susan E. Jennings and Anthony J. Onwuegbuzie, "Com- puter Attitudes as a Function of Age, Gender, Math Attitude, and Developmental Status," Journal of Educational Computing Research 25, no. 4 (2001): 367-84. 16. Jiao, Onwuegbuzie, and Lichtenstein, "Library Anxiety," 152. 17. Mizrachi, "Library Anxiety and Computer Attitudes"; Shoham and Mizrachi, "Library Anxiety among Undergraduates." 18. Mizrachi, "Library Anxiety and Computer Attitudes"; Shoham and Mizrachi, "Library Anxiety among Undergraduates"; THE IMPACT OF INFORMATION TECHNOLOGY ON LIBRARY ANXIETY I JIAO AND ONWUEGBUZIE 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Jerabek, Meyer, and Kordinak, '"Library Anxiety' and 'Com- puter Anxiety."' 19. Brenda H. Loyd and Clarice Gressard, "The Effects of Sex, Age, and Computer Experience on Computer Attitudes" AEDS Journal 18, no. 2 (1984): 67-77. 20. Sharon L. Bostick, "The Development and Validation of the Library Anxiety Scale" (Ph.D. diss, Wayne State University, 1992). 21. Qun G. Jiao and Anthony J. Onwuegbuzie, "Reliability Generalization of the Library Anxiety Scale Scores: Initial Find- ings/' (unpublished manuscript, 2002). 22. Onwuegbuzie, Jiao, and Bostick, Library Anxiety, 22. 23. Norman Cliff and David J. Krus, "Interpretation of Canonical Analyses: Rotated versus Unrotated Solutions," Psy- chometrica 41, no. 1 (Mar. 1976): 35-42; Richard B. Darlington, Sharon L. Weinberg, and Herbert J. Walberg, "Canonical Variate Analysis and Related Techniques," Review of Educational Research 42, no. 4 (fall 1973): 131-43; Bruce Thompson, "Canonical Corre- lation: Recent Extensions for Modeling Educational Processes" (paper presented at the annual meeting of the American Educa- tional Research Association, Boston, Mass., Apr. 7-11, 1980) (ERIC, ED 199269); Bruce Thompson, Canonical Correlation Analysis: Uses and Interpretations (Newbury Park, Calif.: Sage, 1984); Bruce Thompson, "Canonical Correlation Analysis: An Explanation with Comments on Correct Practice" (paper pre- sented at the annual meeting of the American Educational Research Association, New Orleans, La., Apr. 5-9, 1988) (ERIC, ED 295957); Bruce Thompson, "Variable Importance in Multiple Regression and Canonical Correlation" (paper presented at the annual meeting of the American Educational Research Associa- tion, Boston, Mass., April 16-20, 1990) (ERIC, ED 317615). 24. Margery E. Arnold, "The Relationship of Canonical Corre- lation Analysis to Other Parametric Methods" (paper presented at the annual meeting of the Southwest Educational Research Association, New Orleans, La., Jan. 1996) (ERIC, ED 395994). 25. Thompson, "Canonical Correlation: Recent Extensions." 26. Ibid. 27. Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences (New York: Wiley, 1988). 28. Ibid. 29. Zarrel V. Lambert and Richard M. Durand, "Some Pre- cautions in Using Canonical Analysis," Journal of Marketing Research 12, no. 4 (Nov. 1975): 468-75. 30. Anthony J. Onwuegbuzie and Larry G. Daniel, "Typology of Analytical and Interpretational Errors in Quantitative and Qualitative Educational Research," Current Issues in Education 6, no. 2 (Feb. 2003). Accessed Nov. 13, 2003,http://cie.ed.asu.edu/ volume6/number2/. 31. Barbara G. Tabachnick and Linda S. Fidell, Using Multi- variate Statistics, 3rd ed. (New York: Harper), 1996. 32. Mizrachi, "Library Anxiety and Computer Attitudes"; Shoham and Mizrachi, "Library Anxiety among Undergraduates." 33. Jerabek, Meyer, and Kordinak, '"Library Anxiety' and 'Computer Anxiety."' 34. Qun G. Jiao, Anthony J. Onwuegbuzie, and Sharon L. Bostick, "Racial Differences in Library Anxiety among Graduate Students," Library Review 53, no. 4 (2004): 228-35. 35. Qun G. Jiao and Anthony J. Onwuegbuzie, "Library Anx- iety: A Function of Race?" (unpublished manuscript, 2003). 36. Anthony J. Onwuegbuzie and Joel R. Levin, "A Proposed Three-Step Method for Assessing the Statistical and Practical Significance of Multiple Hypothesis Tests" (paper presented at the annual meeting of the American Educational Research Asso- ciation, San Diego, Calif., Apr. 12-16, 2004). 37. Jiao, Onwuegbuzie, and Bostick, "Racial Differences in Library Anxiety." 38. Bostick, "The Development and Validation of the Library Anxiety Scale." 144 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 9656 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Beyond Information Architecture: A Systems Integration Approach to Web-site Design Maloney, Krisellen;Bracke, Paul J Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 145 Beyond Information Architecture: A Systems Integration Approach to Web-site Design Krisellen Maloney and Paul J. Bracke Users' needs and expectations regarding access to infor- mation have fundamentally changed, creating a discon- nect between how users expect to use a library Web site and how the site was designed. At the same time, library technical infrastructures include legacy systems that were not designed for the Web environment. The authors propose a framework that combines elements of informa- tion architecture with approaches to incremental system design and implementation. The framework allows for the development of a Web site that is responsive to changing user needs, while recognizing the need for libraries to adopt a cost-effective approach to implementation and maintenance. T he Web has become the primary mode of informa- tion seeking and access for users of academic libraries. The rapid acceptance of Web technologies is due, in part, to the ubiquity of the Web browser, which presents a user interface that is recognized and under- stood by a broad range of users. As libraries increase the amount of content and broaden the range of services available through their Web sites, it is becoming evident that it will take more than a well-designed user interface to completely support users' information-seeking and access needs. The underlying technical infrastructure of the Web site must also be organized to logically support the users' tasks. Library technical infrastructures, largely designed to support traditional library processes, are being adapted to provide Web access. As part of this adaptation process, they are not necessarily being reor- ganized to meet the changing expectations of Web-savvy users, particularly younger users who are not familiar with traditional library organization methods such as the card catalog, print indexes, or other legacy tools. Libraries must harness the power of the highly struc- tured information systems that have long been a part of libraries and integrate these systems in new ways to support users' goals and objectives. Part of this chal- lenge will be answered by the development of new sys- tems and technical standards, but these are only a partial solution to the problem. An important part of making library systems and Web sites function as powerful dis- covery tools is to modernize the systems that provide existing services and content to support the changing needs and expectations of the user. Emerging concepts of information architecture (IA) describe the system requirements from the user perspective but do not pro- vide a mechanism to conceptually integrate existing functions and content, or to inform the requirements necessary to modernize and integrate the current system architecture. The authors propose a framework for approaching a comprehensive Web-site implementation that combines components of IA and system modernization that have been successful in other industries. Within this frame- work, those components are tailored for the unique aspects of information provision that characterize a library. The proposed framework expands the concept of IA to include functional and content requirements for the Web site. This expansion identifies points within the con- ceptual and physical design where user requirements are constrained by the existing infrastructure. Identification of these constraints begins an iterative design process in which some user requirements inform changes to the underlying system architecture. Conversely, when the required changes to the underlying system architecture cannot be achieved, the constraints inform the conceptual design of the Web site. The iterative nature of this approach acknowledges the usefulness of much of the existing infrastructure but provides an incremental approach to modernizing installed systems. This frame- work describes aspects of the conceptual and physical- design elements that must be considered together and balanced to produce a Web site that supports the goals and objectives of the user but is cost-effective and practi- cal to implement. I Information Architecture and the Problem of Libraries IA is both a characteristic of a Web site and an emerging discipline. A number of authors have attempted to develop a formal definition of IA. Wodtke presents a sim- ple task-based definition, stating that an information architect "creates a blueprint for how to organize the Web site so that it will meet all (business, end user) these needs." 1 Rosenfeld and Marville present a four-part defi- nition in which two parts focus on the practice, and two parts define IA as characteristic. The first characteristic defines IA as a combination of "organization, labeling, and navigation schemes" while the second describes it as "the structural design of an information space to facilitate task description and intuitive access to content." 2 There is general agreement that IA provides a specification of the Web site from the perspective of the user. The specification usually describes the organization, navigational elements, Krisellen Maloney (maloneyk@u.library.arizona.edu) is Director of Technology at the University of Arizona Libraries, Tucson. Paul J. Bracke (paul@ahsl.arizona.edu) is Head of Systems and Net- working at the Arizona Health Sciences Library, Tucson. BEYOND INFORMATION ARCHITECTURE I MALONEY AND BRACKE 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and labeling required to completely structure a user's Web-site experience. IA is not synonymous with Web-site design, but rather provides the conceptual foundation upon which a presentation design is based. Web-site design adds presentation and graphical elements to IA to create the user experience. Library Web sites provide a display platform by which library content and services can be accessed through a common user interface. Most of the tools and services have been available for decades and, in response to user demand, are increasingly being made Web-acces- sible in digital formats (virtual reference, full-text data- bases). Despite this new access medium and format, the conceptual design of the underlying systems has not changed much. The library technical infrastructure is made up of many loosely coupled systems optimized to perform a single function or to support the work of a library department. Library Web sites do not present a sufficiently unified interface design or level of technical integration to match current users' mental models of information seeking and access. 3 The systems have not been integrated to support users' overarching goals or meet the expectation of seamless access that they have developed when using other Web sites (such as Google or Amazon). In many cases, users are still expected to understand aspects of the library that are now obsolete (card catalogs) in order to navigate the library's Web site. For example, the process of finding a journal article using a typical library Web site is based on a print para- digm and has changed little despite the advent of online discovery tools. In a print environment, users first looked at an index to identify an article of interest, then wrote down the citation, went to the card catalog, and there looked up the journal containing the article. If the library owned the journal, the user would then write down the call number and go to the shelves to find the article. This process has not necessarily changed much for many libraries, even though indexes, card catalogs, and journals are often available online. Even more confusing is that the end result of some search processes within a library Web site is not necessarily content, but a metadata representa- tion of content that must be entered into another search box. Although the first search is representative of the search of a traditional index and the second search is rep- resentative of the search of the card catalog, many of our users have no mental model for this multistep search process. Users accustomed to the simple keyword search available through Internet search engines may have great difficulty in understanding the need for the many steps involved in library use. There is an expectation that search systems and online content will be linked, regard- less of the economic, legal, and technical factors that make these links difficult. While linking-options in ven- dor databases and OpenURL resolvers have begun to simplify the electronic version of the process by automat- ing some of the steps, the multistep process is still valid in many instances in most libraries. It is clear that library Web sites must undergo a fun- damental change in order to be responsive to the needs of the user. Because library Web sites appear to be similar to conventional Web sites, it is tempting to adopt a general approach to IA to address users' needs. There are, how- ever, several areas in which the general approach to IA does not adequately support the design needs for library Web sites. Generalized IA approaches, such as those provided by Rosenfeld and Marville, do not provide adequate guid- ance regarding the organization and display of content from external sources. There is an unstated assumption that external sources will provide information in the for- mat specified by the Web-site architect. IA approaches suggest methods to completely describe the user experi- ence, from the time a user first accesses a site to the point at which a user task is complete, regardless of the origin of the content or service accessed. For example, the con- tent from each of Amazon.com' s commercial partners is packaged to operate like a part of the Amazon.com site. In contrast, libraries often only have control of a user's experience up to the point at which they leave a library's servers. Libraries guide users not only to local services and digitized collections, but to databases, journals, and more that are licensed from external sources and the appearances of which are controlled by external sources. Even when using a technical standard such as Z39.50 to provide a local look and feel to remote resources, libraries do not necessarily have full control over the data format or elements of the content that is returned. This lack of local control over content is a limitation to libraries adopt- ing common definitions of IA. Another design area that is not well supported by generalized approaches to IA is the integration of previ- ously installed systems, such as library catalogs. These legacy systems provide important services that represent decades of development and collaboration, and are essen- tial to the future of libraries. For example, libraries pro- vide access to unique resources and systems ranging from online catalogs to abstracting and indexing databases to interlibrary loan (ILL) networks. Libraries are using Web technologies to provide new access methods to library content and services. These technologies provide a thin veneer on systems that function in a manner unfamiliar to many users. The challenge then becomes to change what lies beneath the surface, the underlying functionality of the site, to support the needs of the user. Using a general- ized approach to IA, as applied in other settings, libraries would assess the needs of the user and develop a new, complete system that supports those needs. Such an approach ignores the extensive, existing infrastructure of legacy systems in libraries that is still useful and that serves purposes beyond the user's Web interface. What is 146 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. needed is a standard reference model for library services that provides a framework for access to services and con- tent. This is a long-term goal that requires cooperation and agreement among libraries, and that would allow legacy systems to be repackaged in ways that are more flexible, meet changing user needs, and can be integrated into changing technology environments. Because there are cur- rently no such reference models, librarians need to develop other approaches to integrate existing legacy sys- tems into a modernized Web site. I Extending the IA Framework In this paper, the general definition of IA that has been proposed by several authors has been extended to incor- porate the additional constraints that characterize library Web sites.4 Extended Information Architecture (EIA) is the first half of the framework, and provides a complete conceptual design of the Web site from the users' per- spective. Figure 1 depicts the elements and relationships within EIA. The coordinating structure provides an over- arching framework for the integration of the multiple service elements that provide much of the underlying functionality of the Web site. The relationship between the coordinating structure and the service elements is iterative, with service elements constraining the coordi- nating structure and the coordinating structure informing the design of the service elements. The Coordinating Structure The coordinating structure contains many of the design elements that are found in generalized approaches to IA, including the organization, navigational structure, and labeling. These are the elements of a Web site that, in con- cert, define the structure of the user interface without specifying the functionality and content underlying that interface. The framework emphasizes aspects of the gen- eralized approaches that are most relevant to libraries and places them in relation to the service elements that specify the content and functionality of the site. The first element of the coordinating structure is the organization of the Web site. Organization refers to the logical groupings of the content and services that are available to the user. These groupings are not necessarily representative of physical-system implementations, but may be task- or subject-based instead. For example, many academic library Web sites have primary groupings that include information resources, services, and user guides. Although the information resources may include infor- mation from a range of systems (for instance, the catalog, abstracting and indexing databases, full-text databases, locally-developed exhibits), the logical grouping of infor- mation resources unifies the concept for the user. A site's organization scheme will often serve as the foundation for the primary navigational choices on a site's main menu or primary navigational bar. Another component of the coordinating structure is the navigational structure of the site. Navigational struc- tures define the relationships between content and serv- ice elements of a site, and between groupings in the site's organization. These structures also include search tools and other link-management tools that help users locate needed content and services. There are usually two types of relationships that form a navigational structure. First is the definition of a global relationship scheme that out- lines the primary navigational structure of the site. These often define relationships between sections of a site's organization, but may also provide access to key pieces of functionality from any point within a site. In addition to the overarching global relationship scheme, there are often several locally or functionally defined relationship schemes that are used throughout the site. These local relationship schemes are usually located within a service or content grouping and provide logical connections within their defined grouping. Both sets of relationships are designed to support a task and provide pathways for the user to move among the various elements of the site. Other relationship schemes may be topic oriented, allow- ing the user to move easily among similar content sources. These logical relationships are later implemented within a user interface as tools such as menus, navigation bars, and navigation tabs when combined with labels and a visual design. Customization and personalization are navigational structures that have gained a fair amount of attention in the library literature. Both strategies allow a Web site to be displayed differently, based on user characteristics. Customization allows the user to create the relationships most suitable for his or her needs. This strategy has been explored by a number of libraries, although there is little convincing evidence that users implement such strate- gies in an intense or repeated manner. 5 Personalization allows a system designer to bring together a set of pages in a relationship that is meaningful for a user or a user group. Labels, the third element of the coordinating struc- ture, provide signposts that communicate an integrated view of a Web site's design to those who use it. It is important to define a labeling system that consistently and clearly communicates the meaning of the site to the user. Accordingly, the labels should be constructed in the user's language, not the librarian's. For example, a user may not understand that an abstracting and index- ing database will provide them with information regarding journal articles that are relevant to a topic of interest. In that case, the label "Find an article" is more useful than "Indexes." BEYOND INFORMATION ARCHITECTURE I MALONEY AND BRACKE 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Extended Informati on Architecture Coordinating Structure • Organization: The grouping and specification of the funct ion and content that is necessary to support the site. • Navigational Structure: The associations among the service and content elements of the site. These relationships provide the conceptual foundation for navigation and include global and local navigationa l concepts, site index and search, customizab le and personalized structures. • Labeling: A consistent naming scheme that presents options and choices to users in terms that will understand. Serv ice Elements • Functio nal Requirements: The description of the functional elements that are necessary to support the user. • Content Requirements: The description of the content elements that are necessary to support the user. • Content Specifications: The description of the content elements that are already available to support the user. • Functional Specifications: The description of the functional elements that are present in a previously installed system. Figure 1. An Extended Information Architecture for Developing a Conceptual Design of Library Web Sites Labels are used to describe individual service or con- tent units, but may also be used as headings to provide structural elements to augment the navigational scheme. The consistent use of labels as headings within the site not only increases user understanding of the site, but may also be explicitly constructed to support user tasks. An example of labeling to support tasks can be seen on the University Libraries Web site of the University of Louisville where, under the main heading for Articles, the first subheading is Step 1: Search article databases; and the second subhead- ing is Step 2: Search (the catalog) by journal title." Service Elements Service elements are the second major component of Extended Information Architecture, and represent the content and functionality of the Web site. In this frame- work, the service elements serve a dual purpose. The def- inition of service elements involves defining both the ideal requirements for functionality and content as well as the specifications of what is currently available. The definition process can then be used to identify points in the Web site where new functions and content need to be added, or where existing functionality must be modern- ized. These additions and modifications may be achiev- able immediately, but in many cases an incremental plan for change may need to be developed. The service-element requirements, labeled as Functional Requirements and Content Requirements in figure 1, express the users' needs and expectations for the functional or content elements of the Web site. The pur- pose of the requirements definitions is to describe the service elements that are necessary to allow a user to meet his or her goals or objectives in using the site. These requirements are a representation of the ideal composi- tion of a Web site, and inform not only the immediate implementation of the site but also the development of future systems and the modernization of existing sys- tems. It is also important to note that the requirements should be developed to express user needs, not a particu- lar implementation option. For example, it might be tempting to specify the implementation of a particular vendor's OpenURL resolver. This does not, however, describe how the system would function ideally from a user perspective. Instead, an appropriate requirement would be that users should be able to link to full text from all citations in an abstracting and indexing database. More specifically, content requirements describe the content that is necessary to meet the users' goals and objectives. Access to content is often the primary empha- sis of a library Web site, and the content requirements describe the intellectual content that should be accessible through a Web site. Examples of content that might be required are article citations, full-text articles, and multi- media objects. Normally, these requirements will be closely connected with library-wide collection-develop- ment policies and priorities, and should be driven by sub- ject specialists rather than systems personnel. These requirements inform the development of systems to meet the needs of the users. The content specifications describe the content that is available within the current systems. There are many reasons why content requirements and content specifications do not match, including the inabil- ity or choice of a library to acquire a particular piece, the unavailability of specified content, or technical incompat- ibilities between content and the library's infrastructure. Although content is sometimes viewed as the core component of a library Web site, there is also great deal of additional functionality that is provided to users. The functional requirements describe the users' needs and expectations of the functionality in the context of com- pleting tasks on the Web site. For example, ILL forms found on many sites are easy for the user to fill out, although the most effective interface to ILL for the user might not involve a form-based user interface at all. It might be a direct system-to-system interface from an OpenURL Resolver to the ILL software in which all cita- tion data are transmitted for the user. This requirement is not necessarily obvious when considering ILL in isola- tion, but is evident when considering it in the larger con- text of the users' goals and objective for the entire Web site. The functional specifications describe the functions 148 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as they exist in the installed base of systems and expose the functionality that is available to the user. When the specifications do not match the requirements, the users' expectations regarding the system will not be fully achieved. The economic and technical limitations of system implementation and modernization often reduce the speed at which the large base of previously installed sys- tems can be modified to meet users' changing needs and expectations. It is thus critical to identify gaps between existing systems and desired systems and discover areas where a Web site will have characteristics that are not completely aligned with what the user needs or expects. When the service-element requirements do not match the service-element specifications of existing systems, an iter- ative design process begins. This process will be inter- twined with the evaluation of the system architecture. Gaps that can be addressed immediately should be incor- porated into an implementation plan for the new Web site. Longer-term migration or development plans can be developed to fill gaps that cannot be addressed immedi- ately. It is also important to acknowledge that developing and meeting service-element requirements is an iterative process. They will need to be revisited over time as user needs change, and requirements that are met now become the specifications that are evaluated in the future. I Interrelationships within EIA When the service-element requirements cannot be used to modify the service-element specifications, the service ele- ments constrain the design of the Web site and influence the design of the coordinating structure. The upward arrow in figure 1 labeled Constrains indicates that the user experience is constrained by the specifications of content or functional elements that are not currently changeable. In such situations, the coordinating structure must be designed to provide additional context for the user to understand the purpose of the existing service ele- ments. This explanatory role can be seen in the imple- mentation of many Web sites as formal parts of the organizational structure designed to explain the idiosyn- crasies of the Web site to the user. For example, many aca- demic library Web sites have tutorials, FAQs, or sections labeled "How do I . . . ?" that provide tips on using aspects of the site that are not always evident to users. It is necessary to acknowledge the usefulness of the explanatory role of the coordinating structure in the iter- ative and incremental processes of Web-site design. Just as bibliographic instruction and adequate signage have allowed the user to navigate aspects of the traditional library that were not intuitive, the coordinating structure provides the conceptual signposts and other guidance required for users to effectively navigate the Web site. At the same time, it is important to realize that the explana- tory role would not be necessary if the Web site's archi- tecture and design were intuitive to the user. As the design of the service elements changes to accommodate the larger goals of the user, the explanatory function of the coordinating structure will be diminished. The main goal of library Web site design should be to reduce the explanatory role of the coordinating structure and to develop service elements that seamlessly support the goals and objectives of the user. Until all service elements have been modernized to meet the needs of the user, the conceptual design of Web sites will represent a compro- mise between what users require and what it is possible for users to do within the current legacy information infrastructure I System Architecture While the conceptual design of the Web site describes the needs of the user apart from the technical details of the implementation, the system architecture is the descrip- tion of the system as it exists. In the case of library Web sites, the system architecture is not limited to the func- tionality and data on the library's Web server. Instead, it is also inclusive of all core infrastructure, individual sys- tems, and data access and storage mechanisms that pro- vide the blueprint of the Web site's backend as it has been built. The individual systems in the architecture may include locally controlled ones, (for instance, an online catalog), but will also include remote systems such as abstracting and indexing databases mounted by a vendor. A definition of the design of the existing system plays a key role in the evolutionary specification of the system because it provides developers with a greater under- standing of the possibilities and constraints of the existing infrastructure. In describing a system architecture, sev- eral formal representations can be used that capture vari- ous aspects of the system's capabilities at different levels of granularity. These include module views that provide static specifications of individual components; compo- nent and connector views that provide dynamic views of processes; and deployment views that incorporate hard- ware elements.' The selection of representations is beyond the scope of this paper. Typical elements of a system architecture can be seen in figure 2. For the paper, three classes of components are being considered, although more may be introduced if applicable locally. The core-infrastructure components are fundamental services and information that support one or more systems or subsystems. In a typical library environ- ment this includes authentication services, Web platforms, and the network. In some library environments, external BEYOND INFORMATION ARCHITECTURE I MALONEY AND BRACKE 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. units may maintain some or all of these components. For example, many college campuses maintain an authentica- tion infrastructure in the campus computing office. Overall, core infrastructure provides the glue for tying together the many applications that libraries attempt to integrate in their Web sites. The system architecture should include details regarding the standards and interfaces that are used within the library technical infrastructure. Many of the applications in the library environment are off-the-shelf components that have been developed by external vendors. These off-the-shelf components may include the catalog, ILL modules, electronic-course reserves, and virtual-reference systems. Although indi- vidual libraries may have some control over configura- tion options in these applications, they are likely to have little influence over the basic functionality or data formats provided by these systems. Core functionality tends to change based on the demands of many libraries looking for similar functionality. Despite the lack of functional control over these systems, components developed by external vendors may provide standards-based system interfaces to their functionality. These usually take the form of industry-supported standards or vendor-sup- plied application programming interfaces and give libraries some flexibility in working with these compo- nents. Explicit descriptions of the available standard and proprietary interfaces should be included within the sys- tem architecture. Other applications may have been developed within the library and so can be changed more easily. Examples of locally developed applications typically include subject pages, information about the library, and digital Web exhibits and collections. Although local development does provide more control over the appearance and functional- ity of a piece of software, it is not without problems. Local development is often conducted using a bricolage approach, solving specific problems singularly, without giving consideration to the larger networks of systems in which the solutions operate. When such approaches do not take into account larger issues of systems architecture, opportunities to solve a broader range of problems may be missed and subsequent repackaging of these solutions may be limited or impossible. Libraries frequently also have a limited number of programmers, often remedied by pulling librarians or staff from other duties. While this cer- tainly can allow libraries to meet some user needs, the lack of software-engineering skills in libraries may result in local solutions that are inflexible and that do not support standards for data storage or interchange. Because the internal design of these applications is accessible and mod- ifiable, the system architecture should include more exten- sive descriptions of the internal features and relationships that they contain. Although this will not completely allevi- ate the problems of software maintenance, it will provide a better foundation for decisions regarding future migration. System Architecture Applications (off-the-shelf and locally developed) Specification of the access mechanisms and standar ds for previously installed systems including: • Catalog • Interlibrary-Loan • Electronic Reserves • Abstracting and Indexing Databases • Content Management Systems • Legacy Web Content Core Infrastructure • Authentication: The va lidation of a users identity based on creden tials. Increas ingly a part of a campus-wide infrastructure . • Web Platforms : Operating systems, server software and application software the provide the general foundation for the Website. • Network: The communication infrastructur e within the library system and connect ing to the Internet. Information Storage and Access • Storage: The definition of storage structures including relational or hierarchical schema. Character format specifications. • Standards: Standards available for access to the data. These include formats like MARC, Dublin Core and mechani sms like 239 .50 and ODBC . Figure 2. Eleme nts of a System Archit ec tur e Finally, typical library architectures consist of links to resources that are licensed or organized on behalf of the user. These include abstracting and indexing databases, full-text content provided by publishers outside of the library, and general vetted Internet sites. Linking the user to the system usually provides access to these systems, and libraries have no control over the technical imple- mentations of such resources. Newer federated search technologies are integrating into the library infrastructure the users' access to the site and to results from the sites, and linking tools make the interrelationships between these systems more easily understood. Nevertheless, inte- grating these resources into a Web site in a manner that makes sense to library users is a challenge. The access mechanisms and information formats required to com- municate with the site should be clearly documented within this system architecture. I Interrelationship of the Information and System Architectures Reacting to the rapid pace of change can result in an ad- hoc or haphazard approach to Web-site design. The sec- tions above describe a systematic approach to include and evaluate changes to the Web site. In order to imple- 150 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ment the changes and create a Web site that is scalable and made of reusable components, it is necessary to eval- uate, plan, and document all changes to the system . Figure 3 graphically depicts the interrelationship between EIA and system architecture. User needs, as described by IA, should inform the development of technical infra- structure. The Informs arrow indicating that EIA informs the design and development of th e system architecture depict s this interrelationship. The Constrains arrow des- ignates the reality that some aspects of the existing infra- structure cannot be changed within this planning cycle and will limit the library 's ability to immediately change the underlying content and function of the Web site. When mapping the conceptual d esign to the physical design, there will be gaps that represent functionality that cannot be supported, either fully or in part, by the current system architecture and thus constrain the full imple- mentation of the conceptual design . If IA is then to be implemented as fully as possible, these gaps identify the modification s and additions that must be carefully evaluated, designed , and implemented within the underlying system architecture. Gaps can be addressed in a variety of ways. If there is a total gap in functionality, a system can be deve loped or implemented to provide the desired functionality as part of the larg er system architecture. This may result in a complete devel- opment project or in the specification of an off-the-sh elf application to meet the newly identified demand. In the case where an existing system has some of the required functionality but is not completely suitable for the users ' goals and objectiv es, an incremental approa ch of modernization can be adopted . Modernization sur- rounds "the legacy system with a so ftware layer that hid es the unwanted complexity of the old system and exports a modern interface ."" This is done to provide integration with a modern operating environment while retaining the data and exposing the functions of the exist- ing system, if desired. Techniques range from screen scraping to the implementation of Web services to export access to functions that are still relevant within the new context. All of these chang es beco me part of the system architecture for future iteration s of change. Gaps that can- not be immediately added or changed to meet the speci- fied requirements become constraints in the next iteration of conceptual design. In the absence of a plan, the underlying systems will continue to undergo constant evolutionary changes, ostensibly to meet the changing needs and workflows of both users and staff. Change comes from many sources, including local implementations and modifications, external vendors, and industry-wide changes in stan- dards. This rapid but incremental change can produce a system that is very difficult to maintain and that provides few reusable modules. Having a well-documented imple- mentation and integration plan will not guarantee that Extended Inform atio n Archit ecture System Archit ect ure Coro Infrastructu re • Authonticatlon • Web Platforms • Network Figure 3. The Interrelat ionsh ip between the Conceptual and Physical Design of the Library Web Site the library will not experi ence the negative effects of tech- nological change, but it does allow a library to b ette r manage change in meeting the needs of its users. Th e more explicitly and clearly th e modifiable featur es are documented within the sys te m architecture; the easier it will be to plan to fill the gaps. I Conclusion Library users' mental models of library processes hav e fundamentally changed, creating a serious disconn ect between how users expect to use a library Web site and how the site was design ed. In particular , user expectations regarding the numb er of step s that must be completed have changed. At the same time, library technical infra- structures are composed, in part , of legacy systems that provide great value and facilitate interlibrary resour ce sharing, but were not designed for the Web environm ent. It is essential that librari es develop new approaches to th e conceptual design of Web sites that support current and future changes to both user behaviors and to library sys- tems architectures. In th e long run, these approach es should contribute to th e development of a referenc e model for the description of library services. The authors have proposed a complete framework for conceptual design and physical implementation that is responsive to changing user ne eds while recogni zing the need for libraries to adopt an efficient and cost-effe ctive approach to Web-site design, implementation, and main- tenanc e. Functional and content needs of the user are id entified and molded into a conceptual design based on a broadened perspectiv e of the users' objectiv es . Mapping conceptual requirem ents to physical architec- tures is an important part of this framework, using an architectural representation in combination with descrip- tions of integration elements that have been developed to support the incremental and iterative change. BEYOND INFORMATION ARCHITECTURE I MALONEY AND BRACKE 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The ability to respond is essential, necessitated by the rapid change in the technical and user environments in which libraries operate. The framework is designed to allow logical and informed decisions to be made through- out the process regarding when to create new systems, when to replace or modernize existing systems, and when to improve the conceptual signage of the Web site. References 1. Christina Wodtke, Information Architecture: Blueprints for the Web (Indianapolis: New Riders, 2003), 2. Louis Rosenfeld and Peter Marville, Information Architec- ture for the World Wide Web, 2nd ed. (Cambridge, Mass.: O'Reilly, 2002), 4. 3. Bob Gerrity, Theresa Lyman, and Ed Tallent, "Blurring Services and Resources: Boston College's Implementation of MetaLib and SPX," Reference Services Review 30, no. 3 (2002): 229-41; Barbara J. Cockrell and Elaine Anderson Jayne, "How Do I Find an Article? Insights from a Web Usability Study," Jour- nal of Academic Librarianship 28, no. 3 (May 2002): 122-32. 4. Jesse James Garrett, Elements of User Experience (Indi- anapolis: New Riders, 2002); Rosenfeld and Marville, Information Architecture. 5. James S. Ghapery and Dan Ream, "VCU's My Library: Librarians Love It ... Users? Well Maybe," Information Technol- ogy and Libraries 19, no. 4 (Dec. 2000): 186-90; James S. Ghapery, "My Library at Virginia Commonwealth University: Third Year Evaluation," D-Lib Magazine 8, no. 7 /8 (July/ Aug. 2002). Accessed July 16, 2003, www.dlib.org/ dlib/july02/ ghaphery / 07ghaphery.html. 6. University of Louisville Libraries Web site (2003). Accessed July 16, 2003, http:/ /library.louisville.edu. 7. Craig Larman, Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design (New Jersey: Prentice Hall PTR, 1998); Martin Fowler, Analysis Patterns: Reusable Object Models (Boston: Addison-Wesley, 1997); James Rumbaugh, Ivar Jacobson, and Grady Booch, The Unified Modeling Language Ref- erence Manual (Boston: Addison-Wesley, 1999); Robert C. Sea- cord, Daniel Plakosh, and Grace A. Lewis, Modernizing Legacy Systems: Software Technologies, Engineering Processes, and Business Practices (Boston: Addison-Wesley, 2003). 8. Seacord, Plakosh, and Lewis, Modernizing Legacy Systems, 9. 152 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 9657 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Policies Governing Use of Computing Technology in Academic Libraries Vaughan, Jason Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 153 Policies Governing Use of Computing Technology in Academic Libraries The networked computing environment is a vital resource for academic libraries. Ever-increasing use dictates the prudence of having a comprehensive computer-use policy in force. Universities often have an overarching policy or policies governing the general use of computing technology that helps to safeguard the university equipment, software, and network against inappropriate use. Libraries often benefit from having an adjunct policy that works to empha- size the existence and important points of higher-level poli- cies, while also providing a local context for systems and policies pertinent to the library in particular. Having computer-use policies at the university and library level helps provide a comprehensive, encompassing guide for the effective and appropriate use of this vital resource. F or clients of academic libraries, the computing envi- ronment and access to online information is an essential part of everyday service-every bit as vital as having a printed collection on the shelf. The computing environment has grown in positive ways-higher-caliber hardware and software, evolving methods of communi- cation, and large quantities of accurate online information content. It has also grown in many negative ways-the propagation of worms and viruses, other methods of hacking and disruption, and inaccurate informational content. As the computing environment has grown, it has become essential to have adequate and regularly reviewed policies governing its use. Often, if not always, overarching policies exist at a broad institutional or even larger systemwide level. Such policies can govern the use of all university equipment, software, and network access within the library and elsewhere on campus, such as cam- pus computer labs. A single policy may encompass every easily conceivable computing-related topic, or there may be several individual policies. Apart from any document drafted and enforced at the university level, various pub- lic laws exist that also govern appropriate computer-use behavior, whether in academia or on the beach. Many institutions have separate policies governing employee use of computer resources; this paper focuses on student use of computing technologies. In some cases, the library and the additional campus student-computer infrastructure (for example, campus labs and dormitory computer access) are governed by the same organizational entity, so the higher-level policy and the library policy are de facto the same. In many instances, libraries have enacted additional computer- use policies. Such policies may emphasize or augment certain points found in the institution-level policy(s), address concerns specific to the library environment, or both. This paper surveys the scope of what are most Jason Vaughan commonly referred to as "computer-use policies," specifically, those geared toward the student-client pop- ulation. Common elements found in university-level policies (and often later emphasized in the library pol- icy) are identified. A discussion on additional topics generally more specific to the library environment, and often found in library computer-use policies, follows. The final section takes a look at the computer-use envi- ronment at the University of Nevada, Las Vegas (UNLV), the various policies in force, and identifies where certain elements are spelled out-at the univer- sity level, the library level, or both. I Policy Basics Purpose and Scope Policies can serve several purposes. A policy is defined as: a plan or course of action ... intended to influence and determine decisions, actions, and other matters. A course of action, guiding principle, or procedure con- sidered expedient, prudent, or advantageous.' Any sound university has a comprehensive computer- use policy readily available and visible to all members of the university community-faculty, staff, students, and visitors. Some institutions have drafted a universal policy that seeks to cover all the pertinent bases pertaining to the use of computing technology. In some cases, these broad overarching policies have descriptive content as well as references to other related or subsidiary policies. In this way, they provide content and serve as an index to other policies. In other cases, no illusions are made about having a single, general, overarching policy-the university has multiple policies instead. Policies can define what is per- mitted (use of computers for academic research) or not per- mitted (use of computers for nonacademic purposes, such as commercial or political interests). A policy is meant to guide behavior and the use of resources as they are meant to be used. In addition, policies can delve into procedure. For example, most policies contain a section on how to report suspected abuse and how suspected abuse is inves- tigated, and outlines potential penalties. Policies buried in legalese may serve some purpose, but they may not do a good job of educating users on what is acceptable and not acceptable. Perhaps the best approach is an appropriate Jason Vaughan (jvaughan@ccmail.nevada.edu) is Head of the Library Systems Department at the University of Nevada, Las Vegas. POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. balance between legalese and language most users will understand. In addition, policies can also serve to help educate individuals on important topics, rather than merely stating what is allowed and what will get one in trouble. For example, a general policy statement might read, "You must keep your password confidential." Taken a step further, the policy could include recommendations pertaining to passwords, such as the minimum password length, inclusion on nonalphabetic characters, the recom- mendation to change the password regularly, and the man- date to never write down the password. Characteristics of a Policy-Visibility, Prominence, Easily Identifiable A policy is most useful when it is highly visible and clearly identified as a policy that has been approved by some authoritative individual or body. Students often sign a form or agree online to terms and conditions when their university accounts are established. Web pages may have a disclaimer stating something to the effect of "use of (insti- tution's) resources is governed by .... " and provide a hyperlink to the various policies in place. Or, a simple poli- cies link may appear in the footer of every Web page at the institutional site. Some universities have gone a bit further. At the University of Virginia, for example, students must complete an online quiz after reviewing the computer-use guidelines.' In addition, they can choose to view the optional video. Such components serve to enhance aware- ness of the various policies in place. A review of the library literature failed to uncover any articles focusing on computer-use policies in academic libraries. The author then selected several similar-sized (but not necessarily peer) institutions to UNLV-doctoral- granting universities with a student population between twenty thousand and thirty thousand-and thoroughly examined their library Web sites to see what, if any, policy components were explicitly highlighted. It quickly became evident that many libraries do not have a centrally visible, specifically titled, inclusive computer-use policy document. Most, but not all, of the library Web sites pro- vided a link to the institutional-level computer-use policy. In some cases, library policies were not consolidated under a central page titled "Policies and Procedures," or "Guidelines," and, where they did appear, the context did not imply or state authoritatively that this was an official policy. There was no statement of who drafted the policy (which can lend some level of authority or credence), as well as no indicated creation or revision date. Granted, many libraries have paper forms one must sign to obtain a library card, or they may state the rules in hardcopy posted within prominent computer-dense locations. Still, with so much emphasis given to licensed database and Internet resources, and with such heavy use of the com- puting environment, such policies should appear online in a prominent location. Where better to provide a com- puter-use policy than online? Perhaps all the libraries reviewed did have policies posted somewhere online. If the author could not easily find them, chances are a stu- dent would have difficulties as well. In sum, the location of the policy information and how it is labeled can make a tremendous difference. Revisions Policies should be reviewed on a regular basis. Often, the initial policy likely goes through university counsel, the president's administrative circles, and, perhaps, a board of regents or the equivalent. Revisions may go through such avenues, or may be more streamlined. A frequent review of policies is mandated by evolving information technology. For example, cell phones with built-in cameras or Internet- browsing capabilities, nonexistent a few years ago, are now becoming mainstream. With such an inconspicuous device, activities such as taking pictures of an exam or finding simple answers online are now possible. Similarly, regularly installed critical updates are a central concept within Windows' latest version of operating-system soft- ware. Such functionality failed to attract much attention until the increase in security exploits and associated media coverage. Some policies, recently updated, now make mention of the need to keep operating systems patched. I Why Have a Library Policy? While some libraries link to higher-level institutional policies and perhaps have a few rules stated on various scattered library Web pages, other libraries have quite comprehensive policies that serve as an adjunct to (and certainly comply with) higher-institutional policies. There are several reasons to have a library policy. First, it adds visibility to whatever higher-level policy may be in place. A central feature of a library policy is that it often provides links (and thus, additional visibility) to other higher-level policies. A computer-use policy can never appear in too many places. (Some libraries have the link in the footer of every Web page.) A computer-use policy can be thought of as a speed limit sign. Presumably, everyone knows that unless otherwise posted, the speed limit inside the city is thirty-five miles per hour, and out- side it is fifty-five miles per hour. Nevertheless, numerous speed-limit signs are in place to remind drivers of this. Higher-level institutional policies often take a broad stroke, in that they pertain to and address computing technology in general, without addressing specific sys- tems in detail. A second reason to have a local-library pol- icy is to reflect rules governing local-library resources that are housed and managed by the library. Such systems 154 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. often include virtual reference, electronic reserves, lap- top-checkout privileges, and the mass of electronic data- bases and full-text resources purchased and managed by libraries. Such library-based systems do not necessarily make the radar of higher-level policies, yet have impor- tant considerations, such as copyright issues in the elec- tronic age or privacy as it relates to e-mail and chat reference. In addition, libraries often have two large user groups that other campus entities do not have-univer- sity affiliates (faculty, staff, students) and nonuniversity affiliates (community users). While broader university policies generally apply to all users of computing tech- nology, local-library policies can work to address all users of the library PCs, and make distinctions as to when, where, and what each group can use. I Common Computer-Use Policy Elements The following section outlines broad topics that are usu- ally addressed within high-level, institutional policies. Often, some or many of these same elements are later reemphasized or adapted by libraries, focusing on the library environment. In many cases, the policy is pre- sented in a manner somewhat like breaking the seal on a new piece of software packaging. Essentially, if someone is using the university equipment or network, that person agrees to abide by all policies governing such use. An overarching policy frequently may end with a bulleted summary of the important points in the document. An important first part of the policy is a clear indication of who the policy applies to. This may be as broad as "any- one who sits down in front of university equipment or connects to the network," or as specific as spelling out individual user groups (undergraduates, graduates, alumni, K-12 students). Appendix A summarizes ele- ments found in the various end-user computer policies in force at UNLV and the UNLV university Libraries. Network and Workstation Security Network security is a universal topic addressed in com- puter-use policies. Under this general aegis one often finds prohibitions against various forms of hacking, as well as recommendations for steps individual users should take to help better secure the overall network. There are also such policies as the prohibition of food and drink near computer workstations or on the furniture housing computer workstations. Typical components related to network and workstation security include: 1. Disruption of other computer systems or networks; deliberately altering or reconfiguring system files; use of FTP servers, peer-to-peer file sharing, or operation of other bandwidth-intensive services 2. Creation of a virus; propagation of a virus 3. Attempts at unauthorized access; theft of account IDs or passwords 4. Password information-individual users need to maintain a strong, confidential password 5. Intentionally viewing, copying, modifying, or deleting other users' files 6. A requirement to secure restrictions to files stored on university servers 7. Recommendation or requirement to back up files 8. Statement of ownership regarding equipment and software-the university, not the student, owns the equipment, network, and software 9. Intentional physical damage: tampering, marking, or reconfiguring equipment or infrastructure- such as unplugging network cables 10. Food and drink policies Personal Hardware and Software Many universities allow students to attach their own lap- tops to the campus wired or wireless network(s). In addi- tion to network connections, a growing number of consumer devices such as floppy disks, zip disks, and rewritable CD /DVD-media have the potential to connect to university computers for the purpose of data transfer. Today, the list has grown to include portable flash drives, digital cameras and camcorders, and MP3 players, among others. The attaching of personal equipment to university hardware may or may not be allowed. Similarly, users may often try to install software on university-owned equipment. Typical examples may include a game brought from home or any of the myriad pieces of soft- ware easily downloaded from the Internet. Some of the policy elements dealing with the use of personal hard- ware and software include: 1. Connecting personal laptops to the university wired or wireless network(s) 2. Use of current and up-to-date patched operating systems and antivirus programs running on per- sonal equipment attached to the network 3. Connecting, inserting, or interfacing such personal hardware as floppy disks, CDs, flash drives, and digital cameras with university-owned hardware; liability regarding physical damage or data loss 4. Limit access to and mandate immediate reporting of stolen personal equipment (to deactivate regis- tered MAC addresses, for example) 5. Downloading or installing personal or otherwise additional software onto university equipment 6. Use of personal technology (cell phones, PDAs) in classroom or test-taking environments POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. E-mail E-mail privileges figure prominently in computer-use policies. Some topics deal with security and network per- formance (sending a virus), while many deal with inap- propriate use (making threats or sending obscene e-mails). Other topics deal with both (such as sending spam, which is unsolicited, annoying, and consumes a lot of bandwidth). Among the activities covered are prohibi- tions or statements regarding: l. Hiding identity, forging an e-mail address 2. Initiating spam 3. Subscribing others to mailing lists 4. Disseminating obscene material or Weblinks to such material 5. General guidelines on e-mail privileges, such as the size of an e-mail account, how long an account can be used after graduation, and e-mail retention 6. Basic education regarding e-mail etiquette Printing With the explosion of full-text resources, libraries and other student-computing facilities have experienced a tremendous growth in the volume of pages printed on library printers. At UNLV Libraries, for example, the print- ing volume for July 2002 to June 2003 was just shy of two million pages; the following year that had jumped to almost 2.4 million pages. Various policies helping to gov- ern printing may exist, such as honor-system guidelines ("don't print more than ten pages per day"). Some institu- tions or libraries have implemented cost-recovery systems, where students pay fixed amounts per black-and-white and color pages printed through networked printers. Standard policies regarding printer-use cover: 1. Mass printing of flyers or newsletters 2. Tampering with or trying to load anything into paper trays (such as trying to load transparencies in a laser printer) 3. Per-sheet print costs (color and black-and-white; by paper size) 4. Refund policies 5. Additional commonsense guidelines, such as "use print preview in browser" Personal Web Sites Many universities allow students to create personal Web sites, hosted and served from university-owned equipment. Customary policy items focusing on this privilege include: 1. General account guidelines-space limitations, backups, secure FTP requirements 2. Use of school logo on personal Web pages 3. Statement of content responsibility or institutional disclaimer information 4. Requirement to provide personal contact information 5. Posting or hosting of obscene, questionable, or inappropriate content Intellectual Property, Copyright, or Trademark Abuse of copyright, clearly a violation of federal law, is something that libraries and universities were concerned about long before computers hit the mainstream. Widespread computing has introduced new avenues to potentially break copyright laws, such as peer-to-peer file sharing and DVD-movie duplication, to mention only two. A computer-use policy covering copyright will gen- erally include: l. General discussion of copyright and trademark law; links to comprehensive information on these topics 2. Concept of educational "fair use" 3. Copying or modification of licensed software, use of software as intended, use of unlicensed software 4. Specific rules pertaining to electronic theses and dissertations 5. Specific mention of the illegality of downloading copyrighted music and video files Appropriate- and Priority-Use Guidelines Appropriate use is often covered in association with top- ics such as network security or intellectual property. However, appropriate- and priority-use rules can be an entire policy and would include: l. Mention of federal, state, and local laws 2. Use of resources for theft or plagiarism 3. Abuse, harassment, or making threats to others (via e-mail, instant messaging, or Web page) 4. Viewing material that may offend or harass others 5. Legitimate versus prohibited use; use for nonacad- emic purposes such as commercial, advertising, political purposes, or games 6. Academic freedom, Internet filtering Privacy, Data Security, and Monitoring Privacy and data security are tremendous issues within the computing environment. Networking protocols and com- ponents of many software programs and operating sys- tems by default keep track of many activities (browser history files and cache, Dynamic Host Configuration Protocol logs, and network account login logs, to mention a few). Additional specialized tools can track specific ses- sions and provide additional information. Just as credit- 156 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. card companies, banks, and hospitals provide a privacy policy to their clients, so do many academic computer-use policies. Such statements often address what logs are kept, how they are maintained, how they may be used, and who has access. In addition to the legitimate use of maintaining information, there is the general concept of questionable or outright malicious collection of information, through cook- ies, spybots, or browser hijacks. The following are concepts often addressed under the general heading of privacy: l. Cookies, spybots, other malicious software 2. What information is collected for evaluative sys- tem management and/ or statistical purposes; use of cookies for this; how such information is used and reported 3. Statement on routine monitoring or inspection of accounts or use; reasons information may be accessed (routine system maintenance, official uni- versity business, investigation of abuse, irregular usage patterns) 4. Security of information stored on or transmitted by various campus resources 5. Statement on general lack of security of public, multiuser workstations (browser cache, search his- tory, recent documents) 6. Disposition of information under certain circum- stances (for example, if a student dies while enrolled, any personal university e-mail and stored files can be turned over to executor of will or parents) Abuse Violations, Investigations, and Penalties As policies generally are a statement of what is or is not permitted, or what is considered abuse, a clearly defined mechanism for reporting suspected abuse and policy vio- lations can often be found. Obviously, some abuse issues violate not only university policy, but also local, state, or federal law. Investigations of suspected abuse are by their nature tied into the privacy and monitoring category. Policy items detailing suspected abuse usually include: 1. How one can report suspected abuse 2. How requests for content, logging, or other account information are handled; how and by what entities abuse investigations are handled 3. Potential penalties 4. How to appeal potential penalties; rights and responsibilities one may have in such a situation Other Computer or Network-based Services Affecting the Broad Student Population Universities operate any number of other computer or network-based services for the broad academic commu- nity. Such services may include provisioning of ISP accounts, courseware, online registration, and digital institutional repositories. Depending on the broad nature of these services, policy information particular to such systems can be specified at the broad policy level, espe- cially if they have unique avenues of potential exploita- tion or abuse not covered in the general topics included elsewhere in the policy. I Additional Library-Specific Computer-Use Policy Elements Many libraries elect to have their own, additional computer-use policies that serve as an adjunct to the larger university-level policy that generally governs the use of all computing resources on campus. Libraries that have a formalized library computer-use policy often start with a statement of other policies governing the use of the library equipment and network-references to the uni- versity policies in place. The library policy may choose to include or paraphrase parts of the university policy deemed especially important or otherwise applicable to the specific library environment. Important concepts gov- erning university policies apply equally to library poli- cies-purpose and comprehensiveness, visibility, and frequent review. Libraries that have formalized com- puter-use policies often link them under library common Web-site sections such as "information about the libraries," or "about the libraries." Library policies can help address items unique, special, or otherwise worthy of elaboration, such as specific systems in place or situa- tions that may arise. They can also help provide guide- lines and strategies to aid staff in policy enforcement. As an example of a library computer-use policy, appendix B provides the main UNLV Libraries computer-use policy. Public versus Student Use-Allowances and Priority Use Many of the other entities on a university campus do not daily deal with the community at large (the non-univer- sity affiliates) as do academic libraries. This applies to most if not all public institutions, as well as many private institutions. The degree to which academic libraries embrace community users varies widely; often, a state- ment on which user groups are the primary clients is stated in a policy. Such policy statements may discuss who may use what computers, what software compo- nents they have access to, and when access is allowed. In some cases, levels of access for students and the commu- nity are basically the same. Community users may be allowed to use all software installed on the PC. More often, separate PCs with smaller software sets have been configured for community users or for specific access to POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. government documents. In some cases, libraries allow some or all PCs to be used by anyone, student or nonstu- dent, but have technically configured the PC or network to prevent the community at large from using the full software set (such as common productivity suites). However, community users may be limited from using the productivity software (such as Microsoft Word) found on these PCs. The may be restricted from using PCs on upper floors, or those reserved for special pur- poses, such as high-end graphics-development worksta- tions. In addition, during crunch time-midterms and final exams-community users are often restricted to the few PCs set up and configured to allow access only to the library Web page (not the Web at large) and the online catalog. In addition, only students and staff can plug in their personal laptops to the library and campus net- work. Regardless of whether it is crunch time, nonstu- dent users can be asked to leave if all PCs are in use and students are waiting. An in-house-authored program identifies accounts and whether particular users are stu- dents or nonstudents. In 2005, the UNLV Libraries will begin limiting full web access to community users; they will only be permitted access to a limited set of Web- based resources, such as government document websites and library licensed databases. More and more government information is available online. For libraries serving as government document repositories, all users have the right to freely access infor- mation distributed by the government. In 2005, the UNLV Libraries will begin limiting full Web access to commu- nity users; they will only be permitted access to a limited set of Web-based resources, such as government docu- ment Web sites and library licensed databases. On another note, many libraries have special adaptive work- stations with additional software and hardware to facili- tate access to library resources by disabled citizens. Disabled individuals, enrolled at the university or not, are allowed to use these adaptive workstations. Laptop Checkout Privileges Many libraries today check out laptops for student use. At UNLV Libraries, faculty, staff, and students may check out LCD projectors and library-owned laptops and plug them into the network at any of the hundreds of available locations within the main library. More details on these privileges can be found in the article "Bringing Them In and Checking Them Out: Laptop Use in the Modern Academic Library." 3 As the university does not otherwise check out laptops to users or allow students to plug in their own laptops to the wired university net- work, the Libraries had to come up with these additional specific policies. Licensed Electronic Resources-Terms and Conditions Academic libraries are generally the gatekeepers to many citation and full-text databases and electronic journals. Each of the myriad subscription vendors has terms of use, violations of which can carry harsh penalties. For exam- ple, the UNLV Libraries had an incident where a vendor temporarily cut off access to its resource due to potential abuse detected from a single student. In this case, the user was downloading multiple PDF full-text files in an auto- mated manner. This illustrates the need to have some statement in a library policy outlining the existence of such additional terms of use. Vendors generally place a link at the top page of each of their resources related to this. For greater visibility, libraries should at least point out the existence of such terms of use for better exposure and potential compliance. In addition, some electronic resources have licensing agreements that simply do not permit community-user access. In these cases, library pol- icy can simply state that some licensed resources may be accessed only by university affiliates. Electronic Reserves Many libraries have set up electronic reserves systems to help distribute electronic full-text documents and stream- ing media content, among other things. Additional poli- cies may govern the use of such systems, such as making the system available only to currently enrolled students, and providing some boundaries in terms of what is acceptable for mounting on such a system. In addition, there is the whole area of copyright. E-reserve systems often have built-in methods to help better enforce copy- right compliance in the electronic arena. Additional pol- icy statements can help educate faculty members on particulars related to copyright and e-reserves. Offsite Access to Licensed Electronic Resources Many libraries provide offsite access to their licensed resources to legitimate users via proxy servers or other methods. The policy regarding such access may address things such as who is permitted to access resources from offsite (such as students, staff, and faculty), and the requirement that the user be in good standing (such as no outstanding library-book fines). In some instances, uni- versities have implemented broad authentication systems that, once logged on from an offsite location, allow the user into a range of university resources, including, potentially, library-licensed electronic resources. If such is the case, information pertaining to offsite access may be found in a higher-level policy. 158 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Electronic Reference Transactions Many libraries have installed (or plan to install) virtual- reference systems, or, at a minimum, have a simple e-mail reference service ("Ask a Librarian"). In addition, many collect library feedback or survey information through simple forms. In all cases, a record exists of the transac- tion. With virtual-reference systems, the record can include chat logs, e-mail reference inquiries, and URLs of Web pages accessed during the transaction. A policy governing the use of electronic-reference systems may address such things as which clientele may use the sys- tem; a statement on the confidentiality of the transac- tion; or a statement on whether the library maintains the electronic-transaction details. Items such as hours of operation and response time to an e-mail question could be considered more procedural or informational than a policy issue. Statements on Information Literacy While perhaps not a policy per se, many libraries have a computer-use policy statement to the effect that while the library may provide links to certain information, this does not serve as an endorsement or guarantee that the infor- mation is accurate, up-to-date, or has been verified. (Such a statement posted on the library Web site may provide additional exposure to the maxim that all that glitters is not gold.) Statements that libraries do not regulate, organize, or otherwise verify the general mass of information on the Internet may be included. Obviously, many libraries have separate instruction sessions, awareness programs, and overall mission goals geared toward information literacy. I Principles on Intellectual Freedom and Internet Filtering Statements by the American Library Association (ALA) on intellectual freedom and Internet filtering may well appear in an institutional policy and often are included in library policies. Filtering is something more likely to affect public and school libraries as opposed to academic libraries. Still, underage children can and do use aca- demic libraries. In such an environment, they may be intentionally or unintentionally exposed to questionable or obscene material. Thus, a library computer-use policy can express the general concept behind the following: 1. intellectual freedom (freedom of speech; free, equal, unrestricted access); 2. the fact that academic libraries provide a variety of information expressing a variety of viewpoints; 3. the fact that this information is not filtered; and 4. the responsibility of parents to be aware of what their children may be viewing on library PCs. Some libraries have provided policy links to various sets of information from the Office of Intellectual Freedom at ALA's Web site, such as: 1. ALA Code of Ethics 2. ALA Bill of Rights 3. Intellectual Freedom Principles for Academic Libraries: An Interpretation of the Library Bill of Rights 4. Access to Electronic Information, Services, and Networks: An Interpretation of the Library Bill of Rights Some libraries also provide references to ALA infor- mation pertaining to the USA Patriot Act and how law- enforcement inquiries are handled. I Summary Computing is a vitally important tool in the academic environment. University and library computing resources receive constant and growing use for research, communication, and synthesizing information. Just as computer use has grown, so have the dangers in the net- worked computing environment. Universities often have an overarching policy or policies governing the general use of computing technology that help to safeguard the university equipment, software, and network against inappropriate use. Libraries often benefit from having an adjunct policy that works to emphasize the existence and important points of higher-level policies, while also pro- viding a local context for systems and policies pertinent to the library in particular. Having computer-use policies at the university and library level help provide a compre- hensive, encompassing guide for the effective and appro- priate use of this vital resource. References 1. The A111erica11 I Jeri/age College Dictionary, 3rd edition. (Boston: Houghton, 1997), 1058. 2. Board of Visitors of the University of Virginia, "Responsi- ble Computing at U.Va.: A Handbook for Students." Accessed June 2, 2004, www.itc.virginia.edu/pubs/ docs/RespComp / rchandbook03.html. 3. Jason Vaughan and Brett Burnes, "Bringing Them In and Checking Them Out: Laptop Use in The Modern Academic Library," Information Technology and Libraries 21 (2002): 52-62. POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV UNLV UCCSN UNLV Policy Libraries scs Computing UNLV Student for Posting Guidelines UNLV Libraries NevadaNet Resou rces Computer-Use Information for Library Additional Policy* Policy** Policy*** on the Webt Computer Usett Policiesttt General Direct Evident Link or References to Higher-Level Institutional/System Computer Use Policy X X X Author / Authority Information Included X X X Approved/Revised Date Included X X X X Network and Workstation Security Disruption of other computer systems/networks; deliberate ly altering or reconfiguring system files; FTP Servers/Peer-to-Peer File Sharing/Operation of other bandwidth intensiv e services X X X Creat ion of a virus; propagation of a virus X X X X Attempts at unauthorized access/Theft of account IDs or passwords X X X X X Password Information- individual user's need to maintain a strong, confidential password Intentionally view, copy, mod ify, or de lete other users' files X X X X Requirement to secure restrictions on stored files Recommendat ion/requirement to back up fi les Statement of ownership regarding equipment/software X Intent ion al phys ical damage: tampering with or marking, reconfiguring equipment or infrastructure X X X Food and dr ink policies X 160 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) scs Nevada Net Policy* Pe rsonal Hardware and Software Connect ing persona l laptops, etc. to university wired or wireless network(s) Use of current and up-to-date patched operating systems and antiv irus programs running on personal equipment attached to network - Connect ing/ insert ing/ interfacing personal hardware with existing univers ity equipment; liability regarding physica l damage or data loss Limiting access to personal equipment/report immed iately if stolen Download ing or installat ion of personal or otherwise add itiona l software onto university equipment Use of pe rsonal technology in c lassroom/test -tak ing environments Printing Mass printing of f lyers or news lette rs Tampering with or trying to load anything into paper trays Per-sheet print costs Refund policies Additiona l common- sense gu idel ines E-mail Hiding ident ity/forging an e-mai l address Initiating spam X UCCSN Computing Resources Policy** X X X X UNLV Policy UNLV Student for Posting Computer-Use Information Policy*** on the Webt X X UNLV Libraries Gu idelines for Library Computer Usett X X X X X X X UNLV Libraries Additional Policiesttt X X POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) j UNLV UCCSN UNLV Policy Libraries I scs Computing UNLV Student for Posting Guidelines UNLV Libraries Nevada Net Resources Computer-Use Information for Library Additional Policy* Policy** Policy*** on the Webt Computer Usett Policiesttt E-mail (cont.) Subscribing others to mailing lists Dissemination of obscene material or Web links to such material X X General guidel ines on e-mail privileges , such as the size of an e-mail account, how long an account can be used after graduation, etc. Personal Web Site Specific General account guidelines X Use of schoo l name and logo Statement of content responsibility/institutional discla imer inform ation X Requirement to prov ide personal contact inform at ion X Posting and hosting of obscene, questionable, or inappropriate material X Intellectual Property, Copyright, and Trademark General d iscussion of copyrights and trademark law; link s to comprehensive information on these topics X X X The concept of educational fair use X Copying or modifying licensed software/use of software as intended/use of unlicensed software X X X Specific rules pertaining to electronic theses and dissertations 162 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ' f I I Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) UNLV UCCSN UNLV Policy Libraries scs Computing UNLV Student for Posting Guidelines UNLV Libraries NevadaNet Resources Computer-Use Information for Library Additional Policy* Policy** Policy*** on the Webt Computer Usett Policiesttt Appropriate- and Priority-Use Guidelines Mention of federal, state, and local laws X X X Use of resources for theft/plagiarism X Abuse, harassment, or making threats to others (via e-mai l, instant messaging, Web page, etc.) X X X X Viewing material which may offend others X Legitimate versus prohibited use; use for nonacademic purposes (commerc ial; advertising; political purposes; games; etc.) X X X X X Academic Freedom; Internet filtering X X X X Privacy Cookies, spybots, other malicious software What information is collected for evaluative/system management/statistical purposes; use of cookies for this Statement on routine monitoring or inspect ion of accounts or use; reasons information may be accessed X X Security of information stored on or transmitted by various campus resources X Statement on general lack of security of public, multi-user workstations Disposition of information under certain circumstances POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) UNLV scs NevadaNet Policy* Abuse Violations, Investigations, and Penalties How one can report suspected ab use How requests for content, logg ing, or other account information are hand led; how and by what entities abuse investigations are hand led Potential pena lt ies How to appea l potentia l penalties; rights/ respons ibilit ies you may have in such a sit uation Other Computer/ Network-based Services Affecting the Broad Student Population Library-Specific Pub lic versus student use -a llowances and pr iority use Right to access government information Assistance for person w ith disab ilit ies Laptop, LCD projector, etc. checkout privileges Licensed electron ic resources-terms and conditions Offsite access to licensed electron ic resources-who can access from offsite Electronic reference transactions Statements on information literacy X X X UCCSN Computing UNLV Student Resources Computer,-Use Policy** Policy*** X X X 164 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 UNLV Policy for Posting Information on the Webt X X X Libraries Guidelines UNLV Libraries for Library Additional Computer Usett Policies ttt X X X X X X Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Systemwide, Institutional, and Library Computing Policies at UNLV (cont.) UNLV ALA princip les on academic freedom /Internet filtering scs Nevada Net Policy* Electron ic reserves; copyright as it pertains to electronic reserves Notes UCCSN UNLV Policy Computing UNLV Student for Posting Resources Computer-Use Information Policy** Policy*** on the Webt Libraries Guidelines UNLV Libraries for Library Additional Computer Usett Policiesttt X X * The Systems Computing Services NevadaNet Policy. Among other responsibilities, SCS provides and maintains the general Internet connectivity for Nevada's higher education institutions, including UNLV. The complete document can be accessed at www.scs.nevada.edu/nevadanet/nvpolicies.html. ** The University and Community College System of Nevada Computing Resources Policy. UCCSN is the system of higher education institutions in the state of Nevada, governed by an elected board of regents. The complete document can be accessed at www.scs .nevada .edu/about/policy061899.html *** The complete document can be accessed at www.unlv.edu/infotech/itcc/SCUP.html. 1 The complete document can be accessed at www.unlv.edu/infotech/itcc/WWW _Policy.html. 11 The primary UNLV Libraries policy governing student computer use. Provided in Appendix 2, the complete document can also be accessed at www. library.unlv.edu/services/policies/computeruse.html . ttt Various other policies are in effect at the UNLV Libraries. Some of these can be accessed at www.library.unlv .edu/services/policies/computeruse.html. Appendix B. UNLV University Libraries Guidelines for Library Computer Use In pursuit of its goal to provide effective access to information resources in support of the university's programs of teaching, research, and scholarly and creative production , the university libraries have adopted guidelines governing electronic access and use of licensed software. All those who use the libraries' public computers must do so in a legal and ethical manner that demonstrates respect for the rights of other users and recognizes the importance of civility and responsibility when using resources in a shared academic environment. Authorized Users To gain authenticated access to the libraries ' computer network, all users of the university libraries public computers must be officially registered as a library borrower, a library computer user, or a guest user . A photo ID is required. (Exceptions may be made as needed when access to Federal Depository electronic resources is required.) Priority use is granted to UNLV students, faculty, and staff. As need arises, access restrictions may be imposed on nonuniversity users. In accordance with lic ensing and legal restrictions, nonuniversity users are restricted from using word-processing, spreadsheet, and other productivity and high-end multi-media software. During high-demand times, all users may have time restrictions placed on their computer use. If requested by library staff, all users must be prepared to show photo ID to confirm their user status. Authorized and Unauthorized Use Public computers are to be used for academic research purposes only. Electronic information, services, software, and net- works provided directly or indirectly by the mliversity libraries shall be accessible, in accordance with licensing or contrac- tual obligations and in accordance with existing UNLV and University and Community College System of Nevada (UCCSN) computing services policies (UCCSN Computing Resources Policy www.scs .nev ada.edu/about/policy061899.html; POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNLV Faculty Computer Use Policy www.unlv.edu/infotech/itcc/FCUP.html; Student Computer Use Policy http:/ /ccs. unlv.edu/ scr/ computeruse.asp). Users are not permitted to: 1. Copy any copyrighted software provided by UNLV. It is a criminal offense to copy any software that is protected by copyright, and UNLV will treat it as such 2. Use licensed software in a manner inconsistent with the licensing arrangement. Information on licenses is avail- able through your instructor 3. Copy, rename, alter, examine, or delete the files or programs of another person or UNLV without permission 4. Use a computer with the intent to intimidate, harass, or display hostility toward others (sending offensive mes- sages or prominently displaying material that others might find offensive such as vulgar language, explicit sexual material, or material from hate groups) 5. Create, disseminate, or run a self-replicating program ("virus"), whether destructive in nature or not 6. Use a computer for business purposes 7. Tamper with switch settings, move, reconfigure, or do anything that could damage terminals, computers, printers, or other equipment 8. Collect, read, or destroy output other than your own work without the permission of the owner 9. Use the computer account of another person with or without their permission unless it is designated for group work 10. Use software not provided by UNLV 11. Access or attempt to access a host compnter, either at UNLV or through a network, without the owner's permis- sion, or through use of log-in informatio! belonging to another person Internet and Web Use The university libraries cannot control the information available over the Internet and are not responsible for its content. The Internet contains a wide variety of material, expressing many points of view. Not all sources provide information that is accurate, complete, or current, and some may be offensive or disturbing to some viewers. Users should properly evaluate Internet resources according to their academic and research needs. Links to other Internet sites should not be construed as an endorsement by the libraries of the content or views contained therein. The university libraries respect the First Amendment and support the concept of intellectual freedom. The libraries also endorse ALA's Library Bill of Rights, which supports access to information and opposes censorship, labeling, and restricting access to information. In accordance with this policy, the university libraries do not use filters to restrict access to information on the Internet or Web. As with other library resources, restriction of a minor's access to the Internet or Web is the responsibility of the parent or legal guardian. Printing Users are charged for printing no matter who supplies the paper. Mass production of club flyers, newsletters, posters is strictly prohibited. If multiple copies are desired, users need to go to an appropriate copying facility such as Campus Reprographics. Contact a staff member when using the color laser printer to avoid costly mistakes. The university libraries reserve the right to restrict user printing based on quantity and content (such as materials related to running an outside business). Copyright Alert Many of the resources found on the Internet or Web are copyright protected. Although the Internet is a different medium from printed text, ownership and intellectual property rights still exist. Check the documents for appropriate statements indicat- ing ownership. Most of the electronic software and journal articles available on library servers and computers are also copy- righted. Users shall not violate the legal protection provided by copyrights and licenses held by the university libraries or others. Users shall not make copies of any licensed or copyrighted computer program found on a library computer. Use of Personal Laptops and Other Equipment Students, faculty, and staff of the university are welcome to bring laptops with network cards and use them with our data drops to gain access to our network. The laptop must be registered in our laptop authentication system, and a valid 166 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. library barcode is also required. Users are responsible for notifying the library promptly if their registered laptop is lost or stolen, since they may be held responsible if their laptop is used to access and damage the network. Users taking advantage of this service are required to abide by all UCCSN and UNLV computer policies. The libraries allow the use of the universal serial bus (USB) connections located in the front of the workstations. This includes use with portable USB-based devices such as flash-based memory readers (memory sticks, secure digital) and digital camera connections. The patron assumes all responsibility in attaching personal hardware to library workstations. The libraries are not responsible for any damage done to patron-owned items (hardware, software, or personal data) as a result of connecting such devices to library workstations. As with any use of library workstations, patrons must adhere to all UCCSN, UNLV, and university libraries' computing and network-use policies. Patrons are responsible for the security of their personal hardware, software, and data. Inappropriate Behavior Behavior that adversely affects the work of others and interferes with the ability of library staff to provide good service is considered inappropriate. It is expected that users of the libraries' public computers will be sensitive to the perspective of others and responsive to library staff's reasonable requests for changes in behavior and compliance with library and university policies. The university libraries and their staff reserve the right to remove any user(s) from a computer if they are in violation of any part of this policy and may deny further access to library computers and other library resources for repeat offenders. The libraries will pursue infractions or misconduct through the campus disciplinary channels and law enforcement as appropriate. Revised: March 3, 2004 Updated: Thursday, May 13, 2004 Content Provider: Wendy Starkweather, Director of Public Services POLICIES GOVERNING USE OF COMPUTER TECHNOLOGY IN ACADEMIC LIBRARIES I VAUGHAN 167 9658 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Impact of Web Search Engines on Subject Searching in OPAC Yu, Holly;Young, Margo Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 168 The Impact of Web Search Engines on Subject Searching in OPAC Holly Yu and Margo Young This paper analyzes the results of transaction logs at California State University, Los Angeles (CSULA) and studies the effects of implementing a Web-based OPAC along with interface changes. The authors find that user success in subject searching remains problematic. A major increase in the frequency of searches that would have been more successful in resources other than the library catalog is noted over the time period 2000-2002. The authors attribute this increase to the prevalence of Web search engines and suggest that metasearching, relevance-ranked results, and relevance feedback ("more like this") are now expected in user searching and should be integrated into online catalogs as search options. I n spite of many studies and articles on Online Public Access Catalogs (OPAC) over the last twenty-five years, many of the original ideas about improving user success in searching library catalog have yet to be imple- mented. Ironically, many of these techniques are now found in Web search engines. The popularity of the Web appears to have influenced users' mental models and thus their expectations and behavior when using a Web- based OPAC interface. This study examines current search behavior using transaction-log analysis (TLA) of subject searches when zero-hits are retrieved. It considers some of the features of Web search engines and online bookstores and suggests future enhancements for OPACs. I Literature Review Many studies have been published since the 1980s center- ing on the OPAC. Seymour and Large and Beheshti pro- vide in-depth overviews on OPAC research from the mid-1980s through the mid-1990s.' Much of this research has addressed system design and user behavior including: • user demographic s, • search behavior, • knowledge of system, • knowledge of subject matter, Holly Yu (hyu3@calstatela.edu) is Library Web Administrator and Reference Librarian at the University Library, California State University, Los Angeles. Margo Young (Margo.e.young@jpl. nasa.gov) is Manager of the Library, Archives and Records Sec- tion at the Jet Propulsion Laboratory, California Institute of Technology, Pasadena. • library settings, • search strategies, and • OPAC systems 2 OPAC research has employed a number of data-col- lection methodologies: experiment, interviews, question- naires, observation, think aloud, and transaction logs. ' Transaction logs have been used extensively to study the use of OPACs, and library literature reflects this. While the exact details of TLA vary greatly, Peters et al. define it simply as "the study of electronically recorded interac- tions between online information retrieval systems and the persons who search for the information found in those systems."' This section reviews the TLA literature relevant to the study. I Number of Hits TLA cannot portray user intention or actual satisfaction since relevance, success, or failure are subjectively deter- mined and require the user to decide. Peters recommends combining TLA with another technique such as observa- tion, questionnaire or survey, interview, or focus group. 5 In spite of the limit ations of TLA, many studies (including this one) rely on it alone. Typically, these studies define failure as zero hits in response to a search. Generalizing from several studies, approximately 30 percent of all searches result in zero hits.6 The failure rate is even higher for subject searches: Peters reported that about 40 percent of subject searches failed by retrieving zero hits. 7 Some researchers also define an upper number of results for a successful sea rch. Buckland found that the average retrieval set was 98.8 Blecic reported that Cochrane and Markey found that OPAC users retrieve too much (15 percent of the time). 9 Wiberly, Daugherty, and Danowski (as reported in Peters) found that the median number of postings considered to be too many was fifteen, although when fifteen to thirty postings were retrieved, more users displayed them all than abandoned the search. 10 I Subject Searching Some studies have specifically looked at subject search- ing. Hildreth differentiated among various types of searches and defined one hundred items as the upper limit for keyword searches and ninety as the upper limit for subject searches." Larson defined reasonable subject retrieval as between one and twenty items and found that only 12 percent of subject searches retrieved the appro- priate number. 12 168 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Larson is not the only researcher to have reported poor results in subject searching. For more than twenty years, research has demonstrated that subject or topical searches are both popular and problematic. Tolle and Han found that subject searching is most frequently used and the least successful. 13 Moore reported that 30 percent of searches were for subject, and Matthews et al. found that 59 percent of all searches were for subject information. 14 Hunter found that 52 percent of all searches were subject searches and that 63 percent of these had zero hits. 15 Van Pulis and Ludy referred to Alzofon and Van Pulis's earlier work in 1984 where they reported that 42 percent of all searches were subject searches.16 Hildreth found that 62.1 percent of subject searches and 35.4 percent of keyword searches failed. 17 Larson categorized the major problems with online catalogs as follows: • users' lack of knowledge of Library of Congress sub- ject headings (LCSH), • users' problems with mechanical and conceptual aspects of query formulation, • searches that retrieve nothing, • searches that retrieve too much, and • searches that retrieve records that do not match what the user had in mind. 18 During an eleven-year longitudinal study, Larson found that subject searching was being replaced by key- word searching. 19 No consistent pattern in the number of search terms has emerged in the literature. Van Pulis and Ludy reported that user searches were typically single words. 20 Markey contended that users' search terms frequently matched standardized vocabulary in large catalogs. 21 None of Markey's researchers consulted LCSH, and only 11 percent of Van Pulis and Ludy's did so, notably in spite of their library's user-education programs. Peters reported that Lester found that the average search was less than two words and fewer than thirteen characters." Hildreth found that more than two-thirds of keyword searches included two or more words and 42 percent of these multiple-word searches resulted in zero hits. 23 The proportion of zero-hit keyword searches rose with the increasing number of words in the search. Subject headings have been a matter of considerable study. Gerhan examined catalog records and surmised their accessibility in an online catalog. He contended that when a keyword from the title only is accessed, only 50 percent of all relevant books would be found and that title keywords would lead a user to subject-relevant records in 55 percent of cases while LCSH would lead a user success- fully in 85 percent of the cases. 24 In contrast, Cherry found that 42 percent of zero-hit subject searches would have been more fruitful as keyword or title searches than by fol- lowing cross references retrieved from the subject field.25 She recommended converting zero-hit subject queries to other types of subject searches (keyword). Thorne and Whitlatch recommended that subject searchers should select keyword rather than subject headings as their first access strategy. 26 Types of Problems in Subject Searches Numerous studies have categorized reasons for search failure (typically in zero-hit situations), but Peters reports that a standard categorization has not yet been estab- lished .27 Tn cases where more than one error is made in a search (and Hunter reported this to be frequent), there is no consistency in how that is assigned. Nonetheless, some major categories of problems stand out: • misspelling and typographical errors-Peters found that these errors accounted for 20.8 percent of all unsuccessful keyword searches, while Henty (reported by Peters) concluded that 33 percent of such searches could be attributed to this.28 Hunter found that 9.3 per- cent of subject searches had typographical and spelling errors. 29 • keyword search-Hunter found 52.6 percent of zero- hit searches used uncontrolled vocabulary terms. 30 • wrong source or field-Hunter concluded that 4.5 percent of searches should have been done in a source other than the catalog, while 1.3 percent of searches were of the wrong type (an author search in the subject-search option). 31 • items not in the database-Peters found that searches for items not held in the database accounted for 39.1 percent of unsuccessful searches, while Hunter found that problem in only 2.5 percent of the problem cases. 32 In addition to these problems, Hunter also found that index display and rules relating to the systems accounted for 27 percent of errors. 33 I Resulting Recommendations for Change While Hildreth stated, "There has been little research on most components of the OPAC interface" in 1997, he pro- posed two options to improve user success: increased user training or improved design based on information- seeking behavior. 34 Wallace pointed out that there is a very short window of opportunity when searchers are amenable to instruction and that successful screen designs should therefore focus on presenting the quick- searching options employed by the majority of users first. 35 Large and Beheshti observed "that too many options simply caused confusion, at least for less experi- enced OPAC users," and they summarized that OPAC- THE IMPACT OF WEB SEARCH ENGINES ON SUBJECT SEARCHING IN OPAC I YU AND YOUNG 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. interface research focuses on menu sequence, browsing, and querying .3'; Menu Sequence In terms of menu sequence, Hancock-Beaulieu indicated that "the menu sequence in which search options are offered will influence user selection." 37 Ballard found that the amount of keyword searching was affected by its posi - tion on the menu. 38 Scott reported that both keyword- and subject-search success improved when the keyword was plac ed at the top of the menus .39 Thorne and Whitlach used a combination of methods in their study and concluded that several interface changes should be implemented : • strongly encourage novi ce users to start with key- word (list keyword above subject heading), • relabel "keyword" to "subject or title words," and • relabel II subject heading" to "Library of Congress Subject Heading."' 0 Blecic et al. studied tran sactio n logs over six months to track th e impact of "simplifying and clarifying" OPAC introductory screens. After moving the keyword option to th e top, keyword searching incr ease d from 13.30 per- cent to 15.83 percent of all sea rch statements. Blecic et al. found her original tally of 35.05 p ercent of correct searches having zero hits decre ased to 31.35 percent after screen changes. 41 Querying OPAC-interface design has been based on an assumption that us ers come to the catalog knowing what the y need to know . In either text-bas ed OPAC or Web-based OPAC, query-based searches are still mainstream. Searchers are required to have knowledge of title, author, or subject. Ortiz-Repiso and Moscoso observed that Web-based cata- logs, like all library catalogs, basi cally fulfill two functions: locating works based on known details and identifying which documents in the databas e cover a given subject. 42 Natural-language input has long been considered a desi r- able way to overcome this shortcoming. Browsing Relevance-ranked output and hypertext were considered by Hildr eth to be promising in 1997.43 OPACs have not been conceived within a true h ypertext environment, but rather they maintain the structure of their original for- mats, principally machine-readable cataloging (MARC), and therefore impede the generation of a structure of nodes and links. 44 In addition to continuing to employ MARC format as its underlying structure, the concept of main entry and added entr y, field label, and displa y logic all reflect cataloging rules . Amazon.com and Barnes and Noble have completel y mo ve d away from this century- old structure to pro vi d e easy access to book information . In the Web environment , th e concept of main ent ry loses its meaning to multiple-acces s points and linking capabil- ities of author, subject, and call number. Another prominent drawback of Web-based OP A Cs is that they have not taken advantage of thesaurus structure and utilized the thesaurus for sea rching feedback. The hierarc hical relationship in LCSH is underutilized in terms of the relationship betw een terms and associations through related terms. Web-based OPACs have failed to make use of this important access. The persistence of the se drawbacks in OPAC-interfac e design is rooted d eeply in cataloging rules that were derived from the manual environment more than a cen- tury ago. It reflects th e gap between "concepts typically h eld by nonprofessional users and those used in library practices." 45 In her article "Why Are Online Catalogs Still Hard to Use?" Borgman conclude s: Despite numerous imp rove m en ts to the user interfac e of online catalogs in recent years, searc her s still find th em hard to use . Most of th e improvements are in sur- face features rather than in th e core functionality. We see little evidence th at our research on searching behav- ior studies has influ enced onlin e catalog design. " Catalog Content Users misunderstand th e scope of the catalog. In ques- tionnaire responses, 80 percent of Van Pulis and Ludy 's participants indicated the y had considered looking else- where than the library catalog, as in periodical ind exes. 47 Blazek and Bilal report ed a reque st for inclusion of journal- article titles in one respo nse to their questionnaire .48 Libraries responded to th ese requests by acquiring data- bas es on CD-ROM , loadin g them locally (sometimes using the catalog system to mount a separate databas e), and, most recently, providing access to databases over the Internet. However, seldom h ave libraries responded to these requests by integratin g searc h access through a sin- gle front end as the default search. I Impact of Web Search Engines Blecic et al. found that keyword searching increased from 13.3 percent to 28.3 percent over her four-year series of logs. At the same time, zero hits in keyword increased from 8.71 percent to 20.78 percent while subject zero hits dropped from 23 percent to 13.69 percent. She surmised th at the influence of Web interfaces might have affected the regression-fluctuation in search syntax, initial articles, and author order. 49 170 INFORMATION TECHNOLOGY ANO LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. automatically sco uts the Web for pa ges that are related to its res ults so it can find a large number of resources ver y qu ickly without requiring th e user to select the right keyw ord s . Teoma structures the appropriate com- munities of int eres t on-the-fly and ranks th e results on a range of facto rs including authorities and hubs (good resources pointing to related resources). Google offers an opti on of "similar pages." Whil e the subj ect-r edirect function in a Web-bas ed OPAC emulates thi s, it succ ee ds only if th e user 's initi a l search term y ielded the right result. OPAC u sers ha ve the option of clicking on hyper- linked h ea din gs (author, titl e, subject headin gs ) but can- not ask the sys tem to perform a more so phisticated sea rch on their behalf. User-Popularity Tracking Amazon and Barnes and Nobl e Web sites pr ese nt enhan- ced information about items b y user-popul arity tracking. Circulation stati stics or user comments could serv e as a form of "r ecommend er sys tem " to h elp novi ces narrow th eir selections. Messa ge s such as "o ther student s who checked this bo ok out also read thes e book s" could be dynamically in serted in bibliographic records. Users could also be allowed to pro vide comment s on mat eri als in the catalog, thus providin g an int era ctive experience for OPAC user s. Summary of Web Features There are positive and negati ve imp acts of Web sea rch engines and on line bookstores on Web-based OPAC u sers . U sers who find Web p ages to b e comfortable, easy, and familiar may mak e greater use of Web-ba sed OPACs. While th ey brin g with them their knowledge of search eng ine s, they also brin g their misp erce ption s. The possi- bility of using similar too ls to those found on Web sea rch en gine s can greatly "re infor ce the u sefuln ess of the cata- log as well as th e positiv e perc eption that th e end us er has of it." 61 Given the diver sity of the error s that u sers experi- ence , a co mbination of approaches is necessa r y to improve their search success. Automatic mapping of free- tex t-to -th esauru s term s, tran sla tion of common spelling mistak es, and links to related pages are to ols alr eady in use in th e Web sea rch engines . "See similar pages," exten- sive us e of releva nce feedback, and popularity track ing along with natural language are less common. I Recommendations for Web-based OPACs Th e authors' TLA rev ea led a continuing problem with subject-h ea din g searches and sho we d a trend toward searching top ics that are n ot typically answered in a bo ok catalog. The form er probl em ha s a well-documented hi s- tory, whil e the authors b elieve th e latt er probl em stems from the influence of th e Web and Web sear ch engin es . Severa l changes to typical OPACs are recommended to addr ess th e trend s observ ed in th e cour se of thi s study. Metasearching Th e recent trend of incorporating databases and OPAC s into a single sear ch reflects the neces sity of exp anding information resourc es and simplif ying access to resources. Thi s stud y's empirical results clearly indicate a need to exp and thi s integration into one sea rch. While some argu e that this metasearching w ill further au gment the syntax digr ess ion an d pr eve nt us ers from becom ing information literate, oth er s beli eve that metas ea rchin g, along with th e option of sear chin g each individu al d a tabase , is an ulti- m a te goal for onl ine search. Like it or not, the m etasearch technolog y, also known as federat ed or broadca st search, "crea tes a portal that could allow the libr ary to become the on e-stop shop th eir us ers and potential use rs find so attractive ."65 One- sea rch-for-all cannot solve all problems; how ev er, guidin g u sers to where the y are mo st likel y to find results quickly (the quick search) should sa tisfy th e ne ed s of th e majority of u sers . Menu Sequence Eff ec tive scree n d es ign h as a p osi ti ve e ffec t on user su c- cess. The m enu sequence for search opti ons plays a signif- icant role in user selection . This research and oth ers it h ave demonstr ated th at users choose an option hi gher rath er than lower in a list. Too many options "simply cause con- fu sion, at least for less experienced OPAC use rs." •• Browsing Feature Browsing is a natural and effective approach to m any information -seekin g problems an d requires less effort and knowledge on the part of th e u ser. The liter a ture sug- gests that a great deal of the use of th e Web relies on known Web sites, recommended sites , or return visits to sit es recently visited-thus relying on browsin g rather than on searching. Jenkins, Co rritore , and Widenb eck found that domain novice s seldom clicked very deep- out and b ack-while Web experts explor ed mor e deeply. 67 Holscher and Strub e not e that Hurtineene and Wandtke claim that only minimal trainin g is necessa ry for brow s- in g an individual Web site, whil e Pollok and Hockl ey claim that considerably more experience is req uired for qu ery ing and na viga tin g among sites. 68 Hancock -Beaulieu found that betwe en 30 p ercent and 45 percent of all online searches, reg ardl ess of th e typ e of search, ar e concluded with brow sing the librar y shelve s.69 176 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to implement user help throu gh tip s or tac tics select ed and accumulated from a collection of common u ser- searc h mistakes. In such a case, the system would play a more activ e role by generatin g relevant search tips on th e fly and using zero-hits search resul ts as a basis for gener - ating a spe ll check or sugg esting altern ate wording. An idea l scenario is th at OPAC allow s the user to pursue mu ltiple avenues of an inq uiry by entering fra g- ments of th e question, exploring vocabulary choices, and reformulating the search wi th the assis tan ce of var iou s spec ialized intelligent assistants. Borgman suggests that an OPAC should be jud ge d by whether the ca tal og answers questions rather th an merely mat ches queri es. She s ugges ts the need to design systems that are ba sed on behavioral models of h ow people ask questions, argu- ing th a t users still need to tra n sla te their question into what a sys tem will accept. " User Instruction On- site tr aini ng and online documentation can help mak e it eas ier to u se OPAC. With the adven t of information lit- eracy, the shi ft in librar y instruction from procedur e- based query formulation to question-being-answered has taken place. At CSULA, in struct ion for en try-level classes focu ses on formulating a research sta teme nt and then identifying keywords and alternate terms. The instruc - tion sess ions that follow the initia l-conce pt formulation are sh ort an d focus on how to en ter keyword or subject, a u t h 01~ a n d title, and th e u se of Boolea n operators. Thi s approac h may improve success until th e sys tems provid e th e tools to improve sea rch stra tegies or accept an unt rai ned user 's input. As an increas ing numb er of users access online librar y ca talogs remotel y, assistance needs to be embedded into intuitive sys tems. "Time invested in elaborate help sys- tems often is better spent in redesigning the user interfac e so that help is no longer n eeded." 74 Users are not willing to devote much of their time to learning to use these sys- tems. They just want to get th eir searc h results quickly and expec t the catalog to be easy to use w ith little or no tim e invested in learning th e sys tem. I Conclusion The em piri cal study repo rted in thi s paper indicates th at p rogress has been made in terms of increasing search suc- cess by improv ing the OPAC search int erfac e. The goal is to design Web-based OPAC systems for today's users who are like ly to bring a mental model of Web search engin es to the lib rary catalog. Web-b ased OPACs and Web search engi nes differ in terms of th eir sys tems and interfac e design. However, in most cases, these differences do not res ult in different sea rch charac teris tics by users. Researc h findings on the impact of Web searc h engines and u ser searc hing expectations and behavior should b e ade- quately utilized to guide the in terface design. Web users typically do n ot know how a search engine works. Therefore, fund amental fea tures in the desi gn of the n ext generation of th e OPAC in terface should includ e ch ang in g the search to allow natural-language searching wit h keyword search first, and focu s on meetin g th e quick-search need . Such a concep t-ba sed sea rch will allow u sers to enter natu ra l lan guage of their chos en top ic in the searc h bo x w hil e th e system maps the quer y to th e s tru cture and content of the database. Relevance feedb ack to allow the system to brin g back related page s, spe llin g correctio n, and relevan ce-ranked output remain key goals for future OPACs. References and Notes 1. Sharon Seymour, "On line Public-Access Catalog User Stud ies: A Revi ew of Research Methodologies, March 1986- November 1989," Library and Information Science Research 13 (1991): 89-102; Andrew Large and Ja mshid Beheshti , "OPACs: A Resear ch Review," Library and Information Science Research 19 (1997): 2, 111-33. 2. Ibid., 113-16. 3. Ibid., 116-20. 4. Thomas A. Peters et al.," An Introduct ion to the Special Sec- tion on Transaction-Log Analysis," Library Hi Tech 11(1993): 2, 37. 5. Thomas A. Peters, "The History and Developm ent of Transaction- Log Analysis," Library Hi Tech 11 (1993): 2, 56. 6. Pauline A. Cochrane an d Karen Markey, "Cata log Use Studies since th e Introdu ction of Onlin e Interactiv e Ca tal ogs: Impact on Design for Subj ec t Access, " in Redesign of Catalogs and Indexes for Improved Subject Access: Selected Papers of Pauline A. Cochrnne (Phoenix: Oryx , 1985), 159-84; Steve n A. Zink , "Moni- toring User Success th ro u gh Transac tion-Log Analysis: The WolfPAC Example," Reference Services Review 19 (Sprin g 1991): 449- 56; Michael K. Buckl and et al., "OAS IS: A Front End for Prototy ping Catalog Enhancements," Library Hi Tech 10 (1992): 7-22. 7. Thomas A. Peters, "When Smart People Fail: An Analysis of the Tra nsaction Log of an On line Public-Access Catalog," Journal of Academic Librarianship 15 (1989): 5, 267. 8. Michael K. Buckland et al., "OASIS," 7-22. 9. Deborah D. Blecic et al., "Using Transac tion-Lo g Ana lys is to Imp rove OPAC Retrieval Result s," College and Research Libraries (Jan. 1998): 48. 10. Peters, "Histo ry and Development of Transacti on-Log Analys is," 2, 52. 11. Cha rles R. Hildr eth , "The Use and Understanding of Key- word Searching in a Un iversity Online Catalog," Information Technology and Libraries 16 (1997): 6. 12. Ray R. Larson, "Th e Decline of Subject Searching: Long - Term Trends and Patt erns of Index Use in an Online Catalog," Journal of the American Society for Information Science and Technol- ogy 42 (1991): 3, 210. 178 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 13. John E. Tolle and Sehchang Hah, "O nline Search Patterns: NLM CATLINE Database," Journal of the American Society for Information Science 36 (Mar. 1985): 82- 93. 14. Carol Weiss Moore, "User Reac tion to Online Catalog s: An Exploratory Study," College and Research Libraries 42 (1981): 295-302; Joseph R. Matth ews et a l., Using Online Catalogs: A Nationwide Survey-A Report of a Study Sponsored by the Council on Library Resources (New York: N ea l-Schuman, 1983), 144. 15. Rhonda N. Hunter, "Success and Failures of Patrons Searching the Online Catalog at a Large Academic Library: A Transaction-Log Analysis," R.Q 30 (Spring 1991): 399. 16. Noelle Van Puli s and Lorne E. Ludy, "Subject Searching in an Onl ine Cata log with Aut h ority Contro l," College and Research Libraries 49 (1988): 526. 17. Hildret h, "Th e Use and Understanding of Keyword Searching," 6. 18. Ray R. Larson, "The Decline of Subjec t Searching," 3, 60. 19. Ibid. 20. Van Pulis and Ludy, "Subj ect Searching in an Onlin e Cat - alog," 527. 21. Karen Markey, Research Report on the Process of Subject Searching in the Library Catalog: Final Report of the Subject Access Research Project (repo rt no. OCLC /OP R/ RR-83-1) (Dub lin , Ohio: OCLC Online Co mput er Library Center, 1983), 529. 22. Pe ters, "The History and Deve lopment o f Transaction- Log Ana lysis," 2, 43. 23. Hi ldr eth, "The Use and Understanding of Keyword Searching," 8-9. 24. David R. Gerhan, "LCSH in vivo: Subje ct Searching Per- formance an d Strategy in th e OPAC Era," Journal of Academic Librarianship 15 (1989): 86-8 7. 25. Joan M. Cherry, "Improving Subject Access in OP ACS: An Exploratory Study of Conversion of Users' Queries," Journal of Academic Librarianship 18 (1992): 2, 98. 26. Rosemary Thorne and Jo Bell Whitlatch, "Patron On line Catalog Success," College and Research Libraries 55 (1994): 496. 27. Peters, "The History and Developmen t of Transaction- Log Analys is," 2, 48. 28. Ibid. 29. H unt er, "Succe ss and Failures," 400. 30. Ibid., 399. 31. Ibid., 400. 32. Peters, "The Histor y and Developmen t of Transa ction- Log Analysis," 2, 56. 33. Hunter, "Success and Failures," 400. 34. Hildreth , "The Use and Understandi n g of Keyword Searchi n g," 6. 35. Patricia M . Wa llace, "How Do Patrons Search th e Online C:, talog W h en No One 's Looking? Trnn sae tion-Log A nal ysis and Impli cation s for Bibliographic Instruction and System Desi gn, " RQ 33 (winter 1993): 3, 249. 36. Large and Beheshti, "OPACs: A Research Review," 125. 37. M. M. Hancock-Beaulieu , "Online Cata logue: A Case for the User," in The Online Catalogue: Developments and Directions, C. Hildreth, ed. (London: Library Association , 1989), 25-46. 38. Terry Ballard, "Com parative Searching Styles of Patrons and Staff," Library Resources and Technical Services 38 (1994): 293- 305. 39. Jane Scott et al.,"@*&#@ This Computer and the Horse It Rode in On: Patron Frustration and Failur e at th e OPAC" (in "Co ntinuity and Transformation : The Promise of Confluen ce": U SABi Li rs·"' I [: ,, ), B p l..JR l.i ""( ' " User Interface Consulting FED ERAT ED SEARCH tN(,lN ES 1.IBR;'.'\RY PORTALS & [)AT/\, (LN 'ITR S ()PACS f.." ( H i LDREi' l's Dl(, ITAL LIBR AR IES Ezra Schwartz LOCS (773) 256-1418 ezra@artandtech.com http://www.artandtech.com Proceedings of the ACRL 7th Nationa l Conference, Chicago: ACRL 1995), 247-56. 40. Thorne and Whitlat ch, "Patron On lin e Catalog Success," 496. 41. Blecic et al., "Usin g Tran sac tion-Log Ana lys is," 46. 42. Virginia Ortiz-Repiso and Purificac ion Moscoso, "We b- Based OP A Cs: Between Tradition and Innovation ," lnformntion Technology and Libraries 18, no. 2 (June 1999): 68-69. 43. Hildreth, "The Use and Understanding of Keyword Search- ing," 6. 44. Ortiz-Repiso and Mos coso, "Web-Bas ed OPAC s," 71. 45. Ibid., 75. 46. Chris tine Borgm an, "Why Are On line Catalogs Still Hard to Us e?" Journal of the Americnn Society for Information Science 47 (1996): 7, 501. 47. Van Pulis and Ludy, "Subje ct Searching in an Onlin e Cat - alog," 53. 48. Rla zek and Bilal , "Prob lems with OPAC: A Case Study of an Academic Research Library," RQ 28 (w int er 1988): 175. 49. Debora h D. Blecic et al., "A Longitud inal Stu dy of the Effects of OPAC Screen Changes on Searching Behavior and User Success," College and Research Library 60, no. 6 (Nov. 1999): 524,527. 50. Bernar d J. Jan sen and Udo Pooch, "A Revi ew of Web Searching Studies and a Framework for Future Resear ch," jour- nal of the American Society for Information Science and Technology 52 (2001): 3, 249-50. 51. Ibid., 250. 52. Blazek and Bilal, "Problems with OPAC: A Case Study," 175; Moore , "User Reaction to Online Cata logs," 295-302. THE IMPACT OF WEB SEARCH ENGINES ON SUBJECT SEARCHING IN OPAC I YU AND YOUNG 179 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 53. M. J. Bates, "The Design of Browsin g and Berry-Pickin g Techniques for the Onlin e Search Interfac e," Online Review 13 (1989): 5, 407-24. 54. Jan sen and Pooch, "A Review of Web Searc hing Studies, " 238. 55. Judy Luther, "Trumping Google? Metasearching's Promise," Library Journal 128 (2003): 16, 36. 56. Jack Muramatsu and Wanda Pratt, "Transparent Queries: Investigating Users' Mental Models of Search Engines," Research and Development in Information Retrieval Sept. 2001. Accessed Mar. 10, 2003, http://citeseer.nj.nec.com/muramatsuOltransparent. html. 57. Jans en and Pooch, "A Review of Web Searching Studie s," 235. 58. Luth er, 'T rumping Goog le," 36. 59. Blecic et a l., "A Lon gitudina l Study of th e Effects of OPAC Screen Changes," 527. 60. Sus an M. Colaric, "Ins truction for Web Searching: An Empirical Study," College and Research Libraries News, 64 (2003): 2. 61. A. G. Sutcliff, M. Ennis, and S. J. Watkinson, "Empirical Studies of End-User Informati on Searching," Joumal of the Ameri- can Society for Information Science and Tcchnologtj 51 (2000): 13, 1213. 62. "A ll About Google," Google. Accessed Dec. 10, 2003, www.google.com. 63. G. Salton, Introduction to Modern Information Retrieval (New York: McGraw-Hill, 1983), 18. 64. Orti z-Rep iso and Moscoso, "We b-Ba sed OPACs," 71. 65. Luth er, "Trumping Google," 37. 66. Maaike D. Kiestr a et al, "End-Us ers Searching th e Online Catalogue: The Influenc e of Domain and System Knowledge on Search Patterns. Experiment at Tilburg University," The Elec- tronic Library 12 (Dec. 1994): 335-43. 67. C. Jen kins et al., "Pa tterns of In forma tion Seeking on the Web: A Qualitative Study of Domain Expertise and Web Experti se," IT and Society l (Winter 2003): 3, 74,77. Accessed May 10, 2003, www.ItandSociety.org/. 68. C. Holscher and G. Strube, "Web Search Behavior of Inter- net Experts and Newbi es," 9th International World Wide Web Con- ference, (Amsterdam. 2000). Accessed Mar. 28, 2003, www9.org/ w9cdrom /8 1/81.html; A. Pollock and A. Hockley, "Wha t's Wrong with Internet Searching," D-lib Magazine (Mar. 1997). Accessed May 10, 2003, www.dlib.org/dlib/march97 /b t /03 pollo ck.h tml. 69. M . M . Hanco ck-Beau lieu , "On lin e Catalogue: A Case for the User," 25-46. 70. Wilbert 0. Galitz, The Essential Guide to User Interface Design: An Introduction to GUI Design Principles and Techniques (Chichester, England: Wiley, 1996). 71. Juliana Chan," An Evaluation of Displays of Bibliographic Records in OPACs in Canadian Academic and Public Libraries," MIS Report, Univ. of Toronto, 1995. [025.3132 C454E] 72. Giorgio Brajnik et al., "Strategic H elp in User Interfaces for Information Retriev al," Journal of the American Society for Information Science and Technology (JASIST) 53 (2002): 5, 344 . 73. Borgman, "Why Are Online Catalogs Still Hard to Use?" 500. 74. Ibid . 180 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 9662 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Using a Native XML Database for Encoded Archival Description Search and Retrieval Cornish, Alan Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 181 Communications Using a Native XML Database for Encoded Archival Description Search and Retrieval Alan Cornish The Northwest Digital Archives (NWDA) is a National Endowment for the Humanities-funded effort by fifteen insti- tutions in the Pacific Northwest to create a finding-aids repository. Approximately 2,300 finding aids that follow the Encoded Archival Description (EAD) standard are being contributed to a union catalog by aca- demic and archival institutions in Idaho, Montana, Oregon, and Washington. This paper provides some information on the EAD standard and on search and retrieval issues for EAD XML documents. It describes native XML technology and the issues that were considered in the selection of a native XML database, Ixiasoft's TextML, to support the NWDA project. Pitti, one of the founders of the EAD standard, noted the primary motiva- tion behind the creation of EAD: "To provide a tool to help mitigate the fact that the geographic distribution of collections severely limits the abil- ity of researchers, educators, and oth- ers to locate and use primary sources."' Pitti expanded on this need for EAD in a 1999 D-Lib article: The logical components of archival description and their relations to one another need to be accurately identified in a machine-readable form to sup- port sophisticated indexing, navigation, and display that provide thorough and accurate access to, and description and control of, archival materials.' In a more recent publication, Pitti and Duff noted a key advantage offered by EAD that relates to the focus of this article, the development of an EAD union catalog: EAD makes it possible to pro- vide union access to detailed archival descriptions and resour- ces in repositories distributed throughout the world. . . . Libraries and archives will be able to easily share information about complementary records and collections, and to "virtu- ally" integrate collections related by provenance, but dispersed geographically or administra- tively.' In a 2001 American Archivist article, Roth examined EAD history and deployment methods used up to the 2001 time period. Importantly, two of the most prominent delivery systems described by Roth-DynaText (a server-side solution) and Panorama (a client-side solution)-were, by 2003, obsolete products for EAD delivery. This is indicative of the rapid pace of change in EAD deployment, in part due to the migration from SGML to XML technologies. Roth described survey results obtained on EAD deployment that underscore the rec- ognized need at that time for a "cost- effective server-side XML delivery system." The lack of such a solution motivated institutions to choose HTML as a delivery method for EAD finding aids.4 Articles like Roth's that describe specific EAD search-and-retrieval implementation options are in short supply. One such option, the Univer- sity of Michigan DLXS XPAT soft- ware, is employed for the search and retrieval of EAD and other metadata in the University of Illinois at Urbana- Champaign (UIUC) Cultural Heritage Repository. 5 Another option, harvest- ing EAD records into machine-read- able cataloging (MARC) to establish search and retrieval access in an inte- grated library system, was described by Fleck and Seadle in a 2002 Coalition for Networked Information Task Force briefing. Using an XML Harvester product created by Innova- tive Interfaces, MARC records are generated based upon MARC encod- ing analogs included in the EAD markup and loaded into an Innova- tive Interfaces INNOPAC system. 6 This product has been used to create access to EAD finding aids in the cat- alog for Michigan State University's Vincent Voice Library. In a 2001 article, Gilliland- Swetland recommended several desirable features for an EAD search- and-retrieval system. She emphasized the challenge of EAD search and retrieval by noting the nature of find- ing aids themselves: Archivists have historically been materials-centric rather than user-centric in their descriptive practices, resulting in the find- ing aid assuming a form quite unlike the concise bibliographic description with name and subject access most users are accustomed to using in other information systems such as library catalogs, abstracts, and indexes.' Without describing specific soft- ware tools, Gilliland-Swetland argued for a user-centric approach to the search and retrieval of finding aids by examining the needs of specific user communities such as genealogists, K-12 teachers, and historians. 8 Several initiatives similar to the NWDA effort are described in the professional literature. The Online Archive of California (OAC), which was founded in the mid-1990s, is a consortium of California special- collections repositories. A number of key consortium functions are central- ized, including "monitoring to ensure consistency of EAD encoding across all OAC finding aids" according to agreed-upon best practices, a critical need in the creation of a union cata- log.9 Brown and Schottlaender also describe the integration of the OAC into the California Digital Library, which enables linkages between EAD Alan Cornish (cornish@wsu.edu) is Sys- tems Librarian, Washington State Univer- sity Libraries, Pullman. USING A NATIVE XML DATABASE FOR ENCODING ARCHIVAL DESCRIPTION SEARCH AND RETRIEVAL I CORNISH 181 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. finding aids and digiti zed copies of original materials. 10 Finall y, one import ant develop- ment area is the po ssibilit y of inte - grating EAD docum ent s into Open Archives Initiative (OAI) services in order to enh ance resourc e discovery. A 2002 paper written by Prom and Habin g, both of whom work with th e UIUC Cultural Herit age Repository, explored th e possibility of mapping EAD to OAI, the latt er of w hich is based up on th e fifteen- eleme nt Dub- lin Cor e Metadata Set (unqualified) . While no ting, "w e do n ot propose that th e full capabiliti es of EAD find- ing aids could be subsum ed by OAI," Prom and Habing sug gested that it is possible to map the top-l eve l and co mpon e nt portions of EA D int o OAI, res ul tin g in multipl e OAI records from a singl e EAD finding aid. In thi s scenario, a sin gle OAI record is created from th e collection- level information and multipl e records from component-level infor- mation in an EAD docum en t.11 Evaluation of EAD Search and Retrieval Products In order to iden tify a software solution for supporting a union catalog of EAD findin g aids, the con so rtium con- ducted a product evaluation. The strengths and weakn esses of the native XML technology em ployed by the consortiu m can be best understood by lookin g at alternative XML prod- uct s and product categor ies . Table 1 shows the products con sidere d during an evaluati on period th at consisted of both product research and actual tri- als. In approaching the eva luation, the consortium and its union -catalog host institut ion , the Washin gton Stat e University Libraries, had seve ral spe- cific need s in mind. First, the licensing an d support costs for the product needed to fit w ithin the consortium's budget. Second , th e sea rch-and- retrieval softw are had to sup port sev- eral basic fun ctions: Keywo rd search- ing across all union-cat alog finding aids; specific field searching based upon elements or attribut es in the EAD docum ent ; an abilit y to cus- tomize the look and feel of the inter- face and search-results screens; and the ability to display search term(s) in the conte xt of the finding aid . As not ed in the tabl e, three of the ev aluated products are n ativ e XML databases. Cyrenne provid es a defi- nition of native XML as a database with the se features: • The XML document is stored intact: "t he XML d ocum ent is preserv ed as a separat e, unique entity in its entirety ." • "Schema independenc e," that is, "a ny we ll-formed XML docu- ment can be stored and queried." • The qu ery language is XML- based: "na ti ve XML d ata base vendors typically u se a quer y langua ge d es igned sp eci fically for XML" as opposed to SQL.12 Of the thr ee native XML products, only the licensi ng costs of Ixiasoft's TextML and the open-sourc e XIndice so ftware fell within the available proj- ec t fundin g. Both pack ages were extensively tested, with Text ML prov- ing superio r at handlin g th e large (sometimes in the MB-size range) and structurall y complex EAD documents crea ted by consortium memb ers. One key strength of TextML that m et an NWDA consortium-need involved field sea rching. In TextML, it is possibl e to m ap a search field to one or m ore XPath s ta tements , enabling th e crea tion of sea rch fields b ase d upon the precise us e of an ele- ment or attribute in EAD d ocuments. The importanc e of this capability is show n with th e EAD ele- ment, which can appear at the collec- tion lev el and at the sub or dinate component level in a docu men t. With TextML, usin g its limited XPath sup- port, it is p ossib le to refer ence a spe- cific, contextual use of . In addition to the native XML sol utions , seve ral oth er product 182 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 types were considered. An XML qu ery engine, Verity Ultra seek, was te sted and produced good results whe n u sed for the search and retrieval of consortium docum en ts. 13 Ultraseek can be used to search dis- crete XML files , supports th e creation of custom int erfaces for th e searc h- and-r etrie va l sys tem, and ha s strong documentation . Pro bably th e most obvious limit a tion in thi s XML qu ery- engin e product conc erned the crea tion of search fields. To contras t U ltr asee k with a native XML solu- tion : Ultras eek 5 .0 (used du r ing the product trial) lacked XPath support. Inste ad, it requir ed a uniqu e eleme nt- attr ibute combin ation for the crea tion of a databa se sea rch field . Returning to th e exa mple , cont extual u ses of could n ot be indexed with o ut recoding consor- tium docum ent s to create a unique eleme nt-attribut e combination on which to ind ex. An XML-enabled databa se, DLXS XPAT, has b een successfully used in se veral EAD projects, including OAC. One d isadv antage of this product is th at it re quir es a UNIX operating sys tem for th e se rver. A dditionall y, XPAT, as a supporting toolse t for di gital-library collection building, provid es functionalit y that duplicates other media tool s at the ho st institution (specifically, OCLC/ DiM eMa CONTENTdm). The use of a Relational Dat abase Management System (RDBMS) to es tablish sea rch and retri eva l for EAD XML d ocume nts was con sid- ered as well. Th e advantage to thi s approac h is th at it would ena ble the u se of codin g techniques built up through other Web-based media d elivery proj ects at the ho st institu- tion. The mo st obvio us negati ve issue is th e need to map XML elements or attributes to tables and field s in an RDBMS, which , as Cyrenne notes, "is often expensiv e and will m ost likely res ult in the loss of some dat a suc h as processing in stru ctions , and com- ments as well as the noti on of ele- me nt and attribut e orderin g." 14 The Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 1. NWDA project---€valuated search and retrieval products Product Vendor Product category License MySQUPHP N/ A Relationa l database management system Open so urce Tamino XML Server Software AG Native XML database Nat ive XML database XML query engine Native XML database Integrated library system XML enabled database Commercial Commercial Commercial Open source Commercial Commercial Textml lxiasoft Ultraseek Verity Xindice N/A XML Harvester Innovative Interfaces XPAT DLXS use of native XML avoids the task of explodin g XML data in to the tabl e and field struc ture s of an RDBMS. Finally, another approac h consid- ered was the use of an integrated library sys tem product. This was a realistic option for NWDA becaus e consortium member institutions had decid ed to include MARC encoding catalogs for selected elements in union-catalog findin g ai ds. Inn o- vative Int er faces produces an XML Harve ste r that can be u sed to gen er- ate MARC records from EAD findin g aids th a t include MARC encoding analo gs. For this proj ect, a local ( or self-cont aine d) cat alog could hav e been created and p opulated with MARC records containing metadat a for th e EAD docum en ts, includin g a URL for online access. This approach offers important strengths and weak- nesses . On the positiv e side, it is a relati ve ly eas y meth od for enablin g search-and-retrieval access to EAD findin g aids. In contrast to the int er - face coding requirement s for TextML, the XML Harvest er provided an almost tu rn key approach to XML search and retrieval. On the negativ e side, tw o factors stood out during th e evaluation . First, it would be difficult to fully custom ize sea rch-and- retrieval interfaces as needed for th e proj ec t. Second, u sing the XML Harv ester, there is no ability to dis- play searc h terms in the context of the findin g aid. Search and retrieval is bas ed upon the m etadata extract ed from th e finding aid usin g th e MARC analogs. In Michigan State's Voice Librar y impl eme ntation of thi s so lu- tion , th e finding aid is an external resource with no hi ghlighting of search ter ms . Strengths and Weaknesses of the TextML Approach Each p roject has it s ow n specific n eed s; thu s, th ere is no correct approach to establishing searc h and retrieval for EAD XML documents. In taking th e needs and resources of th e NWDA conso rtium into account, Ixiaso ft's TextML, a nati ve XML prod - uct, pr ovi ded the best fit and was licens ed for u se. The use of TextML enables the creation of customized interfac es for an XML d atabase (or Docum en t Base, u sing the TextML terminol ogy) and pro vides support for ke yword and field sea rching of consortium documents. The qualified XPath support in TextML enables search fields to be built up on precis e element or attribute combinations wi thin EAD document s. The existence of a major finding- aids Int erne t site empl oy ing TextML was a factor in the proj ect's selection of the sof tware . Th e Acces s to Archive s (A2A) site, access ible from URL www .a2a.pro.gov.uk / , provid es an excell ent model for a publicly sea rchabl e finding-aid sit e. Th e A2A site supp orts keyw ord searching and sea rchin g b y arc hival facility; pro- vides multiple views of sea rch results (a summary recor ds scree n, sea rch ter ms in cont ext, and th e full rec ord); highlights searc h term(s) in the dis- played findin g aid ; and supp or ts the presentation of lar ge findin g -aid doc - ument s. While A2A u ses Ge neral Internation al Standard Arc hival Description, or ISAD(G), as op posed to EAD for its description standard, the similaritie s between th e two stan- d a rd s m akes th e A2A site a va luable example for d eve lopment. '5 One w eakness of TextML is the implementation model supported by Ixiasoft, whi ch assumes significant local de velopme nt of the app lication or Web int er face. Th e rela tionship b etween sof tw are cap abiliti es and local dev elopme nt was con sidered with each of the produ cts listed in tab le 1. As no ted , th e Innovative Interfaces so lution was th e most straightfor wa rd approach , assu ming the existenc e of the MARC analogs in EAD marku p, but provid ed the least flexibility in terms of customization an d establishing a tru e linkage between the searc h system and the actual document. In contra st, while Ixiasoft m akes available a base set of active server pages using visual basic script (ASP / VBScript) code for TextML app lication de velop ment and provides very goo d trainin g and sup- port ser vices, the resp onsi bility for USING A NATIVE XML DATABASE FOR ENCODING ARCHIVAL DESCRIPTION SEARCH AND RETRIEVAL I CORNISH 183 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that d evelopm ent rests with the loca l site . For the NWDA consortium, this development, using the co de base, ha s been manag ea ble. The curr ent state of interface dev elopment for the NWDA proj ect can b e reviewed at http: // nwd a.ws ulibs .ws u.edu / project_info /. Conclusion In se le cting a n EAD se arch- an d- retr ieva l sy s te m, on e important qu es ti on for th e con so rtium was, Whi ch software so lution had the be st prosp ects for migration in the futur e ? Becau se of th e inherent strength s of native XML tec hnolog y in compari- son to the other product catego r ies listed in table 1, a nativ e XML d a ta- base appeared to be the be s t approach , and Tex tML pro v ided the best combination of lic ensing costs, softw are capabilities, and support. It is import a nt to not e that the di s- tinctions betw ee n nativ e XML d ata- bas es and databases that supp or t XML throu g h extensions (XML- enabl ed datab ases) 1nay b eco me more difficult to di scern ov er time, in part due to the ex isting expertise and inv es tments in RDBMS techn o lo - gies.16 Ne verthel ess, capabilities cen- tral to native XML, such as the us e of an XML-bas ed query language, are integral to th e success of such h ybrid systems . References and Notes 1. Daniel Pitti, "Enc oded Archi va l Description: The Dev elop ment of an Encoding Standard for Archival Findin g Aids," The American Archivist 60, no. 3 (Summer 1997): 269. 2. Daniel Pitti, "Encod ed Archi va l Descrip tion: An Introducti on and Over- view," D-Lib Magazine 5, no. 11 (Nov. 1999). Accessed Nov. 2, 2004, www.dlib. org / dlib / novemb er99 / 11 pi tti.h tml. 3. Daniel V. Pitti and Wendy M. Duff (eds.), "Introdu ction ," in Encoded Archival Description on the Internet (Binghamton, N.Y.: Haworth, 2001), 3. 4. James M. Roth, "Serving Up EAD: An Exploratory Stud y on the Deploy- ment and Utili zation of Encoded Arch- iva l Description Findin g Aids, " The American Archivist 64, n o. 2 (Fall/ Winter 2001): 226. 5. Sarah L. Shreeves et al., "H arvest- ing Cultural Her itage Metada ta Using the OAI Protocol," Library Hi Tech 21, no. 2 (2003): 161. 6. Nanc y Fleck and Michael Seadle, "EAD Harv es ting for the Na tional Gallery of the Spoken Word" (pap er pre- sent ed at th e Coalition for Netw orked Information fall 2002 Task Force meeting, San Antoni o, Tex., Dec. 2002). Accessed Nov. 2, 2004, www.cn i.org/ tfms/20 02b. fall/ handout s/ H-EAD-FleckSeadl e.doc . 7. Anne J. Gilliland -Swetland, "Popu- larizing the Finding Aid : Exploiting EAD to Enhance Online Discovery and Retrie- val," in Encoded Archival Description on the Internet (Binghamton, N.Y.: H aw orth, 2001), 207. 8. Ibid , 210-14. 9. Charlotte B. Brown and Brian E. C. Schottlaender, "The Online Archive of Cal- ifornia: A Consor tia! Approach to Encoded Archival Description ," in Encoded Archival Description on the Internet (Binghamton, N .Y.: Haworth , 2001), 99. 10. Ibid, 103-5. OAC available at: www. oac.cd lib. o rg/. Accessed Nov . 2, 2004 . 11. Christ ophe r J. Prom and Thomas Habing, "Using the Op en Archives Initia- tive Protocols with EAD," in Proceed ings of the Second ACM/ IEEE-CS Joint Confer- ence on Digital Libraries (Portland, Ore., July 2002). Accessed Nov. 2, 2004, http:// dli .grainger. ui uc.ed u / publ ications/ jcdl20 02/ pl4prom .pdf . 12 . Marc Cyre nn e, "Going N ative : Wh en Should You Use a Nativ e XML Database?" AIM E-DOC Magazine 16, no. 6 (Nov./ Dec. 2002), 16. Accessed Nov. 2, 2004, www .edocmag az ine.com / ar ticle_ new.asp?ID=25421. 13. Product categor y decisions based up on definitions and classifications avail- able from : Ronald Bourret, "XML Database Products." Accessed Nov. 2, 2004, www. rp bourret.com / xml / XMLD a t a b a se Prods.htm. 14. Cyrenne, "Going Native, " 18. 15. Bill Stockting, "EAD in A2A," Microsoft PowerPoint present at ion. Accessed N ov. 2, 2004, www.agad .archiwa. gov.pl/ ead / stocking.ppt. 16. Uw e Ho henst ein, "Supporting XML in Oracl e9i," in Akmal B. Chaudhri, 184 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Awais Rashid, and Roberto Zicari (eds.), XML Data Management: Native XML and XML-Enabled Database Systems (Boston: Add ison -Wesley, 2003), 123-4 . Using GIS to Measure In-Library Book-Use Behavior Jingfeng Xia This article is an attempt to develop Geographic Information Syst ems (GIS) technologi; into an analytical tool for exam- ining the relationships between the height of the bookshelves and the behavior of library readers in utiliz ing books within a library. The tool would contain a database to store book-use information and some GIS maps to represent bookshelves. Upon ana- lyzing the data stored in the database, dif- ferent frequ encies of book use across bookshelf layers are displayed on the maps. The tool would provide a wonderful means of visualization through which analysts can quickly realize the spatial distribution of books used in a library. This article reveals that readers tend to pull books out of the bookshelf layers that are easily reachable by human eyes and hands, and thus opens some issues for librarians to reconsider the management of library collections. Several years ago , when working as a library ass istant reshelving books in a univer sit y library, the author noted that the majority of books used inside th e librar y were from the mid-range layers of b oo kshelv es . That is , by pro- portion, few book s pulled out by librar y rea ders were from the top or b ottom layers. Books on the layers that were ea sily re achable by readers were frequently utilized . Such a b oo k-u se distribution patt ern mad e th e job of reshelving books easy, but created some inquiries: how could book locations influ ence th e choices of read ers in selecting books? If this was not a n isolat ed observ a tion, it must have exposed an int ere sting 9663 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Using GIS to Measure In-Library Book-Use Behavior Xia, Jingfeng Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 184 that development rests with the local site. For the NWDA consortium, this development , using the code base, ha s been manageable. The current stat e of interface dev elop ment for the NWDA project can be reviewed at http :// nwda. wsulibs .wsu.edu / project_info /. Conclusion In se lecting an EAD searc h- and- retrieval system, one important qu es tion for th e consortium was, Which software solution had the best prosp ects for migration in the futur e? Because of the inherent strength s of nativ e XML technology in comp ari - son to the other product categories list ed in tabl e 1, a nati ve XML data- bas e appeared to be the best appro ach, and TextML provided the best combination of licensi ng costs, software capabilities, and support. It is important to note that the dis- tinctions betw een nativ e XML data- bas es and databases that support XML throu gh extensions (XML- enabled databa ses) ma y b eco me more difficult to dis cern over time, in part du e to the existi ng exper tise and in vestme nts in RDBMS technolo- gies. 16 Nevertheless, capabilities cen- tral to native XML, such as the us e of an XML-based query language, are integral to th e success of such hybrid syst ems . References and Notes 1. Daniel Pitti , "Encoded Archival De scriptio n: The Development of a n Encoding Standard for Archival Finding Aids ," The American Archivist 60, no. 3 (Summ er 1997): 269. 2. Daniel Pitti, "Encod ed Archival Des cription: An Introducti on and Over- vi ew," 0-Lib Magazine 5, no. 11 (Nov. 1999). Accessed Nov. 2, 2004, www.dlib. org / dlib / november99 / 11 pitti.html. 3. Daniel V. Pitti and Wendy M. Duff (ed s.), "Introduction," in Encoded Archival Description on the Internet (Binghamton, N.Y.: Haworth, 2001), 3. 4. James M . Roth, "Serv ing Up EAD: An Exp lorat o ry Study on the Deploy- ment and Utilization of Encod ed Arch- ival Description Finding Aids," The American Archivist 64, no. 2 (Fall /Win ter 2001): 226. 5. Sarah L. Shreeves et al., "Har ves t- ing Cultural Heritage Metadata Using the OAI Protocol," Library Hi Tech 21, no. 2 (2003): 161. 6. Nan cy Fleck and Michael Sead le, "EAD Harvesting for the National Ga llery of th e Spoken Word" (pap er pre- sent ed at th e Coa liti on for Netw orke d Information fall 2002 Task Force meeting, San Anton io, Tex., Dec. 2002). Accessed Nov. 2, 2004, www.cni .org/tfms/2002b. fall/handouts/H-EAD-FleckSeadle.doc. 7. Anne J. Gilliland -Swe tland , "Po pu- larizi ng th e Finding Aid : Exploiting EAD to Enhance Online Discover y and Retrie- val," in Encoded Archival Description on the Internet (Bing h a mton , N.Y.: Haworth, 2001), 207. 8. Ibid, 210-14. 9. Charlott e B. Brown and Brian E. C. Schottlaender, "The Onlin e Arch ive of Cal- ifornia: A Consortia! Approach to Encode d Archival Descrip tion, " in Encoded Archival Description on the Internet (Bingham ton, N .Y.: Haworth, 2001), 99. 10. Ibid , 103-5. OAC ava ilable at: www . o ac.c dlib.org / . Accessed Nov. 2, 2004. 11. Christopher J. Prom and Thomas Habing, "Using the Open Archiv es Initia- tive Protocols w ith EAD," in Proceed ings of th e Second ACM/IEEE-CS Joint Confe r- ence on Digit al Librari es (Portland, Ore., July 2002). Accessed Nov . 2, 2004, http:// dli .grai ng er.uiu c.edu / publications / jcdl20 02/ p14prom.pdf. 12. Marc Cyrenne, "Go ing N at ive: Wh en Should You Use a Native XML Database?" AIM E-DOC Magazine 16, no . 6 (Nov./Dec. 2002), 16. Accessed Nov. 2, 2004, www. edo cmaga zine.com/ article_ n ew.as p?ID=2 5421. 13. Product categor y decisions ba sed upon definiti ons and classifications avail- able from: Ronald Bourret, "XML Database Products." Accessed Nov. 2, 2004, www. rpbourret .com/ x ml / XMLDa ta base Prods .htm. 14. Cyrenn e, "Going Native," 18. 15. Bill Stockting, "EAD in A2A," Microsoft Power Point pres entation. Accessed N ov. 2, 2004, www.agad.a rchiwa . gov.pl/ ead /s tocking.pp t. 16. Uwe Hohenstein, "Supp orting XML in Oracle9i ," in Akm a l B. Chaudhri , 184 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Awais Rashid, and Rob erto Zic ar i (eds.), XML Data Management: Native XML and XML-Enabled Database Systems (Boston: Addison-Wesley, 2003), 123-4. Using GIS to Measure In-Library Book-Use Behavior Jingfeng Xia This article is an attempt to develop Geographic Information Systems (GIS) technology into an analytical tool for exam- ining the relationships between the height of the bookshelves and the behavior of library readers in utilizing books within a library. The tool would contain a database to store book-use information and some GIS maps to represent bookshelves. Upon ana- lyzing the data stored in the database, dif- feren t frequencies of book use across bookshelf layers are displayed on the maps. The tool would provide a wonderful means of visualization through which analysts can quickly realize the spatial distribution of books used in a library. This article reveals that readers tend to pull books out of the bookshelf layers that are easily reachable by human eyes and hands, and thus opens some issues for librarians to reconsider the management of library collections. Several years ago, when working as a library assistant reshelving books in a university librar y, the author noted that the majority of books used inside the library were from the mid-range laye rs of bookshelves. That is, b y pro- portion , few books pulled out by library readers were from the top or bottom layers. Books on the layers that were easily reachable by readers were frequentl y utilized . Such a book-us e distribution patt ern made the job of reshelving books easy, but created some inquiries: how could book locati ons influ ence th e choices of readers in selecting books? If this was not an isolated observation, it must have exposed an inter es ting Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. phenomenon that librarians needed to pay attention to . Then , by finding out the reasons , librarians might becom e capable of guiding, to some extent , us ers' selectiv eness on library books by deliberately arranging col- lections at design ated heights on book sh elves. A research study was designed to develop Geographical Information Systems (GIS) into an analytical tool to examine former casual observa- tions by the author. The study was conducted in the MacKimmie Library at the University of Calgary. Thi s paper highlights th e results of the study that aimed at assessing th e behavior of library readers in pulling out books from bookshelves . Thes e book s, when not checked out, are cat- egoriz ed as "pickup books" becau se they are usually discarded inside a library after use and then picked up by library assistants for reshelving. Like many other libraries , the MacKimmie Library does not encourage reasd ers to reshelve books th emse lves. ArcView, a GIS software, was selected to develop th e tool for this study because GIS ha s the functions of dynamicall y analyzing and di s- playin g spatial data. The research on library readers pullin g out books involv es the measur emen ts of book- shelf heights, and thu s deals with spatial coordinates. With the capabil- ity of presenting book shelves in dif- ferent views on map s, GIS is able to provide readers with an easy und er- standing of the anal ytical results in visual forms, which make any textu al description s wordy . At the same time, some GIS products are avail- able now in most academic libraries, thus giving develop ers convenient access to use. Hypothesis When library users decide to check books out of a library, the se books are what the y think of as useful. Peopl e are usually hesitant to carry home books that are of little or uncertain use, not only because of the limit on the numb er of check-out books , but also bec ause of the physical work required for carrying them. Moreover, some items, such as periodicals and multimedia materials, are either des- ignated as "refe rence only" or have a very short loan period . It is reasonable to beli eve that user s carefully select what they want from library collec- tions and keep these book s for handy use outside the library. By contrast , in-library book use repre sents a different category of library readers' behavior . There are two general categories of in-library book us e: readers bringin g their own books into a library for use, and readers pulling out book s from book- shelves inside a librar y. The former is commonly seen when students study textbook s for examinations (not the topic of this study), whil e the latter is a little more complex. 1 As library users approach book- shelves to extract book s, th ey may or may not hav e a definit e target. When coming with call numb ers, peopl e will deliberately draw the books they want for reading, photoc opyi ng , or referencing. Ho wever, there are time s when user s on ly wander in bookshelf aisles of desired collections, uncer- tain about singling out specific books . Th ey may simply shelf-shop to randomly select whatever is inter- esting to them, or they may locate a subject of need and go to the storage position(s) to look for whatever books are there. No matter what these readers' intention s are, they roam among collections, pick book s for quick u se, and leave them inside the library after use, although some materials may also be checked out. Because of such arbitrary selec- tions from library collections , physi- cal con venie nce sometimes influence s library users in takin g books from booksh elves-they ma y look around for books on bookshelf layers that are at a reach able height. The standard library bookshelf is hi gher than the average person's height and is struc- tured to have five to eigh t layers. In aca demic libraries, "wood shelving is available in three heights: 82 in. (2050 mm), with a bottom shelf and six ad justabl e shelves; 60 in. (1500 mm), with a b ottom shelf and four adjustable she lves; and 42 in. (1050 mm), with a bottom shelf and two adjustable shel ves ." 2 For regular col- lections in mo st academic libraries, bookshelve s are usually about eighty- two inches high and hav e seven lay- ers. Books on the top lay er are out of reach for many reader s, requiring them to use a ladder to draw a book from it. Many users are hesitant to use ladders. Even worse, a reader will have to bend over or squ at down to view the contents of books on the bot- tom layer of a bookshelf . Hence , the hypothe sis is that books used inside a library are prima- rily distribut ed among the mid-ranged layers of bookshelves. Specifically, if a bookshelf ha s seven lay ers, books placed on layers two through six are most frequently consulted. This is the subject of this research paper . Background A considerable number of studies have investigated the utilization of books that are checked out of a library. An esti mate made in 1967 pointed out that over seven hundred research results pertained to this topic. ' How ever, the situation of books used inside a library has not been given enough attention. One of the reasons for this seeming neglect comes from the belief that the records of library book s in circulation provide similar info rma tion as those of books used within libraries." Thi s misunder- standing wa s lately criticized by other researchers who discov ere d the dif- ferences in use behavior between Jingfeng Xia (jxia@email.arizona.edu) is a student at the School of Information Resources and Library Science at the University of Arizona, Tucson. USING GIS TO MEASURE IN-LIBRARY BOOK-USE BEHAVIOR I XIA 185 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. libr ary readers takin g books h ome and those using books inside librar- ies. 5 Research ers hav e now recog - ni zed that correlations between the two sets of data are n ot as strong as they seemed to be. Such reco gnition, unfortunately, ha s not resulted in mor e consequ ent work to explor e the issu e of in-lib rary book use. This is probabl y due to th e difficulties of co llecting data or the la ck of appropriate research meth- ods .6 Also, the majority of rele va nt surv eys w ere conducted several de cades ago and focu sed primaril y on exp loring a go od method of sam - plin g in-library book us e.7 Am ong the se studies , Fu ssler and Simon pre- ferr ed to carry out researc h by dis- tributing questionnaires am ong library reader s; Drott u sed random- sampling m et hods to statisti ca lly examine th e importance of librar y- book use; and Jain, as well as Salv erso n, emphasized dividing th e survey time s into differ ent investi ga- tion units when conducting res earch. Simil a rly, M orse point ed out the compl ex ity of measurin g lib rary- book u se a t wo rk , advocating an involv ement of computerized opera- tion s in librar y-book man ag ement. The sampling strategies and ana- ly tical methods implemented in pa st studie s are still applicable to curr ent res earc h. Non etheless, because many new technol ogie s ha ve come into view since th en, it is quite likel y tha t som e new ways of obtaining and analy zing th e d ata of in-library book use can now be developed. Th e n ew app roac hes must have the capability of providing not only accurate m eas- urem ent of the data but also the me ans for easy manipulation . Th eir result s must be able to enhance th e und ers tandin g of us er behavio r in expl ori ng th e reso urc es of existing collection inv entorie s . One of th e solutions is an analytic al tool. An analytical tool can control data collection and anal ysis by computeri- zati on . If the system is ab le to accu- mul ate const antly upd ated records ov er time, it will remedy the probl em of poor sampling th at man y resear- chers hav e encount ered, be cause an alysis will then b e done on all the data rather th an w ith certain isolated samples. The development of m odern technologi es makes such data collec- tion and storage po ssible and easier than ever before. On e exampl e of the technologi es is the radio freque ncy identification (RFID) tag system that ha s been adopted b y some public and acade mic librar ies recently.8 Thi s sys- tem stores a tag in each librar y item with the item's biblio gra phic informa- tion, and uses an antenna to keep tr ack of th e tag. By automatically com- municating with dat a stored in the tags, the system can collect dat a on all librar y collections in a timel y manner and export them into pred esigned d atabases for easy man ag ement. Data an a lys is and pres enta tion comprise ano ther p ar t of the an aly ti- cal mechani sm. Researc hers h ave to carefully evaluate existing technolo- gies in order to select prop er prod- u cts or de ve lop parti cular pro gra ms to integrate with RFID (if used) and th e databases. It is fortunate th at GIS techno log y is available with numer- ous functi ons for analyzing and demonstrating data , especiall y spa- tial data. Da ta visuali za tion through GIS produ cts has been very good, which giv es them advantages over other analytic al, stati stical, or repor t- in g produ cts. Combining RFID and GIS into one system would seem to be th e per- fect solution-the former can effec- tive ly carry out dat a collection and th e latter can efficiently perfo rm data analysis and presentation. H ow ever, while GIS products h ave been u sed in libraries in the Unit ed States for more th an a dec ade, mo st academic lib- raries are hesitant to invest in RFID because of its high costs . GIS technol- ogy alone, however , can still provide sufficient functions to be dev eloped into such an analytical tool. Up to n ow, tho se librarie s that have provid ed GIS serv ices only use the software that assists in the uti- lization of geospatial data and map- 186 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 ping te chnologie s for users .9 GIS is not expl oited enou gh to aid the man- age ment of librari es them selves and the res earch of librar y collections. Some commercial GIS software, such as "Lib rary Decision" by Civic-Tech- nologi es, has be en recently marketed to support the analysis of library- user d a ta for public libraries. 10 Ho wev er, it only wor ks w ell on the data of conventional geographical nature, that is, th e distributi on and location of librari es and th eir users with the mapping of city bl ocks and streets . It does not app ly to a librar y an d its books, and especiall y not to the distribution of books us ed insid e th e librar y. Such products are also not ap plicabl e to acad emic librari es that do not always concentrate on the ana lysis of geog rap hical area s of their us ers. Even so, GIS h as all the function s that such a propos ed analytical tool demands. It is suit able for assisting in the research of in-library book us e where library floor layout s or other facilities can be d raw n into maps on multiple-dimensional views. At the same tim e, bookshelves wi th individ- ual lay ers can be treated as an innova- tive form of map by GIS technology (see figur e 1), makin g visible the rela- tionship of book u se to the height of the book sh elf. As soon as th e presen- tation mechanism is linked to data- bases, any updat es on book use will be mirror ed visuall y. Method This proj ect is one of a serie s of proj- ects for deve lopin g GIS into a tool to manag e and anal yze the u sage char- acteristi cs of library books . The other projects include u sing GIS to measure book u sability for the de velopm ent of collection inventorie s; to assist in the managem ent of libr ary physic al space an d facilities; and to locat e library items . 11 In order to make GIS workable for the subject of this paper , the focus was placed only on the exploration of corr ela tions b e tween b ooks helf Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 1. The front view of one bookshelf rack on the fifth floor of the University of Calgary MacKimmie Library. Eight bookshelves assemb le the range. Here, different shades of color represents the numbers of books used on each individual layer. The display is only for demonstration and not to actual scale. height s and book-use frequenci es in an aca demic library environm ent. Th ere are two major step s to con- ductin g this research : collectin g data and d ev eloping a GIS anal y tical tool. Since MacKimmie Librar y did not in ves t in RFID at th e tim e thi s resea rch was undertak en , p ersonal ob servations were mad e to record b ook-use data. 12 The dev elopm ent of th e GIS tool involves creatin g a sm all d a tabase to store data and facilit ate d ata analysis. It also requir es creatin g seve r al bookshelf and sh elf-r an ge m ap s to pre sent anal ytical result s in visualized forms. Arc View-the mo st p opul ar GIS produ ct in th e w orld- was ut ilize d for the de ve lopment. This paper presents only a p or tion of co llection areas at MacKimmie Library. Part of the fifth floor, wh ere som e collections of humaniti es and social sciences are stored, w as selected becau se this floor is amon g the busi est of th e floors used by read ers. It is filled with sixty-eight ran ges of b ook- shelves containin g book s from call numbers B to DU. The terms used in this paper includ e bookshelf, referring to one unit of furnitur e fitted with horizontal sh elves to h old book s; rack, which includ es more than one bookshelf standin g tog eth er in a line ; and range, comp osed of two racks standing b ack-t o-back. Bookshelves on the fifth floor are arr anged to sur- round a group of facility rooms in the central area. Stud y corridors are set between booksh elves and the wall. Each booksh elf ran ge consists of two bookshelf rack s, each of which in turn has eight individual book- shel ves . All of the book shel ves are about eight y-two in ches high and have seven laye rs. Th e laye rs, except for the top on es th a t are open, are equal in height , w idth , and length. Data Collection Personal surv eys wer e taken by the author to not e d own each call number of books that w ere n ot in their origi- n a l p os ition s on the sh elv es, but in stead were found discard ed on the floo r, tables, chairs, sofas , or on top or in front of other stocked book s . Boo ks on th e sh elving carts ar e also account ed for. The surveys we re sep- ar ately con ducted three times a d ay - mo rnin g, afternoon, and ev en in g- in ord er to cat ch as m any book s u sed in a day as p oss ible. To avoid reco rdin g the sa me boo k mor e than on ce, n o duplicat e call numbers w ere acce pt ed for any single da y even thou gh th e sam e book wa s found in diff erent locations on that day. On the oth er hand , the sam e call number coul d be ent ere d int o the records on th e second day alth ough it was recorded th e d ay befo re a nd remained in th e sa m e pla ce w ith out b eing pick ed up by librar y ass is tants . (Thi s dupli ca te reco rdin g was ve ry rare beca use of th e routin e work of book pi ckup by libra ry ass istants.) A period of two w eeks w as d esignated for the sur vey in th e first h alf of December 2002. Th e final exam in ation week was pl ann ed becau se it represents a week of h ea vy book u se, although previous resea rch found th at readers in this w ee k tend ed to u se library collection s less th an their own stud y mat erials." A suppl em ent a ry surve y th a t a lso las ted two w eeks, includin g a final exam ina tion wee k, wa s condu cted in th e lib ra ry in late spring 2004. To simplify the rese arch , some excepti ons w ere established for d a ta collection. Pe riodicals were exclud ed beca use th ey have a very short loa n p er iod (gen erall y one day) . Libra ry u sers m ay pr efer to read articl es in journ als w ithin the library and thu s w ill h av e a clear idea as to wh a t m aterials to read. '' Books belon ging to oth er floo rs of the librar y, o r b oo ks b elon g ing to th e fifth floor but found out sid e th e area were not includ ed in th e an alysis. Furthermore, du e to the n atur e and time limit of thes e ob ser - v ation s, b ooks pulled out of tar geted bo okshelves were not distingui sh ed from b oo ks taken from book sh elves at rand om . Thi s information can onl y become ava ilable throu gh int erv iew s USING GIS TO MEASURE IN-LIBRARY BOOK-USE BEHAVIOR I XIA 187 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with library users, which can be another rese arch project. Each book shelf laye r wa s recorded with and signified by two call num- bers: the start and end numbers of books. For example, the call numbers "BF1999 .K54" to "BH21 .B35 1965," representing books stored on a partic- ular layer, were record ed to identify that layer . Because book shifting can happen from time to time, such recording of start and end call num- bers for individual book shelf layers only reflects the condition s when this research wa s undertaken and may need updates whenever changes occur. Data Manipulation and Visualization Using a bookshelf lay er as the recording unit is essenti al for the analysi s of the relationship between book use and bookshelf height. Each book used can be classifi ed to fit in one unit according to the call num - ber of the book. Therefore , building a databas e with a table for lay ers will be an important part in the develop- ment of such an analytic al tool. The LAYERS tabl e includes a data field as an identifi e r to stand for the sequenc e of e ach layer-1 for the top layer, 2 for th e next layer down , and so on , in addition to storing the start and end call numbers of books for each lay er. If more than one book- shelf in th e library has seven layers, layer identifiers will it erate from booksh elf to bookshelf . Therefore, this tabl e will also need an identifier for each individual book shelf with which lay ers are associated. The dat abase will also contain such information as bookshelf ranges, bookshelf racks, and books , all of which are individual database tables that are joined with each other by relational keys. Among them, the RANGES table is simply character- ized by its id entifier, and is designed to repre se nt two rack s of book - shelves that stand back to back. The BOOKSHELVES table is identified by the call numbers of the start and end books stor ed across individual bookshelves rather than on individ- ual layers. Furthermore, th e BOOKS table is primarily filled with the data of individual book call numbers as well as book pickup time s and book discard locations . GIS h as lim ited ability for orga n- izing da tab ase struc tu re. If n ecessa ry, oth er da tab ase managemen t sys tem s, su ch as Microsof t Access, can b e incor p ora ted . Qu ery codes are built to ge t su mmarize d infor m ation for speci fic p ur poses, and th e agg re- ga ted da ta are exp or ted int o GIS data bases for fur the r sp a tial an alysis or con venie nt vis u al prese nt ati on . Da ta vis u aliza tion can be show n at differe nt leve ls- by layer, books helf, rack, and range . Th e firs t attempt at ma king a vis u al dem on stra tion of this researc h is for th e area of in di - vi du al b ooks helves at layer leve l (see figur e 1). Th e follow in g qu ery w ill return necessary summ arize d infor- ma tion: SELECT sum(b.call_no) AS total_num, l.layer_id, l.shelf_id FROM (BOOKS b INNER JOIN LAYERS l ON b .some_id = l.some_id) WHERE b.call_no > l.start_no and b.call_no < l.end_no GROUP BY l.layer_id, l.shelf_id ORDER BY l.shelf_id, l.layer_id. At the same time, another attempt is made to d emonstrate book num- bers per layer, at bookshelf level, across multipl e bookshelf ranges. This demonstration provides a better visualization in the GIS di splay so that an ov erall view of the height distributions of book usage over cer- tain collection areas can be presented (see figures 2 and 3). To achi eve such visualization, data must be com- pared in order to get information about which layer of a bookshelf 188 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 contains the most frequently used books and which holds thos e that are rarely visit ed . This demonstration indicates that any alternative selec- tion of analytical-display units can be easily performed by making mod- ifications on the query that works on aggregating data . Technically, data visualization can be presented by using an y GIS soft- ware, although ArcView is used here because it has been availabl e in the systems of many academic libraries. Bookshelf ranges in MacKimmie Library 's fifth floor were drawn into map features . In order to show them with a three-dim ensional view, each of the seven layers was given a sequential number as its height value , and all book shelves were treated as having the same height. These height values are tre a ted as the z values in any three-dimensional analysis. Then, by associating the numbers of books from the database with the heights of layers on the map, ArcView is able to sketch the hei ght distributions of in- library book us e in new perspectives, dramatically improving the under- standing of book use. In order to implement the visual- iza tion of all layers across a book- shelf range, lay ers were drawn as map features (see figure 1). Layer heights and widths are in appropri- at e proportion . (Individual book s on each layer are for demonstration only, and thus are not in the exact shape and number.) Figure 1 shows how a bookshelf rack has been pre- sented as a GIS map, which is a totally new idea in the applications of GIS visualization . The databas e and visualization mechanism constitute what is referred to in this paper as the analyt- ical tool. One will find that th e devel- opment is relatively easy and the tool is incredibly simple. However, it is a dynamic device. If expanded into other parts of the library collections, this tool will become an integrated system that is able to assi st in the management of library book use and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ••••m•==== ===----::"'-:-=-=-=-=-=-=-=-=-=-=-===:-::::-::-":"".:-:-~.,Jgl4 file Edit 7:J Scene Iheme .S1.liace 6t~ s ~ I:!~ IJID ~ ~ liiffl~[i] !HJ~ ~~~I§]~ [QI (ill ---- ..................... _ -~ ¥.l ,, Ill Figure 2. A three-dimensional view of bookshelf ranges on the fifth floor at the MacKimmie Library. The height of each bookshelf represents the corresponding height of the layer from which most books were removed. This display is not to actual scale. I -'! st a,t IIJ gJ.1V Once the appropriate pieces of HTML code had been replaced with corresponding include statements, the changeover was complete. From this point forward, changes in such things as database names, coverage periods, and descriptive material will be made to one .txt file. The change will immediately be reflected across all subject pages with no additional work involved for the librarians responsible for those pages. Note that the SUL server has been configured so that it parses all Web pages. This is necessary because most of the library's Web pages have some SSL This configuration means that the Web page extensions remain .html. If the server is not configured in this manner, then all pages con- taining SSI must end in a .shtml extension. This is a subject that requires discussion with automation librarians or the department respon- sible for the library's server. Advantages Obviously, the biggest advantage to this method is the time saved for individual librarians. There is now no need for librarians to do any maintenance work for links to infor- mation housed in the alphabetical list. Static HTML pages referencing Gale's InfoTrac OneFile database, for instance, would have required updates to approximately forty sub- ject pages; now, one librarian can cor- rect one .txt file and simultaneously update all forty subject pages. Time saved can be used in collecting and editing the list of Web sites that are a part of each subject page; this is a task that has been pushed back in the past, in favor of making more urgent database information changes. .! i Coffeecup HTML Editor - www.CoffeeCup.com , f.ile J;:dit ~iew Q.ocument [nsert E,ormat Iools \'Lindow t!elp J . [~ · lrl· ~ ~ I ~ @ • ! .. ,,., ,. IX IQ :II"' t;~ ~ . ! '1&1 t':f ~ ft; • •&;pl .;11= • · 1;9l· ,,. ®·. !Ai· ,.,i · ~ . fil• · ~ · ~ · ee . !.l!J • Edrt j Preview I Help I r Academic Sea rch Elite ( EBSCO) ; < img src= " / image::: /f ulltext. gif " a lt = "some full text " border = "O"> multi-di sc iplinary database includes some sc hol ar ly articles Fig. 1. HTML Code for Academic Search Elite Using a PURL Called eb-ase El Microsoft Excel 4lphabeticalResources.xls l~ Ole ~dit 'iiew Insert FQ.rmat Ioofs Qata Y!,indow t1elp D Q§; !iii ,f6I 'M It [l. ~ 1 nth till, • ~ AcceSsibleA~h ives · accessible.Ix! Fig. 2. Database Names, .txt File Names, and Resultant Include Commands In addition, librarians who are using this simple technique do not need extensive training. The creation of the Excel database of include com- mands allows for quick additions to an existing page, or the creation of new subject pages. Librarians using the include commands can simply copy and paste them; there is no need for them to understand the syntax or to be able to repeat it. This makes using SSI particularly attractive to staff who do not want the added bur- den of further training in HTML. The librarian responsible for creating the .txt files and the Excel database of statements demonstrated the copy- ing and pasting of the include state- ments to all the other librarians who edit HTML pages in a one-time ten- minute training session. The only additional training issue has involved page structure. Since the library uses a table structure for the subject pages, all table tags are included in the database .txt files. Making sure that librarians under- stand that they do not need to recre- ate the table tags has been the only additional training issue for the department. As librarians begin to use these commands, links to resources across subject pages will look the same and will provide the user with the same information. This increased unifor- mity results in a more professional appearance for the Web site as a whole. Disadvantages This revolution in the maintenance of subject pages has not been without its disadvantages. The primary com- plaint by librarians using SSI include commands is that they cannot pre- view their changes in their HTML editors. SUL's department uses the CoffeeCup HTML Editor, which allows previews, but the previews are not visible for items that are retrieved using SSis. This is because the page is not fully assembled until the server assembles it. When the librarian views the page in the editor, 196 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. prior to uploading it to th e server, the include commands are without tar - gets. The target .txt files are on the server. When a user requ ests a p age, include commands pull in the missing pieces (the .txt files, or other files); th en, th e completed pag e is seam- lessly presented to the us er via his or her brow ser. As Mach notes, "Pre- view ing a Web page without crucial element s . . . can be di sconcerting, esp ecially to visuall y oriented d esign- ers."20 In SUL's experienc e with thi s particular issue, librarian s who are uncomfortable loading pages with locally invisible elements can load th em into temporary fold ers on the server, check them for errors there, and then move them to th eir appro - priate dir ectories . Conclusion Situational factors have allowed SUL to imple ment this change with sur- prising ease and speed. Because the library has its own server, and because th ere is an automation librar- ian on staff, communicati on and chan ge have been easy and efficient. Librar y staff deduce that it is becau se the include command of SSI is b eing u sed more than other possible com- mands that the librar y is not experi- encing an increase in loadin g tim e on its pages. Of course, the size of SUL's reso urce list makes this kind of solu- Art & Tec h EBSCO tion feasible ; certainly, if the librar y were working with hundreds of resources, it would be more likely that a datab ase -driv en strategy would be ad op ted . The simplicity and elegance of the SSI include com- mand process has encourage d adop- tion, and SUL ha s seen no ill effects from the us er side of operations. Librarian Web au th ors qui ckly over- came any slight di sco mfort with the new proc ess and are now able to devote a portion of editing time to other, less m ono tonous tasks. References and Notes 1. Carla Dun smore, "A Qualitative Study of Web-Mounted Pathfinders Cre- ated by Academic Business Libraries," Libri 52, no . 3 (Sept. 2002): 140-41. 2. Charles W. Dea n , "Th e Public Elec- tronic Libr ary : Web-based Subj ec t Guides," Library Hi Tech 16, no. 3-4 (1998): 80-88; Gary Rob erts , "Designi ng a Data- base-Driven Web Site, or, The Evolution of the Infoiguan a," Computers in Libraries 20, no. 9 (Oct. 2000): 26-32; Bryan H. Davidson, "Database-Driven, Dynamic Content Delivery: Providing an d Manag- ing Access to Online Resources Using Microsoft Access and Ac ti ve Server Pages," OCLC Systems and Services 17, no . 1 (2001): 34-42; Marybeth Grimes and Sara E. Morris , "A Co mp ari so n of Acade- mic Librarie s' Webliographies, " Internet Reference Services Quarterly 5, no . 4 (2001): 69-77; Laur a Ga lv an -Estra da, "Moving towards a User-Cent ere d, Database-Dri- ven Web Site at th e UCSD Libraries," Index to Advertisers 179 200 LITA Internet Reference Services Quarterly 7, no. 1-2 (2002): 49-61. 3. Roberts, "Infoiguana "; Davidson, "Da tabase Driven"; Galvan- Estrada, "User -Cen tered, Database-Driv en Web Site." 4. Davidson, "Database Driven," und er " Int roduction ." 5. Ibid., under "Developm ent Con- side ra tions." 6. Roberts, "Infoiguana ," 32. 7. Ga lvan-Estrada, " U ser -Centered, Database-Driven Web Site, " 55-56. 8. Jody Co ndit Fagan, "Server -Side Includ es Made Sim ple, " The Electronic Library 20, no. 5 (2002): 382-83 . 9. Michelle Mach, "The Service of Serv er -Side Includes," Information Tech- nology and Libraries 20, no. 4 (2001): 213. 10. Greg R. Notess, "Serv er Side Includes for Site Management," Online 24, no. 4 (July 2000): 78, 80. 11. Ibid. 12. Mach, "Se rvice of Server-Side Includ es," 216. 13. Ibid., 214. 14. Fagan, "Server -Side Includ es M ade Simple," 387. 15. Ibid., 383. 16 . Ibid. 17. Ibid. 18. Apache HTTPD Server Project, "Apac h e HTTP Server Version 1.3: Secu - rity Tips for Server Configurati on," Th e Apache Softwar e Foundation. Accessed Oct. 29, 2003, http: / / httpd. apac he.org/ docs / misc / sec urity _tips .html. 19. An th on y Baratta, e-mail to th eLis t mailing list, May 16, 2003, Accessed Nov . 4, 2003, http:/ / lists.evolt.or g/ archive/ Week-of-Mon-20030512/140824.html. 20. Mach, "Service of Serv er -Side Includ es," 217. cover 2, 191, covers 3--4 USING SERVER-SIDE INCLUDE COMMANDS I NORTHRUP, CHERRY, AND DARBY 197 9665 ---- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Free Culture: How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity Coyle, Karen Information Technology and Libraries; Dec 2004; 23, 4; ProQuest pg. 198 Book Review Free Culture How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity By Lawrence Lessig . New York: Penguin, 2004. 240p. $24.95 (ISBN 1- 594-20006-8). This is the third book by Stanford law professor Larry Lessig, and the third in which he furthers his basic theme : that the ancient regime of intellectual property owners is locked in a battle with the capabilities of new technol- ogy. Lessig u sed his first book, Code and Other Laws of Cyberspace (Basic Books, 1999), to explain that the notion of cyberspace as free, open, and anar- chic is simply a myth, and a danger- ous one at that: the very architecture of our computers and how they com- municate determine what one can and cam10t do within that environ- ment. If you can get control of that architecture, say by mand ating filters on cont ent, yo u can get subs tantial control over the culture of that com- munication space. In his sec ond book, The Future of Ideas: The Fate of the Commons in a Connected World (Random, 2001), Lessig describes how the chang e from real prop erty to vir- tual propert y actually means more opportunity for control , not less. The theme that he takes up in Free Culture is his conc ern that certain power- ful inter ests in our society (read: Hollywood) are using copyright law to lock down the very stuff of creativ- ity: mainly , pa st creativity. Lessig himself admits in his pref- ace that his is not a new or unique argument. He cites Richard Stallman's writings in the mid-1980s that became the basis for the Free Software move- ment as containing many of the same concepts that Lessig argues in his book. In this case, it serves as a kind of proof of concept (that new ideas build on past ideas) rather than a criticism of lack of originality. Stallman's work is not, however, a substitute for Lessig's; not only does Lessig address popular culture where Stallman addresses only computer code, but Lessig has one key thing in his favor: h e is a mast er story-tell er and a darned good writer, not something one usually expec ts in an academic and an expert in constitutional law. His book opens with the first flight osf the Wright brothers and the death of a farmer's chick ens, followed by Buster Keaton's film Steamboat Bill and Disney's famous mouse . Th e next chapter traces the history of photogra- phy and how the law once considered that snapping a picture could require prior permission from the owners of any property caught in th e view- finder. Later he tells how an improve- ment to a sea rch engin e led one college student to owe the Recording Industry Association of America $15 million. Throughout the book Lessig illustrates copyright through the lives of real people and uses histor y, sci- ence, and the arts to mak e this law come to life for the reader . Lessig explains that intellectual property differ s from real property in the eye of the law. Unlike real prop- erty, where th e property owner has near total control over its uses, the only control offered to authors origi- nally was the control over who could make copies of the work and distrib- ut e them. In addition, that right-the "copy right" -lasted only a short time. The original length of copyright in the United States was fourteen years, with the right to renew for another fourteen years. So a total of twenty-eight years stood betwe en an author's rights and the public domain, and those rights were limited to publishing copies. Others could quote from a work, even derive other works from it (such as turning a no ve l into a play) , all within a law that was designed to promote science and the arts. Fast forward to the present day and we have a very different situation. Not only has there been a change in th e length of time that copyright applies to a work; a major change in 198 INFORMATION TECHNOLOGY AND LIBRARIES I DECEMBER 2004 Tom Zillner, Editor copyright law in 1976 extended copy- right to works that had not previously b een covered. In the earli es t U.S. copyright regimes of the late 18th cen- tury, only works that were registered with the copyright office were afforded the prot ection of copyright law, and only about five perc en t of works produc ed were so registered. Th e rest were in the public domain. Later, actual registration with the copyright office was unnecessary but the author was required to place a copyright notice on a work (e.g ., "© 2004, Karen Coyle") in order to claim copyright in it. Copyright holder s had to renew works in order make use of the full term of protection, and renewal rates were actually quite low. In 1976, all such requirements were removed, and the law was amended to state that any work in a fixed m edium automatically receives copy- right protection, and for the full term. That is true even if the author do es not want that protection . So although many saw the great exchange of ideas an d information on the Internet as being a huge commons of knowledge, to be shared and sha red alike, a ll of it has, in fact, alwa ys been covered by copyright law-every word out there belongs to someone. That chang e, combined with a much earlier change that gave a copyright holder control over deriv- ative works, puts creators into a deadlock. Th ey cannot safely build on the work of others without per- mission (thus Less ig's argument that we are becomin g a "permission cul- ture ") . Yet, we have no m echanism (such as registration of works that would result in a databas e of cre- ators) that would facilitate getting th at permission . If you find a work on the Internet and it has no named author or no contact information for the author, the law forbids you to reuse the work without permission, but there is nothing that would make getting that permission a man- ageable task. Of course, even if you do know who th e rights hold er is , permission is not a given. For exam- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ple, you hear a great song on the radio and want to use parts of that tune in your next rap performance. You would need to approach the major record label that holds the rights and ask permission, which might not be granted. You could go ah ead and use the sample and, if challenged, claim "fair use." But being challenged means going to court in a world where a court case could cost you in the six digits, an amount of money that most creators do not have. Lessig, of course, spends quite a bit of time in his book on the length of copyright, now life of the author plus seventy years. It was exactly this issue that he and Eric Eldred took to the Supreme Court in 2003. Lessig argued before the court that if Congress can seemingly arbitrarily increase the length of copyright, as it has eleven times since 1962, then there is effectively no limit to the copyright term. Yet "for a limited time" was clearly mandated in the U.S. Constitution. Lessig lost his case. You might expect him to spend his efforts explaining how the Supreme Court was wrong and he was right, but that is not what he does . Right or wrong, they are the Supreme Court, and his job was to convince them to decide in favor of his client. Instead, Lessig revises his estimation of what can be accom- plished with constitutional argu- ments and spends a chapter outlining compromises that might- just might-be possible in the future. To the extent that Eldred v. Ashcroft had an effect on Lessig's thinking , and there is evidence that the effect was profound, it will have an effect on all of us because Lessig is one of the key actors in this arena. Throughout the book, Lessig points out the difference between copyright law and the actual market for works. There is a great irony in the fact that copyright law now protects works for a century or more while most books are in print for one year or less. It is this vast storehouse of out-of- print and unexploited works that makes a strong argument for some modification of our copyright law. He also recognizes that there are different creative cultures in our society, with different views of the purpose of cre- ation. Here he cites academic move- ments like the Public Library of Science as solutions for the sector of society that has a low or nonexistent commercial interest but a need to get its works as widely distributed as pos- sible. For these creators, and for "shar- ers" everywhere, Lessig promotes the CreativeCommons solution (at www. creativecommons.org), a simple licen- sing scheme that allows creators to attach a license to their work that lets others know how they can make use of it. In a sense, CreativeCommons is a way to opt out of the default copyright that is applied to all works. When I first received my copy of Free Culture, I did two things: I looked up libraries in the index, and I looked up the book online to see what other reviewers had said. Online, I found a Web site for the book (http:/ /free-culture.org) that pointed to two very interesting sites: one that lists free, downloadable full- text copies of the book in over a dozen different formats; and one that allows you to listen to the chapters being read aloud by volunteers and admirers. (I did listen to a few chap- ters and generally they are as listen- able as most nonfiction audio books. In the end, though, I read the hard copy of the book.) Lessig is making a point by offering his work outside the usual confines of copyright law, but in fact the meaning of his gesture is more economic than legal. Al- though he, and Cory Doctorow before him (Down and Out in the Magic Kingdom, Tor Books, 2003), bro- kered agreements with their publish- ers to publish simultaneously in print with free digital copies, few authors and publishers today will choose that option for fear of loss of revenue, not because of their belief in the sanctity of intellectual property. If there were sufficient proof that free online copies of works increased sales of hard copies, this would quickly become the norm, regardless of the state of copyright law. As for libraries-unfortunately, they do not fare well. He dedicates a short chapter to Brewster Kahle and his Way-Back Machine as his example of the need to archive our culture for future access. I admit that I winced when Lessig stated: But Kahle is not the only librar- ian. The Internet Archive is not the only archive. But Kahle and the Internet Archive suggest what the future of librarie s or archives could be. (114) Lessig also mentions libraries in his arguments about out-of-print and inaccessible works, but in this case he actually gets it wrong: After it [a book] is out of print , it can be sold in used book store s without the copyright owner getting anything and stored in libraries, where many get to read the book, also for free. (113) Since we know that Lessig is very aware that books are sold and lent even while they are still in print, we have to assume that the elegance of the argum ent was preferred over preci- sion . But he makes this error mor e than once in the book, leaving librarie s to appear to be a home for leftov ers and remaindered works. That is too bad. We know that Lessig is aware of libraries; anyone active in the legal profession depends on them. He has spoken at library-related conferences and events. Yet he does not see libraries as key players in the battle against overly powerful copyright interests . More to the point, libraries have not captured his imagination, or given him a good story to tell. So here is a challenge for myself and my fel- low librarians: whether it means chat- ting up Lessig after one of his many public performances, becoming active in CreativeCommons, or stopping by Palo Alto to take a busy law professor to lunch , we need to make sure that we get on , and stay on, Lessig's radar . We need him ; h e needs us.-Karen Coyle, Digital Libraries Consultant, http:// kcoyle.net BOOK REVIEW 199 9718 ---- June_ITAL_Fagan_final An Evidence-Based Review of Academic Web Search Engines, 2014-2016: Implications for Librarians’ Practice and Research Agenda Jody Condit Fagan AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 7 7 ABSTRACT Academic web search engines have become central to scholarly research. While the fitness of Google Scholar for research purposes has been examined repeatedly, Microsoft Academic and Google Books have not received much attention. Recent studies have much to tell us about Google Scholar’s coverage of the sciences and its utility for evaluating researcher impact. But other aspects have been understudied, such as coverage of the arts and humanities, books, and non-Western, non-English publications. User research has also tapered off. A small number of articles hint at the opportunity for librarians to become expert advisors concerning scholarly communication made possible or enhanced by these platforms. This article seeks to summarize research concerning Google Scholar, Google Books, and Microsoft Academic from the past three years with a mind to informing practice and setting a research agenda. Selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas. INTRODUCTION Recent Pew Internet surveys indicate an overwhelming majority of American adults see themselves as lifelong learners who like to “gather as much information as [they] can” when they encounter something unfamiliar (Horrigan 2016). Although significant barriers to access remain, the open access movement and search engine giants have made full text more available than ever.1 The general public may not begin with an academic search engine, but Google may direct them to Google Scholar or Google Books. Within academia, students and faculty rely heavily on academic web search engines (especially Google Scholar) for research; among academic researchers in high-income areas, academic search engines recently surpassed abstracts & indexes as a starting place for research (Inger and Gardner 2016, 85, Fig. 4). Given these trends, academic librarians have a professional obligation to understand the role of academic web search engines as part of the research process. Jody Condit Fagan (faganjc@jmu.edu) is Professor and Director of Technology, James Madison University, Harrisonburg, VA. 1 Khabsa and Giles estimate “almost 1 in 4 of web accessible scholarly documents are freely and publicly available” (2014, 5). AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 8 Two recent events also point to the need for a review of research. Legal decisions in 2016 confirmed Google’s right to make copies of books for its index without paying or even obtaining permission from copyright holders, solidifying the company’s opportunity to shape the online experience with respect to books. Meanwhile, Microsoft rebooted their academic web search engine, now called Microsoft Academic. At the same time, information scientists, librarians, and other academics conducted research into the performance and utility of academic web search engines. This article seeks to review the last three years of research concerning academic web search engines, make recommendations related to the practice of librarianship, and propose a research agenda. METHODOLOGY A literature review was conducted to find articles, conference presentations, and books about the use or utility of Google Books, Google Scholar, and Microsoft Academic for scholarly use, including comparisons with other search tools. Because of the pace of technological change, the focus was on recent studies (2014 through 2016, inclusive). A search was conducted on “Google Books” in EBSCO’s Library and Information Science and Technology Abstracts (LISTA) on December 19, 2016, limited to 2014-2016. Of the 46 results found, most were related to legal activity. Only four items related to the tool’s use for research. These four titles were entered into Google Scholar to look for citing references, but no additional relevant citations were found. In the relevant articles found, the literature reviews testified to the general lack of studies of Google Books as a research tool (Abrizah and Thelwall 2014; Weiss 2016) with a few exceptions concerning early reviews of metadata, scanning, and coverage problems (Weiss 2016). A search on “Google Books” in combination with “evaluation OR review OR comparison” was also submitted to JMU’s discovery service,2 limited to 2014-2016 in combination with the terms. Forty-nine items were found and from these, three relevant citations were added; these were also entered into Google Scholar to look for citing references. However, no additional relevant citations were found. Thus, a total of seven citations from 2014-2016 were found with relevant information concerning Google Books. Earlier citations from the articles’ bibliographies were also reviewed when research was based on previous work, and to inform the development of a fuller research agenda. A search on “Microsoft Academic” in LISTA on February 3, 2017 netted fourteen citations from 2014-2016. Only seven seemed to focus on evaluation of the tool for research purposes. A search on “Microsoft Academic” in combination with terms “evaluation OR review OR comparison” was also submitted to JMU’s discovery service, limited to 2014-2016. Eighteen items were found but no additional citations were added, either because they had already been found or were not relevant. The seven titles found in LISTA were searched in Google Scholar for citing references; four additional relevant citations were found, plus a paper relevant to Google Scholar not 2 JMU’s version of EBSCO Discovery Service contained 453,754,281 items at the time of writing and is carefully vetted to contain items of curricular relevance to the JMU community (Fagan and Gaines 2016). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 9 previously discovered (Weideman 2015). Thus, a total of eleven citations were found with relevant information for this review concerning Microsoft Academic. Because of this small number, several articles prior to 2014 were included in this review for historical context. An initial search was performed on “Google Scholar” in LISTA on November 19, 2016, limited to 2014-2016. This netted 159 results, of which 24 items were relevant. A search on “Google Scholar” in combination with terms “evaluation OR review OR comparison” was also submitted to JMU’s discovery tool limited to 2014-2016, and eleven relevant citations were added. Items older than 2014 that were repeatedly cited or that formed the basis of recent research were retrieved for historical context. Finally, relevant articles were submitted to Google Scholar, which netted an additional 41 relevant citations. Altogether, 70 citations were found to articles with relevant information for this review concerning Google Scholar in 2014-2016. Readers interested in literature reviews covering Google Scholar studies prior to 2014 are directed to (Gray et al. 2012; Erb and Sica 2015; Harzing and Alakangas 2016b). FINDINGS Google Books Google Books (https://books.google.com) contains about 30 million books, approaching the Library of Congress’s 37 million, but far shy of Google’s estimate of 130 million books in existence (Wu 2015), which Google intends to continue indexing (Jackson 2010). Content in Google Books includes publisher-supplied, self-published, and author-supplied content (Harper 2016) as well as the results of the famous Google Books Library Project. Started in December 2004 as the “Google Print” project,3 the project involved over 40 libraries digitizing works from their collections, with Google indexing and performing OCR to make them available in Google Books (Weiss 2016; Mays 2015). Scholars have noted many errors with Google Books metadata, including misspellings, inaccurate dates, and inaccurate subject classifications (Harper 2016; Weiss 2016). Google does not release information about the database’s coverage, including which books are indexed or which libraries’ collections are included (Abrizah and Thelwall 2014). Researchers have suggested the database covers mostly U.S. and English-language books (Abrizah and Thelwall 2014; Weiss 2016). The conveniences of Google Books include limits by the type of book availability (e.g. free e- books vs. Google e-books), document type, and date. The detail view of a book allows magnification, hyperlinked tables of contents, buying and “Find in a Library” options, “My Library,” and user history (Whitmer 2015). Google Books also offers textbook rental (Harper 2016) and limited print-on-demand services for out-of-print books (Mays 2015; Boumenot 2015). In April 2016, the Supreme Court affirmed Google’s right to make copies for its index without paying or even obtaining permission from copyright holders (Authors Guild 2016; Los Angeles Times 2016). Scanning of library books and “snippet view” was deemed fair use: “The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do 3 https://www.google.com/googlebooks/about/history.html AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 10 not provide a significant market substitute for the protected aspects of the originals” (U.S. Court of Appeals for the Second Circuit 2015). Literature concerning high-level implications of Google Books suggests the tool is having a profound effect on research and scholarship. The tool has been credited for serving as “a huge laboratory” for indexing, interpretation, working with document image repositories, and other activities (Jones 2010). At the same time, the academic community has expressed concerns about Google Books’s effects on social justice and how its full-text search capability may change the very nature of discovery (Hoffmann 2014; Hoffmann 2016; Szpiech 2014). One study found that books are far more prevalently cited in Wikipedia than are research articles (Kousha and Thelwall 2017). Yet investigations of Google Books’ coverage and utility as a research tool seem to be sorely lacking. As Weiss noted, “no critical studies seem to exist on the effect that Google Books might have on the contemporary reference experience” (Weiss 2016, 293). Furthermore, no information was found concerning how many users are taking advantage of Google Books; the tool was noticeably absent from surveys such as (Inger and Gardner's (2016) and from research centers such as the Pew Internet Research Project. In a largely descriptive review, Harper (2016) bemoaned Google Books’ lack of integration with link resolvers and discovery tools, and judged it lacking in relevant material for the health sciences, because so much of the content is older. She also noted the majority of books scanned are in English, which could skew scholarship. The non-English skew of Google Books was also lamented by Weiss, who noted an “underrepresentation of Spanish and overestimation of French and German (or even Japanese for that matter)” especially as compared to the number of Spanish speakers in the United States (Weiss 2016, 286-306). Whitmer (2015) and Mays (2015) provided practical information about how Google Books can be used as a reference tool. Whitmer presented major Google Books features and challenged librarians to teach Google Books during library instruction. Mays conducted a cursory search on the 1871 Chicago Fire and described the primary documents she retrieved as “pure gold,” including records of city council meetings, notes from insurance companies, reports from relief societies, church sermons on the fire, and personal memoirs (Mays 2015, 22). Mays also described Google Books as a godsend to genealogists for finding local records (e.g. police departments, labor unions, public schools). In her experience, the geographic regions surrounding the forty participating Google Books Library Project libraries are “better represented than other areas” (Mays 2015, 25). Mays concludes, “Its poor indexing and search capabilities are overshadowed by the ease of its fulltext search capabilities and the wonderful ephemera that enriches its holdings far beyond mere ‘books’” (Mays 2015, 26). Abrizah and Thelwall (2014) investigated whether Google Books and Google Scholar provided “good impact data for books published in non-Western countries.” They used a comprehensive list of arts, humanities, and social sciences books (n=1,357) from the five main university presses in INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 11 Malaysia 1961-2013. They found only 23% of the books were cited in Google Books4 and 37% in Google Scholar (p. 2502). The overlap was small: only 15% were cited in both Google Scholar and Google Books. English-language books were more likely to be cited in Google Books; 40% of English language books were cited versus 16% Malay. Examining the top 20 books cited in Google Books, researchers found them to be mostly written in English (95% in Google Books vs 29% in the sample), and published by University of Malaysia Press (60% in Google Books vs 26% in the sample) (2505). The authors concluded that due to the low overlap between Google Scholar and Google Books, searching both engines was required to find the most citations to academic books. Kousha and Thelwall (2015; 2011) compared Google Books with Thomson Reuters Book Citation Index (BKCI) to examine its suitability for scholarly impact assessment and found Google Books to have a clear advantage over BKCI in the total number of citations found within the arts and humanities, but not for the social sciences or sciences. They advised combining results from BKCI with Google Books when performing research impact assessment for the arts and humanities and social sciences, but not using Google Books for the sciences, “because of the lower regard for books among scientists and the lower proportion of Google Books citations compared to BKCI citations for science and medicine” (Kousha and Thelwall 2015, 317). Microsoft Academic Microsoft Academic (https://academic.microsoft.com) is an entirely new software product as of 2016. Therefore, the studies cited prior to 2016 refer to entirely different search engines than the one currently available. However, a historical account of the tool and reviewers’ opinions was deemed helpful for informing a fuller picture of academic web search engines and pointing to a research agenda. Microsoft Academic was born as Windows Live Academic in 2006 (Carlson 2006), was renamed Live Search Academic after a first year of struggle (Jacsó 2008), and was scrapped two years later after the company recognized it did not have sufficient development support in the United States (Jacsó 2011). Microsoft Asia Research Group launched a beta tool called Libra in 2009, which redirected to the “Microsoft Academic Search” service by 2011. Early reviews of the 2011 edition of Microsoft Academic Search were promising, although the tool clearly lacked the quantity of data searched by Google Scholar (Jacsó 2011; Hands 2012). There were a few studies involving Microsoft Academic Search in 2014. Ortega and Aguillo (2014) compared Microsoft Academic Search and Google Scholar Citations for research evaluation and concluded “Microsoft Academic Search is better for disciplinary studies than for analyses at institutional and individual levels. On the other hand, Google Scholar Citations is a good tool for individual assessment because it draws on a wider variety of documents and citations” (1155). 4 Google Books does not support citation searching; the researchers searched for the book title to manually find citations to a book. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 12 As part of a comparative investigation of an automatic method for citation snowballing using Microsoft Academic Search, Choong et al. (2014) manually searched for a sample of 949 citations to journal or conference articles cited from 20 systematic reviews. They found Microsoft Academic Search contained 78% of the cited articles and noted its utility for testing automated methods due to its free API and no blocks to automated access. The researchers also tested their method against Google Scholar, but noted “computer-access restrictions prevented a robust comparison” (n.p.). Also in 2014, Orduna-Malea et al. (2014) attempted a longitudinal study of disciplines, journals, and organizations in Microsoft Academic Search only to find the database had not been updated since 2013. Furthermore they found the indexing to be incomplete and still in process, meaning Microsoft Academic Search’s presentation of information about any particular publication, organization, or author was distorted. Despite this finding, MAS was included in two studies of scholar profiles. Ortega (2015) compared scholar profiles across Google Scholar, Microsoft Academic Search, Research Gate, Academia.edu, and Mendeley, and found little overlap across the sites. They also found social and usage indicators did not consistently correlate with bibliometric indicators, except on the ResearchGate platform. Social and usage indicators were “influenced by their own social sites,” while bibliometric indicators seemed more stable across all services (13). Ward et al. (2015) still included Microsoft Academic Search in their discussion of scholarly profiles as part of the social media network, noting Microsoft Academic Search was painfully time-consuming to work with in terms of consolidating data, correcting items, and adding missing items. In September 2016, Hug et al. demonstrated the utility of the new Microsoft Academic API by conducting a comparative evaluation of normalized data from Microsoft Academic and Scopus (Hug, Ochsner, and Braendle 2016). They noted Microsoft Academic has “grown massively from 83 million publication records in 2015 to 140 million in 2016” (10). The Microsoft Academic API offers rich, structured metadata with the exception of document type. They found all attributes containing text were normalized and that identifiers were available for all entities, including references, supporting bibliometricians’ needs for data retrieval, handling, and processing. In addition to the lack of document type, the researchers also found the “fields of study” to be too granular and dynamic, and their hierarchies incoherent. They also desired the ability to use the DOI to build API requests. Nevertheless, the advantages of Microsoft Academic’s metadata and API retrieval suggested to Hug et al. that Microsoft Academic was superior to Google Scholar for calculating research impact indicators and bibliometrics in general. In October 2016, Harzing and Alakangas compared publication and citation coverage of the new Microsoft Academic with Google Scholar, Scopus, and Web of Science using a sample of 145 academics at the University of Melbourne (Harzing and Alakangas 2016a) including observations from 20-40 faculty each in the humanities, social sciences, engineering, sciences, and life sciences. They discovered Microsoft Academic had improved substantially since their previous study (Harzing 2016b), increasing 9.6% for a comparison sample in comparison with 1.4%, 2%, and 1.7% growth in Google Scholar, Scopus, and Web of Science (n.p.). The researchers noted a few INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 13 problems with data quality, “although the Microsoft Academic team have indicated they are working on a resolution” (n.p.). On average, the researchers found that Microsoft Academic found 59% as many citations as Google Scholar, 97% as many citations as Scopus, and 108% as many citations as Web of Science. Google Scholar had the top counts for each disciplinary area, followed by Scopus except in the social sciences and humanities, where Microsoft Academic ranked second. The researchers explained that Microsoft Academic “only includes citation records if it can validate both citing and cited papers as credible,” as established through a machine-learning- based system, and discussed an emerging metric of “estimated citation count” also provided by Microsoft Academic. The researchers concluded that Microsoft Academic is promising to be “an excellent alternative for citation analysis” and suggested Microsoft should work to improve coverage of books and grey literature. Google Scholar Google Scholar was released in beta form in November 2004, and was expanded to include judicial case law in 2009. While Google Scholar has received much attention in academia, it seems to be regarded by Google as a niche product: in 2011 Google removed Scholar from the list of top services and list of “more” services, relegating it to the “even more” list. In 2014, the Scholar team consisted of just nine people (Levy 2014). Describing Google Scholar in an introductory manner is not helped by Google’s vague documentation, which simply says it “includes scholarly articles from a wide variety of sources in all fields of research, all languages, all countries, and over all time periods.”5 The “wide variety of sources” includes “journal papers, conference papers, technical reports, or their drafts, dissertations, pre-prints, post-prints, or abstracts,” as well as court opinions and patents, but not “news or magazine articles, book reviews, and editorials.” Books and dissertations uploaded to Google Book Search are “automatically” included in Scholar. Google says abstracts are key, noting “Sites that show login pages, error pages, or bare bibliographic data without abstracts will not be considered for inclusion and may be removed from Google Scholar.” Studies of Google Scholar can be divided in to three major categories of focus: investigating the coverage of Google Scholar; the use and utility of Google Scholar as part of the research process; and Google Scholar’s utility for bibliographic measurement, including evaluating the productivity of individual researchers and the impact of journals. There is some overlap across these categories, because studies of Google Scholar seem to involve three questions: 1) What is being searched? 2) How does the search function? and 3) To what extent can the user usefully accomplish her task? The Coverage of Google Scholar Scholars want to know what “scholarship” is covered by Google Scholar, but the documentation merely states that it indexes “papers, not journals”6 and challenges researchers to investigate 5 https://scholar.google.com/intl/en/scholar/inclusion.html 6 https://www.google.com/intl/en/scholar/help.html#coverage AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 14 Google Scholar’s coverage empirically despite Google Scholar’s notoriously challenging technical limitations. While some limitations of Google Scholar have been corrected over the years, longstanding logistical hurdles involved with studying Google Scholar’s coverage have been well-documented for over a decade (Shultz 2007; Bonato 2016; Haddaway et al. 2015; Levay et al. 2016), and include: • Search queries are limited to 256 characters • Not being able to retrieve more than 1,000 results • Not being able to display more than 20 results per page • Not being able to download batches of results (e.g. to load into citation management software) • Duplicate citations (beyond the multiple article “versions”), requiring manual screening • Retrieving different results with Advanced and Basic searches • No designation of the format of items (e.g. conference papers) • Minimal sort options for results • Basic Boolean operators only7 • Illogical interpretation of Boolean operators: esophagus OR oesophagus and oesophagus OR esophagus return different numbers of results (Boeker, Vach, and Motschall 2013) • Non-disclosure of the algorithm by which search results are sorted. Additionally, one study reported experiencing an automated block to the researcher’s IP address after the export of approximately 180 citations or 180 individual searches (Haddaway et al. 2015, 14). Furthermore, the Research Excellence Framework was unable to use Google Scholar to assess the quality of research in UK higher education institutions, because of researchers’ inability to agree with Google on a “suitable process for bulk access to their citation information, due to arrangements that Google Scholar have in place with publishers” (Research Excellence Framework 2013, 1562). Such barriers can limit what can be studied and also cost researchers significant time in terms of downloading (Prins et al. 2016) and cleaning citations (Levay et al. 2016). Despite these hurdles, research activity analyzing the coverage of Google Scholar has continued in the past two years, often building off previous studies. This section will first discuss Google Scholar’s size and ranking, followed by its coverage of articles and citations, then its coverage of books, grey literature, and open access and institutional repositories. Google Scholar Size and Ranking In a 2014 study, Khabsa and Giles estimated there were at least 114 million English-language scholarly documents on the Web, of which Google Scholar had “nearly 100 million.” Another study by Orduna-Malea, Ayllón, Martín-Martín, and López-Cózar (2015) estimated that the total number 7 E.g., no nesting of logical subexpressions deeper than one level (Boeker, Vach, and Motschall 2013) and no truncation operators. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 15 of documents indexed by Google Scholar, without any language restriction, was between 160 and 165 million. By comparison, in 2016 the author’s discovery tool contained about 168 million items in academic journals, conference materials, dissertations, and reviews.8 Google Scholar’s presence in the information marketplace has influenced vendors to increase the discoverability of their content, including pushing for the display of abstracts and/or the first page of articles (Levy 2014). ProQuest and Gale indexes were added to Google Scholar in 2015 (Quint 2016). Martín-Martín et al. (2016b) noted that Google Scholar’s agreements with big publishers come at a price: “the impossibility of offering an API,” which would support bibliometricians’ research (54). Google Scholar’s results ranking “aims to rank documents the way researchers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.”9 Martín-Martín and his colleagues (2017, 159) conducted a large, longitudinal study of null query results in Google Scholar and found a strong correlation between result list ranking and times cited. The influence of citations is so strong that when the researchers performed the same search process four months later, 14.7% of documents were missing in the second sample, causing them to conclude even a change of one or two citations could lead to a document being excluded or included from the top 1,000 results (157). Using citation counts as a major part of the ranking algorithm has been hypothesized to produce the “Matthew Effect,” where “work that is already influential becomes even more widely known by virtue of being the first hit from a Google Scholar search, whereas possibly meritorious but obscure academic work is buried at the bottom” (Antell et al. 2013, 281). Google Scholar has been shown to heavily bias its ranking toward English-language publications even when there are highly cited non-English publications in the result set, although selection of interface language may influence the ranking. Martin-Martin and his colleagues noted that Google Scholar seems to use the domain of the document’s hosting web site as a proxy for language, meaning that “some documents written in English but with their primary version hosted in non- Anglophone countries’ web domains do appear in lower positions in spite of receiving a large number of citations” (Martin-Martin et al. 2017, 161). This effect is shown dramatically in Figure 3 of their paper. Google Scholar Coverage: Articles and Citations The coverage of articles, journals, and citations by Google Scholar has been commonly examined by using brute force methods to retrieve a sample of items from Google Scholar and possibly one or more of its competitors. (Studies discussed in this section are listed in Table 1). The goal is usually to determine how well Google Scholar’s database compares to traditional research databases, usually in a specific field. Core methodology involves importing citations into software such as Publish or Perish (Harzing 2016a), cleaning the data, then performing statistical tests, 8 The discovery tool does not contain all available metadata but has been carefully vetted (Fagan and Gaines 2016). 9 https://www.google.com/intl/en/scholar/about.html AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 16 expert review, or both. Haddaway (2015) and Moed et al. (2016) have written articles specifically discussing methodological aspects. Recent studies repeatedly find that Google Scholar’s coverage meets or exceeds that of other search tools, no matter what is identified by target samples, including journals, articles, and citations (Karlsson 2014; Harzing 2014; Harzing 2016b; Harzing and Alakangas 2016b; Moed, Bar- Ilan, and Halevi 2016; Prins et al. 2016; Wildgaard 2015; Ciccone and Vickery 2015). In only three studies did Google Scholar find fewer items, and the meaningful difference was minimal.10 Science disciplines were the most studied in Google Scholar, including agriculture, astronomy, chemistry, computer science, ecology, environmental science, fisheries, geosciences, mathematics, medicine, molecular biology, oceanography, physics, and public health. Social sciences studied include education (Prins et al. 2016), economics (Harzing 2014), geography (Ştirbu et al. 2015, 322-329), information science (Winter, Zadpoor, and Dodou 2014; Harzing 2016b), and psychology (Pitol and De Groote 2014). Studies related to the arts or humanities 2014-2016 included an analysis of open access journals in music (Testa 2016) and a comparison between Google Scholar and Web of Science for research evaluation within education, pedagogical sciences, and anthropology11 (Prins et al. 2016). Wildgaard (2015) and Bornmann et al. (2016) included samples of humanities scholars as part of bibliometric studies, but did not discuss disciplinary aspects related to coverage. Prior to 2014, the only study found related to the arts and humanities compared Google Scholar with Historical Abstracts (Kirkwood Jr. and Kirkwood 2011). Google Scholar’s coverage has been growing over time (Meier and Conkling 2008; Harzing 2014; Winter, Zadpoor, and Dodou 2014; Bartol and Mackiewicz-Talarczyk 2015, 531; Orduña-Malea and Delgado López-Cózar 2014) with recent increases in older articles (Winter, Zadpoor, and Dodou 2014; Harzing and Alakangas 2016b), leading some to question whether this supports the documented trend of increased citation of older literature (Martín-Martín et al. 2016c; Varshney 2012). Winter et al. noted that in 2005 Web of Science yielded more citations than Google Scholar for about two-thirds of their sample, but for the same sample in 2013, Google Scholar found more citations than Web of Science, with only 6.8% of citations not retrieved by Google Scholar (Winter, Zadpoor, and Dodou 2014, 1560). The unique citations of Web of Science were “typically documents before the digital age and conference proceedings not available online” (Winter, Zadpoor, and Dodou 2014, 1560). Harzing and Alakangas’s (2016b) large-scale longitudinal comparison of Google Scholar, Scopus, and Web of Science suggested that Google Scholar’s retroactive expansion has stabilized and now all three databases are growing at similar rates. 10 For example, Bramer, Giustini, and Kramer (2016a) found slightly more of their 4,795 references from systematic reviews in Embase (97.5%) than in Google Scholar (97.2%). In Testa (2016), the music database RILM indexed two more of the 84 OA journals than Google Scholar (which indexed at least one article from 93% of the journals). Finally, in a study using citations to the most-cited article of all time as a sample, Web of Science found more citations than did Google Scholar (Winter, Zadpoor, and Dodou 2014). 11 Prins et al. classified anthropology as part of the humanities. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 17 Google Scholar also seems to cover both the oldest and the most recent publications. Unlike traditional abstracts and indexes, Google Scholar is not limited by starting year, so as publishers post tables of contents of their earliest journals online, Google Scholar discovers those sources (Antell et al. 2013, 281). Trapp (2016) reported the number of citations to a highly-cited physics paper after the first 11 days of publication to be 67 in Web of Science, 72 in Scopus, and 462 in Google Scholar (Trapp 2016, 4). In a study of 800 citations to Nobelists in multiple fields, Harzing found that “Google Scholar could effectively be 9–12 months ahead of Web of Science in terms of publication and citation coverage” (2013, 1073). An increasing proportion of journal articles in Google Scholar are freely available in full text. A large-scale, longitudinal study of highly-cited articles 1950-2013 found 40% of article citations in the sample were freely available in full text (Martín-Martín et al. 2014). Another large-sample study found 61% of articles in their sample from 2004–2014 could be freely accessed (Jamali and Nabavi 2015). In both studies, nih.gov and ResearchGate were the top two full-text providers. Google Scholar’s coverage of major publisher content varies; having some coverage of a publisher does not imply all articles or journals from that publisher are covered. In a sample of 222 citations compared across Google Scholar, Scopus, and Web of Science, Google Scholar contained all of the Springer titles, as many Elsevier titles as Scopus, and the most articles by Wolters Kluwer and John Wiley. However, among the three databases, Google Scholar contained the fewest articles by BMJ and Nature (Rothfus et al. 2016). AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 18 18 Study Sample Results (Bartol and Mackiewicz- Talarczyk 2015) Documents retrieved in response to searches on crops and fibers in article titles, 1994-2013 (samples varied by crop) Google Scholar returned more documents retrieved for each crop. For example, “hemp” retrieved 644 results in Google Scholar, 493 in Scopus, and 318 in Web of Science; Google Scholar demonstrated higher yearly growth of records over time. (Bramer, Giustini, and Kramer 2016b) References from a pool of systematic reviewer searches in medicine (n=4795) Google found 97.2%, Embase, 97.5%, MEDLINE 92.3% of all references; When using search strategies, Embase retrieved 81.6%, MEDLINE 72.6%, and Google Scholar 72.8%. (Ciccone and Vickery 2015) Based on 183 user searches randomly selected from NCSU Libraries’ 2013 Summon search logs (n=137) No significant difference between the performance of Google Scholar, Summon, and EDS for known-item searches; “Google Scholar outperformed both discovery services for topical searches.” (Harzing 2014) Publications and citation metrics for 20 Nobelists in chemistry, economics, medicine, physics, 2012- 2013 (samples varied) Google Scholar coverage is now “increasing at a stable rate” and provides “comprehensive coverage across a wide set of disciplines for articles published in the last four decades” (575). (Harzing 2016b) Citations from one researcher (n=126) Microsoft Academic found all books and journal articles covered by Google Scholar; Google Scholar found 35 additional publications including book chapters, white papers, and conference papers. (Harzing and Alakangas 2016a) Samples from (Harzing and Alakangas 2016b, 802) (samples varied by faculty) Google Scholar provided higher “true” citation counts than Microsoft Academic but Microsoft Academic “estimated” citation counts were 12% higher than Google Scholar for life sciences and equivalent for the sciences. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 19 (Harzing and Alakangas 2016b) Citations of the works of 145 faculty among 37 scholarly disciplines at the University of Melbourne (samples varied by faculty) For the top faculty member, Google Scholar had 519 total papers (compared with 309 in both Web of Science and Scopus); Google Scholar had 16,507 citations (compared with 11,287 in Web of Science and 11,740 in Scopus). (Hilbert et al. 2015) Documents published by 76 information scientists in German-speaking countries (n=1,017) Google Scholar covered 63%, Scopus, 31%, BibSonomy, 24%, Mendeley, 19%, Web of Science, 15%, CiteULike, 8%. (Jamali and Nabavi 2015) Items published between 2004 and 2014 (n=8,310) 61% of articles were freely available; of these, 81% were publisher versions and 14% were pre-prints; ResearchGate was the top full-text source netting 10.5% of full-text sources, followed by ncbi.nlm.nih.gov (6.5%). (Karlsson 2014) Journals from ten different fields (n=30) Google Scholar retrieved documents from all the selected journals; Summon only retrieved documents from 14 out of 30 journals. (Lee et al. 2015) Journal articles housed in Florida State University’s institutional repository (n=170) Metadata found in Google for 46% of items and in Google Scholar for 75% of items; Google Scholar found 78% of available full text. Google Scholar found full text for six items with no full text in the IR. (Martín-Martín et al. 2014) Items highly cited by Google Scholar (n=64,000) 40% could be freely accessed using Google Scholar; Nih.gov and ResearchGate were the top two full-text providers. (Moed, Bar-Ilan, and Halevi 2016) Citations to 36 highly cited articles in 12 scientific-scholarly English-language journals (n=about 7,000) 47% of sources were in both Google Scholar and Scopus; 47% of sources were in Google Scholar only; 6% of sources were in Scopus only; Of the unique Google Scholar citations, sources were most often from Google Books, Springer, SSRN, ResearchGate, ACM Digital Library, Arxiv, and ACLweb.org. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 20 (Prins et al. 2016) Article citations in the field of education and pedagogies, and citations to 328 articles in anthropology (n=774) Google Scholar found 22,887 citations in Education & Pedagogical Science compared to Web of Science’s 8,870, and 8,092 in Anthropology compared with Web of Science’s 1,097. (Ştirbu et al. 2015) Compared # of citations resulting from two geographical topic searches (samples varied) Google Scholar found 2,732 geographical references whereas Web of Science found only 275, GeoRef 97, and FRANCIS 45. For sedimentation, Google Scholar found 1,855 geographical references compared to Web of Science’s 606, GeoRef’s 1,265, and FRANCIS’s 33; Google Scholar overlapped Web of Science by 67% and 82% for the two searches, and GeoRef by 57% and 62% (Testa 2016) Open access journals in music (n=84) Google Scholar indexed at least one article from 93% of OA journals. RILM indexed two additional journals. (Wildgaard 2015) Publications from researchers in astronomy, environmental science, philosophy and public health (n=512) Publication count from Web of Science was 2-4 times lower for all disciplines than Google Scholar; Citation count was up to 13 times lower in Web of Science than in Google Scholar. (Winter, Zadpoor, and Dodou 2014) Growth of citations to 2 classic articles (1995- 2013) and 56 science and social science articles in Google Scholar, 2005-2013 (samples varied) Total citation counts 21% higher in Web of Science than Google Scholar for Lowry (1951) but Google Scholar 17% higher than Web of Science for Garfield (1955) and 102% higher for the 56 research articles; Google Scholar showed a significant retroactive expansion to all articles compared to negligible retroactive growth in Web of Science. Table 1. Studies investigating Google Scholar’s coverage of journal articles and citations, 2014-2016. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 21 Google Scholar Coverage: Books Many studies mentioned that books, including Google Books, are sometimes included in Google Scholar results. Jamali and Nabavi (2015) found 13% of their sample of 8,310 citations from Google Scholar were books, while Martín-Martín et al. (2014) had found that 18% of their sample of 64,000 citations from Google Scholar were books. Within the field of anthropology, Prins (2016) found books to generate the most citation impact in Google Scholar (41% of books in their sample were cited in Google Scholar) compared to articles (21% of articles were cited in Google Scholar). In education, 31% of articles and 25% of books were cited by Google Scholar (3). Abrizah and Thelwall found only 37% of their sample of 1,357 arts, humanities, and social sciences books from the five main university presses in Malaysia had been cited in Google Scholar (23% of the books had been cited in Google Books) (Abrizah and Thelwall 2014, 2502). The overlap was small: 15% had impact in both Google Scholar and Google Books. The authors concluded that due to the low overlap between Google Scholar and Google Books, searching both engines is required to find the most citations to academic books. English books were significantly more likely to be cited in Google Scholar (48% vs. 32%), as were edited books (53% vs. 36%). They surmised edited books’ citation advantage was due to the use of book chapters in social sciences. They found arts and humanities books more likely to be cited in Google Scholar than social sciences books (40% vs. 34%) (Abrizah and Thelwall 2014, 2503). Google Scholar Coverage: Grey Literature Grey literature refers to documents not published commercially, including theses, reports, conference papers, government information, and poster sessions. Haddaway et al. (2015) was the only empirical study found focused on grey literature. They discovered that between 8% and 39% of full-text search results from Google Scholar were grey literature, with the greatest concentration of citations from grey literature on page 80 of results for full-text searches and page 35 for title searches. They concluded “the high proportion of grey literature that is missed by Google Scholar means it is not a viable alternative to hand searching for grey literature as a stand- alone tool” (2015, 14). For one of the systematic reviews in their sample, none of the 84 grey literature articles cited were found within the exported Google Scholar search results. The only other investigation of grey literature found was Bonato (2016), who after conducting a very limited number of searches on one specific topic and a search for a known item, concluded Google Scholar to be “deficient.” In conclusion, despite much offhand praise for Google Scholar’s grey literature coverage (Erb and Sica 2015; Antell et al. 2013), the topic has been little studied and when it has, grey literature results have not been prominent. Google Scholar Coverage: Open Access and Institutional Repository Content Erb and Sica touted Google Scholar’s access to “free content that might not be available through a library’s subscription services,” including open access journals and institutional repository coverage (2015, 48). Recent research has dug deeper into both these content areas. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 22 In general, OA articles have been shown to net more citations than non-OA articles, as Koler-Povh, Južnic, and Turk (2014) showed within the field of civil engineering. Across their sample of 2,026 scholarly articles in 14 journals, all indexed in Web of Science, Scopus, and Google Scholar, OA articles received an average of 43 citations while non-OA articles were cited 29 times (1039). Google Scholar did a better job discovering those citations; in Google Scholar the median of citations of OA articles was always higher than that for non-OA articles, wheras this was true in Web of Science for only 10 of the 14 journals and in Scopus for 11 of the 14 journals (1040). Similarly, Chen (2014) found Google Scholar to index far more OA journals than Scopus and Web of Science, especially “gold OA.”12 Google Scholar’s advantage should not be assumed across all disciplines, however; Testa (2016) found both Google Scholar and RILM to provide good coverage of OA journals in music, with Google Scholar indexing at least one article from 93% of the 84 OA journals in the sample. But the bibliographic database RILM indexed two more OA journals than Google Scholar. Google Scholar indexing of repositories may be critical for success, but results vary by IR platform and whether the IR metadata has been structured according to Google’s guidelines. In a random sample from Shodhganga, India’s central ETD database, Weideman (2015) found not one article had been indexed in full text by Google Scholar, although in many cases the metadata was indexed, leading the author to identify needed changes to the way Shodhganga stores ETDs.13 Likewise, Chen (2014) found that neither Google Scholar nor Google appears to index Baidu Wenku, a major full-text archive and social networking site in China similar to ResearchGate, and Orduña-Malea and López-Cózar (2015) found that Latin American repositories are not very visible in Google or Google Scholar due to limitations of the description schemas chosen as well as search engine reliability. In Yang’s (2016) study of Texas Tech’s DSpace IR, Google was the only search engine that indexed, discovered, or linked to PDF files supplemented with metadata; Google Scholar did not discover or provide links to the IR’s PDF files, and was less successful at discovering metadata. When Google Scholar is able to index IR content, it may be responsible for significant traffic. In a study of four major U.S. universities’ institutional repositories (three DSpace, one CONTENTdm) involving a dataset of 57,087 unique URLs and 413,786 records, researchers found that 48%–66% of referrals came from Google Scholar (Obrien et al. 2016, 870). The importance of Google Scholar in contrast to Google was noted by Lee et al. (2015), who conducted title searches on 170 journal articles housed in Florida State University’s institutional repository (using bePress’s Digital Commons platform), 100 of which existed in full text in the IR. Links to the IR were found in Google results for 45.9% of the 170 items, and in Google Scholar for 74.7% of the 170 items. Furthermore, Google Scholar linked to the full text for 78% of the 100 cases where full text was available, and even provided links to freely available full text for six items that did not have full 12 OA articles on publisher web sites, whether the journal itself is OA or not (Chen 2014) 13 Most notably, the need to store thesis documents as one PDF file instead of divided into multiple, separate files, to create HTML landing pages as per Google’s recommendations, and to submit the addresses of these pages to Google Scholar. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 23 text in the IR. However, the researchers also noted “relying on either Google or Google Scholar individually cannot ensure full access to scholarly works housed in OA IRs.” In their study, among the 104 fully open access items there was an overlap in results of only 57.5%; Google provided links to 20 items not found with Google Scholar, and Google Scholar provided links to 25 items not found with Google (Lee et al. 2015, 15). Google Scholar results note the number of “versions” available for each item. In a study of 982 science article citations (including both OA and non-OA) in IRs, Pitol and DeGroote found 56% of citations had between four and nine Google Scholar versions (2014, 603) Almost 90% of the citations shown were the publisher version, but of these, only 14.3% were freely available in full text on the publisher web site. Meanwhile, 70% percent of the items had at least one free full-text version available through a “hidden” Google Scholar version. The author’s experience in retrieving full text for this review indicates this issue still exists, but research would be needed to formulate reliable recommendations for users. Use and utility of Google Scholar as part of the research process Studies were found concerning Google Scholar’s popularity with users and their reasons for preferring it (or not) over other tools. Another group of studies examined issues related to the utility of Google Scholar for research processes, including issues related to messy metadata. Finally, a cluster of articles focused specifically on using Google Scholar for systematic reviews. Popularity and User Preferences Several studies have shown Google Scholar to be well-known to scholarly communities. A survey of 3,500 scholars from 95 countries found that over 60% of 3,500 scientists and engineers and over 70% of respondents in the social sciences, arts, and humanities were aware of Google Scholar and used it regularly (Van Noorden 2014). In a large-scale journal-reader survey, Inger and Gardner (2016) found that among academic researchers in high-income areas, academic search engines surpassed abstracts and indexes as a starting place for research (2016, 85, Figure 4). In low-income areas, Google use exceeded Google Scholar use for academic research. Major library link resolver software offers reports of full-text requests broken down by referrer. Inger and Gardner (2016) showed a large variance across subjects for whether people prefer Google or Google Scholar: “People in the social sciences, education, law, and business use Google Scholar more to find journal articles. However, people working in the humanities and religion and theology prefer to use Google” (88). Humanities scholar use of Google over Google Scholar was also found by Kemman et al. (2013); Google, Google Images, Google Scholar, and YouTube were used more than JSTOR or other library databases, even though humanities scholars’ trust in Google and Google Scholar was lower. User research since 2014 concerning Google Scholar has focused on graduate students. Results suggest Scholar is used regularly but the tool is only partially sufficient. In their study of 20 engineering masters’ students’ use of abstracts and indexes, Johnson and Simonsen (2015) found AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 24 that half their sample (n=20) had used Google Scholar the last time they located an article using specific search terms or criteria. Google was the second most-used source at 20%, followed by abstracting and indexing services (15%). Graduate students describe Google Scholar with nuance and refer to it as a specific part of their process. In Bøyum and Aabø’s (2015) interviews with eight PhD business students and Wu and Chen’s (2014, 381) interviews with 32 graduate students drawn from multiple academic disciplines, the majority described using library databases and Google Scholar for different purposes depending on the context. Graduate students in both studies were well aware of Google Scholar’s use for citation searching. Bøyum and Aabø’s (2015) subjects described library resources as more “academically robust” than Google or Google Scholar. Wu and Chen’s (2014) interviewees praised Google Scholar for its wider coverage and convenience, but lamented the uncertain quality, sometimes inaccessible full text, too many results, lack of sorting function (document type or date), finding documents from different disciplines, and duplicate citations. Google Scholar was seen by their subjects as useful during early stages of information seeking. In contrast to general assumptions, more than half the students (Wu and Chen 2014, 381) interviewed reported browsing more than 3 pages’ worth of Google Scholar results. About half of interviewees reported looking at cited documents to find more, however students had mixed opinions about whether the citing documents turned out to be relevant. Google Scholar’s “My Library” feature, introduced in 2013, now competes with other bibliographic citation management software. In a survey of 344 (mostly graduate) students, Conrad, Leonard, and Somerville found Google Scholar was the most-used (47%) followed by EndNote (37%), and Zotero (19%) (2015, 572). Follow-up interviews with 13 of the students revealed that a few students used multiple tools, for example one participant noted he/she used “EndNote for sharing data with lab partners and others “across the community”; Mendeley for her own personal thesis work, where she needs to “build a whole body of literature”; and Google Scholar Citations for “quick reference lists that I may not need for a second or third time.” Messy Metadata Many studies have suggested Google Scholar’s metadata is “messy.” Although none in the period of study examined this phenomenon in conjunction with relative user performance, the issues found could affect scholarship. A 2016 study itemized the most common mistakes in Google Scholar resulting from its extraction process: 1) incorrect title identification; 2) missing or incorrectly assigned authors; 3) book reviews indexed as books; 4) failing to group versions of the same document, which inflates citation counts; 5) grouping different editions of books, which deflates citation counts; 6) attributing citations to documents that did not cite them, or missing citations that did; and 7) duplicate author profiles (Martín-Martín et al. 2016b). The authors concluded that “in an academic big data environment, these errors (which we deem affect less than 10% of the records in the database) are of no great consequence, and do not affect the core system INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 25 performance significantly” (54). Two of these issues have been studied specifically: duplicate citations and missing publication dates. The rate of duplicate citations in Google Scholar has ranged upwards of 2.93% (Haddaway et al. 2015) and 5% (Winter, Zadpoor, and Dodou 2014, 1562), which can be compared to a .05% duplicate citation rate in Web of Science (Haddaway et al. 2015, 13). Haddaway found the main reasons for duplication include “typographical errors, including punctuation and formatting differences; capitalization differences (Google Scholar only), incomplete titles, and the fact that Google Scholar scans citations within reference lists and may include those as well as the citing article” (2015, 13). The issue of missing publication dates varies greatly across samples. Dates were found to be missing 9% of the time in Winter et al.’s study, although it varied by publication type: 4% of journals, 15% of theses, and 41% of the unknown document types” (Winter, Zadpoor, and Dodou 2014, 1562). However Martin-Martin et al. studied a sample of 32,680 highly-cited documents and found that Web of Science and Google Scholar agreed on publication dates 96.7% of the time, with an idiosyncratically large proportion of those mismatches in 2012 and 2013 (2017, 159). Utility for Research Processes Prior to 2014, studies such as Asher, Duke, and Wilson's 2012 evaluated Google Scholar’s utility as a general research tool, often in comparison with discovery tools. Since 2014, the only such study found was Namei and Young’s comparison of Summon, Google Scholar, and Google using 299 known-item queries. They found Google Scholar and Summon returned relevant results 74% of the time; Google returned relevant results 91% of the time. For “scholarly formats,” they found Summon returned relevant results 76% of the time; Google 79%; and Google 91% (2015, 526- 527). The remainder of studies in this category focused specifically on systematic reviews, perhaps because such reviews are so time-consuming. Authors develop search strategies carefully, execute them in multiple databases, and document their search methods and results carefully. Some prestigious journals are beginning to require similar rigor for any original research article, not just systematic reviews (Cals and Kotz 2016). Information provided by professional organizations about the use of Google Scholar for systematic reviews seems inconsistent: the Cochrane Handbook for Systematic Reviews of Interventions lists Google Scholar among sources for searching, but none of the five “highlighted reviews” on the Cochrane web site at the time of this article’s writing used Google Scholar in their methodologies. The UK organization National Institute for Health and Care Excellence’s manual (National Institute for Health and Care Excellence (NICE)) only mentions Google Scholar in an appendix of search sources under “Conference Abstracts.” A study by Gehanno et al. (2013) found Google Scholar contained 100% of the references from 29 systematic reviews, and suggested Google Scholar could be the first choice for systematic reviews or meta-analyses. This finding prompted a slew of follow-up studies in the next three years. An AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 26 immediate response by Giustini and Boulos (2013) pointed out that systematic reviews are not performed by searching for article titles as with Gehanno et al.’s method, but through search strategies. When they tried to replicate a systematic review’s topical search strategy in Google Scholar, the citations were not easily discovered. In addition the authors were not able to find all the papers from a given systematic review even by title searching. Haddaway et al. also found imperfect coverage: for one of the seven reviews examined, 31.5% of citations could not be found (2015, 11). Haddaway also noted that special characters and fonts (as with chemical symbols) can cause poor matching when such characters are part of article titles. Recent literature concurs that it is still necessary to search multiple databases when conducting a systematic review, including abstracts and indexes, no matter how good Google Scholar’s coverage seems to be. No one database’s coverage is complete, including Google Scholar (Thielen et al. 2016), and practical recall of Google Scholar is exceptionally low due to the 1,000 result limit, yet at the same time, Google Scholar’s lack of precision is costly in terms of researchers’ time (Bramer, Giustini, and Kramer 2016b; Haddaway et al. 2015). The challenges limiting study of Google Scholar’s coverage also bedevil those wishing to use it for reviews, especially the 1,000 result retrieval limit, lack of batch export, and lack of exported abstracts (Levay et al. 2016). Additionally, Google Scholar’s changing content, unknown algorithm and updating practices, search inconsistencies, limited Boolean functions, and 256-character query limit prevent the tool from accommodating the detailed, reproducible search methodologies required by systematic reviews (Bonato 2016; Haddaway et al. 2015; Giustini and Boulos 2013). Bonato noted Google Scholar retrieved different results with Advanced and Basic searches; could not determine the format of items (e.g. conference papers); and found other inconsistent results.14 Bonato also lamented the lack of any kind of document type limit. Despite the limitations and logistical challenges, practitioners and scholars are finding solid reasons for including academic web search engines as part of most systematic review methodologies (Cals and Kotz 2016). Stansfield et al. noted that “relevant literature for low- and middle-income countries, such as working and policy papers, is often not included in databases,” and that Google Scholar finds additional journal articles and grey literature not indexed in databases (2016, 191). For eight systematic reviews by EPPI-Center, “over a quarter of relevant citations were found from websites and internet search engines” (Stansfield, Dickson, and Bangpan 2016, 2). Specific tools and practices have been recommended when using search engines within the context of systematic reviews. Software is available to record search strategies and results (Harzing and Alakangas 2016b; Haddaway 2015). Haddaway suggests the use of snapshot tools (Haddaway 2015) to record the first 1,000 Google Scholar records rather than the typical assessment of the first 50 search results as had been done in the past: “This change in practice 14 Bonato (2016) found zero hits for conference papers when limiting by year 2015-2016, but found two papers presented at a 2015 meeting. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 27 could significantly improve both the transparency and coverage of systematic reviews, especially with respect to their grey literature components.” (Haddaway et al. 2015, 15). Both Haddaway (2015) and Cochrane recommend that review authors print or save locally electronic copies of the full text or relevant details rather than bookmarking web sites, “in case the record of the trial is removed or altered at a later stage” (Higgins and Green 2011). New methods for searching, downloading, and integrating academic search engine results into review procedures using free software to increase transparency, repeatability, and efficiency have been proposed by Haddaway and his colleagues (2015). Google Scholar Citations and Metrics Google Scholar Citations and Metrics are not academic search engines, but this article included them because these products are interwoven into the fabric of the Google Scholar database. Google Scholar Citations, launched in late 2011 (Martín-Martín et al. 2016b, 12) groups citations by author, while Google Metrics (launch date uncertain) provides similar data for articles and journals. Readers interested in an in-depth literature review of Google Scholar Citations for earlier years (2005-2012) are directed to (Thelwall and Kousha 2015b). In his comprehensive review of more recent literature about using Google Scholar Citations for citation analysis, Waltman (2016) described several themes. Google Scholar’s coverage of many fields is significantly broader than Web of Science and Scopus, and this seems to be continuing to improve over time. However studies regularly report Google Scholar’s inaccuracies, content gaps, phantom data, easily manipulatable citation counts, lack of transparency, and limitations for empirical bibliometric studies. As discussed in the coverage section, Google Scholar’s citation database is competitive with other major databases such as Web of Science and has been growing dramatically in the last few years (Winter, Zadpoor, and Dodou 2014; Harzing and Alakangas 2016b; Harzing 2014) but has recently stabilized (Harzing and Alakangas 2016b). More and more studies are concluding that Google Scholar will report more comprehensive information about citation impact than Web of Science or Scopus. Across a sample of articles from many years of one science journal, Trapp (2016) found the proportion of articles with zero citations was 37% for Web of Science, 29% for Scopus, and 19% for Google Scholar. Some of Google Scholar’s superiority for citation analysis in the social sciences and humanities is due to its inclusion of book content, software, and additional journals (Prins et al. 2016; Bornmann et al. 2016). Bornmann et al. (2016) noted citations to all ten of a research institute’s ten books published in 2009 were found in Google Scholar, whereas Web of Science found citations for only two books. Furthermore they found data in Google Scholar for 55 of the total of 71 of the institute’s book chapters. For the four conference proceedings they could identify in Google Scholar, there were 100 citations, of which 65 could be found in Google Scholar. The comparative success of Google Scholar for citation impact varies by discipline, however: (Levay et al. 2016) found Web of Science to be more reliable than Google Scholar, quicker for AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 28 downloading results, and better for retrieving 100% of the most important publications in public health. Despite Google Scholar’s growth, using all three major tools (Scopus, Web of Science, and Google Scholar) still seems to be necessary for evaluating researcher productivity. Rothfus (2016) compared Web of Science, Scopus, and Google Scholar citation counts for evaluating the impact of the Canadian Network of Observational Drug Effect Studies (CNODES), as represented by a sample of 222 citations from five articles. Attempting to determine citation metrics for the CNODES research team yielded different results for every article when using the three tools. They found that “using three tools (Web of Science, Scopus, Google Scholar) to determine citation metrics as indicators of research performance and impact provided varying results, with poor overall agreement among the three” (237). Major academic libraries’ web sites often explain how to find one’s h-index in all three (Suiter and Moulaison 2015). Researchers have also noted the disadvantages of Google Scholar for citation impact studies. Google Scholar is costly in terms of researcher time. Levay et al. (2016) estimated the cost of “administering results” from Web of Science to be 4 hours versus 75 hours for Google Scholar. Administering results includes using the search tool to search, download, and add records to bibliographic citation software, and removing duplicate citations. Duplicate citations are often mentioned as a problem (Prins et al. 2016), although Moed (2016) suggested the double counting by Google Scholar would occur only if the level of analysis is on target sources, not if it is on target articles.15 Downloaded citation samples can still suffer from double counts, however: Harzing and Alakangas described how cleaning “a fairly extreme case” in their study reduced the number of papers from 244 to 106 (2016b). Google Scholar also does not identify self-citations, which can dramatically influence the meaning of results (Prins et al. 2016). Furthermore, researchers have shown it is possible to corrupt Google Scholar Citations by uploading obviously false documents (Delgado López-Cózar, Robinson-García, and Torres-Salinas 2014).While the researchers noted traditional citation indexes can also be defrauded, Google’s products are less transparent and abuses may not be easily detected. Google did not respond to the research team when contacted and simply deleted the false documents to which it had been alerted without reporting the situation to the affected authors, and the researchers concluded: “This lack of transparency is the main obstacle when considering Google Scholar and its by-products for research evaluation purposes” (453). Because these disadvantages do not outweigh Google Scholar’s seemingly broader coverage, many articles investigate workarounds for using Google Scholar more effectively when evaluating 15 “if a document is, for instance, first published in ArXiv, and a next version later in a journal J, citations to the two versions are aggregated. In Google Scholar Metrics, in which ArXiv is included as a source, this document (assuming that its citation count exceed the h5 value of ArXiv and journal J) is listed both under ArXiv and under journal J, with the same, aggregate citation count (Moed 2016, 29). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 29 research impact. Harzing and Alakangas (2016b) recommend the hI index16, which is corrected for career length and co-authorship patterns, as the citation metric of choice for a fair comparison of Google Scholar with other tools. Bornmann et al. (2016) investigated a method to normalize data and reduce errors when using Google Scholar data to evaluate citations in the social sciences and humanities. Researcher profiles can also be used to find other scholars by topic. In a 2014 survey of researchers (n=8,554), Dagienė and Krapavickaitė found that 22% used a third-party service such as Google Scholar or Microsoft Academic to produce lists of their scholarly activities and 63% reported their scholarly record was freely available on the Web (2016, 158, 161). Google Scholar ranked only second to Microsoft Word as the most frequently used software to maintain academic activity records (160). Martín-Martín et al. (2016b) examined 814 authors in the field of bibliometrics using Google Scholar Citations, ResearcherID, ResearchGate, Mendeley, and Twitter. Google Scholar was the most used social research sharing platform, followed by ResearchGate, with ResearcherID gaining wider acceptance among authors deemed “core” to the field. Only about one-third of the authors created a Twitter profile, and many Mendeley and ResearcherID profiles were found empty. The study found Google Scholar academic profiles’ distinctive advantages to be automatic updates and its high growth rate, with disadvantages of scarce quality control, inherited metadata mistakes from Google Scholar, and its manipulatability. Overall, Martin-Martin and colleagues concluded that Google Scholar “should be the preferred source for relational and comparative analyses in which the emphasis is put on author clusters” (57). Google Scholar Metrics provides citation information for articles and journals. In a sample of 1,000 journals, Orduña-Malea and Delgado López-Cózar found that “despite all the technical and methodological problems,” Google Scholar Metrics provides sound and reliable journal rankings (2014, 2365). Google Scholar Metrics seems to be an annual publication; the 2016 edition contains 5,734 publications and 12 language rankings. Russian, Korean, Polish, Ukranian, and Indionesian were added this year, while Italian and Dutch were removed for unknown reasons (Martín-Martín et al. 2016a). Researchers also found that many discussion papers and working papers were removed in 2016. English-language publications are broken into subject areas and disciplines. Google Scholar Metrics often, but not always creates separate entries for each language in which a journal is published. Bibliometricians call for Google Scholar Metrics to display the total number of documents published in the publications indexed and the total number of citations received: “These are the two essential parameters that make it possible to assess the reliability and accuracy of any bibliometric indicator” (13). Adding country and language of publication and self-citation rates are among the other improvements listed by Lopez-Cozar and colleagues. 16 Harzing and Alakangas (2016b) define the hIa as the hI norm/academic age. Academic age refers to the number of years elapsed since first publication. To calculate hI norm, one divides the number of citations by the number of authors for that paper, and then calculates the h-index of the normalized citation count. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 30 Informing Practice The glaring lack of research related to the coverage of arts and humanities scholarship, limited research on book coverage, and relaunch of Microsoft Academic make it impossible to form a general recommendation regarding the use of academic web search engines for serious research. Until the ambiguity of arts and humanities coverage is clarified, and until academic web search engines are transparent and stable, traditional bibliographic databases still seem essential for systematic reviews, citation analysis, and other rigorous literature search purposes. Discipline- specific databases also have features such as controlled vocabulary, industry classification codes, and peer review indicators that make scholars more efficient and effective. Nevertheless, the increasing relevance of academic search engines and solid coverage of sciences and social sciences make it essential for librarians to become as expert with Google Scholar, Google Books, and Microsoft Academic. For some scholarly tasks, academic search engines may be superior: for example, when looking up doi numbers for this paper’s bibliography, the most efficient process seemed to be a Google search on the article title plus the term “doi,” and the most likely site to display in the results was ResearchGate.17 Librarians and scholars should champion these tools as an important part of an efficient, effective scholarly research process (Walsh 2015), while also acknowledging the gaps in coverage, biases, metadata issues and missing features available in other databases. Academic web search engines could form the centerpiece for instruction sessions surrounding the scholarly network, as shown by “cited by” features, author profiles, and full-text sources. Traditional abstracts and indexes could then be presented on the basis of their strengths. At some point, explaining how to access full text will likely no longer focus on the link resolver but on the many possible document versions a user might encounter (e.g. pre-prints or editions of books) and how to make an informed choice. In the meantime, even though web search engines and repositories may retrieve copious full text outside library subscriptions, college students should still be made aware of the library’s collections and services such as interlibrary loan. When considering Google Scholar’s weaknesses, it’s important to keep in mind Chen’s observation that we may not have a tool available that does any better (Antell et al. 2013). While Google Scholar may be biased toward English-language publications, so are many bibliographic databases. Overall, Google Scholar seems to have increased the visibility of international research (Bartol and Mackiewicz-Talarczyk 2015). While Google Scholar’s coverage of grey literature has been shown to be somewhat uneven (Bonato 2016; Haddaway et al. 2015), it seems to include more diversity among relevant document types than many abstracts and indexes (Ştirbu et al. 2015; Bartol and Mackiewicz-Talarczyk 2015). Although the rigors of systematic reviews may contraindicate the tool’s use as a single source, it adds value to search results from other databases (Bramer, Giustini, and Kramer 2016a). User preferences and priorities should also be taken into account; Google 17 Because the authority of ResearchGate is ambiguous, in such cases I then looked up the doi using Google to find the publisher’s version. In some cases, the doi was not displayed on the publisher’s result page (e.g., https://muse.jhu.edu/article/197091). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 31 Scholar results have been said to contain “clutter,” but many researchers have found the noise in Google Scholar tolerable given its other benefits (Ştirbu et al. 2015). Google Books purportedly contains about 30 million items, focused on U.S.-published and English- language books. But its coverage is hit-or-miss, surprising Mays (2015) with an unexpected wealth of primary sources but disappointing Harper (2016) with limited coverage of academic health sciences books. Recent court decisions have enabled Google to continue progressing toward their goal of full-text indexing and making snippet views available for the Google-estimated universe of 130 million books, which suggests its utility may increase. Google Books is not integrated with link resolvers or discovery tools but has been found useful for providing information about scholarly research impact, especially for the arts, humanities, and social sciences. As re-launched in 2016, Microsoft Academic shows real potential to compete with Google Scholar in coverage and utility for finding journal articles. As of February 2017 its index contains 120 million citations. In contrast to the mystery of Google Scholar’s black-box algorithms and restrictive limitations, Microsoft Academic uses an open-system approach and offers an API. Microsoft Academic appears to have less coverage of books and grey literature compared with Google Scholar. Research is badly needed about the coverage and utility of both Google Books and Microsoft Academic. Google Scholar continues to evolve, launching a new algorithm for known-item searching in 201618 that appears to work very well. Google Scholar does not reveal how many items it searches but studies have suggested 160 million documents have been indexed. Studies have shown the Google Scholar relevance algorithm to be heavily influenced by citation counts and language of publication. Google Scholar has been so heavily researched and is such a “black box” that more attention would seem to have diminishing returns, except in the area of coverage of and utility for arts and humanities research. Librarians may find these takeaways useful for working with or teaching Google Scholar: • Little is known about coverage of arts and humanities by Google Scholar. • Recent studies repeatedly find that in the sciences and social sciences Google Scholar covers as much if not more than library databases, has more recent coverage, and frequently provides access to full text without the need for library subscriptions. • Although the number of studies is limited, Google Scholar seems excellent at retrieving known scholarly items compared with discovery tools. • Using proper accent marks in the title when searching for non-English language items appears to be important. 18 Google Scholar’s blog notes that in January 2016, a change was made so “Scholar now automatically identifies queries that are likely to be looking for a specific paper” Technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 32 • Finding full text for non-English journal articles may require searching Google Scholar in the original language. • While Google Scholar may include results from Google Books, it appears both tools should be used rather than assuming Google Books will appear in Google Scholar. • While Google Scholar does include grey literature, these results do not usually rank highly. • Google Scholar and Google must both be used to effectively search across institutional repository content. • Free full text may be buried underneath the “All X versions” links because the publisher’s web site is usually the dominant version presented to the user. The right-hand column links may help ameliorate this situation, but not reliably. • Google Scholar is well-known in most academic communities and used regularly; however, it is seldom the only tool used, with scholars continuing to use other web search tools, library abstracts and indexes, and published web sites as well. • Experts in writing systematic reviews recommend Google Scholar be included as a search tool along with traditional abstracts and indexes, using software to record the search process and results. • For evaluating research impact, Google Scholar may be superior to Web of Science or Scopus, but using all three tools still seems necessary. • As with any database, citation metadata should be verified against the publisher’s data; with Google Scholar, publication dates should receive deliberate attention. • When Google Scholar covers some of a major publisher’s content, that does not imply it covers all of that publisher’s content. • Google Scholar Metrics appears to provide reliable journal rankings. Research Agenda This review of the literature also provides direction for future research concerning academic web search engines. Because this review focused on 2014-2016, researchers may need to review studies from earlier periods for methodological ideas and previous findings, noting that dramatic changes in search engine coverage and behavior can occur within only a few years.19 Across the studies, some general best practices were observed. When comparing the coverage of academic web search engines, their utility for establishing research impact, or other bibliometric studies, researchers should strongly consider using software such as Publish or Perish, and to design their research approach with previous methodologies in mind. Information scientists have charted a set of clear disciplinary methods; there is no need to start from scratch. Even when 19 For example Ştirbu found that Google Scholar overlapped GeoRef by 57% and 62% (Ştirbu et al. 2015, 328), compared with a finding by Neuhaus in 2006 where Scholar overlapped with GeoRef by 26% (2006, 133). INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 33 performing a large-scale quantitative assessment such as (Kousha and Thelwall 2015), manually examining and discussing a subset of the sample seems helpful for checking assumptions and for enhancing the meaning of the findings to the reader. Some researchers examined the “top 20” or “top 10” results qualitatively (Kousha and Thelwall 2015), while others took a random sample from within their large-study sample (Kousha, Thelwall, and Rezaie 2011). Academic search engines for arts and humanities research Research into the use of academic web search engines within arts and humanities fields is sorely needed. Surveys show humanities scholars use both Google and Google Scholar (Inger and Gardner 2016; Kemman, Kleppe, and Scagliola 2013; Van Noorden 2014). During interviews of 20 historians by Martin and Quan-Haase (2016) concerning serendipity, five mentioned Google Books and Google Scholar as important for recreating serendipity of the physical library online. Almost all arts and humanities scholars search the Internet for researchers and their activities, and commonly expressed the belief that having a complete list of research activities online improves public awareness (Dagienė and Krapavickaitė 2016). Mays’s (2015) practical advice and the few recent studies on citation impact of Google Books for these disciplines point to the enormous potential for this tool’s use. Articles describing opportunities for new online searching habits of humanities scholars have not always included Google Scholar (Huistra and Mellink 2016). Wu and Chen’s interviews with humanities graduate students suggested their behavior and preferences were different from science and technology students, doing more known-item searching and struggling with “semantically ambiguous keywords” that retrieved irrelevant results (2014, 381). Platform preferences seem to have a disciplinary aspect: Hammarfelt’s (2014) investigation of altmetrics in the humanities suggests Mendeley and Twitter should be included along with Google Scholar when examining citation impact of humanities research, while a 2014 Nature survey suggests ResearchGate is much less popular in the social sciences and humanities than in the sciences (Van Noorden 2014). In summary, arts and humanities scholars are active users of academic web search engines and related tools, but their preferences and behavior, and the relative success of Google Scholar as a research tool cannot be inferred from the vast literature focused on the sciences. Advice from librarians and scholars about the strengths and limitations of academic web search engines in these fields would be incredibly useful. Specific examples of needed research, and related studies to reference for methodological ideas: • Similar to the studies that have been done in the sciences, how well do academic search engines cover the arts and humanities? An emphasis on formats important to the discipline would be important (Prins et al. 2016). • How does the quality of search results compare between academic search engines and traditional library databases for arts and humanities topics? To what extent can the user usefully accomplish her task? (Ruppel 2009)? AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 34 • To what extent do academic search engines support the research process for scholarship distinctive to arts and humanities disciplines (e.g. historiographies, review essays)? • In academic search engines, how visible is the arts and humanities literature found in institutional repositories (Pitol and De Groote 2014)? Specific aspects of academic search engine coverage This review suggests that broad studies of academic search engine coverage may have reached a saturation point. However, specific aspects of coverage need additional investigation: • Grey literature: Although Google Scholar’s inclusion of grey literature is frequently mentioned as valuable, empirical studies evaluating its coverage are scarce. Additional research following the methodology of Haddaway (2015) could investigate the bibliographies of literature other than systematic reviews, investigate various disciplines, or use a sample of valuable known items (similar to Kousha, Thelwall, and Rezaie’s (2011) methodology for books). • Non-Western, non-English language literature: For further investigation of the repeated finding of non-Western, non-English language bias (Abrizah and Thelwall 2014; Cavacini 2015), comparisons to library abstracts and indexes would be helpful for providing context. To what extent is this bias present in traditional research tools? Hilbert et al. found the coverage of their sample increased for English language in both Web of Science and Scopus, and “to a lesser extent” in Google Scholar (2015, 260). • Books: Any investigations of book coverage in Microsoft Academic and Google Scholar would be welcome. Very few 2014-2016 studies focused on books in Google Scholar, and even looking in earlier years turned up little research. Georgas (2015) compared Google with a federated search tool for finding books, so her study may be a useful reference. Kousha et al. (2011) found three times as many citations in Google Scholar than in Scopus to a sample of 1,000 academic books. The authors concluded “there are substantial numbers of citations to academic books from Google Books and Google Scholar, and it therefore may be possible to use these potential sources to help evaluate research in book- oriented disciplines” (Kousha, Thelwall, and Rezaie 2011, 2157). • Institutional Repositories: Yang (2016) recommended that “librarians of digital resources conduct research on their local digital repositories, as the indexing effects and discovery rates on metadata or associated text files may be different case by case,” and the studies found 2014-2016 show that IR platform and metadata schema dramatically affect discovery, with some IRs nearly invisible (Weideman 2015; Chen 2014; Orduña-Malea and López-Cózar 2015; Yang 2016) and others somewhat findable by Google Scholar (Lee et al. 2015; Obrien et al. 2016). Askey and Arlitsch (2015) have explained how Google Scholar’s decisions regarding metadata schema can dramatically affect results.20 Libraries who 20 For example, Google’s rejection of Dublin Core. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 35 would like their institutional repositories to serve as social sharing platforms for research should consider conducting a study similar to (Martín-Martín et al. 2016b). Finally, a study of IR journal article visibility in academic web search engines could be extremely informative. • Full-text retrieval: The indexing coverage of academic search engines relates to the retrieval of full text, which is another area ripe for more research studies, especially in light of the impressive quantity of full text that can be retrieved without user authentication. Johnson and Simonsen (2015) found that more of the engineering students they surveyed obtained scholarly articles from a free download or getting a PDF from a colleague at another institution than used the library’s subscription. Meanwhile, libraries continue to pay for costly subscription resources. Monitoring this situation is essential for strategic decision-making. Quint (2016) and Karlsson (2014) have suggested strategies for libraries and vendors to support broader access to subscription full text through creative licensing and per-item fee approaches. Institutional repositories have had mixed results in changing scholars’ habits (both contributors and searchers) but are demonstrably contributing to the presence of full text in the academic search engine experience. When will academic users find a good-enough selection of full-text articles that they no longer need the expanded full text paid for by their institutions? Google Books Similarly to Microsoft Academic, Google Books as a search tool also needs dedicated research from librarians and information scientists about its coverage, utility, and/or adoption. A purposeful comparison with other large digital repositories such as HathiTrust (https://www.hathitrust.org) would be a boon to practitioners and the public. While HathiTrust is transparent about its coverage (https://www.hathitrust.org/statistics_visualizations), specific areas of Google Books’ coverage have been called into question. Weiss (2016) suggested a gap in Google Books exists from about 1915-1965 “because many publishers either have let it fall out of print, or the book is orphaned and no one wants to go through the trouble of tracking down the copyright owners” and found that copies in Google Books “will likely be locked down and thus unreadable, or visible only as a snippet, at best” (303). Has this situation changed since the court rulings concerning the legality of snippet view? Longitudinal studies in the growth of Google Books similar to (Harzing 2014) could illuminate this and other questions about Google Books’s ability to deliver content. Uneven coverage of content types, geography, and language should be investigated. Mays noted a possible geographical imbalance within the United States (Mays 2015, 26). Others noted significant language and international imbalances, and large disciplinary differences (Weiss 2016; Abrizah and Thelwall 2014; Kousha and Thelwall 2015). Weiss and others suggest the implications of Google Books’ coverage imbalance have enormous social implications: “Google and other [massive digital libraries] have essentially canonized the books they have scanned and contribute to the marginalization of those left unscanned” (301). Therefore more holistic quantitative investigations of the types of information in Google Books and possible skewness AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 36 would be welcome. Finally, Chen’s study (2012) comparing the coverage of Google Books and WorldCat could be repeated to provide longitudinal information. The utility of Google Books for research purposes also needs further investigation. Books are far more prevalently cited in Wikipedia than are research articles (Thelwall and Kousha 2015a). Examining samples of Wikipedia articles’ citation lists for the prevalence of Google Books could reveal how dominant a force Google Books has become in that space. On a more philosophical level, investigating the ways Google Books might transform scholarly processes would be useful. Szpiech (2014) considered how the Google Books version of a medieval manuscript transformed his relationship with texts, causing a rupture “produced by my new power to extract words and information from a text without being subject to its order, scale, or authority” (78). He hypothesized readers approach Google Books texts as consumers, rather than learners, whereby “the critical sense of the gestalt” is at risk of being forgotten” (84). Have other researchers in experienced what he describes? Microsoft Academic Given the stated openness of Microsoft’s new academic web search engine,21 the closed nature of Google Scholar, and the promising findings of bibliometricians (Harzing 2016b; Harzing and Alakangas 2016a), librarians and information scientists should embark on a thorough review of Microsoft Academic with similar enthusiasm to which they approached Google Scholar. The search engine’s coverage, utility for research, and suitability for bibliometric analysis22 all need to be examined. Microsoft Academic’s abilities for supporting scholarly social networking would also be of interest, perhaps using Ward et al. (2015) as a theoretical groundwork. The tool’s coverage and utility for various disciplines and research purposes is a wide-open field for highly useful research. Professional and Instructional Approaches Based on User Research To inform instructional approaches, more study on user behavior is needed, perhaps repeating Herrera’s (2011) study with Google Scholar and Microsoft Academic. In light of the recent focus on graduate students, research concerning the use of academic web search engines by undergraduates, community college students, high school students, and other groups would be welcome. Using an interview or focus group generates exploratory findings that could be tested through surveys with a larger, more representative sample of the population of interest. Studying searching behaviors has been common; can librarians design creative studies to investigate reading, engagement, and reflection when web search engines are used as part of the process? Is there a way to study whether the “Matthew Effect” (Antell et al. 2013, 281), the aging citation 21 Microsoft’s FAQ says the company is “adopting an open approach in developing the service, and we invite community participation. We like to think what we have developed is a community property. As such, we are opening up our academic knowledge as a downloadable dataset” and offers the Academic Knowledge API (https://www.microsoft.com/cognitive-services/en-us/academic-knowledge-api). 22 See Jacsó (2011) for methodology. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 37 phenomenon (Verstak et al. 2014; Martín-Martín et al. 2016a; Davis and Cochran 2015), or other epistemological hypotheses are influencing scholarship patterns? A bold study could be performed to examine differences in quality outcomes between samples of students using primarily academic search engines versus traditional library search tools. Exploratory studies in this area could begin by surveying students about their use of search tools for research methods courses or asking them to record their research process in a journal, and correlating the findings with their grades on the final research product. Three specific areas of user research needed are the use of scholarly social network platforms, researcher profiles, and the influence of these on scholarly collaboration and research (Ward, Bejarano, and Dudás 2015, 178); the performance of Google’s relatively new known-item search23 (compared with Microsoft Academic’s known-item search abilities), and searching in non-English languages. Regarding the latter, Albarillo’s (2016) method which he applied to library databases could be repeated with Google Scholar, Microsoft Academic, and Google Books. Finally, to continue their strong track record as experts in navigating the landscape of digital scholarship, librarians need to research assumptions regarding best practices for scholarly logistics. For example, searching Google for article titles plus the term “doi,” then scanning the results list for ResearchGate was found by this study’s author to most efficiently provide doi numbers: but is this a reliable approach? Does ResearchGate have sufficient accuracy to be recommended as the optimal tool for this task? What is the most efficient way for a scholar to locate full text for a citation? Are academic search engines’ bibliographic citation management software export tools competitive with third-party commercial tools such as RefWorks? Another area needing investigation is the visibility of links to free full text in Google Scholar. Pitol and DeGroote found that 70% percent of the items in their study had at least one free full-text version available through a “hidden” Google Scholar version (2014, 603), and this author’s work on this review article indicates this problem still exists — but to what extent? Also, when free full text exists in multiple repositories (e.g. ResearchGate, Digital Commons, Academic.edu), which are the most trustworthy and practically useful for scholars? Librarians should discuss the answers to these questions and be ready to provide expert advice to users. CONCLUSION With so many users opting to use academic web search engines for research, librarians need to investigate the performance of Microsoft Academic, Google Books, and of Google Scholar for the arts and humanities, and to re-think library services and collections in light of these tools’ strengths and limitations. The evolution of web indexing and increasing free access to full text should be monitored in conjunction with library collection development. To remain relevant to 23 Google Scholar’s blog notes that in January 2016, a change was made so “Scholar now automatically identifies queries that are likely to be looking for a specific paper” Technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 38 modern researchers, librarians should continue to strengthen their knowledge of and expertise with public academic web search engines, full-text repositories, and scholarly networks. BIBLIOGRAPHY Abrizah, A., and Mike Thelwall. 2014. "Can the Impact of Non- Western Academic Books be Measured? An Investigation of Google Books and Google Scholar for Malaysia." Journal of the Association for Information Science & Technology 65 (12): 2498-2508. https://doi.org/10.1002/asi.23145. Albarillo, Frans. 2016. "Evaluating Language Functionality in Library Databases." International Information & Library Review 48 (1): 1-10. https://doi.org/10.1080/10572317.2016.1146036. Antell, Karen, Molly Strothmann, Xiaotian Chen, and Kevin O’Kelly. 2013. "Cross-Examining Google Scholar." Reference & User Services Quarterly 52 (4): 279-282. https://doi.org/10.5860/rusq.52n4.279. Asher, Andrew D., Lynda M. Duke, and Suzanne Wilson. 2012. "Paths of Discovery: Comparing the Search Effectiveness of EBSCO Discovery Service, Summon, Google Scholar, and Conventional Library Resources." College & Research Libraries 74(5):464-488. https://doi.org/10.5860/crl- 374. Askey, Dale, and Kenning Arlitsch. 2015. "Heeding the Signals: Applying Web Best Practices When Google Recommends." Journal of Library Administration 55 (1): 49-59. https://doi.org/10.1080/01930826.2014.978685. Authors Guild. "Authors Guild v. Google." Accessed January 1, 2016, https://www.authorsguild.org/where-we-stand/authors-guild-v-google/. Bartol, Tomaž, and Maria Mackiewicz-Talarczyk. 2015. "Bibliometric Analysis of Publishing Trends in Fiber Crops in Google Scholar, Scopus, and Web of Science." Journal of Natural Fibers 12 (6): 531. https://doi.org/10.1080/15440478.2014.972000. Boeker, Martin, Werner Vach, and Edith Motschall. 2013. "Google Scholar as Replacement for Systematic Literature Searches: Good Relative Recall and Precision Are Not Enough." BMC Medical Research Methodology 13 (1): 1. Bonato, Sarah. 2016. "Google Scholar and Scopus for Finding Gray Literature Publications." Journal of the Medical Library Association 104 (3): 252-254. https://doi.org/10.3163/1536- 5050.104.3.021. Bornmann, Lutz, Andreas Thor, Werner Marx, and Hermann Schier. 2016. "The Application of Bibliometrics to Research Evaluation in the Humanities and Social Sciences: An Exploratory Study using Normalized Google Scholar Data for the Publications of a Research Institute." INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 39 Journal of the Association for Information Science & Technology 67 (11): 2778-2789. https://doi.org/10.1002/asi.23627. Boumenot, Diane. "Printing a Book from Google Books." One Rhode Island Family. Last modified December 3, 2015, accessed January 1, 2017. https://onerhodeislandfamily.com/2015/12/03/printing-a-book-from-google-books/. Bøyum, Idunn, and Svanhild Aabø. 2015. "The Information Practices of Business PhD Students." New Library World 116 (3): 187-200. https://doi.org/10.1108/NLW-06-2014-0073. Bramer, Wichor M., Dean Giustini, and Bianca M. R. Kramer. 2016. "Comparing the Coverage, Recall, and Precision of Searches for 120 Systematic Reviews in Embase, MEDLINE, and Google Scholar: A Prospective Study." Systematic Reviews 5(39):1-7. https://doi.org/10.1186/s13643-016-0215-7. Cals, J. W., and D. Kotz. 2016. "Literature Review in Biomedical Research: Useful Search Engines Beyond PubMed." Journal of Clinical Epidemiology 71: 115-117. https://doi.org/10.1016/j.jclinepi.2015.10.012. Carlson, Scott. 2006. "Challenging Google, Microsoft Unveils a Search Tool for Scholarly Articles." Chronicle of Higher Education 52 (33). Cavacini, Antonio. 2015. "What is the Best Database for Computer Science Journal Articles?" Scientometrics 102 (3): 2059-2071. https://doi.org/10.1007/s11192-014-1506-1. Chen, Xiaotian. 2012. "Google Books and WorldCat: A Comparison of their Content." Online Information Review 36 (4): 507-516. https://doi.org/10.1108/14684521211254031. ———. 2014. "Open Access in 2013: Reaching the 50% Milestone." Serials Review 40 (1): 21-27. https://doi.org/10.1080/00987913.2014.895556. Choong, Miew Keen, Filippo Galgani, Adam G. Dunn, and Guy Tsafnat. 2014. "Automatic Evidence Retrieval for Systematic Reviews." Journal of Medical Internet Research 16 (10): 1-1. https://doi.org/10.2196/jmir.3369. Ciccone, Karen, and John Vickery. 2015. "Summon, EBSCO Discovery Service, and Google Scholar: A Comparison of Search Performance using User Queries." Evidence Based Library & Information Practice 10 (1): 34-49. https://ejournals.library.ualberta.ca/index.php/EBLIP/article/view/23845. Conrad, Lettie Y., Elisabeth Leonard, and Mary M. Somerville. 2015. "New Pathways in Scholarly Discovery: Understanding the Next Generation of Researcher Tools." Paper presented at the Association of College and Research Libraries annual conference, March 25-27, Portland, OR. https://pdfs.semanticscholar.org/3cb1/315476ccf9b443c01eb9b1d175ae3b0a5b4e.pdf. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 40 Dagienė, Eleonora, and Danutė Krapavickaitė. 2016. "How Researchers Manage their Academic Activities." Learned Publishing 29(3):155-163. https://doi.org/10.1002/leap.1030. Davis, Philip M., and Angela Cochran. 2015. "Cited Half-Life of the Journal Literature." arXiv Preprint arXiv:1504.07479. https://arxiv.org/abs/1504.07479. Delgado López-Cózar, Emilio, Nicolás Robinson-García, and Daniel Torres-Salinas. 2014. "The Google Scholar Experiment: How to Index False Papers and Manipulate Bibliometric Indicators." Journal of the Association for Information Science & Technology 65 (3): 446-454. https://doi.org/10.1002/asi.23056. Erb, Brian, and Rob Sica. 2015. "Flagship Database for Literature Searching Or Flelpful Auxiliary?" Charleston Advisor 17 (2): 47-50. https://doi.org/10.5260/chara.17.2.47. Fagan, Jody Condit, and David Gaines. 2016. "Take Charge of EDS: Vet Your Content." Presentation to the EBSCO Users' Group, Boston, MA, May 10-11. Gehanno, Jean-François, Laetitia Rollin, and Stefan Darmoni. 2013. "Is the Coverage of Google Scholar Enough to be Used Alone for Systematic Reviews." BMC Medical Informatics and Decision Making 13 (1): 1. https://doi.org/10.1186/1472-6947-13-7. Georgas, Helen. 2015. "Google vs. the Library (Part III): Assessing the Quality of Sources found by Undergraduates." portal: Libraries and the Academy 15 (1): 133-161. https://doi.org/10.1353/pla.2015.0012. Giustini, Dean, and Maged N. Kamel Boulos. 2013. "Google Scholar is Not Enough to be Used Alone for Systematic Reviews." Online Journal of Public Health Informatics 5 (2). https://doi.org/10.5210/ojphi.v5i2.4623. Gray, Jerry E., Michelle C. Hamilton, Alexandra Hauser, Margaret M. Janz, Justin P. Peters, and Fiona Taggart. 2012. "Scholarish: Google Scholar and its Value to the Sciences." Issues in Science and Technology Librarianship 70 (Summer). https://doi.org/10.1002/asi.21372/full. Haddaway, Neal R. 2015. "The Use of Web-Scraping Software in Searching for Grey Literature." Grey Journal 11 (3): 186-190. Haddaway, Neal Robert, Alexandra Mary Collins, Deborah Coughlin, and Stuart Kirk. 2015. "The Role of Google Scholar in Evidence Reviews and its Applicability to Grey Literature Searching." PloS One 10 (9): e0138237. https://doi.org/10.1371/journal.pone.0138237. Hammarfelt, Björn. 2014. "Using Altmetrics for Assessing Research Impact in the Humanities." Scientometrics 101 (2): 1419-1430. https://doi.org/10.1007/s11192-014-1261-3. Hands, Africa. 2012. "Microsoft Academic Search – http://academic.research.microsoft.com." Technical Services Quarterly 29 (3): 251-252. https://doi.org/10.1080/07317131.2012.682026. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 41 Harper, Sarah Fletcher. 2016. "Google Books Review." Journal of Electronic Resources in Medical Libraries 13 (1): 2-7. https://doi.org/10.1080/15424065.2016.1142835. Harzing, Anne-Wil. 2013. "A Preliminary Test of Google Scholar as a Source for Citation Data: A Longitudinal Study of Nobel Prize Winners." Scientometrics 94 (3): 1057-1075. https://doi.org/10.1007/s11192-012-0777-7. ———. 2014. "A Longitudinal Study of Google Scholar Coverage between 2012 and 2013." Scientometrics 98 (1): 565-575. https://doi.org/10.1007/s11192-013-0975-y. ———. 2016a. Publish Or Perish. Vol. 5. http://www.harzing.com/resources/publish-or-perish. ———. 2016b. "Microsoft Academic (Search): A Phoenix Arisen from the Ashes?" Scientometrics 108 (3): 1637-1647.https://doi.org/10.1007/s11192-016-2026-y. Harzing, Anne-Wil, and Satu Alakangas. 2016a. "Microsoft Academic: Is the Phoenix Getting Wings?" Scientometrics: 1-13. Harzing, Anne-Wil, and Satu Alakangas. 2016b. "Google Scholar, Scopus and the Web of Science: A Longitudinal and Cross-Disciplinary Comparison." Scientometrics 106 (2): 787-804. https://doi.org/10.1007/s11192-015-1798-9. Herrera, Gail. 2011. "Google Scholar Users and User Behaviors: An Exploratory Study." College & Research Libraries 72 (4): 316-331. https://doi.org/10.5860/crl-125rl. Higgins, Julian, and S. Green, eds. 2011. Cochrane Handbook for Systematic Reviews of Interventions. Version 5.1.0 ed.: The Cochrane Collaboration. http://handbook.cochrane.org/. Hilbert, Fee, Julia Barth, Julia Gremm, Daniel Gros, Jessica Haiter, Maria Henkel, Wilhelm Reinhardt, and Wolfgang G. Stock. 2015. "Coverage of Academic Citation Databases Compared with Coverage of Scientific Social Media." Online Information Review 39 (2): 255-264. https://doi.org/10.1108/OIR-07-2014-0159. Hoffmann, Anna Lauren. 2014. "Google Books as Infrastructure of in/Justice: Towards a Sociotechnical Account of Rawlsian Justice, Information, and Technology." Theses and Dissertations. Paper 530. http://dc.uwm.edu/etd/530/. ———. 2016. "Google Books, Libraries, and Self-Respect: Information Justice Beyond Distributions." The Library 86 (1). https://doi.org/10.1086/684141. Horrigan, John B. "Lifelong Learning and Technology." Pew Research Center, last modified March 22, 2016, accessed February 7, 2017, http://www.pewinternet.org/2016/03/22/lifelong- learning-and-technology/. Hug, Sven E., Michael Ochsner, and Martin P. Braendle. 2016. "Citation Analysis with Microsoft Academic." arXiv Preprint arXiv:1609.05354.https://arxiv.org/abs/1609.05354. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 42 Huistra, Hieke, and Bram Mellink. 2016. "Phrasing History: Selecting Sources in Digital Repositories." Historical Methods: A Journal of Quantitative and Interdisciplinary History 49 (4): 220-229. https://doi.org/10.1093/llc/fqw002. Inger, Simon, and Tracy Gardner. 2016. "How Readers Discover Content in Scholarly Publications." Information Services & Use 36 (1): 81-97. https://doi.org/10.3233/ISU-160800. Jackson, Joab. 2010. "Google: 129 Million Different Books have been Published." PC World, August 6, 2010. http://www.pcworld.com/article/202803/google_129_million_different_books_have_been_pu blished.html. Jacsó, P. 2008. "Live Search Academic." Peter’s Digital Reference Shelf, April. Jacsó, Péter. 2011. "The Pros and Cons of Microsoft Academic Search from a Bibliometric Perspective." Online Information Review 35 (6): 983-997. https://doi.org/10.1108/14684521111210788. Jamali, Hamid R., and Majid Nabavi. 2015. "Open Access and Sources of Full-Text Articles in Google Scholar in Different Subject Fields." Scientometrics 105 (3): 1635-1651. https://doi.org/10.1007/s11192-015-1642-2. Johnson, Paula C., and Jennifer E. Simonsen. 2015. "Do Engineering Master's Students Know What They Don't Know?" Library Review 64 (1): 36-57. https://doi.org/10.1108/LR-05-2014-0052. Jones, Edgar. 2010. "Google Books as a General Research Collection." Library Resources & Technical Services 54 (2): 77-89. https://doi.org/10.5860/lrts.54n2.77. Karlsson, Niklas. 2014. "The Crossroads of Academic Electronic Availability: How Well does Google Scholar Measure Up Against a University-Based Metadata System in 2014?" Current Science 107 (10): 1661-1665. http://www.currentscience.ac.in/Volumes/107/10/1661.pdf. Kemman, Max, Martijn Kleppe, and Stef Scagliola. 2013. "Just Google It-Digital Research Practices of Humanities Scholars." arXiv Preprint arXiv:1309.2434. https://arxiv.org/abs/1309.2434. Khabsa, Madian, and C. Lee Giles. 2014. "The Number of Scholarly Documents on the Public Web." PloS One 9 (5): https://doi.org/10.1371/journal.pone.0093949 Kirkwood Jr., Hal, and Monica C. Kirkwood. 2011. "Historical Research." Online 35 (4): 28-32. Koler-Povh, Teja, Primož Južnic, and Goran Turk. 2014. "Impact of Open Access on Citation of Scholarly Publications in the Field of Civil Engineering." Scientometrics 98 (2): 1033-1045. https://doi.org/10.1007/s11192-013-1101-x. Kousha, Kayvan, Mike Thelwall, and Somayeh Rezaie. 2011. "Assessing the Citation Impact of Books: The Role of Google Books, Google Scholar, and Scopus." Journal of the American Society INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 43 for Information Science and Technology 62 (11): 2147-2164. https://doi.org/10.1002/asi.21608. Kousha, Kayvan, and Mike Thelwall. 2017. "Are Wikipedia Citations Important Evidence of the Impact of Scholarly Articles and Books?" Journal of the Association for Information Science and Technology. 68(3):762-779. https://doi.org/10.1002/asi.23694. Kousha, Kayvan, and Mike Thelwall. 2015. "An Automatic Method for Extracting Citations from Google Books." Journal of the Association for Information Science & Technology 66 (2): 309- 320. https://doi.org/10.1002/asi.23170. Lee, Jongwook, Gary Burnett, Micah Vandegrift, Hoon Baeg Jung, and Richard Morris. 2015. "Availability and Accessibility in an Open Access Institutional Repository: A Case Study." Information Research 20 (1): 334-349. Levay, Paul, Nicola Ainsworth, Rachel Kettle, and Antony Morgan. 2016. "Identifying Evidence for Public Health Guidance: A Comparison of Citation Searching with Web of Science and Google Scholar." Research Synthesis Methods 7 (1): 34-45. https://doi.org/10.1002/jrsm.1158. Levy, Steven. "Making the World’s Problem Solvers 10% More Efficient." Backchannel. Last modified October 17, 2014, accessed January 14, 2016, https://medium.com/backchannel/the-gentleman-who-made-scholar-d71289d9a82d. Los Angeles Times. 2016. "Google, Books and 'Fair Use'." Los Angeles Times, April 19, 2016. http://www.latimes.com/opinion/editorials/la-ed-google-book-search-20160419-story.html Martin, Kim, and Anabel Quan-Haase. 2016. "The Role of Agency in Historians’ Experiences of Serendipity in Physical and Digital Information Environments." Journal of Documentation 72 (6): 1008-1026. https://doi.org/10.1108/JD-11-2015-0144. Martín-Martín, Alberto, Juan Manuel Ayllón, Enrique Orduña-Malea, and Emilio Delgado López- Cózar. 2016a. "2016 Google Scholar Metrics Released: A Matter of Languages... and Something Else." arXiv Preprint arXiv:1607.06260. https://arxiv.org/abs/1607.06260. Martín-Martín, Alberto, Enrique Orduña-Malea, Juan M. Ayllón, and Emilio Delgado López-Cózar. 2016b. "The Counting House: Measuring those Who Count. Presence of Bibliometrics, Scientometrics, Informetrics, Webometrics and Altmetrics in the Google Scholar Citations, ResearcherID, ResearchGate, Mendeley & Twitter." arXiv Preprint arXiv:1602.02412. https://arxiv.org/abs/1602.02412. Martín-Martín, Alberto, Enrique Orduña-Malea, Juan Manuel Ayllón, and Emilio Delgado López- Cózar. 2014. "Does Google Scholar Contain All Highly Cited Documents (1950-2013)?" arXiv Preprint arXiv:1410.8464. https://arxiv.org/abs/1410.8464. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 44 Martín-Martín, Alberto, Enrique Orduña-Malea, Juan Ayllón, and Emilio Delgado López-Cózar. 2016c. "Back to the Past: On the Shoulders of an Academic Search Engine Giant." Scientometrics 107 (3): 1477-1487. https://doi.org/10.1007/s11192-016-1917-2. Martín-Martín, Alberto, Enrique Orduña-Malea, Anne-Wil Harzing, and Emilio Delgado López- Cózar. 2017. "Can we Use Google Scholar to Identify Highly-Cited Documents?" Journal of Informetrics 11 (1): 152-163. https://doi.org/10.1016/j.joi.2016.11.008. Mays, Dorothy A. 2015. "Google Books: Far More Than Just Books." Public Libraries 54 (5): 23-26. http://publiclibrariesonline.org/2015/10/far-more-than-just-books/ Meier, John J., and Thomas W. Conkling. 2008. "Google Scholar’s Coverage of the Engineering Literature: An Empirical Study." The Journal of Academic Librarianship 34 (3): 196-201. https://doi.org/10.1016/j.acalib.2008.03.002. Moed, Henk F., Judit Bar-Ilan, and Gali Halevi. 2016. "A New Methodology for Comparing Google Scholar and Scopus." arXiv Preprint arXiv:1512.05741.https://arxiv.org/abs/1512.05741. Namei, Elizabeth, and Christal A. Young. 2015. "Measuring our Relevancy: Comparing Results in a Web-Scale Discovery Tool, Google & Google Scholar." Paper presented at the Association of College and Research Libraries annual conference, March 25-27, Portland, OR. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/Namei_Young.pdf National Institute for Health and Care Excellence (NICE). "Developing NICE Guidelines: The Manual." Last modified April 2016, accessed November 27, 2016. https://www.nice.org.uk/process/pmg20. Neuhaus, Chris, Ellen Neuhaus, Alan Asher, and Clint Wrede. 2006. "The Depth and Breadth of Google Scholar: An Empirical Study." portal: Libraries and the Academy 6 (2): 127-141. https://doi.org/10.1353/pla.2006.0026. Obrien, Patrick, Kenning Arlitsch, Leila Sterman, Jeff Mixter, Jonathan Wheeler, and Susan Borda. 2016. "Undercounting File Downloads from Institutional Repositories." Journal of Library Administration 56 (7): 854-874. https://doi.org/10.1080/01930826.2016.1216224. Orduña-Malea, Enrique, and Emilio Delgado López-Cózar. 2014. "Google Scholar Metrics Evolution: An Analysis According to Languages." Scientometrics 98 (3): 2353-2367. https://doi.org/10.1007/s11192-013-1164-8. Orduña-Malea, Enrique, and Emilio Delgado López-Cózar. 2015. "The Dark Side of Open Access in Google and Google Scholar: The Case of Latin-American Repositories." Scientometrics 102 (1): 829-846. https://doi.org/10.1007/s11192-014-1369-5. Orduña-Malea, Enrique, Alberto Martín-Martín, Juan M. Ayllon, and Emilio Delgado López-Cózar. 2014. "The Silent Fading of an Academic Search Engine: The Case of Microsoft Academic INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 45 Search." Online Information Review 38(7):936-953. https://doi.org/10.1108/OIR-07-2014- 0169. Ortega, José Luis. 2015. "Relationship between Altmetric and Bibliometric Indicators Across Academic Social Sites: The Case of CSIC's Members." Journal of Informetrics 9 (1): 39-49. https://doi.org/10.1016/j.joi.2014.11.004. Ortega, José Luis, and Isidro F. Aguillo. 2014. "Microsoft Academic Search and Google Scholar Citations: Comparative Analysis of Author Profiles." Journal of the Association for Information Science & Technology 65 (6): 1149-1156. https://doi.org/10.1002/asi.23036. Pitol, Scott P., and Sandra L. De Groote. 2014. "Google Scholar Versions: Do More Versions of an Article Mean Greater Impact?" Library Hi Tech 32 (4): 594-611. https://doi.org/0.1108/LHT- 05-2014-0039. Prins, Ad A. M., Rodrigo Costas, Thed N. van Leeuwen, and Paul F. Wouters. 2016. "Using Google Scholar in Research Evaluation of Humanities and Social Science Programs: A Comparison with Web of Science Data." Research Evaluation 25 (3): 264-270. https://doi.org/10.1093/reseval/rvv049. Quint, Barbara. 2016. "Find and Fetch: Completing the Course." Information Today 33 (3): 17-17. Rothfus, Melissa, Ingrid S. Sketris, Robyn Traynor, Melissa Helwig, and Samuel A. Stewart. 2016. "Measuring Knowledge Translation Uptake using Citation Metrics: A Case Study of a Pan- Canadian Network of Pharmacoepidemiology Researchers." Science & Technology Libraries 35 (3): 228-240. https://doi.org/10.1080/0194262X.2016.1192008. Ruppel, Margie. 2009. "Google Scholar, Social Work Abstracts (EBSCO), and PsycINFO (EBSCO)." Charleston Advisor 10 (3): 5-11. Shultz, M. 2007. "Comparing Test Searches in PubMed and Google Scholar." Journal of the Medical Library Association : JMLA 95 (4): 442-445. https://doi.org/10.3163/1536-5050.95.4.442. Stansfield, Claire, Kelly Dickson, and Mukdarut Bangpan. 2016. "Exploring Issues in the Conduct of Website Searching and Other Online Sources for Systematic Reviews: How Can We be Systematic?" Systematic Reviews 5 (1): 191. https://doi.org/10.1186/s13643-016-0371-9. Ştirbu, Simona, Paul Thirion, Serge Schmitz, Gentiane Haesbroeck, and Ninfa Greco. 2015. "The Utility of Google Scholar when Searching Geographical Literature: Comparison with Three Commercial Bibliographic Databases." The Journal of Academic Librarianship 41 (3): 322-329. https://doi.org/10.1016/j.acalib.2015.02.013. Suiter, Amy M., and Heather Lea Moulaison. 2015. "Supporting Scholars: An Analysis of Academic Library Websites' Documentation on Metrics and Impact." The Journal of Academic Librarianship 41 (6): 814-820. https://doi.org/10.1016/j.acalib.2015.09.004. AN EVIDENCE-BASED REVIEW OF ACADEMIC WEB SEARCH ENGINES, 2014-2016| FAGAN | https://doi.org/10.6017/ital.v36i2.9718 46 Szpiech, Ryan. 2014. "Cracking the Code: Reflections on Manuscripts in the Age of Digital Books." Digital Philology: A Journal of Medieval Cultures 3(1): 75-100. https://doi.org/10.1353/dph.2014.0010. Testa, Matthew. 2016. "Availability and Discoverability of Open-Access Journals in Music." Music Reference Services Quarterly 19 (1): 1-17. https://doi.org/10.1080/10588167.2016.1130386. Thelwall, Mike, and Kayvan Kousha. 2015b. "Web Indicators for Research Evaluation. Part 1: Citations and Links to Academic Articles from the Web." El Profesional De La Información 24 (5): 587-606.https://doi.org/10.3145/epi.2015.sep.08. Thielen, Frederick W., Ghislaine van Mastrigt, L. T. Burgers, Wichor M. Bramer, Marian H. J. M. Majoie, Sylvia M. A. A. Evers, and Jos Kleijnen. 2016. "How to Prepare a Systematic Review of Economic Evaluations for Clinical Practice Guidelines: Database Selection and Search Strategy Development (Part 2/3)." Expert Review of Pharmacoeconomics & Outcomes Research: 1-17. https://doi.org/10.1080/14737167.2016.1246962. Trapp, Jamie. 2016. "Web of Science, Scopus, and Google Scholar Citation Rates: A Case Study of Medical Physics and Biomedical Engineering: What Gets Cited and What Doesn't?" Australasian Physical & Engineering Sciences in Medicine. 39(4): 817-823. https://doi.org/10.1007/s13246-016-0478-2. Van Noorden, R. 2014. "Online Collaboration: Scientists and the Social Network." Nature 512 (7513): 126-129. https://doi.org/10.1038/512126a. Varshney, Lav R. 2012. "The Google Effect in Doctoral Theses." Scientometrics 92 (3): 785-793. https://doi.org/10.1007/s11192-012-0654-4. Verstak, Alex, Anurag Acharya, Helder Suzuki, Sean Henderson, Mikhail Iakhiaev, Cliff Chiung Yu Lin, and Namit Shetty. 2014. "On the Shoulders of Giants: The Growing Impact of Older Articles." arXiv Preprint arXiv:1411.0275. https://arxiv.org/abs/1411.0275. Walsh, Andrew. 2015. "Beyond "Good" and "Bad": Google as a Crucial Component of Information Literacy." In The Complete Guide to Using Google in Libraries, edited by Carol Smallwood, 3-12. New York: Rowman & Littlefield. Waltman, Ludo. 2016. "A Review of the Literature on Citation Impact Indicators." Journal of Informetrics 10 (2): 365-391. https://doi.org/10.1016/j.joi.2016.02.007. Ward, Judit, William Bejarano, and Anikó Dudás. 2015. "Scholarly Social Media Profiles and Libraries: A Review." Liber Quarterly 24 (4): 174–204.https://doi.org/10.18352/lq.9958. Weideman, Melius. 2015. "ETD Visibility: A Study on the Exposure of Indian ETDs to the Google Scholar Crawler." Paper presented at ETD 2015: 18th International Symposium on Electronic Theses and Dissertations, New Delhi, India, November 4-6. http://www.web- INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 47 visibility.co.za/0168-conference-paper-2015-weideman-etd-theses-dissertation-india-google- scholar-crawler.pdf. Weiss, Andrew. 2016. "Examining Massive Digital Libraries (MDLs) and their Impact on Reference Services." Reference Librarian 57 (4): 286-306. https://doi.org/10.1080/02763877.2016.1145614. Whitmer, Susan. 2015. "Google Books: Shamed by Snobs, a Resource for the Rest of Us." In The Complete Guide to using Google in Libraries, edited by Carol Smallwood, 241-250. New York: Rowman & Littlefield. Wildgaard, Lorna. 2015. "A Comparison of 17 Author-Level Bibliometric Indicators for Researchers in Astronomy, Environmental Science, Philosophy and Public Health in Web of Science and Google Scholar." Scientometrics 104 (3): 873-906. https://doi.org/10.1007/s11192-015-1608-4. Winter, Joost, Amir Zadpoor, and Dimitra Dodou. 2014. "The Expansion of Google Scholar Versus Web of Science: A Longitudinal Study." Scientometrics 98 (2): 1547-1565. https://doi.org/10.1007/s11192-013-1089-2. Wu, Tim. 2015. "Whatever Happened to Google Books?" The New Yorker, September 11, 2015. Wu, Ming-der, and Shih-chuan Chen. 2014. "Graduate Students Appreciate Google Scholar, but Still Find use for Libraries." Electronic Library 32 (3): 375-389. https://doi.org/10.1108/EL-08- 2012-0102. Yang, Le. 2016. "Making Search Engines Notice: An Exploratory Study on Discoverability of DSpace Metadata and PDF Files." Journal of Web Librarianship 10 (3): 147-160. https://doi.org/10.1080/19322909.2016.1172539. 9720 ---- Microsoft Word - Author_Edits_March_ITAL_Rebmannproof_Edits.docx TV White Spaces in Public Libraries: A Primer Kristen Radsliff Rebmann, Emmanuel Edward Te, and Donald Means INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 36 ABSTRACT TV White Space (TVWS) represents one new wireless communication technology that has the potential to improve internet access and inclusion. This primer describes TVWS technology as a viable, long-term access solution for the benefit of public libraries and their communities, especially for underserved populations. Discussion focuses first on providing a brief overview of the digital divide and the emerging role of public libraries as internet access providers. Next, a basic description of TVWS and its features is provided, focusing on key aspects of the technology relevant to libraries as community anchor institutions. Several TVWS implementations are described with discussion of TVWS implementations in several public libraries. Finally, consideration is given to first steps that library organizations must take when contemplating new TVWS implementations supportive of Wi- Fi applications and crisis response planning. INTRODUCTION Tens of millions of people rely wholly or in part on libraries to provide access to the Internet. Many lack access to the Federal Communications Commission (FCC) recommended standard of 25 Mbps (megabits per second) download speed and 3 Mbps upload speed.1 Though the FCC reclassified high-speed Internet as a public utility under Title II of the Telecommunications Act to ensure that broadband networks are “fast, fair, and open” in 2015,2 the “digital divide” still remains. One in four community members does not have access to the Internet at home. Accounting for age and education level, households with the lowest median income households have service adoption rates of around 50%, compared to those with higher incomes, with rates of 80 to 90%.3 A recent Pew Research Center survey on home broadband adoption found that 43% of those surveyed reported cost being their main reason for non-adoption.4 Individuals with low quality or no access are more likely to be digitally disadvantaged, tend to use library computers more frequently, and are less equipped to interact and compete economically as more services and application processes move online.5 Kristen Radsliff Rebmann (Kristen.rebmann@sjsu.edu) is Associate Professor, San Jose State University School of Information, San Jose, CA. Emmanuel Edward Te (emmanueledward.te@sjsu.edu) is a graduate student, San Jose State University School of Information, San Jose, CA. Donald Means (don@digitalvillage.com) is co-founder and principal of Digital Village Associates, Sausalito, CA. TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 37 This article highlights TV White Space (TVWS), a new wireless communication technology with the potential to assist libraries in addressing digital access and inclusion issues. This primer provides first a brief overview of the digital divide and the emerging role of public libraries as internet access providers, highlighting the need for cost-efficient, technological solutions. We go further to provide a basic description of TVWS and its features, focusing on key aspects of the technology relevant to libraries as community anchor institutions. Several TVWS implementations are described with discussion of how TVWS was set up in several public libraries. Finally, we extend consideration to first steps library organizations must consider when contemplating new implementations including everyday applications and crisis response planning. Digital Access and Inclusion The term “digital divide” describes the gap between people who can easily access and use technology and the internet, and those who cannot.6 As Kinney observes, “there has not been one single digital divide, but rather a series of divides that attend each new technology.”7 Digital divides are exacerbated by various factors including: socioeconomic status, education, geography, age, ability, language, and especially availability and quality.8 In recent years, the language describing this issue has changed, but the inequalities stay consistent and widen among different dimensions with each emerging technology. The most recent public policy term “digital inclusion” promotes digital literacy efforts for unserved and underserved populations.9 The progression from the term “digital divide” to “digital inclusion” represents a shift in focus from issues of access exclusively toward contexts and quality of participation and usage. Along these lines, the language of digital inclusion reframes the issue by making visible that simply focusing on internet access can obscure the fact that divides associated with quality and effectiveness remain.10 In response to the digital divide, public libraries have become the “unofficial” providers of internet access, stemming from libraries’ access to broadband infrastructure, maintenance of publicly- available computers, and services providing assistance and training.11 A Pew Research Center survey on perceptions of libraries found that most respondents reported viewing public libraries as important parts of their communities, providing resources and assisting in decisions regarding what information to trust.12 However, many public libraries are facing an “infrastructure plateau” of internet access due to few computer workstations and slower broadband connection speeds that can support a growing number of users,13 on top of insufficient funding, physical space, and staffing.14 Previous surveys show that although public libraries are connected to the internet and provide public access workstations and wireless access, nearly 50% of public libraries only offer wireless access that shares the same bandwidth as their workstations.15 This increased usage strains existing network connections and infrastructure, resulting in slower connections for everyone connected to the public library’s network. Many public libraries cannot accommodate more workstations, support the power requirements of both workstations and patrons’ laptops, and afford workstation upgrades and bandwidth increases to move past their insufficient connectivity speeds. Libraries often lack the IT skills, time, and funds to upgrade their INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 38 infrastructure.16 Typical wireless access via Wi-Fi is relegated to distances within library buildings, which may extend to exterior spaces and is available only during operating hours. Despite these challenges, public libraries continually provide access and “at-the-point-of-need” training and support for their patrons, especially for those who do not have easy access to the internet and computers.17 Subsidized by federal funding, libraries represent key access providers and technology trainers for the public without internet access.18 The FCC classifies libraries as “community anchor institutions” (CAIs), organizations that “facilitate greater use of broadband by vulnerable populations, including low-income, the unemployed, and the aged.”19 Recent surveys show that users have a positive view of libraries, providing opportunities to spend time in a safe space, pursue learning, and promote a sense of community. Librarians offer internet skills training programs more often than other community organizations though (at around 75% of the time) training occurs informally.20 In particular, 29% of respondents to a library use survey reported going to libraries to use computers, the internet, or the Wi-Fi network; 7% have also reported using libraries’ Wi-Fi signals outside when libraries are closed.21 The majority of these users are more likely to be young, black, female, and lower income, utilizing library technology resources for school or work (61%), checking email or sending texts (53%), finding health information (38%), and taking online courses or completing certifications (26%).22 Public libraries are already exploring creative approaches to providing internet access for these underserved communities. The mobile hotspot lending program in public library systems in New York City and Kansas City are just two examples.23 Yet libraries must do more by supporting innovation and providing leadership by partnering with other community organizations and their stakeholders to enhance resilience in addressing access and inclusion. The emergence of TVWS wireless technology presents an opportunity for libraries to explore expanding the reach of their wireless signals beyond library buildings and extend 24/7 library Wi-Fi availability to community spaces such as subsidized housing, schools, clinics, parks, senior centers, and museums. TVWS Basics TV Whitespace (TVWS) refers to the unoccupied portions of spectrum in the VHF/UHF terrestrial television frequency bands.24 Television broadcast frequency allocations traditionally assumed that TV station transmissions operating at high power needed wide spectrum separation to prevent interference between broadcasting channels, which led to the specific spectrum allocation of these frequency “guard bands.”25 Research discovered that low-power devices can operate within these spaces, which led the Federal Communications Commission (FCC) to field test TVWS applications to wireless communications and (ultimately) promote TVWS neutrality.26 In 2015, the Federal Communications Commission (FCC) made a portion of these very valuable TVWS bands of spectrum available for open, shared public use, like Wi-Fi. Yet, unlike Wi-Fi, with a reach measured in 10s of meters, the range of TVWS is measured in 100s or even 1000s of meters. TVWS has good propagation characteristics, which makes it an extremely valuable license-exempt radio spectrum.27 It is a relatively stable frequency that does not change over time, allowing for TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 39 spectrum availability estimates to remain reliable and valid, which in turn promotes its various applications.28 Radio spectrum is considered a “common heritage of humanity,”29 as radio waves “do not respect national borders.”30 The FCC recently made a portion of these TVWS bands of spectrum available for open, shared public use.31 TVWS availability and application are contextual and dependent on many key factors. Availability is influenced by frequency (the idle channels purposely planned in TV bands, varying across regions), deployment (the height and location of the TVWS transmit antenna and its installation sites in relation to nearby surrounding TV broadcasting reception), space and distance (geographical areas outside the current planned TV coverage, including no present broadcasting signals), and time (off-air availability of licensed broadcasting transmitters during specific periods of time, subject to change by the broadcaster).32 As TVWS existed as fragmented “safety margins” between broadcast services, TVWS is typically more abundant in rural areas that have less broadcast coverage and in larger contiguous blocks rather than in highly dense urban areas.33 Assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or non- exclusive sharing can alleviate pressure on these resources.34 This “spectrum crunch” of the inefficient use of scarce spectrum resources can be alleviated with dynamic spectrum access (DSA) and spectrum sharing. TVWS availability is small where digital television has been deployed, with the potentials for aggregate interference (from TVWS users in relation to primary TV service) and self-interference (within the TVWS network), which may lead to a “mismatch situation” where there is high demand for bandwidth but very low TVWS bandwidth supply.35 As most spectrum frequencies have been organized through some form of exclusive access in which only the licensee can use the specific spectrum, technologies such as cognitive radios can enable new modes of spectrum access, supporting autonomous, self-configuring, self-planning networks which rely on up-to-date TVWS availability databases. The limited distribution (in many areas) of basic broadband infrastructure and relatively high cost of access often prevents individuals with lower incomes from participating in the digital revolution of information access and its opportunities.36 Despite these challenges to broadband availability, TVWS excels in areas with low broadband coverage. Rural regions possess greater frequency availability due to lower density of spectrum licensing. In comparison to other frequencies operating higher up on the spectrum band, TVWS does not require direct line-of-sight between devices for operation, and has lower deployment costs. Equipment market costs are comparable to Wi-Fi equipment currently on the market.37 Importantly, TVWS can address access and inclusion by having relatively low start-up costs and no ongoing services fees. As a public resource, it can work with existing services to create new, potentially mobile connections to the internet that ensure the continuation of vital services in the event of service interruptions.38 In urban areas with fewer channels available, new efficient spectrum sharing policies will be necessary. Assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or non-exclusive sharing or “recycling” of bands for more INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 40 effective spectrum use by multiple parties with changing spectrum needs can alleviate pressure on these resources.39 TVWS for Public Libraries TVWS is a viable medium for applications from internet access, content distribution within a given location, tracking (people, animals, and assets), task automation, and public safety and security,40 as well as remote patient monitoring and other telemedicine applications.41 TVWS complements existing networks that use other parts of the spectrum for access points, mobile communications, and home media devices.42 Analyses of a recent digital inclusion survey suggest that technology upgrades can have significant impact on the ability of libraries to expand programs and services.43 As community anchor institutions (CAIs), public libraries can use TVWS systems to expand and improve access to their services for their users, especially for underserved populations. Library-led collaborations to deploy TVWS networks in other CAIs and public spaces have numerous benefits. In conjunction with building-centered Wi-Fi, TVWS can redistribute network users from congested library spaces to other community sites, thereby distributing network usage across the community. From an existing broadband connection, libraries can extend their networks of internet access strategically across their communities. Yet, unlike networks which solely use limited-range Wi-Fi, far-reaching TVWS can improve the coverage and inclusion of patrons in accessing library programs, services and the broader internet.44 The portability of the access points allows libraries to extend their reach by providing wireless connections in the short- term, for cultural or civic events like fairs, markets, or concerts, and in the long-term, for use at popular public areas. Recent TVWS pilot installations have proven to be very stable in Kansas, Colorado, Mississippi, and Delaware. Manhattan Public Library (Kansas)’s TVWS project began in fall 2013. Though there were a few delays in the installation and testing process, the TVWS equipment was successfully implemented and welcomed by the community in early 2014. IT staff report that their remote locations have shown that this library service fills a community need, especially for underserved populations.45 Delta County Libraries (Colorado) are conducting trials with two public hotspots to support “Guest” access and potentially provide library patrons with more bandwidth access.46 TVWS implementations in the Pascagoula School District (Mississippi)47 and Delaware Public Libraries48 show successful initial pilot usage in providing wireless internet service directly to community-distributed access points. Though there are contextual differences across these sites, the strength of public libraries as CAIs providing internet access via TVWS systems is evident and promising. First Steps Any library can take the initiative in setting up a TVWS network on its own. The first step is to assess availability of spectrum in the library’s geographic location. Access to TVWS frequencies is free and requires no subscription fees other than the initial equipment investment. Public TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 41 databases of TVWS availability are easily accessible and have been tested by the FCC since 2011;49 Google also has posted its own spectrum database as well.50 From this setup, the library gains access to public TVWS frequencies by which they can broadcast and receive internet connections from paired TVWS-enabled remote hotspots. Once it is determined that there is available spectrum/channels in the desired area, libraries can then explore how their current broadband and wireless connections might be expanded to include several community spaces where internet access is needed. Next, the library works with a TVWS equipment supplier to design and install a TVWS network consisting of a base station that is integrated with their wired connection to the internet. Finally, the library places TVWS-enabled remote hotspots in (previously identified) community-based spaces where Wi-Fi access is needed by underserved populations. Given a high quality backhaul (i.e., fiber optic cable high speed connection), TVWS can spread that signal and provide access from the library, which is able to propagate and penetrate multiple barriers and geographical features with a signal up to 10 times stronger than current Wi-Fi. Depending on the context (geographical features, TVWS availabilities, etc.), hotspots can be installed up to six miles (10 km) away and do not require line-of-sight between the base station and hotspots. This ability is superior to current Wi-Fi networks that only cover patrons in the immediate vicinity of the library. These TVWS remote hotspots also can be easily (and strategically) moved to support occasional community needs (such as neighborhood-wide or city events) or in response to crisis situations. TVWS, Libraries, and Emergency Response Public libraries provide leadership as “ready access point, first choice, first refuge, and last resort” for community services in everyday matters and in emergencies.51 They have assisted residents in relief efforts during Hurricanes Katrina and Rita, and other natural and man-made disasters.52 …the provision of access to computers and the internet was a wholly unique and immeasurably important role for public libraries… The infrastructure can be a tremendous asset in times of emergencies, and should be incorporated into community plans.53 They have likewise provided immediate and long-term assistance to communities and aid workers, providing physical space for recovery operations for emergency agencies, communication technologies, and emotional support for the community. In previous library internet usage surveys, nearly one-third of libraries reported that their computers and internet services would be used by the public in emergencies to access relief services and benefits.54 Such activities include finding and communicating with family and friends, completing online FEMA forms and insurance claims, and checking news sites regarding information of their affected homes.55 Yet, despite the admirable and successful efforts of many public libraries, their infrastructures are not always built to meet the increased demand of user needs and e-government services in emergency contexts.56 Jaeger, Shneiderman, Fleischmann , Preece, Qu, and Wu propose the concept of community response grids (CRGs), which utilize the internet and mobile communications devices so that emergency responders and residents in a disaster area can INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 42 communicate and coordinate accurate, appropriate responses.57 This concept relies on social networks, both in person and online, to enable residents and emergency responders to work together in a multi-directional communication scheme. CRGs provide residents tailored, localized information and a means to report pertinent disaster related information to emergency responders, who in turn can synthesize and analyze submitted information and act accordingly.58 Due to their existing role as community anchor institutions (CAIs), public libraries are uniquely positioned for CRG involvement. Libraries can assist in facilitation of internet access with portable TVWS network connection points. By virtue of their portability, TVWS hotspots can provide essential digital access in times of crisis by moving along with their affected populations. Emergency operations and communications in a crisis occur throughout networks comprised of various technologies. Information management before, during, and after a disaster affects how well a crisis is managed.59 Broadband internet can be one access route in the event that phone and radio transmissions are affected, and vice versa, as part of a “mixed media approach” to get messages to those that need it in an emergency.60 Yet one must remember that internet communications are double-edged: the internet provides relevant material on demand and near instant sharing and collaborating, but these very features can compound a crisis with misinformation.61 Despite these concerns, the potential of the integration of wireless devices and other technologies into a multi-technology, collaborative response system can solve the problem of existing communication structures that lack coordination and quality control.62 The proliferation of smartphones, laptops, and other portable wireless devices makes such technology ideal for emergency communications, especially in how users’ familiarity with their own devices will help them navigate CRG communications while under stress.63 CONCLUSION Supporting internet access and inclusion in public libraries and having equal, affordable, and available access to information is a necessary component to bridging the digital divide. Technology has become “an irreducible component of modern life, and its presence and use has significant impact on an individuals’ ability to fully engage in society.”64 As Cohron argues, this principle represents more than providing people with internet access: it is about “leveling the playing field in regards to information diffusion. The internet is such a prominent utility in peoples’ lives that we, as a society, cannot afford for citizens to go without.”65 Broadband access is the first step; digital literacy training is also a necessity. Access alone is not enough to ensure quality and effective use, however, as the digital divide is representative of broader social inequalities that computer and internet access cannot fully remedy.66 This is a complex problem that requires a multi-faceted solution. As Kinney states, “the digital divide is a moving target, and new divides open up with new technologies. Libraries help bridge some inequities more than others, and substantial disparities exist among library systems.”67 Internet access also becomes a necessity when the internet is to play a role in emergency communications.68 TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 43 It is problematic to suggest that public libraries can be simultaneously promoted as the solution to digital divide issues while facing cuts to funding. Policy makers, community advocates, and the community members themselves are stakeholders in the success of their communities, and must also take responsibility for access and inclusion via public libraries.69 As public agencies automate to increase equality and save money, they exacerbate digital divides by excluding those without access. Suggesting that community members simply visit the library to ensure access to public services places additional pressure on libraries, yet these efforts may go unsupported and unacknowledged. Public libraries are already valuable community access points to resources especially in emergencies, though many suffer from a lack of concerted disaster planning. Along similar lines, many libraries are ill-equipped to accommodate the bandwidth needs of growing and oftentimes sparsely connected populations. As communications and government services move increasingly online, it becomes imperative to build strong cost-effective information infrastructures. TVWS connections can arguably help in breaking down the barriers that challenge ubiquitous access and inclusion. TVWS-enabled remote access points in daily use around communities are ideally situated to provide everyday Wi-Fi and for rapid redeployment to damaged areas (as pop-up hotspots) to provide essential communication and information resources in times of crisis. In short, TVWS can augment the technological infrastructure of public libraries toward further developing their roles as CAIs and leaders serve their communities well into the future. REFERENCES 1. Wireline Competition Bureau, “2016 Broadband Progress Report,” Federal Communications Commission, January 29, 2016, https://www.fcc.gov/reports-research/reports/broadband- progress-reports/2016-broadband-progress-report. 2. Office of Chairman Wheeler, “FCC Adopts Strong, Sustainable Rules to Protect the Open Internet,” Federal Communications Commission, February 26, 2015, https://apps.fcc.gov/edocs_public/attachmatch/DOC-332260A1.pdf. 3. “Here's What the Digital Divide Looks Like in the United States,” The White House, July 15, 2015, https://www.whitehouse.gov/share/heres-what-digital-divide-looks-united-states. 4. John B. Horrigan and Maeve Duggan, “Home Broadband 2015,” Pew Research Center, December 21, 2015, http://www.pewInternet.org/files/2015/12/Broadband-adoption- full.pdf. This 43% is further divided between 33% reporting the monthly subscription cost as their main reason, while the other 10% report the expensive cost of a computer as their reason for non-adoption. 5. Bo Kinney, “The Internet, Public Libraries, and the Digital Divide,” Public Library Quarterly 29, no. 2 (2010): 104-161, https://doi.org/10.1080/01616841003779718. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 44 6. Madalyn Cohron, “The Continuing Digital Divide in the United States,” The Serials Librarian 69, no. 1 (2015): 77-86, https://doi.org/10.1080/0361526X.2015.1036195. 7. Kinney, “The Internet, Public Libraries, and the Digital Divide.” 8. Paul T. Jaeger, John Carlo Bertot, Kim M. Thompson, Sarah M. Katz, and Elizabeth J. DeCoster, “The Intersection of Public Policy and Public Access: Digital Divides, Digital Literacy, Digital Inclusion, and Public Libraries,” Public Library Quarterly 31, no.1 (2012): 1-20, https://doi.org/10.1080/01616846.2012.654728. 9. Brian Real, John Carlo Bertot, and Paul T. Jaeger, “Rural Public Libraries and Digital Inclusion: Issues and Challenges,” Information Technology and Libraries 33, no. 1 (2014): 6-24, https://doi.org/10.6017/ital.v33i1.5141. 10. Jaeger et al., “The Intersection of Public Policy and Public Access.” 11. John Carlo Bertot, Paul T. Jaeger, Lesley A. Langa, Charles R. McClure, “Public access computing and Internet access in public libraries: The role of public libraries in e-government and emergency situations,” First Monday 11, no. 9 (2006), https://doi.org/10.5210/fm.v11i9.1392. 12. John. B Horrigan, “Libraries 2016,” Pew Research Center, September 9. 2016, http://www.pewinternet.org/2016/09/09/libraries-2016/. 13. Real et al., “Rural Public Libraries and Digital Inclusion.” 14. John Carlo Bertot, Charles R. McClure, and Paul T. Jaeger, “The Impacts of Free Public Internet Access on Public Library Patrons and Communities,” Library Quarterly 78, no.3 (2008): 285- 301, https://doi.org/10.1086/588445. 15. Charles R. McClure, Paul T. Jaeger, John Carlo Bertot, “The Looming Infrastructure Plateau? Space, Funding, Connection Speed, and the Ability of Public Libraries to meet the Demand for Free Internet Access,” First Monday 12, no. 12 (2007): https://doi.org/10.5210/fm.v12i12.2017 . 16. Ibid. 17. Bertot et al., “Public access computing and Internet access in public libraries.” 18. Ibid.; Jaeger et al., “The Intersection of Public Policy and Public Access.” 19. Wireline Competition Bureau, “WCB Cost Model Virtual Workshop 2012 - Community Anchor Institutions,” Federal Communications Commission, June 1, 2012, https://www.fcc.gov/news- events/blog/2012/06/01/wcb-cost-model-virtual-workshop-2012-community-anchor- institutions. 20. Jennifer Koerber, "ALA and iPAC Analyze Digital Inclusion Survey," Library Journal 141, no. 1 (2016): 24-26. 21. Horrigan, “Libraries 2016.” TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 45 22. Ibid. 23. Timothy Inklebarger, “Bridging the tech gap,” American Libraries, September 11, 2015, https://americanlibrariesmagazine.org/2015/09/11/bridging-tech-gap-wi-fi-lending. 24. Andrew Stirling, “White spaces – the new Wi-Fi?,” International Journal of Digital Television 1, no. 1 (2010): 69–83, https://doi.org/10.1386/jdtv.1.1.69/1; Cristian Gomez, “TV White Spaces: Managing Spaces or Better Managing Inefficiencies?,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 67-77. 25. Steve Song, “Spectrum and Development,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 35-40. 26. Robert Horvitz, “Geo-Database Management of White Space vs. Open Spectrum,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 7-17. 27. Julie Knapp, “FCC Announces Public Testing of First Television White Spaces Database,” Federal Communications Commission, September 14, 2011, https://www.fcc.gov/news- events/blog/2011/09/14/fcc-announces-public-testing-first-television-white-spaces- database. 28. Horvitz, “Geo-Database Management of White Space vs. Open Spectrum.” 29. Ryszard Strużak and Dariusz Więcek, “Regulatory Issues for TV White Spaces,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 19-34. 30. Horvitz, “Geo-Database Management of White Space vs. Open Spectrum,” 8. 31. Engineering & Technology Bureau, “FCC Adopts Rules For Unlicensed Services In TV And 600 MHz Bands,” Federal Communications Commission, August 11, 2015, https://apps.fcc.gov/edocs_public/attachmatch/FCC-15-99A1_Rcd.pdf. 32. Gomez, “TV White Spaces: Managing Spaces or Better Managing Inefficiencies?,” 68. 33. Stirling, “White spaces – the new Wi-Fi?.” 34. Linda E. Doyle, “Cognitive Radio and Africa,” in TV White Spaces A Pragmatic Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 109-119. 35. Gomez, “TV White Spaces: Managing Spaces or Better Managing Inefficiencies?,” 72. 36. Mike Jensen, “The role of TV White Spaces and Dynamic Spectrum in helping to improve Internet access in Africa and other Developing Regions,” in TV White Spaces A Pragmatic INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2017 46 Approach, eds. Ermanno Pietrosemoli and Marco Zennaro (Trieste: Abdus Salam International Centre for Theoretical Physics T/ICT4D Lab, 2013), 83-89. 37. Song, “Spectrum and Development.” 38. Ibid. 39. Doyle, “Cognitive Radio and Africa,” 113. 40. Stirling, “White spaces – the new Wi-Fi?.” 41. Afton Chavez, Ryan Littman-Quinn, Kagiso Ndlovu, and Carrie L Kovarik, “Using TV white space spectrum to practice telemedicine: A promising technology to enhance broadband Internet connectivity within healthcare facilities in rural regions of developing countries,” Journal of Telemedicine and Telecare 22, no. 4 (2015): 260-263, https://doi.org/10.1177/1357633X15595324. 42. Stirling, “White spaces – the new Wi-Fi?.” 43. Koerber, "ALA and iPAC Analyze Digital Inclusion Survey." 44. Chavez et al., “Using TV white space spectrum to practice telemedicine.” 45. Kerry Ingersoll, June 22, 2015, Google+ comment to the Gigabit Libraries Network, https://plus.google.com/107631107756352079114/posts/L4Y8ci8sG5Y. 46. Delta County Libraries, “Super Wi-Fi Pilot,” accessed November 1, 2016, http://www.deltalibraries.org/super-wi-fi-pilot/. 47. Pascagoula TV White Spaces Facebook group, accessed November 1, 2016, https://www.facebook.com/PSDTVWS/. 48. “Delaware Libraries White Space Pilot Update, January 2015,” accessed November 1, 2016, http://lib.de.us/files/2015/01/Delaware-Libraries-White-Space-Pilot-Update-Jan-2015.pdf. 49. Knapp, “FCC Announces Public Testing of First Television White Spaces Database.” 50. See https://www.google.com/get/spectrumdatabase/. 51. Bertot et al., “Public access computing and Internet access in public libraries.” 52. Bertot et al., “The Impacts of Free Public Internet Access.” See also Horrigan, “Libraries 2016.” 53. Paul T. Jaeger, Lesley A. Langa, Charles R. McClure, and John Carlo Bertot, “The 2004 and 2005 Gulf Coast Hurricanes: Evolving Roles and Lessons Learned for Public Libraries in Disaster Preparedness and Community Services,” Public Library Quarterly 25, 3/4, (2007), 199-214. 54. Ibid. TV WHITE SPACES IN PUBLIC LIBRARIES: A PRIMER | REBMANN, TE, AND MEANS | https://doi.org/10.6017/ital.v36i1.9720 47 55. Bertot et al., “Public access computing and Internet access in public libraries.” 56. Ibid. 57. Paul T. Jaeger, Ben Shneiderman, Kenneth R. Fleischmann , Jennifer Preece, Yan Qu, Philip Fei Wu, “Community response grids: E-government, social networks, and effective emergency management,” Telecommunications Policy 31 (2007): 592-604, https://doi.org/10.1016/j.telpol.2007.07.008. 58. Ibid., 595. 59. Laurie Putnam, “By choice or by chance: How the Internet is used to prepare for, manage, and share information about emergencies,” First Monday 7, no.11 (2002), https://doi.org/10.5210/fm.v7i11.1007. 60. Ibid. 61. Ibid. 62. Jaeger et al., “Community response grids,” 598. Jaegar et al. describe how the Internet combines the best of one-to-one, one-to-many, many-to-one, and many-to-many in terms of the flow and quality of information. One-to-one communication is slow; many-to-one only benefits the central network, while outsiders reporting emergencies do not learn what others are reporting; one-to-many is inefficient, limited, and assumes the broadcaster has the appropriate information and can get it to those that need it most; many-to-many can create “information overload” of questionable content. 63. Ibid., 599. 64. Jaeger et al., “The Intersection of Public Policy and Public Access,” 3. 65. Cohron, “The Continuing Digital Divide in the United States,” 84. 66. Kinney, “The Internet, Public Libraries, and the Digital Divide,” 120. 67. Ibid., 148. 68. Jaeger et al., “Community response grids,” 599. 69. Bertot et al., “The Impacts of Free Public Internet Access,” 299. 9733 ---- Microsoft Word - 9733-16966-4-CE.docx Editorial Board Thoughts: Arts into Science, Technology, Engineering, and Mathematics – STEAM, Creative Abrasion, and the Opportunity in Libraries Today Tod Colegrove INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 4 Over the millennia, man’s attempt to understand the universe has been an evolution from the broad to the sharply focused. A wide range of distinctly separate disciplines evolved from the overarching natural philosophy, the study of nature, of Greco-Roman antiquity: anatomy and astronomy through botany, mathematics, and zoology among many others. Similarly, the Arts, Humanities, and Engineering developed from broad over-arching interest into tightly focused disciplines that today are distinctly separate. As these legitimate divisions formed, grew, and developed into ever-deepening specialty, they enabled correspondingly deeper study and discovery1; in response, the supporting collections of the library divided and grew to reflect that increasing complexity. Libraries have long been about the organization of, and access to, information resources. Subject classification systems in use today, such as the Dewey Decimal system, are designed to group like items with like, albeit under broad overarching topic. A perhaps inevitable result for print collections housed under such a classification system is the physical isolation of items - and, by extension, the individuals researching those topics - from one another. Under the Library of Congress system, for example, items categorized as “geography” are physically removed from those in “science;” further still from “technology.” End-users benefit from the possibility of serendipitous discovery while browsing shelves nearby, even as they are effectively shielded from exposure to distracting topics outside of their immediate focus. Recent years have witnessed a rediscovery of, and renewed interest in, the fundamental role the library can have in the creation of knowledge, learning, and innovation among its members. As collections shift from print to electronic, libraries are increasingly less bound to the physical constraints imposed by their print collections. Rather than a continued focus on hyper- specialization and separation, we have the opportunity to rethink the library: exploring novel configurations and services that might better support its community, and embracing emerging roles of trans-disciplinary collaboration and innovation. The Library as Intersection Libraries reflect the institutional and organizational structures of their communities, even as the Tod Colegrove (pcolegrove@unr.edu), a member of the ITAL Editorial Board, is Head of DeLaMare Science & Engineering Library, University of Nevada, Reno. EDITORIAL BOARD THOUGHTS | COLEGROVE https://doi.org/10.6017/ital.v36i1.9733 5 physical organization of the structures built to house print collections mirror the classification system in use. Academic libraries are perhaps most entrenched in the structural division: rather than intrinsically promoting collaboration and discovery across disciplines, the organization of print collections, and typically the spaces around them, is designed to foster increased focus and specialization. Specialized almost to the exclusion of other areas of study altogether, in branch libraries of a college or university this division can reach a pinnacle; libraries and collections devoted to exclusive topics of engineering, science, music, and others, exist on campuses across the country. Amplified by separation and clustering of faculty and researchers, typically by department and discipline, it becomes entirely possible for individuals to “spend a lifetime working in a particular narrow field and never come into contact with the wider context of his or her study.”2 The library is also one of the few places in any community where individuals from a variety of backgrounds and specialties can naturally cross paths with one another. At a college or university, students and faculty from one discipline might otherwise rarely encounter those from other disciplines. Whether public, school, or academic library, outside of the library individuals and groups are typically isolated from one another physically, with little opportunity to interact organically. Without active intervention and deliberate effort on the part of the library, opportunities for creative abrasion3 and trans-disciplinary collaboration become virtually non- existent; its potential to “unleash the creative potential that is latent in a collection of unlike- minded individuals,”4 untapped. Leveraged properly, however, the intersection of interests and expertise that occurs naturally within the neutral spaces of the library can become a powerful tool that supports not only research, but creativity and innovation - a place where ideas and viewpoints can collide, building on one another: “For most of us, the best chance to innovate lies at the Intersection. Not only do we have a greater chance of finding remarkable idea combinations there, we will also find many more of them.... The explosion of remarkable ideas is what happened in Florence during the Renaissance, and it suggests something very important. If we can just reach an intersection of disciplines or cultures, we will have a greater chance of innovating, simply because there are so many unusual ideas to go around.”5 Difficult and Scary The problem? “Stimulating creative abrasion is difficult and scary because we are far more comfortable being with folks like us.”6 And yet a quick review of the literature reveals that knowledge creation, innovation, and success are inextricably linked7, with the fundamental understanding of their connection having undergone a dramatic shift: “knowledge is in fact essential to innovate, and while this might sound obvious today, putting knowledge and innovation and not physical assets at the centre of competitive advantage was a tremendous change.”8 As our libraries move toward embracing an even more active role within our communities, our organizational priorities are undergoing similarly dramatic shifts: support for knowledge creation INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 6 and innovation becomes more central, even as physical assets shift toward a supporting, even peripheral, role. Libraries, as fundamentally neutral hubs of diverse communities, are uniquely positioned to be able to cultivate creative abrasion within and among their communities, fostering not only knowledge creation, but innovation and success. Indeed, the combination of physical, electronic, and staff assets can be the raw stuff by which trans-disciplinary engagement is encouraged. The active cultivation and support of creative abrasion, with direct linkage to desired outcomes, becomes arguably one of the most vital services the library can provide its community. Rather than deepening the cycle of hyper-specialization, the emergence of makerspace in our libraries is one example of a trend toward enabling libraries to broaden and embrace that support. Building on the intellectual diversity within the spaces of the library, staff members, volunteers, and fellow community members can serve as catalyst, triggering groups to “do something with that variety”9 by engaging across traditional boundaries. Indeed, “by deliberately creating diverse organizations and explicitly helping team members appreciate thinking-styles different than their own, creative abrasion can result in successful innovation.”10 Strategic placement and staff support of makerspace activity can dramatically increase the opportunity for creative abrasion - and, by extension, the resulting knowledge creation, creativity and innovation. Arts Bring a Fundamental Literacy and Resource to STEM In recent years, greater emphasis on students acquiring STEM (Science, Technology, Engineering, and Math) skills has raised the topic to be one of the most central issues in education. Considered a key solution to improving the competitiveness of American students on the global stage, the approach of STEM education shares the common goal of breaking down the artificial barriers that exist even within the separate disciplines of sciences, technology, engineering, and math - in short, increasing the diversity of the learning environment. Proponents of STEAM go further by suggesting that adding Art into the mix can bring new energy and language to the table, “sparking curiosity, experimentation, and the desire to discover the unknown in students.” 11 Federal agencies such as the U.S. Department of Education and the National Science Foundation have funded and underwritten a number of grants, conferences, and workshops in the field, including the seminal forum hosted by the Rhode Island School of Design (RISD), “Bridging STEM to STEAM: Developing New Frameworks for Art-Science-Design pedagogy.”12 John Maeda, the president of the RISD, identifies a direct connection between the approach and the creativity and success of late Apple co-founder Steve Jobs, with STEAM support “a pathway to enhance U.S. Economic competitiveness.”13 Proponents go further, arguing the Arts bring both a fundamental literacy and resource to the STEM disciplines, providing “innovations through analogies, models, skills, structures, techniques, methods, and knowledge.”14 Consider the findings of a study of Nobel Prize winners in the sciences, members of the Royal Society, and the U.S. National Academy of Sciences; Nobel laureates were: EDITORIAL BOARD THOUGHTS | COLEGROVE https://doi.org/10.6017/ital.v36i1.9733 7 - twenty-five times as likely as an average scientist to sing, dance, or act; - seventeen times as likely to be an artist; - twelve times more likely to write poetry and literature; - eight times more likely to do woodworking or some other craft; - four times as likely to be a musician; and - twice as likely to be a photographer.15 From the standpoint of creative abrasion, welcoming the “A” of Art into the library support of STEM disciplines increases the diversity of the library, and by default the opportunity for creative abrasion. From Aristotle and Pythagoras through Galileo Galilei and Leonardo da Vinci to Benjamin Franklin, Richard Feynman, and Noam Chomsky, a long list of individuals of wide- ranging genius hints at a potential left largely untapped by our traditional approach. Connections between STEM disciplines, Art, and the innovation arising directly out of their creative abrasion surround us: the electronic screens used on a wide range of technology, including computers, televisions, and cell phones, are the result of a collaboration between a series of painter-scientists and post-impressionist artists such as Seurat - a combination of red, green, and blue dots generate full-spectrum images in a way not unlike that of the artistic technique of pointillism. The electricity to drive that technology is understood, in part, due to early work by Franklin - even as he lay the foundations of the free public library with the opening of America’s first lending library, and pursued a broad range of parallel interests. The stitches used in medical surgery are the result of Nobel laureate Alexis Carrel taking his knowledge of lace making from a traditional arena into the operating room. Prominent American inventors “Samuel Morse (telegraph) and Robert Fulton (steam ship) were among the most prominent American artists before they turned to inventing.”16 In short, “increasing success in science is accompanied by developed ability in other fields such as the fine arts.”17 Rather than isolated in monastic study, “almost all Nobel laureates in the sciences are actively engaged in arts as adults.”18 Perhaps surprisingly, rather than being rewarded by an ever-increasing focus and hyper-specialization, genius in the sciences seems tied to individuals’ activity in the arts and crafts. The study’s authors cite three different Nobel prize winners, including J. H. Van’t Hoff’s 1878 speculation that scientific imagination is correlated with creative activities outside of science19; going on to detail similar findings from general studies dating back over a century. Of even more seminal interest, the authors point to a similar connection for adolescents/young adults where Milgram and colleagues20 found “having at least one persistent and intellectually stimulating hobby is a better predictor of career success in any discipline than IQ, standardized test scores, or grades.”21 Discussion The connection between individuals holding a multiplicity of interests, trans-disciplinary activity, and success is clear; what is less clear is to what extent we are fostering that connection in our libraries today. The potential is nevertheless tantalizing: a random group of people, thrown together, is not likely to be very creative. By going beyond specialization and wading into the INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 8 deeper waters of supporting and cultivating creative abrasion and avocation among the membership of our libraries, we are fostering success and innovation beyond what might otherwise occur. The decision to catalyze and foster the cross-curricular collaboration that is STEAM22 is squarely in the hands of the library: in the design of its spaces, and in the interactions of the staff of the library with the communities served. We can choose to actively connect and catalyze across traditional boundaries. As the head of a science and engineering library, one of the early adopters of makerspace and actively exploring the possibilities of STEAM engagement for several years, I have time and again witnessed the leaps of insight and creativity brought about by creative abrasion. From across disciplines members are engaging with the resources of the library - and, with our encouragement, one another - in an ever-increasing cycle of knowledge creation, innovation, and success. The impact is particularly dramatic among individuals from strongly differing backgrounds and disciplines: for example, when an engineering student, who considers themselves to be expert with a particular technology, witnesses and interacts with an art student using that same technology and accomplishing something truly unexpected, even seemingly magical. Or when a science student approaching a problem from one perspective realizes a practitioner from a different discipline sees the problem from an entirely different, and yet equally valid, point of view. In each case, it’s as if the worldview of each suddenly melts: shifting and expanding, never to return to its original shape. Transformative experiences become the order of the day, even as the informal environment offers a wealth of opportunity to engage with and connect end-users to the more traditional resources of library. By actively seeking out opportunities to bring art into traditionally STEM-focused activity, and vice-versa, we are deliberately increasing the diversity of the environment. Makerspace services and activities, to the extent they are open and visibly accessible to all, are a natural for the spontaneous development of trans-disciplinary collaboration. Within the spaces of the library, opportunities to connect individuals around shared avocational interest might range from music and spontaneous performance areas to spaces salted with LEGO bricks and jigsaw puzzles; the potential connections between our resources and the members of our communities are as diverse as their interests. Indeed, when a practitioner from one discipline can interact and engage with others from across the STEAM spectrum, the world becomes a richer place – and maybe, just maybe, we can fan the flames of curiosity along the way. REFERENCES 1. Bohm, D., and F. D. Peat. 1987. Science, Order, and Creativity: A Dramatic New Look at the Creative Roots of Science and Life. London: Bantam. 2. Ibid., 18-19. 3. Hirshberg, Jerry. 1998. The Creative Priority: Driving Innovative Business in the Real World. London: Penguin. EDITORIAL BOARD THOUGHTS | COLEGROVE https://doi.org/10.6017/ital.v36i1.9733 9 4. Leonard-Barton, Dorothy, and Walter C. Swap. 1999. When Sparks Fly: Harnessing the Power of Group Creativity. Boston, Massachusetts: Harvard Business School Press Books. 5. Johansson, Frans. 2004. The Medici Effect: Breakthrough Insights at the Intersection of Ideas, Concepts, and Cultures. Boston, Massachusetts: Harvard Business School Press, 20. 6. Leonard-Barton, Dorothy, and Walter C. Swap. 1999. When Sparks Fly: Harnessing the Power of Group Creativity. Boston, Massachusetts: Harvard Business School Press Books, 25. 7. Nonaka, Ikujiro. 1994. “A Dynamic Theory of Organizational Knowledge Creation.” Organization Science 5 (1): 14–37. 8. Correia de Sousa, Milton. 2006. “The Sustainable Innovation Engine.” Vine 36 (4): 398–405, accessed February 14, 2017. https://doi.org/10.1108/03055720610716656. 9. Leonard-Barton, Dorothy, and Walter C. Swap. 1999. When Sparks Fly: Harnessing the Power of Group Creativity. Boston, Massachusetts: Harvard Business School Press Books, 20. 10. Adams, Karlyn. 2005. The Sources of Innovation and Creativity. Education, September, 2005, 33. https://doi.org/10.1007/978-3-8349-9320-5. 11. Jolly, Anne. 2014. “Stem vs. STEAM: Do the Arts Belong?” Education Week Teacher. http://www.edweek.org/tm/articles/2014/11/18/ctq-jolly-stem-vs- steam.html?qs=stem+vs.+steam. 12. Rose, Christopher, and Brian K. Smith. 2011. “Bridging STEM to STEAM: Developing New Frameworks for Art-Science-Design Pedagogy.” Rhode Island School District Press Release. 13. Robelen, Erik W. 2011. “STEAM: Experts Make Case for Adding Arts to STEM.” Education Week. http://www.bmfenterprises.com/aep-arts/wp-content/uploads/2012/02/Ed-Week-STEM- to-STEAM.pdf. 14. Root-Bernstein, Robert. 2011. “The Art of Scientific and Technological Innovations – Art of Science Learning.” http://scienceblogs.com/art_of_science_learning/2011/04/11/the-art-of- scientific-and-tech-1/. 15. Ibid. 16. Ibid. 17. Root-Bernstein, Robert, Lindsay Allen, Leighanna Beach, Ragini Bhadula, Justin Fast, Chelsea Hosey, Benjamin Kremkow, et al. 2008. “Arts Foster Scientific Success: Avocations of Nobel, National Academy, Royal Society, and Sigma Xi Members.” Journal of Psychology of Science and Technology. https://doi.org/10.1891/1939-7054.1.2.51. INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 10 18. Ibid. 19. Van’t Hoff, Jacobus Henricus. 1967. “Imagination in Science,” Molecular Biology, Biochemistry and Biophysics, translated by G. F. Springer, 1, Springer-Verlag, pp. 1-18 20. Milgram, Roberta M., and Eunsook Hong. 1997. "Out-of-school activities in gifted adolescents as a predictor of vocational choice and work." Journal Of Secondary Gifted Education 8, no. 3: 111. Education Research Complete, EBSCOhost (accessed February 26, 2017). 21. Root-Bernstein, Robert, Lindsay Allen, Leighanna Beach, Ragini Bhadula, Justin Fast, Chelsea Hosey, Benjamin Kremkow, et al. 2008. “Arts Foster Scientific Success: Avocations of Nobel, National Academy, Royal Society, and Sigma Xi Members.” Journal of Psychology of Science and Technology. https://doi.org/10.1891/1939-7054.1.2.51. 22. Land, Michelle H. 2013. “Full STEAM Ahead: The Benefits of Integrating the Arts into STEM.” Procedia Computer Science 20. Elsevier Masson SAS: 547–52. https://doi.org/10.1016/j.procs.2013.09.317. 9750 ---- A Technology-Dependent Information Literacy Model within the Confines of a Limited Resources Environment Ibrahim Abunadi INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 119 Ibrahim Abunadi (i.abunadi@gmail.com) is an Assistant Professor, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia. ABSTRACT The purpose of this paper is to investigate information literacy as an increasingly evolving trend in computer education. A quantitative research design was implemented, and a longitudinal case study methodology was conducted to measure tendencies in information literacy skill development and to develop a practical information literacy model. It was found that both students and educators believe that the combination of information literacy with a learning management system is more effective in increasing information literacy and research skills where information resources are limited. Based on the quantitative study, a practical, technology-dependent information literacy model was developed and tested in a case study, resulting in fostering the information literacy skills of students who majored in information systems. These results are especially important in smaller universities with libraries having limited technology capabilities, located in developing countries. INTRODUCTION Many different challenges arise during a graduate’s career. Moreover, professional life can involve numerous situations and problems that university students are not prepared for during their college studies.1 The use of internet sources to find solutions to real problems depends on students’/graduates’ information literacy skills.2 A strong aid to students’ learning is the ability to search, analyze, and apply knowledge from different sources, including literature, databases, and the internet.3 One of the issues students face concerning technology is its continuous evolution. Although students learn survival skills in their professional lives, they also require special coping skills. A skill that should be considered for all technology-related courses is information literacy. Lin defines information literacy as a “set of abilities, skills, competencies, or fluencies, which enable people to access and utilize information resources.”4 These are part of the lifelong learning skills of students, which put the power of continuous education in their hands. Another issue is the exclusive allocation of the responsibility for information literacy skill development in smaller educational institutes to librarians or to instructors who majored in library science.5 This paper has taken another approach to information literacy skill development whereby specialized educators, such as capable information systems faculty members, facilitate this skill development. A learning management system (LMS) is a widely used form of technology for course delivery and the organization of subject material. Blackboard, Desire2Learn, Sakai, Moodle, and ANGEL, as common LMS platforms, provide an integrated guidance system to deliver and analyze learning. mailto:i.abunadi@gmail.com INFORMATION LITERACY MODEL | ABUNADI 120 https://doi.org/10.6017/ital.v37i4.9750 These systems can be used to support information literacy instruction. Standard features include assignments and quizzes, while other systems offer tools that allow students to view and comment on other students’ portfolios or work, depending on the LMS’s features.6 Before the 1990s, face-to- face learning was common within the educational domain. However, LMS emerged in the twenty- first century as the internet became a suitable alternative to traditional learning. Moodle, an open- source LMS, is an acronym that stands for “Modular Object-Oriented Dynamic Learning Environment.” This online education system is intended to make learning available with the necessary guidance for educators. Web services available through Moodle are based on a well- organized structural outline, and they are widely used to perform educational tasks and to analyze statistics helpful to instructors.7 Peter et al. (2015) presented an approach related to information literacy instruction in universities and colleges that combines traditional classroom instruction and online learning; this is known as “blended learning.”8 This involves only one seminar in the classroom; thus, it can replace traditional sessions at universities and colleges with education involving information literacy instructions. It has been recommended that a time-efficient method should be adopted by augmenting classroom seminars and literacy instructions through the addition of online materials. However, the findings of this study showed that students who only use online materials do not show greater progress in their learning than those who follow the blended approach. The results of another study by Jackson more effectively integrated educational services into learning management systems and library resources.9 Jackson suggested that better implementation was required, and recommended using Blackboard LMS to include information literacy and scaffolding activities into subject-specific courses. This study intends to determine the most effective method of information literacy education. It evaluates instructors’ and students’ perceptions of the effectiveness of traditional teaching in comparison to electronic teaching in information literacy. In this study, a quantitative research investigation was conducted with participants. A research model and questionnaire were developed for this purpose with three underlying latent variables. The participants were asked to describe their understanding of learning systems and their preferences in information literacy education. Their requirements varied with their continuing education levels and past educational activities, based on which software or website appeared to be more supportive and compatible with them.10 This study considered the research results, developed an information literacy intervention model and applied it to a case study. LITERATURE REVIEW Previously, educational institutions were limited to face-to-face teaching techniques or classroom- based teaching. Face-to-face teaching is the traditional method still used in most educational institutions. In classrooms, the subject is explained, and books or other paper-based materials are read out of class to enhance understanding.11 Face-to-face learning or teaching is limited by the number of physical resources available. Therefore, it becomes difficult to accommodate the widespread interest in information literacy through face-to-face learning.12 Gathering information using only physical resources can lead to information deficiencies.13 Education has evolved to benefit from advances in technologies by using LMS and online sources. The effective usage of LMS and online sources requires the development of information literacy. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 121 Information Literacy Information literacy includes technological literacy, information ethics, online library skills, and critical literacy.14 Technological literacy is defined as the ability to use common software, hardware, and internet tools to reach a specific goal. This ability is an important component of information literacy that enables a graduate to seek answers by using the internet and digital resources.15 Hauptman defines information ethics as “the production, dissemination, storage, retrieval, security, and application of information within an ethical context.”16 This skill is essential to preserve the original rights of researchers cited in a study, based on the ethical standards of the graduate conducting the study. Another important component of information literacy comprises online library skills, which can be defined as the ability to use online digital sources, including digital libraries, to effectively seek different knowledge resources by using search engines, correctly locating required information, and using online support when needed.17 Critical literacy is a thorough evaluation of online material that allows for the appropriate conclusion to be reached on the suitability of the material for the required investigation. 18 Seeking answers from appropriate sources is important to allow graduates to find and report on accurate and valid data. These components of information literacy enable information extraction from topics related to the desired course or field of research. Students, professors, instructors, employees, learners, and educational policy administrators are the major knowledge seekers who use information literacy skills.19 With improved online resources available for learning, many learning requirements are moving toward providing services that are exclusively online. 20 Gray and Montgomery studied an online information literacy course.21 They found that teaching with the aid of information literacy is helpful for students in obtaining improvised instruction. The authors also compared an online information literacy course and face-to-face instruction, focusing primarily on the behaviors and attitudes of teachers and college students toward the online course. The students agreed that the application of information literacy techniques would be particularly helpful to them in clarifying their understanding of complicated instructions. The teachers also indicated that an information literacy course would result in better regulation of academic processes than face-to-face learning. Dimopoulos et al. (2013) measured student performance within an online learning environment, finding that the online learning environment has direct relevance for the completion of challenging tasks within academic settings.22 The findings further indicated that an LMS could improve teaching activities. As an LMS, Moodle was also helpful for students to ensure their development of collaborative problem-solving skills. They concluded that Moodle includes different useful modules such as digital resource repositories, interactive Wikis, and external add- in tools that have been related to student learning when incorporated into the LMS environment, resulting in better performance. Hernández-García and Conde-González focused on learning analytical tools within engineering education, noting that engineering students are more likely to understand complicated concepts better. Therefore, the application of the information literacy model resulted in better performance.23 Further, educating students about information sources was found to be helpful for the instructors in enhancing the students’ learning by improving their online information retrieval skills. This study indicated that students can develop their learning traits more effectively through online learning than through face-to-face learning. INFORMATION LITERACY MODEL | ABUNADI 122 https://doi.org/10.6017/ital.v37i4.9750 Many researchers in this area have developed models that are only theoretical. 24 However, this paper develops a practical information literacy model that can be tested for improvement in information literacy skills. This is especially relevant for computer and information systems courses, which can sometimes fall outside the purview of library-related training or education in universities with limited resources. The inclusion of information literacy training within computer and information systems courses is not regularly done in the information literacy field. 25 Additionally, although some information literacy has been implemented practically in research, no other study has developed a practical information literacy model based on educators’ and students’ information literacy dispositions as well as both information literacy theory and practice.26 Moodle as an LMS Moodle is a useful and accommodating open-source platform with a stable structure of website services that allows instructors and learners to implement a range of helpful plugins. It can be used as a lively online education community and an enhancement to the face-to-face learning process.27 Moodle is used in around 190 countries and offers its services in over seventy languages. It acts as a mediator for instruction and is widely adopted in many institutions. Moodle provides services such as assignments, wikis, messaging, blogs, quizzes, and databases. 28 It can provide a more flexible teaching platform than traditional teaching. Health science educational service providers facilitate self-assurance in their learners. Several educational campuses operate by using face-to-face learning strategies, whereby learners obtain their training on-campus locations. The objective of Moodle is to enable the education of learners through internet access.29 Xing focused on the broad application of the Moodle LMS for developing educational technology within academic settings, suggesting that academic organizations should promote technology as a solution for common problems with students’ learning processes.30 Such suggestions have been supported by Costa et al. (2012) who found that Moodle is significantly helpful for developing an e-learning platform for students. They emphasized that engineering universities must use the Moodle LMS to provide students with extensive technical knowledge. 31 Costello et al. (2012) stated that Moodle, if used, will significantly help students improve their skills and knowledge effectively.32 METHODOLOGY In information literacy skill development, there are studies that support using only face-to-face education or only using an LMS. For example, Churkovich and Oughtred found that face-to-face learning leads to better results in information literacy tutorials than online learning. 33 At the same time, Anderson and May concluded that the use of an LMS is viewed by students as a better method than face-to-face instruction in information literacy.34 To test which educational pedagogy (traditional or technology) is better regarding information literacy, the following two hypotheses were posited: H1: Face-to-face learning has a significantly positive influence on information literacy disposition. H2: Moodle learning has a significantly positive influence on information literacy disposition. To provide a better understanding of the most effective method of information literacy instruction, a quantitative research design was used. The wording of the questionnaire items (shown in table 1) was inspired by the studies of Ng, Horvat et al., Abdullah, and Deng and Tavares. 35 Online INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 123 questionnaires were prepared and distributed to students, teachers, trainers, and professors as well as administrative departments in a small private university located in the Arabian Gulf region. Initially, a pilot study was conducted to test the instrument. This pilot study involved forty-nine participants and fifteen questions on information literacy. It also included demographic questions. Variables Code Item Wording Face-to-face Education Disposition (FED) FED1 Information literacy skills are polished through face-to-face learning FED2 Face-to-face learning accommodates information literacy requirements FED3 Face-to-face learning is easier than learning management systems FED4 Face-to-face learning is better than learning management systems Moodle Usage Disposition (MUD) MUD1 Moodle is more easily accessible than other online resources MUD2 Moodle is an effective web server for information literacy MUD3 Moodle is more reliable than other online resources MUD4 Moodle enables the provision of an extensive amount of useful information MUD5 Moodle is used to overcome language, understanding, and communication gaps Information Literacy Preference (IL) IL1 Students and teachers prefer online resources IL2 Inauthentic websites are helpful for students and teachers IL3 Authentic websites are useful for students and teachers IL4 Students and teachers prefer published articles, journals, and books IL5 Online learning is more effective IL6 Information is essential for individuals’ knowledge Table 1. Item coding. After the pilot study, a full-scale study was conducted, in which the participants were students, professors, and educational administrators. An online questionnaire was sent to the management of an academic institution in the Arabian Gulf region to assess the instruction methodology to improve students’ information literacy skills. The language used in the survey was Arabic, and the questionnaire was translated into English for this article by a professional translator. A total of five hundred questionnaires were sent, and 398 of them were received with complete responses. The following criteria were used to filter questionnaires that were not appropriate for this study: Inclusion Criteria • People currently involved in the education system. • Students, teachers, or members of an academic department. • People who understand information literacy. A question was added in the survey about whether the participant was familiar with information literacy; if not, the participant was removed from the sample. Exclusion Criteria • People who were not involved in the education system. • People who were not aware of online learning systems. • Staff with no role in learning or teaching. INFORMATION LITERACY MODEL | ABUNADI 124 https://doi.org/10.6017/ital.v37i4.9750 Gender Frequency Percent Male 186 46.73 Female 212 53.27 Total 398 100 Qualification Frequency Percent Undergraduate 181 45.48 Graduate 98 24.62 Masters 119 29.90 Total 398 100 Designation Frequency Percent Student 216 54.27 Instructor 90 22.61 Administrator 92 23.12 Total 398 100 Table 2. Demographic information. Question Agree Neutral Disagree Don’t Know Face-to-face Education Disposition (FED) FED1 46.8 22.8 21.3 9.1 FED2 10 74.5 14.2 1.3 FED3 1.5 12.8 75.8 9.9 FED4 32 30 26 12 Information Literacy Preference (IL) IL1 38.8 21.3 1.5 38.4 IL2 0.3 1 98.7 -- IL3 15 31 53.3 -- IL4 49.5 30 13.0 7.5 IL5 48 29.8 -- 22.2 IL6 74 11.5 1.8 12.7 Table 3. Questionnaire response distribution for FED and IL. Question Yes No Moodle Usage Disposition (MUD) MUD1 65 35 MUD2 73.3 26.8 MUD3 67 33 MUD4 66 34 MUD5 63.7 36.3 Table 4. Responses to MUD. The reliability statistics showed a high level of consistency for the pilot test because the Cronbach’s alpha for the fifteen items was 0.901, which is above the recommended level of 0.7.36 Cronbach’s alpha is a widely used coefficient measuring the internal consistency of items as a unified group.37 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 125 Based on the successful pilot study, a full-scale study was conducted. The demographic distribution for the full-scale study is shown in table 2 along with the mean and standard deviation of each demographic factor. The distribution of the questionnaire items for the full-scale study is shown in tables 3 and 4. Cronbach’s alpha was used to determine the reliability of the constructed items for the full-scale study. The standard benchmark for the reliability value is a 0.7 threshold; however, the Cronbach’s alpha for all constructed items was above the 0.7 standard value. Thus, this standard score revealed that all the items had appropriate and adequate reliability.38 RESULTS The research hypotheses were tested using structural equation modeling (SEM) with the analysis of momentum structures (AMOS) approach. SEM includes various statistical methods and computer algorithms that are used to assess latent variables along with observed variables. SEM also indicates the relationships among latent variables, showing the effects of the independent variables on the dependent variables.39 One well-regarded SEM methodology is AMOS, which is a multivariate technique that can concurrently assess the relationships between latent variables and their corresponding indicators (measurement model), as well as the relationships among the model’s variables.40 Highly cited information systems and statistics guidelines were followed for the SEM to ensure the validity and reliability of data analysis. 41 Measurement and Structural Model The measurement model contained fifteen items for ascertaining the representation of three latent variables, including face-to-face education disposition, Moodle usage disposition, and information literacy preferences. Before we proceed to this analysis, the data need to show normality for us to be able to trust the robustness of this parametric SEM. Curran et al. suggested a skewness and kurtosis less than the absolute value of 2 and 7, respectively, to display the normality of the data.42 All items’ absolute values of skewness and kurtosis were less than the suggested cut off, showing a suitable level of normality for conducting SEM analysis. The overall measurement model showed a high level for the fit indices: GFI=0.99, AGFI=0.98, NFI=0.98, CMIN/DF=0.86, and RMR=0.39. All these fit indices show that the theoretical model fits well with the empirical data if they are above 0.95, except CMIN/DF and RMR, which do not follow this cut off. The CMIN/DF should be less than 3, while the RMR should be less than 0.5.43 Table 5 shows all the items loaded on their corresponding latent variables higher than the suggested cut off (0.5). As shown in the table, IL6 was the only item that did not load clearly on its latent variable and, thus, it was dropped from further analysis.44 An additional method to assess item loading was item loading significance, which was significant at the level of 0.001, indicating that all items loaded on their latent variables.45 The indices of the measurement model suggested that the psychometric properties of this instrument can be preceded by the structural model. INFORMATION LITERACY MODEL | ABUNADI 126 https://doi.org/10.6017/ital.v37i4.9750 Item Estimate Face-to-face Education Disposition (FED) FED4 0.71 FED3 0.52 FED2 0.66 FED1 0.89 Moodle Usage Disposition (MUD) MUD5 0.93 MUD4 0.92 MUD3 0.92 MUD2 0.73 MUD1 0.93 Information Literacy Preference Disposition (IL) IL6 0.32 IL5 0.91 IL4 0.72 IL3 0.86 IL2 0.81 IL1 0.83 Table 5. Item loadings. The next step was to assess the structural model, which was used to evaluate the hypothesized relations between the dependent variables (face-to-face education disposition [FED] and Moodle usage disposition [MUD]) and independent variables (IL). Both education methods were tested in the hypotheses to identify the most suitable information literacy delivery mode for students. Both hypotheses were supported, which indicates that an individual method of information literacy delivery (either face-to-face instruction or LMS) is not preferred, and a different model can be suggested. Both hypotheses were supported at the level of 0.001 with an effect size for face-to-face education disposition of 0.32, which indicates a medium impact on information literacy preferences. Meanwhile, the Moodle usage disposition had an effect size of 0.70, which is considered a large effect size (Hair et al. 2010). Finally, the model’s explanatory power of information literacy preferences was determined by R2, which was high (0.85). Based on the previous analysis, it can be said that an individual method of information literacy delivery is insufficient in developing countries. Thus, a different model for information literacy was developed (figure 1), which had an impact on students’ related competencies. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 127 Figure 1. Information Literacy Intervention Model. As shown in figure 1, the model includes conducting weekly information literacy sessions that focus on educating students about technological literacy, information ethics, online library skills, and critical literacy. After each session is concluded, the instructor creates weekly assignments using an LMS that tests the students’ information literacy abilities regarding the subject material. The instructor follows up regarding the students’ overall performance and fills any identified gaps in subsequent information literacy sessions and assignments. The instructor studies the students’ performance after one month and provides feedback to students. Finally, a “real case project assignment” is used to teach students to solve real problems using the skills they learn. The instructor can further extend reflection on the process of assigning “real case project” grades by creating a course exit survey that asks students about their acquired level of information literacy skills. LONGITUDINAL CASE STUDY A small technical university in the Arabian Gulf region faces difficulties in providing adequate library resources to its students because of its limited capabilities. The university has about 4,500 students and five hundred employees. The university library and information technology department shortage in adequate staff and resources, resulting in an insufficient support for student learning. This has caused lack of student information literacy education, which is evident in the submission of student assignments. For example, students are not accustomed to citing materials that were used in their assessments. Thus, these undergraduates are viewed suspiciously by their educators when using online materials. Not knowing how to paraphrase then cite relevant online materials causes missing learning opportunities for students. Information literacy is a skill that should be considered for all technology-related courses.46 The outcomes of this course will be used to improve the education of students and place the power of learning in their hands.47 Therefore, the objective of this case study is to determine the influence of information literacy practices in improving student performance in solving organizational problems, especially when technology and library resources are scarce. This longitudinal case INFORMATION LITERACY MODEL | ABUNADI 128 https://doi.org/10.6017/ital.v37i4.9750 study was conducted in two semesters: the first was conducted traditionally without the use of an information literacy intervention model, whereas in the second semester, the intervention model was introduced. Finally, the performance and opinions of students for the two semesters were compared using a case study assignment and course exit survey. The information literacy intervention model was implemented by providing a series of practical tutorials at the beginning of the semester showing students how to use information from the internet. Then, the students applied the information and used information literacy skills to solve weekly assessments for an enterprise-architecture (EA) course. This course is taught under the information systems program at a private university. Students enrolling in the course are in their second year or higher. The information literacy assessments require students to search for reliable sources of information and cite and reference them. This forces them into the habit of critically examining sources of information, and grasping, analyzing, and using these sources to solve problems. The information literacy technology pedagogical method was followed to improve students’ knowledge of methods of learning.48 The students were educated through a series of classes on how to use the university’s databases, e-books, and internet resources to solve real-life organizational problems and to apply concepts in different situations, as shown in figure 1. The students were given ten small assessments from the Moodle LMS, where a concept taught in the class needed to be applied after students searched for it and learned more about it from different sources. This included looking in the correct places for reliable resources, online scholarly databases, and online videos that could be of use. Then, students were taught how to critically examine resources and determine which of these could be reliable. For example, students were shown that highly cited papers were more reliable than less cited papers and that online videos from professional organizations (e.g., IBM or Gartner) were more reliable than personal videos. Students were also taught how to use in-text citations and how to create reference lists. In the last quarter of the semester, a case study assignment was provided with real-life problems that students were required to solve using different sources, including the internet. The performance of semester-1 students (no intervention was conducted) was compared with that of semester-2 students (information literacy intervention was conducted) taking the same course. An improvement in grades was considered a successful indicator. The comparison point was a major project that required students to solve real-life organizational problems and required greater information literacy. Some of the EA concepts taught in the class required practice to apply them. For example, the as-is organizational modeling that is needed before implementing EA would be difficult to understand unless students actually conducted modeling on selected organizations. This enabled students to understand how they related to the real world. The concepts that were focused on were related to business tools in information systems (e.g., business process management and requirement elicitation) that are widely used for analysis within organizations. The theory behind these tools was explained in class; applying these theories required students to search many sources of information, including online books and research databases. Students were unaware of these resources until the instructors explained their availability on the internet and in the library. The students were provided with regular information literacy sessions to improve their skills in this aspect. They were shown how to search; for instance, if they could not find a specific term, they INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 129 could look for synonyms. They were instructed on how to use search engines and research databases and were shown the relevant electronic journals and books that can aid in solving weekly assessments. The usage of internet multimedia is also important in education.49 The students were shown relevant YouTube channels (e.g., by Harvard and Khan Academy) and relevant massive online open courses (e.g., free courses on Coursera.com and Udemy.com). Weekly tests required students to use these resources to solve the assessment problems. An important outcome of this intervention was an improvement in students’ abilities to use different digital resources. This was evident in semester-2 students’ usage of suitable reference lists and in-text citations, as compared to a lack of such usage by semester-1 students. An additional measure was the higher average score the students indicated in semester 2 (4.15/5), in comparison to semester 1 (3.2/5), for one of the items in the course exit survey relevant to information literacy: “Illustrate responsibility for one’s own learning.” The students were continually taught that information literacy grants a power that comes with responsibility, and no incidents of plagiarism were reported during the semester in which the intervention was conducted. Referencing became a habit with weekly information literacy assessments. The students’ grades in the final project were better than in the previous academic semester. The average grade for the project for semester 1 was 15.5/20, while that for semester 2 was 17/20. The difference between the grades for semester-1 and semester-2 projects was statistically significant at the level of 0.10, indicating significant differences in the students’ grades between the two semesters. The students could use digital library databases, and some were interested in using external online books. It became habitual for students to use in-text citations, and their references became diversified. Some students, however, struggled at times with the limited usage of suitable references in only some paragraphs. This feedback was delivered to students so that they could address this issue in other courses. DISCUSSION AND CONCLUSION This study was conducted to investigate the most effective mode of information literacy delivery. The study focused on smaller universities because they do not have adequate library facilities and technological capabilities to provide students with sufficient information literacy competencies during course delivery. A survey was conducted to determine the most suitable form of information literacy delivery. The survey determined that Moodle and face-to-face methods were both favored regarding information literacy. Thus, the information literacy intervention model was developed and tested in a case study, so that students’ performance would improve. The results of this study have shown that the combination of technology and information literacy instruction is an effective method to improve student skills in using digital resources in seeking knowledge. It was found that both face-to-face learning and the use of an LMS increase student performance in assessments that require information literacy. Face-to-face learning is required in order to explain information literacy concepts, while the LMS is used to disseminate the necessary digital resources and in creating assessment modules. Thus, the arrangement of both theory and practice in information literacy resulted in better understanding and implementation in knowledge seeking and problem-solving related to information systems. The inclusion of information literacy instruction along with the use of LMS for information literacy assessments within information systems courses has reduced the pressure on libraries that lack technological resources (such as PCs) and qualified staff. INFORMATION LITERACY MODEL | ABUNADI 130 https://doi.org/10.6017/ital.v37i4.9750 The results with regard to this study’s hypotheses are in agreement with those of previous studies.50 Hypothesis 1, which posited that there would be a positive significant influence on information literacy disposition, is congruent with the research of Churkovich and Oughtred. 51 Their research focused on student information literacy skill development using library facilities instead of faculty, which is a different approach than the approach followed in the present study. However, both the present study and the study of Churkovich and Oughtred found that using face- to-face instruction leads to improved student performance. Hypothesis 2, which posited a positive impact on information literacy disposition, correlates with the research of Anderson and May.52 They found that using an LMS is more effective than face-to-face instruction for information literacy instruction. Similar to Churkovich and Oughtred (and in contrast to the present study), Anderson and May relied on librarians to deliver information literacy instruction online. However, Anderson and May also relied on faculty staff in addition to librarians. There are two noteworthy outcomes of the first study. First, the questionnaire measurement model showed that the development of this instrument was successful and that the items and their latent variables can be used in further studies. Second, the results regarding the structural model indicated that both face-to-face instruction and Moodle use influenced information literacy preferences. Other studies have supported these results. The results of Peter et al. (2015) agree with the finding of the present study that the combination of face-to-face instruction and LMS use leads to improved student performance.53 Peter et al. (2015), based on psychology students, focused on the time-efficiency of the delivery of information literacy instruction; in contrast, the present study considers information literacy skill development as a progressive, long-term process. The information literacy intervention model is not only a learning medium but an interactive method of teaching that adapts to student learning patterns. The primary limitations of the study were the nature of the sample, the exclusion of some potentially relevant variables, and the simplification of the study’s findings. The sample was limited to students, professors, and people who were aware of the learning programs; it is highly possible that they were more familiar with such technological innovations than the general population. Future studies could retest the hypothesis of the study in a comprehensive manner and impose more control on the respondents. The interaction between people while visiting a site is itself an activity worthy of examination, but it must be either controlled or measured for us to understand the role it plays in shaping attitudes and behaviors. Future studies can apply the developed theoretical model in different settings to determine its interaction with other variables in the information systems field. A quantitative instrument can be developed based on the information literacy intervention model. Alternatively, this model can be applied with qualitative interviews in future studies to develop theoretical themes based on instructors’ and students’ responses. REFERENCES 1 Harry M. Kibirige and Lisa DePalo, “The Internet as a Source of Academic Research Information: Findings of Two Pilot Studies,” Information Technology and Libraries 19, no. 1 (2000): 11–15; Debbie Folaron, A Discipline Coming of Age in the Digital Age (Philadelphia: John Benjamins, 2006); N. N. Edzan, “Tracing Information Literacy of Computer Science Undergraduates: A Content Analysis of Students’ Academic Exercise,” Malaysian Journal of Library & Information Science 12, no. 1 (2007): 97–109. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 131 2 Heinz Bonfadelli, “The Internet and Knowledge Gaps,” European Journal of Communication 17, no. 1 (2002): 65–84, http://journals.sagepub.com/doi/abs/10.1177/0267323102017001607; Kibirige and DePalo, “The Internet as a Source of Academic Research Information,” 11–15. 3 Laurie A. Henry, “Searching for an Answer: The Critical Role of New Literacies While Reading on the Internet,” The Reading Teacher 59, no. 7 (2006): 614–27. 4 Peyina Lin, “Information Literacy Barriers: Language Use and Social Structure,” Library Hi Tech 28, no. 4 (2010): 548–68, https://doi.org/10.1108/07378831011096222. 5 Michael R. Hearn, “Embedding a Librarian in the Classroom: An Intensive Information Literacy Model,” Reference Services Review 33, no. 2 (2005): 219–27. 6 Hui Hui Chen et al., “An Analysis of Moodle in Engineering Education: The Tam Perspective” (paper presented at Teaching, Assessment and Learning for Engineering (TALE), 2012 IEEE International Conference on). 7 N. N. Edzan, “Tracing Information Literacy of Computer Science Undergraduates: A Content Analysis of Students' Academic Exercise,” Malaysian Journal of Library & Information Science 12, no. 1 (2007): 97–109. 8 Johannes Peter et al., “Making Information Literacy Instruction More Efficient by Providing Individual Feedback,” Studies in Higher Education (2015): 1–16, https://doi.org/10.1080/03075079.2015.1079607. 9 Pamela Alexondra Jackson, “Integrating Information Literacy into Blackboard: Building Campus Partnerships for Successful Student Learning,” The Journal of Academic Librarianship 33, no. 4 (2007): 454–61, https://doi.org/10.1016/j.acalib.2007.03.010. 10 Manal Abdulaziz Abdullah, “Learning Style Classification Based on Student's Behavior in Moodle Learning Management System,” Transactions on Machine Learning and Artificial Intelligence 3, no. 1 (2015): 28. 11 Catherine J. Gray and Molly Montgomery, “Teaching an Online Information Literacy Course: Is It Equivalent to Face-to-Face Instruction?,” Journal of Library & Information Services in Distance Learning 8, no. 3–4 (2014): 301–9, https://doi.org/10.1080/1533290X.2014.945876. 12 William Sugar, Trey Martindale, and Frank E Crawley, “One Professor’s Face-to-Face Teaching Strategies While Becoming an Online Instructor,” Quarterly Review of Distance Education 8, no. 4 (2007): 365–85. 13 Stephann Makri et al., “A Library or Just Another Information Resource? A Case Study of Users’ Mental Models of Traditional and Digital Libraries,” Journal of the Association for Information Science and Technology 58, no. 3 (2007): 433–45. 14 Christine Susan Bruce, “Workplace Experiences of Information Literacy,” International Journal of Information Management 19, no. 1 (1999): 33–47, Michael B Eisenberg, Carrie A Lowe, and Kathleen L Spitzer, Information Literacy: Essential Skills for the Information Age, (Westport, CT: Greenwood Publishing Group, 2004), https://doi.org/10.1016/S0268-4012(98)00045-0. INFORMATION LITERACY MODEL | ABUNADI 132 https://doi.org/10.6017/ital.v37i4.9750 15 Andy Carvin, “More Than Just Access: Fitting Literacy and Content into the Digital Divide Equation,” Educause Review 35, no. 6 (2000): 38–47. 16 Robert Hauptman, Ethics and Librarianship (Jefferson, NC: McFarland, 2002). 17 JaNae Kinikin and Keith Hench, “Poster Presentations as an Assessment Tool in a Third/College Level Information Literacy Course: An Effective Method of Measuring Student Understanding of Library Research Skills,” Journal of Information Literacy 6, no. 2 (2012), https://doi.org/10.11645/6.2.1698; Stuart Palmer and Barry Tucker, “Planning, Delivery and Evaluation of Information Literacy Training for Engineering and Technology Students, ” Australian Academic & Research Libraries 35, no. 1 (2004): 16–34, https://doi.org/10.1080/00048623.2004.10755254. 18 Lauren Smith, “Towards a Model of Critical Information Literacy Instruction for the Development of Political Agency,” Journal of Information Literacy 7, no. 2 (2013): 15–32, https://doi.org/10.11645/7.2.1809. 19 Melissa Gross and Don Latham, “What’s Skill Got to Do with It?: Information Literacy Skills and Self‐Views of Ability among First‐Year College Students,” Journal of the American Society for Information Science and Technology 63, no. 3 (2012): 574–83, https://doi.org/10.1002/asi.21681. 20 Bala Haruna et al., “Modelling Web-Based Library Service Quality and User Loyalty in the Context of a Developing Country,” The Electronic Library 35, no. 3 (2017): 507–19, https://doi.org/10.1108/EL-10-2015-0211. 21 Catherine J. Gray and Molly Montgomery, “Teaching an Online Information Literacy Course: Is It Equivalent to Face-to-Face Instruction?,” Journal of Library & Information Services in Distance Learning 8, no. 3–4 (2014): 301–9, https://doi.org/10.1080/1533290X.2014.945876. 22 Ioannis Dimopoulos et al., “Using Learning Analytics in Moodle for Assessing Students’ Performance” (paper presented at the 2nd Moodle Research Conference Sousse, Tunisia, 4 –6, 2013). 23 Ángel Hernández-García and Miguel Á. Conde-González, “Using Learning Analytics Tools in Engineering Education” (paper presented at LASI Spain, Bilbao, 2016). 24 Michael R. Hearn, “Embedding a Librarian in the Classroom: An Intensive Information Literacy Model,” Reference Services Review 33, no. 2 (2005): 219–27, https://doi.org/10.1108/00907320510597426; Thomas P Mackey and Trudi E Jacobson, “Reframing Information Literacy as a Metaliteracy,” College & Research Libraries 72, no. 1 (2011): 62–78; S. Serap Kurbanoglu, Buket Akkoyunlu, and Aysun Umay, “Developing the Information Literacy Self-Efficacy Scale,” Journal of Documentation 62, no. 6 (2006): 730–43, https://doi.org/10.1108/00220410610714949. 25 Michelle Holschuh Simmons, “Librarians as Disciplinary Discourse Mediators: Using Genre Theory to Move toward Critical Information Literacy,” portal: Libraries and the Academy 5, no. 3 (2005): 297–311, https://doi.org/10.1353/pla.2005.0041; Sharon Markless and David R. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 133 Streatfield, “Three Decades of Information Literacy: Redefining the Parameters,” Change and Challenge: Information Literacy for the 21st Century (Blackwood, South Australia: Auslib Press, 2007): 15–36; Meg Raven and Denyse Rodrigues, “A Course of Our Own: Taking an Information Literacy Credit Course from Inception to Reality,” Partnership: The Canadian Journal of Library and Information Practice and Research 12, no. 1 (2017), https://doi.org/10.21083/partnership.v12i1.3907. 26 Joanne Munn and Jann Small, “What Is the Best Way to Develop Information Literacy and Academic Skills of First Year Health Science Students? A Systematic Review,” Evidence Based Library and Information Practice 12, no. 3 (2017): 56–94, https://doi.org/10.18438/B8QS9M; Sheila Corrall, “Crossing the Threshold: Reflective Practice in Information Literacy Development,” Journal of Information Literacy 11, no. 1 (2017): 23–53, https://doi.org/10.11645/11.1.2241. 27 Liping Deng and Nicole Judith Tavares, “From Moodle to Facebook: Exploring Students’ Motivation and Experiences in Online Communities,” Computers & Education 68 (2013): 167– 76, https://doi.org/10.1016/j.compedu.2013.04.028. 28 Ana Horvat et al., “Student Perception of Moodle Learning Management System: A Satisfaction and Significance Analysis,” Interactive Learning Environments 23, no. 4 (2015): 515–27, https://doi.org/10.1080/10494820.2013.788033. 29 Cary Roseth, Mete Akcaoglu, and Andrea Zellner, “Blending Synchronous Face-to-Face and Computer-Supported Cooperative Learning in a Hybrid Doctoral Seminar,” TechTrends 57, no. 3 (2013): 54–59, https://doi.org/10.1007/s11528-013-0663-z. 30 Ruonan Xing, “Practical Teaching Platform Construction Based on Moodle—Taking ‘Education Technology Project Practice’ as an Example,” Communications and Network 5, no. 3 (2013): 631, https://doi.org/10.4236/cn.2013.53B2113. 31 Carolina Costa, Helena Alvelos, and Leonor Teixeira, “The Use of Moodle E-Learning Platform: A Study in a Portuguese University,” Procedia Technology 5 (2012): 334–43, https://doi.org/10.1016/j.protcy.2012.09.037. 32 Eamon Costello, “Opening Up to Open Source: Looking at How Moodle Was Adopted in Higher Education,” Open Learning: The Journal of Open, Distance and e-Learning 28, no. 3 (2013): 187– 200, https://doi.org/10.1080/02680513.2013.856289. 33 Marion Churkovich and Christine Oughtred, “Can an Online Tutorial Pass the Test for Library Instruction? An Evaluation and Comparison of Library Skills Instruction Methods for First Year Students at Deakin University,” Australian Academic & Research Libraries 33, no. 1 (2002): 25– 38, https://doi.org/10.1080/00048623.2002.10755177. 34 Karen Anderson and Frances A. May, “Does the Method of Instruction Matter? An Experimental Examination of Information Literacy Instruction in the Online, Blended, and Face-to-Face Classrooms,” The Journal of Academic Librarianship 36, no. 6 (2010): 495–500, https://doi.org/10.1016/j.acalib.2010.08.005. INFORMATION LITERACY MODEL | ABUNADI 134 https://doi.org/10.6017/ital.v37i4.9750 35 Wan Ng, “Can We Teach Digital Natives Digital Literacy?,” Computers & Education 59, no. 3 (2012): 1065–78, https://doi.org/10.1016/j.compedu.2012.04.016; Horvat et al., “Student Perception of Moodle Learning Management System,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033; Manal Abdulaziz Abdullah, “Learning Style Classification Based on Student's Behavior in Moodle Learning Management System,” Transactions on Machine Learning and Artificial Intelligence 3, no. 1 (2015): 28; Liping Deng and Nicole Judith Tavares, “From Moodle to Facebook: Exploring Students’ Motivation and Experiences in Online Communities,” Computers & Education 68 (2013): 167–76, https://doi.org/10.1016/j.compedu.2013.04.028. 36 J. F. Hair, William C. Black, and Barry J. Babin, Multivariate Data Analysis: A Global Perspective, 7th ed. (Upper Saddle River, NJ: Pearson, 2010). 37 L. J. Cronbach, “Test Validation,” in Educational Measurement, R. L. Thorndike 2nd ed. (Washington, DC: American Council on Education, 1971). 38 B. Tabachnick and L. Fidell, Using Multivariate Statistics, 5th ed. (New York: Allyn and Bacon, 2007). 39 Hair, Black, and Babin, Multivariate Data Analysis. 40 B. M. Byrne, Structural Equation Modeling with Amos: Basic Concepts, Applications, and Programming, 2nd ed. (New York: Taylor & Francis Group, 2010); Hair, Black, and Babin, Multivariate Data Analysis. 41 T. A. Brown, Confirmatory Factor Analysis for Applied Research (Methodology in the Social Sciences) (New York: Guilford, 2006); Byrne, Structural Equation Modeling with Amos; D. Gefen, D. Straub, and M. Boudreau, “Structural Equation Modeling and Regression: Guidelines for Research Practice,” Communications of the Association for Information Systems 4, no. 7 (2000): 1–77; Hair, Multivariate Data Analysis: A Global Perspective. 42 P. J. Curran, S. G. West, and J. F. Finch, “The Robustness of Test Statistics to Nonnormality and Specification Error in Confirmatory Factor Analysis,” Psychological Methods 1, no. 1 (1996): 16–29, https://doi.org/10.1037/1082-989X.1.1.16. 43 Byrne, Structural Equation Modeling with Amos. 44 Brown, Confirmatory Factor Analysis for Applied Research; Byrne, Structural Equation Modeling with Amos. 45 Hair, Black, and Babin, Multivariate Data Analysis: A Global Perspective. 46 Michael B. Eisenberg, Carrie A. Lowe, and Kathleen L. Spitzer, Information Literacy: Essential Skills for the Information Age (Westport, CT: Greenwood Publishing Group, 2004). 47 James Elmborg, “Critical Information Literacy: Implications for Instructional Practice,” The Journal of Academic Librarianship 32, no. 2 (2006): 192–99, https://doi.org/10.1016/j.acalib.2005.12.004. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2018 135 48 Ibid. 49 Anderson and May, “Does the Method of Instruction Matter?,” 495–500; Horvat et al., “Student Perception of Moodle Learning Management System,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033. 50 Horvat et al., “Student Perception of Moodle Learning Management System,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033; Anderson and May, “Does the Method of Instruction Matter?,” 495–500; Raven and Rodrigues, “A Course of Our Own.” 51 Churkovich and Oughtred, “Can an Online Tutorial Pass the Test for Library Instruction?,” 25–38. 52 Anderson and May, “Does the Method of Instruction Matter?,” 495–500. 53 Peter et al., “Making Information Literacy Instruction More Efficient,” 1–16. ABSTRACT INTRODUCTION LITERATURE REVIEW Information Literacy Moodle as an LMS METHODOLOGY Inclusion Criteria Exclusion Criteria Results Measurement and Structural Model longitudinal CASE STUDY DISCUSSION AND CONCLUSION REFERENCES 9808 ---- Microsoft Word - December_ITAL_fifarek_final.docx President’s Message: For The Record Aimee Fifarek INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2017 1 For a long time, I’ve have an idea that when a new President of the United States is elected, sometime after he's sworn in, amid all of the briefings, a wizened old man sits down with him to have The Talk. In my imagination the messenger is some cross between the Templar Knight from Indiana Jones and the Last Crusade and the International Express man from Neil Gaiman and Terry Pratchett’s Good Omens: officious yet wise. He tells the new President the why of it all, the real reasons why important things have happened in the ways they have, making all the decisions that seemed so wrong now seem inevitable. And probably not for the first time the new President thinks to himself “What have I gotten myself into?” This is clearly reflective of my desire for there to be, if not a reason for everything that happens, then at least some record of it all that can be reviewed, synthesized, and mined for meaning by future leaders. It’s the Librarian in me I suppose. Although being LITA President bears absolutely no resemblance to being President of the United States, I have been thinking about this little imagining of mine a lot lately. This is probably because, now that I am midway through my Presidential cycle (Vice President, President, Past President), I realize how much of what I’ve done has been marked by the absence of such a record. I did not receive a “How to be LITA President” manual along with my gavel, and no one gave me the LITA version of The Talk. The one person who could have done it, LITA Executive Director Jenny Levine, was as new to her position as I was to mine, so we have learned together and asked many questions of those around us with more experience. We are in the midst of Election season, and will soon have a new President-Elect. Bohyun Kim and David Lee King are both excellent candidates (http://litablog.org/2017/01/meet-your- candidates-for-the-2017-lita-election/); those of you who have not yet voted have a difficult choice. In order to make a little progress toward developing that how-to guide I thought I’d document a few of things I’ve learned since being LITA President. Being LITA President also means being President of a Division of the American Library Association. When I was elected I expected to manage the business of the Library Information Technology Association—Board meetings, Committee Appointments, Presidential Programs and LITA Forums. Seeing the Board complete the LITA Strategic Plan (http://www.ala.org/lita/about/strategic) was a great accomplishment at this level. While it’s possible for a Division leader to have minimal interactions with “Big ALA” during their Aimee Fifarek (aimee.fifarek@phoenix.gov) is LITA President 2016-17 and Deputy Director for Customer Support, IT and Digital Initiatives at Phoenix Public Library, Phoenix, AZ. PRESIDENT’S MESSAGE | FIFAREK https://doi.org/10.6017/ital.v36i1.9808 2 term and still be successful, my priority for my presidential year—increasing value LITAns receive from membership, especially those who are not able to attend in-person conferences—meant that I needed to learn more about how ALA works. After a year and a half, I have a much better understanding of the Association’s budgeting, publishing, and technology practices, and how all of these are impacted by declines in membership and decreasing revenues. Future LITA leaders are going to need to continue to be engaged at the larger organizational level if we are to be able to use LITA’s technological knowledge and expertise to support ALA’s efforts to maximize efficiency while minimizing costs. Being LITA President means speaking not just to, but for, an incredibly diverse community. My plan when I became LITA President was to blog on a more regular basis. However, I didn’t expect some of my first communications to be about a mass shooting in Dallas (in advance of the Forum in Ft. Worth) or working with the Board to craft a statement on inclusivity after the US presidential election. The proverbial curse “may you live in interesting times” has certainly been true this year. Having to speak to the LITA community about those issues made me acutely aware of my responsibility to adequately represent you when we’ve also been asked to weigh in on technology policy issues at the federal level such as the call for increased gun violence research and rescinding ISP regulations on privacy protection. The decision by the Board to include Advocacy and Information Policy as a primary focus for the strategic plan was certainly prescient. We are fortunate that our President Elect, Andromeda Yelton, is both well-versed in the issues and able to speak eloquently to them1. Being LITA President means being part of more than one team I’m continually amazed at the hard work and dedication that Board members (http://www.ala.org/lita/about/board), Committee and Interest Group Chairs (http://www.ala.org/lita/about/committees/chairs), and anyone who fits into Involvement our Member persona (http://litablog.org/2017/03/who-are-lita-members-lita-personas/). The success of LITA as an organization is entirely due to the time and passion of this team. But when you become LITA President-Elect you get a new team—the other Division Vice Presidents. This cohort travels to ALA HQ in Chicago in October after they are elected to meet each other and the incoming ALA President and learn about the structure of ALA. I have learned much from the other Presidents this year, and we have had a number of truly productive discussions about how the Divisions can collaborate and learn from each other to more effectively serve our members. LITA is directly benefitting from the expertise of the other groups and they are in turn looking to us for both our technical skillset and the successes we’ve had over 50-years as an Association. INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 1017 3 Consider this a new preface to the How to Be LITA President manual. I hope that my successors find it useful, and that it will serve as an inspiration for any LITAns out there who are thinking about putting their name on the ballot in future years. It has been a marvelous and educational experience. And the gavel is pretty cool, too. REFERENCES 1. Making ALA Great Again, Publisher’s Weekly, Feb 17, 2017. http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/72814- making-ala-great-again.html 9817 ---- June_ITA_Pekala_final Privacy and User Experience in 21st Century Library Discovery Shayna Pekala INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 48 ABSTRACT Over the last decade, libraries have taken advantage of emerging technologies to provide new discovery tools to help users find information and resources more efficiently. In the wake of this technological shift in discovery, privacy has become an increasingly prominent and complex issue for libraries. The nature of the web, over which users interact with discovery tools, has substantially diminished the library’s ability to control patron privacy. The emergence of a data economy has led to a new wave of online tracking and surveillance, in which multiple third parties collect and share user data during the discovery process, making it much more difficult, if not impossible, for libraries to protect patron privacy. In addition, users are increasingly starting their searches with web search engines, diminishing the library’s control over privacy even further. While libraries have a legal and ethical responsibility to protect patron privacy, they are simultaneously challenged to meet evolving user needs for discovery. In a world where “search” is synonymous with Google, users increasingly expect their library discovery experience to mimic their experience using web search engines.1 However, web search engines rely on a drastically different set of privacy standards, as they strive to create tailored, personalized search results based on user data. Libraries are seemingly forced to make a choice between delivering the discovery experience users expect and protecting user privacy. This paper explores the competing interests of privacy and user experience, and proposes possible strategies to address them in the future design of library discovery tools. INTRODUCTION On March 23, 2017, the internet erupted with outrage in response to the results of a Senate vote to roll back Federal Communications Commission (FCC) rules prohibiting internet service providers (ISPs), such as Comcast, Verizon, and AT&T, from selling customer web browsing histories and other usage data without customer permission. Less than a week after the Senate vote, the House followed suit and similarly voted in favor of rolling back the FCC rules, which were set to go into effect at the end of 2017.2 The repeal became official on April 3, 2017 when the President signed it into law.3 This decision by U.S. lawmakers serves as a reminder that today’s internet economy is a data economy, where personal data flows freely on the web, ready to be compiled and sold to the highest bidder. Continuous online tracking and surveillance has become the new normal. Shayna Pekala (shayna.pekala@georgetown.edu) is Discovery Services Librarian, Georgetown University Library, Washington, DC. PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 49 ISPs are just one of the many players in the online tracking game. Major web search engines, such as Google, Bing, and Yahoo, also collect information about users’ search histories, among other personal information.4 By selling this data to advertisers, data brokers, and/or government agencies, these search engine companies are able to make a profit while providing the search engines themselves for “free.” In addition to profiting off of user data, web search engines also use it to enhance the user experience of their products. Collecting and analyzing user data enables systems to learn user preferences, providing personalized search results that make it easier to navigate the ever-increasing sea of online information. The collection and sharing of user data that occurs on the open web is deeply troubling for libraries, whose professional ethics embody the values of privacy and intellectual freedom. A user’s search history contains information about a user’s thought process, and the monitoring of these thoughts inhibits intellectual inquiry.5 Libraries, however, would be remiss to dismiss the success of web search engines and their use of data altogether. MIT’s preliminary report on the future of libraries urges, “While the notion of ‘tracking’ any individual’s consumption patterns for research and educational materials is anathema to the core values of libraries...the opportunity to leverage emerging technologies and new methodologies for discovery should not be discounted.”6 This article examines the current landscape of library discovery, the competing interests of privacy and user experience at play, and proposes possible strategies to address them in the future design of library discovery tools. BACKGROUND Library Discovery in the Digital Age The advent of new technologies has drastically shaped the way libraries support information discovery. While users once relied on shelf-browsing and card catalogs to find library resources, libraries now provide access to a suite of online tools and interfaces that facilitate cross-collection searching and access to a wide range of materials. In an online environment, many paths to discovery are possible, with the open web playing a newfound and significant role. Today’s library discovery tools fall into three categories: online catalogs (the patron interface of the integrated library system (ILS)), discovery layers (a patron interface with enhanced functionality that is separate from an ILS), and web-scale discovery tools (an enhanced patron interface that relies on a central index to bring together resources from the library catalog, subscription databases, and digital repositories).7 These tools are commonly integrated with a variety of external systems, including proxy servers, inter-library loan, subscription databases, individual publisher websites, and more. For the most part, libraries purchase discovery tools from third-party vendors. While some libraries use open source discovery layers, such as Blacklight or VuFind, there are currently no open source options for web-scale discovery tools.8 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 50 Outside of the library, web search engines (e.g. Google, Bing, and Yahoo), and targeted academic discovery products (e.g. Google Scholar, ResearchGate, and Academia.edu) provide additional systems that enable discovery.9 In fact, web search engines, particularly Google, play a significant role in the research process. Both students and faculty use Google in conjunction with library discovery tools. Students typically use Google at the beginning of the research process to get a better understanding of their topic and identify secondary search terms. Faculty, on the other hand, use Google to find out how other scholars are thinking about a topic.10 Unsurprisingly, Google and Google Scholar provide the majority of content access to major content platforms.11 The Data Economy and Online Privacy Concerns In an information discovery environment that is primarily online, new threats to patron privacy emerge. In today’s economy, user data has become a global commodity. Commercial businesses have recognized the value of data mining for marketing purposes. Bjorn Bloching, et. al. explain, “From cleverly aggregated data points, you can draw multiple conclusions that go right to the heart and mind of the customer.”12 Along the same lines, the ability to collect and analyze user data is extremely valuable to government agencies for surveillance purposes, creating an additional data-driven market.13 The increasing value of user data has drastically expanded the business of online tracking. In her book, Dragnet Nation, journalist Julia Angwin outlines a detailed taxonomy of trackers, including various types of government, commercial, and individual trackers.14 In the online information discovery process, multiple parties collect user data at different points. Consider the following scenario: a user executes a basic keyword search in Google to access an openly available online resource. In the fifteen seconds it takes the user to get to that resource, information about the user’s search is collected by the internet service provider (ISP), the web browser, the search engine, the website hosting the resource, and any third-party trackers embedded in the website. The search query, along with the user’s Internet Protocol (IP) address, become part of the data collector’s profile on the user. In the future, the data collector can sell the user’s profile to a data broker, where it will be merged with profiles from other data collectors to create an even more detailed portrait of the user.15 The data broker, in turn, can sell the complete dataset to the government, law enforcement, commercial businesses, and even criminals. This creates serious privacy concerns, particularly since users have no legal right over how their data is bought and sold.16 Privacy Protection in Libraries Libraries have deeply-rooted values in privacy and strong motivations to protect it. Intellectual freedom, the foundation on which libraries are built, necessarily requires privacy. In its interpretation of the Library Bill of Rights, the American Library Association (ALA) explains, “In a library (physical or virtual), the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others.”17 Many studies support this idea, PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 51 having found that people who are indiscriminately and secretly monitored censor their behavior and speech.18 Libraries have both legal and ethical obligations to protect patron privacy. While there is no federal legislation that protects privacy in libraries, forty-eight states have regulations regarding the confidentiality of library records, though the extent of these protections varies by state.19 Because these statutes were drafted before the widespread use of the internet, they are phrased in a way that addresses circulation records and does not specifically include or exclude internet use records (records with information on sites accessed by patrons) from these protections. Therefore, according to Theresa Chmara, libraries should not treat internet use records any differently than circulation records with respect to confidentiality.20 The library community has established many guiding documents that embody its ethical commitment to protecting patron privacy. The ALA Code of Ethics states in its third principle, “We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”21 The International Federation of Library Associations and Institutions (IFLA) Code of Ethics has more specific language about data sharing, stating, “The relationship between the library and the user is one of confidentiality and librarians and other information workers will take appropriate measures to ensure that user data is not shared beyond the original transaction.”22 The library community has also established practical guidelines for dealing with privacy issues in libraries, particularly those issues relating to digital privacy, including the ALA Privacy Guidelines23 and the National Information Standards Organization (NISO) Consensus Principles on User’s Digital Privacy in Library, Publisher, and Software-Provider Systems.24 Additionally, The Library Freedom Project was launched in 2015 as an educational resource to teach librarians about privacy threats, rights, and tools, and in 2017, the Library and Information Technology Association (LITA) released a set of seven privacy checklists25 to help libraries implement the ALA Privacy Guidelines. Personalization of Online Systems While user data can be used for tracking and surveillance, it can also be used to improve the digital user experience of online systems through personalization. Because the growth of the internet has made it increasingly difficult to navigate the continually growing sea of information online, researchers have put significant effort into designing interfaces, interaction methods, and systems that deliver adaptive and personalized experiences.26 Angsar Koene, et. al. explain, “The basic concept behind personalization of on-line information services is to shield users from the risk of information overload, by pre-filtering search results based on a model of the user’s preferences… A perfect user model would…enable the service provider to perfectly predict the decision a user would make for any given choice.”27 The authors continue to describe three main flavors of personalization systems: 1. content-based systems, in which the system recommends items based on their similarity to items that the user expressed interest in; INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 52 2. collaborative-filtering systems, in which users are given recommendations for items that other users with similar tastes liked in the past; and 3. community-based systems, in which the system recommends items based on the preferences of the user’s friends.28 Many popular consumer services, such as Amazon.com, YouTube, Netflix, Google, etc., have increased (and continue to increase) the level of personalization that they offer.29 One such service in the area of academic resource discovery is Google Scholar’s Updates, which analyzes a user’s publication history in order to predict new publications of interest.30 Libraries, in contrast, have not pressed their developers and vendors to personalize their services in favor of privacy, even though studies have shown that users expect library tools to mimic their experience using web search engines.31 Some web-scale discovery services do, however, allow researchers to set personalization preferences, such as their field of study, and, according to Roger Schonfeld, it is likely that many researchers would benefit tremendously from increased personalization in discovery.32 In this vein, the American Philosophical Society Library recently launched a new recommendation tool for archives and manuscripts that uses circulation data and user-supplied interests to drive recommendations.33 Opportunities for User Experience in Library Discovery A major challenge in today’s online discovery environment is that the user is inhibited by an overwhelming number of results. This leads to users rely on relevance rankings and to fail to examine search results in depth. Creating fine-tuned relevance ranking algorithms based on user behavior is one remedy to this problem, but it relies on the use of personal user data.34 However, there may be opportunities to facilitate data-driven discovery while maintaining the user’s anonymity that would be suitable for library (and other) discovery tools. Irina Trapido proposes that relevance ranking algorithms could be designed to leverage the popularity of a resource measured by its circulation statistics or by ranking popular or introductory materials higher than more specialized ones to help users make sense of large results sets.35 Michael Schofield proposes “context-driven design” as an intermediary solution, whereby the user opts in to have the system infer context from neutral device or browser information, such as the time of day, business hours, weather, events, holidays, etc.36 Jason Clark describes a search prototype he built that applies these principles, but he questions whether these types of enhancements actually add value to users.37 Rachel Vacek cautions that personalization is not guaranteed to be useful or meaningful, and continuous user testing is key.38 DISCUSSION There are several aspects to consider for the design of future library discovery tools. The integrated, complex nature of the web causes privacy to become compromised during the information discovery process. Library discovery tools have been designed not to retain borrowing records, but have not yet evolved to mask user behavior, which is invaluable in today’s data economy. It is imperative that all types of library discovery tools have built-in functionality to PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 53 protect patron privacy beyond borrowing records, while also enabling the ethical use of patron data to improve user experience. Even if library discovery tools were to evolve so that they themselves were absolutely private (where no data were ever collected or shared), other online parties (ISPs, web browsers, advertisers, data brokers, etc.) would still have access to user data through other means, such as cookies and fingerprinting. The operating reality is such that privacy is not immediately and completely controllable by libraries. Laurie Rinehart-Thompson explains, “In the big picture, privacy is at the mercy of ethical and stewardship choices on the part of all information handlers.”39 While libraries alone cannot guarantee complete privacy for their patrons, they can and should mitigate privacy risks to the greatest extent possible. At the same time, ignoring altogether the benefits of using patron data to improve the discovery user experience may threaten the library’s viability in the age of Google. Roger Schonfeld explains, “If systems exclude all personal data and use-related data, the resulting services will be one- dimensional and sterile. I consider it essential for libraries to deliver dynamic and personalized services to remain viable in today's environment; expectations are set by sophisticated social networks and commercial destinations.”40 Libraries must find ways to keep up with greater industry trends while adhering to professional ethics. RECOMMENDATIONS While libraries have traditionally shied away from collecting data about patron transactions, these conservative tendencies run counter to the library’s mission to provide outstanding user experience and the need to evolve in a rapidly changing information industry. As the profession adopts new technologies, ethical dilemmas present themselves that are tied into their use. While several library organizations have issued guidance for libraries about the role of user data in these new technologies, this does not go far enough. The NISO Privacy Principles, for instance, acknowledge that its principles are merely “a starting point.”41 Examining the substance of these guidelines is important for confronting the privacy challenges facing library discovery in the 21st century, but there are additional steps libraries can take to more fully address the competing interests of privacy and user experience in library discovery and in library technologies more generally. Holding Third Parties Accountable Libraries are increasingly at the mercy of third parties when it comes to the development and design of library discovery tools. Unfortunately, these third parties not have the same ethical obligations to protect patron privacy that librarians do. In addition, the existing guidance for protecting user data in library technologies is directed towards librarians, not third party vendors. The library community must hold third parties accountable for the ethical design of library discovery tools. One strategy for doing this would be to develop a ranking or certification process for discovery tools based on a community set of standards. The development of HIPAA-compliant INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 54 records management systems in the medical field sets an example. Because healthcare providers are required by law to guarantee the privacy of patient data,42 they must select Electronic Health Records systems (ERMs) that have been certified by an Office of the National Coordinator for Health Information Technology (ONC)-authorized body.43 In order to be certified, the system must adhere to a set of criteria adopted by the Department of Health and Human Services,44 which includes privacy and security standards.45 Another example is the Consumer Reports standard and testing program for consumer privacy and security, which is currently in development. Consumer Reports explains the reason for developing this new privacy standard, “If Consumer Reports and other public-interest organizations create a reasonable standard and let people know which products do the best job of meeting it, consumer pressure and choices can change the marketplace.”46 Libraries could potentially adapt the Consumer Reports standards and rating system for library discovery tools and other library technologies. Engaging in UX Research & Design Libraries should not rely on third parties alone to address privacy and user experience requirements for library discovery tools. Libraries are well-poised to become more involved in the design process itself by actively engaging in user experience research and design. The opportunities for “context-driven design” and personalization based on circulation and other anonymous data are promising for library discovery but require ample user testing to determine their usefulness. Understanding which types of personalization features offer the most value while preserving privacy is key to accelerating the design of library discovery tools. The growth of User Experience Librarian jobs and the emergence of User Experience teams and departments in libraries signals an increasing amount of user experience expertise in the field, which can be leveraged to investigate these important questions for library discovery. Illuminating the Black Box When librarians adopt new discovery tools without fully understanding their underlying technologies and the data economy in which they operate, this does not serve users. Librarians have ethical obligations that should require them to thoroughly understand how and when user data is captured by library discovery tools and other web technologies, and how this information is compiled and shared at a higher level. Not only do librarians need to understand the technical aspects of discovery technologies, they also need to understand the related user experience benefits and privacy concerns and the resulting ethical implications. As technology continues to evolve, librarians should be required to engage in continued learning in these areas. Such technology literacy skills could be incorporated in the curriculum of Library and Information Science degree programs, as well as in ongoing professional development opportunities. Empowering Library Users Because information discovery in an online environment introduces new privacy risks, communication about this topic between librarians and patrons is paramount. Librarians should PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 55 proactively discuss with patrons the potential risks to their privacy when conducting research online, whether they are using the open web or library discovery tools. It is ultimately up to the patron to weigh their needs and preferences in order to decide which tools to use, but it is the librarian’s responsibility to empower patrons to be able to make these decisions in the first place. CONCLUSION With the rollback of the FCC privacy rules that prohibit ISPs from selling customer search histories without customer permission, understanding digital privacy issues and taking action to protect patron privacy is more important than ever. While privacy and user experience are both necessary and important components of library discovery systems, their requirements are in direct conflict with each other. An absolutely private discovery experience would mean that no user data is ever collected during the search process, whereas a completely personalized discovery experience would mean that all user data is collected and utilized to inform the design and features of the system. It is essential for library discovery tools to have built-in functionality that protects patron privacy to the greatest extent possible and enables the ethical use of patron data to improve user experience. The library community must take action to address these requirements beyond establishing guidelines. Holding third party providers to higher privacy standards is a starting point. In addition, librarians themselves need to engage in user experience research and design to discover and test the usefulness of possible intermediary solutions. Librarians must also become more educated as a profession on digital privacy issues and their ethical implications in order to educate patrons about their fundamental rights to privacy and empower them to make decisions about which discovery tools to use. Collectively, these strategies enable libraries to address user needs, uphold professional ethics, and drive the future of library discovery. REFERENCES 1. Irina Trapido, “Library Discovery Products: Discovering User Expectations through Failure Analysis,” Information Technologies and Libraries 35, no. 3 (2016): 9-23, https://doi.org/10.6017/ital.v35i3.9190. 2. Brian Fung, “The House Just Voted to Wipe Away the FCC’s Landmark Internet Privacy Protections,” The Washington Post, March 28, 2017, https://www.washingtonpost.com/news/the-switch/wp/2017/03/28/the-house-just- voted-to-wipe-out-the-fccs-landmark-internet-privacy-protections. 3. Jon Brodkin, “President Trump Delivers Final Blow to Web Browsing Privacy Rules,” ARS Technica, April 3, 2017, https://arstechnica.com/tech-policy/2017/04/trumps-signature- makes-it-official-isp-privacy-rules-are-dead/. 4. Nathan Freed Wessler, “How Private is Your Online Search History?” ACLU Free Future (blog), https://www.aclu.org/blog/how-private-your-online-search-history. 5. Julia Angwin, Dragnet Nation (New York: Times Books, 2014), 41-42. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 56 6. MIT Libraries, Institute-wide Task Force on the Future of Libraries (2016), 12, https://assets.pubpub.org/abhksylo/FutureLibrariesReport.pdf. 7. Trapido, “Library Discovery Products,” 10. 8. Marshall Breeding, “The Future of Library Resource Discovery,” NISO White Papers, NISO, Baltimore, MD, 2015, 4, http://www.niso.org/apps/group_public/download.php/14487/future_library_resource_dis covery.pdf. 9. Christine Wolff, Alisa B. Rod, and Roger C. Schonfeld, Ithaka S+R US Faculty Survey 2015 (New York: Ithaka S+R, 2016), 11, https://doi.org/10.18665/sr.277685. 10. Deirdre Costello, “Students and Faculty Research Differently” (presentation, Computers in Libraries, Washington, D.C., March 28, 2017), http://conferences.infotoday.com/documents/221/A103_Costello.pdf. 11. Roger C. Schonfeld, Meeting Researchers Where They Start: Streamlining Access to Scholarly Resources (New York: Ithaka S+R, 2015), https://doi.org/10.18665/sr.241038. 12. Björn Bloching, Lars Luck, and Thomas Ramge, In Data We Trust: How Customer Data Is Revolutionizing Our Economy (London: Bloomsbury Publishing, 2012), 65. 13. Angwin, 21-36. 14. Ibid., 32-33. 15. Natasha Singer, “Mapping, and Sharing, the Consumer Genome,” New York Times, June 16, 2012, http://www.nytimes.com/2012/06/17/technology/acxiom-the-quiet-giant-of- consumer-database-marketing.html. 16. Lois Beckett, “Everything We Know About What Data Brokers Know About You,” ProPublica, June 13, 2014, https://www.propublica.org/article/everything-we-know-about-what-data- brokers-know-about-you. 17. “An Interpretation of the Library Bill of Rights,” American Library Association, amended July 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 18. Angwin, Dragnet Nation, 41-42. 19. Anne Klinefelter, “Privacy and Library Public Services: Or, I Know What You Read Last Summer,” Legal References Services Quarterly 26, no. 1-2 (2007): 258-260, https://doi.org/10.1300/J113v26n01_13. 20. Theresa Chmara, Privacy and Confidentiality Issues: Guide for Libraries and Their Lawyers (Chicago: ALA Editions, 2009), 27-28. 21. “Code of Ethics of the American Library Association,” American Library Association, PRIVACY AND USER EXPERIENCE IN 21ST CENTURY LIBRARY DISCOVERY | PEKALA https://doi.org/10.6017/ital.v36i2.9817 57 amended January 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics. 22. “IFLA Code of Ethics for Librarians and other Information Workers,” International Federation of Library Associations and Institutions, August 12, 2012, http://www.ifla.org/news/ifla-code-of-ethics-for-librarians-and-other-information- workers-full-version. 23. “Privacy & Surveillance,” American Library Association, approved 2015-2016, http://www.ala.org/advocacy/privacyconfidentiality. 24. National Information Standards Organization, NISO Consensus Principles on Users’ Digital Privacy in Library, Publisher, and Software- Provider Systems (NISO Privacy Principles), published on December 10, 2015, http://www.niso.org/apps/group_public/download.php/15863/NISO%20Consensus%20Pr inciples%20on%20Users%92%20Digital%20Privacy.pdf. 25. “Library Privacy Checklists,” Library and Information Technology Association, accessed March 7, 2017, http://www.ala.org/lita/advocacy. 26. Panagiotis Germanakos and Marios Belk, “Personalization in the Digital Era,” in Human- Centred Web Adaptation and Personalization: From Theory to Practice, (Switzerland: Springer International Publishing Switzerland, 2016), 16. 27. Ansgar Koene et al., “Privacy Concerns Arising from Internet Service Personalization Filters,” ACM SIGCAS Computers and Society 45, no. 3 (2015): 167. 28. Ibid., 168. 29. Ibid. 30. James Connor, “Scholar Updates: Making New Connections,” Google Scholar Blog, https://scholar.googleblog.com/2012/08/scholar-updates-making-new-connections.html. 31. Schonfeld, Meeting Researchers Where They Start, 2. 32. Roger C. Schonfeld, Does Discovery Still Happen in the Library?: Roles and Strategies for a Shifting Reality (New York: Ithaka S+R, 2014), 10, https://doi.org/10.18665/sr.24914. 33. Abigail Shelton, “American Philosophical Society Announces Launch of PAL, an Innovative Recommendation Tool for Research Libraries,” American Philosophical Society, April 3, 2017, https://www.amphilsoc.org/press/pal. 34. Trapido, “Library Discovery Products,” 17. 35. Ibid. 36. Michael Schofield, “Does the Best Library Web Design Eliminate Choice?” LibUX, September INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2017 58 11, 2015, http://libux.co/best-library-web-design-eliminate-choice/. 37. Jason A. Clark, “Anticipatory Design: Improving Search UX using Query Analysis and Machine Cues,” Weave: Journal of Library User Experience 1, no. 4 (2016), https://doi.org/10.3998/weave.12535642.0001.402. 38. Rachel Vacek, “Customizing Discovery at Michigan” (presentation, Electronic Resources & Libraries, Austin, TX, April 4, 2017), https://www.slideshare.net/vacekrae/customizing- discovery-at-the-university-of-michigan. 39. Laurie A. Rinehart-Thompson, Beth M. Hjort, and Bonnie S. Cassidy, “Redefining the Health Information Management Privacy and Security Role,” Perspectives in Health Information Management 6 (2009): 4.s 40. Marshall Breeding, “Perspectives on Patron Privacy and Security,” Computers in Libraries 35, no. 5 (2015): 13. 41. National Information Standards Organization, NISO Consensus Principles. 42. Joel JPC Rodrigues, et al., “Analysis of the Security and Privacy Requirements of Cloud-Based Electronic Health Records Systems,” Journal of Medical Internet Research 15, no. 8 (2013), https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3757992/. 43. Office of the National Coordinator for Health Information Technology, Guide to Privacy and Security of Electronic Health Information, April 2015, https://www.healthit.gov/sites/default/files/pdf/privacy/privacy-and-security-guide.pdf. 44. Office of the National Coordinator for Health Information Technology, “Health IT Certification Program Overview,” January 30, 2016, https://www.healthit.gov/sites/default/files/PUBLICHealthITCertificationProgramOvervie w_v1.1.pdf. 45. Office of the National Coordinator for Health Information Technology, “2015 Edition Health Information Technology (Health IT) Certification Criteria, Base Electronic Health Record (EHR) Definition, and ONC Health IT Certification Program Modifications Final Rule,” October 2015, https://www.healthit.gov/sites/default/files/factsheet_draft_2015-10-06.pdf. 46. Consumer Reports, “Consumer Reports to Begin Evaluating Products, Services for Privacy and Data Security,” Consumer Reports, March 6, 2017, http://www.consumerreports.org/privacy/consumer-reports-to-begin-evaluating- products-services-for-privacy-and-data-security/. 9825 ---- Current Trends and Goals in the Development of Makerspaces at New England College and Research Libraries Ann Marie L. Davis INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 94 Ann Marie L. Davis (davis.5257@osu.edu) is Faculty Librarian of Japanese Studies at The Ohio State University. ABSTRACT This study investigates why and which types of college and research libraries (CRLs) are currently developing makerspaces (or an equivalent space) for their communities. Based on an online survey and phone interviews with a sample population of CRLs in New England, I found that 26 CRLs had or were in the process of developing a makerspace in this region. In addition, several other CRLs were actively promoting and diffusing the maker ethos. Of these libraries, most were motivated to promote open access to new technologies, literacies, and STEM-related knowledge. INTRODUCTION AND OVERVIEW Makerspaces, alternatively known as hackerspaces, tech shops, and fab labs, are trendy new sites where people of all ages and backgrounds gather to experiment and learn. Born of a global community movement, makerspaces bring the do-it-yourself (DIY) approach to communities of tinkerers using technologies including 3D printers, robotics, metal- and woodworking, and arts and crafts.1 Building on this philosophy of shared discovery, public libraries have been creating free programs and open makerspaces since 2011.2 Given their potential for community engagement, college and research libraries (CRLs) have also been joining the movement in growing numbers.3 In recent years, makerspaces in CRLs have generated positive press in popular and academic journals. Despite the optimism, scholarly research that measures their impact is sparse. For example, current library and information science literature overlooks why and how various CRLs choose to create and maintain their respective makerspace. Likewise, there is scant data on the institutional objectives, frameworks, and experiences that characterize current CRL makerspace initiatives.4 This study begins to fill this gap by investigating why and which types of CRLs are creating makerspaces (or an equivalent room or space) for their library communities. Specifically, it focuses on libraries at four-year colleges and research universities in New England. Throughout this study, makerspace is used interchangeably with other terms, including maker labs and innovation spaces, to reflect the variation in names and objectives that underlie the current trends. In exploring their motives and experiences, this article provides a snapshot of the current makerspace movement in CRLs. mailto:davis.5257@osu.edu CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 95 https://doi.org/10.6017/ital.v37i2.9825 The study finds that the number of CRLs actively involved in the makerspace movement is growing. In addition to more than two dozen that have or are in the process of developing a makerspace, another dozen CRLs have staff who support the diffusion of maker technologies, such as 3D printing and crafting tools that support active learning and discovery, in the campus library and beyond.5 Comprising research and liberal arts schools, public and private, and small and large, the CRLs involved with makerspaces are strikingly diverse. Despite these differences, this population is united by common objectives to promote new literacies, provide open access to new technologies, and foster a cooperative ethos of making. LITERATURE REVIEW The body of literature on library makerspaces is brief, descriptive, and often didactic. Given the newness of the maker movement in public and academic libraries, many articles focus on early success stories and defining the movement vis-à-vis the mission of the library. For instance, Laura Britton, known for having created the first makerspace in a public library (The Fayetteville Free Library’s Fabulous Laboratory), defines a makerspace as “a place where people come together to create and collaborate, to share resources, knowledge, and stuff.”6 This definition, she determines, is strikingly similar to that of the library. Most literature on makerspaces appears in academic blogs, professional websites, and popular magazines. Among the most frequently cited is TJ McCue’s article, which celebrates Britton’s (née Smedley) FabLab while distilling the intellectual underpinnings of the makerspace ethos.7 Phillip Torrone, editor of Make: magazine, supports Smedley’s project as an example of “rebuilding” or “retooling” our public spaces.8 Within this camp, David Lankes, professor of information studies at Syracuse University, applauds such work as activist and community-oriented librarianship.9 Many authors emphasize the philosophical “fit,” or intersection, of public makerspaces with the principles of librarianship. Building on Torrone’s work, J. L. Balas claims that creating access to resources for learning and making is in keeping with the “library’s historical role of providing access to the ‘tools of knowledge.’”10 Others emphasize the hands-on, participatory, and inter- generational features of the maker movement, which has the potential to bridge the digital divide.11 Still others identify areas of literacy, innovation, and STE(A)M skills where library makerspaces can have a broad impact. While public libraries often focus on early childhood or adult education, CRLs adopt separate frameworks for information literacy. Like public libraries, they aim to build (meta)literacies and STE(A)M skills. Nevertheless, their programs often tailor to curricular goals in the arts and sciences or specialized degrees in engineering, education, and business. This is especially true of CRLs situated within large, research-intensive universities. Considering their specific missions and aims, this study seeks to identify the goals and challenges that reinforce the development of makerspaces in undergraduate and research environments. RESEARCH DESIGN AND METHOD Data presented in this study was gathered from library directors (or their designees) through an online survey and oral telephone interviews. After choosing a sampling frame of CRLs in New England, I developed a three-path survey, sent invitations, and collected and analyzed data using the online platform SurveyMonkey. The survey was distributed following review by the INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 96 institutional review board (IRB) at Southern Connecticut State University, where I completed a Master of Library Science (MLS) degree. Survey Population To assess generalized findings for the larger population in North America, I chose a cluster- sampling approach that limited the survey population to the CRLs in New England. In generating the sampling frame, I included four-year and advanced-degree institutions based on the assumption that libraries at these schools supported specialized, research, or field-specific degrees. I omitted for-profit and two-year institutions, based on the assumption that they are driven by separate business models. This process generated a contact list of 182 library directors at the designated CRLs in Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont. Survey Design The purpose of the survey was to gather basic data about the size and structure of the respondents’ institutions and to gain insights on their views and practices regarding makerspaces (the survey is reproduced in the appendix). The first page of the survey contained a statement of consent, including my contact information and that of my IRB. After a short set of preliminary questions, the survey branched into one of three paths based on respondents’ answers about makerspaces. The respondents were thus categorized into one of three groups: Path One (P1) for those with no makerspace and no plans to create one, Path Two (P2) for those with plans to develop a makerspace in the near future, and Path Three (P3) for those already running a makerspace in their libraries. P3 was the longest section of the survey, containing several questions about P3 experiences with makerspaces such as staffing, programing, and objectives. Data Collection In summer 2015, brief email invitations and two reminders were sent to the targeted population.12 To increase the participation rate, I sometimes wrote personal emails and made direct phone calls to CRLs known to have makerspace. For cold-call interviews, I developed a script explaining the nature of the online survey. After obtaining informed consent, I proceeded to ask the questions in the online survey and manually enter the participants’ responses at the time of the interview. On a few occasions, online respondents followed up with personal emails volunteering to discuss their library’s experiences in more detail. I took advantage of these invitations, which often provided unique and welcome insights. In analyzing the responses, I used tabulated frequencies for quantitative results and sorted qualitative data into two different categories. The first category was identified as “short and objective” and coded and analyzed numerically. The longer, more “subjective and value-driven” data was analyzed for common trends, relationships, and patterns. Within this second category, I also identified outlier responses that suggested possible exceptions to common experiences. RESULTS The survey closed after one month of data collection. At this time, 55 of 182 potential respondents had participated, yielding a response rate of 30.2%. Among these participants, the survey achieved a 100.0% response rate (9 completed surveys of 9 targeted CRLs) among libraries that were CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 97 https://doi.org/10.6017/ital.v37i2.9825 currently operating makerspaces. I created a list of all known CRL makerspaces in New England based on an exhaustive website search of all CRLs in this region. Subsequent interviews with the managers of the makerspaces on this list revealed no other hidden or unknown makerspaces in this region. Of the 55 respondents, 29 (52.7%) were in P1, 17 (30.9%) were in P2, and 9 (16.4%) were in P3. (See figure 1.) Figure 1. Survey participants’ (n = 55) current CRL efforts and plans to develop and operate a makerspace. Among respondents in P2 and P3, the majority (13 of 23) indicated that they were from libraries that served a student population of 4,999 people or fewer, while only one library served a population of 30,000 or more (see figure 2). In terms of sheer numbers, makerspaces might seem to be gaining traction at smaller CRLs, but proportionally, one cannot say that smaller CRLs are adopting makerspaces at a higher rate because the majority of survey participants have student populations of 19,999 or less (51, or 91.1%). The number of institutions with populations over 20,000 were in a clear minority (5, or 8.9%). (See figure 3.) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 98 Figure 2. P2 and P3 CRLs with makerspaces or concrete plans to develop a makerspace. Figure 3. The majority of CRLs (67.2%) that participated in the survey had a population of 4,999 students or less. Only 1.8% of schools that participated had a population of 30,000 students or more. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 99 https://doi.org/10.6017/ital.v37i2.9825 CRLs with No Makerspace (P1 = 29) In the first part of the survey, the majority of P1 respondents demonstrated positive views toward makerspaces despite having no plans to create one in the near future. Budgetary and space limitations aside, many were relatively open to the possibility of developing a makerspace in a more distant future. In the words of one respondent, “we have several areas within the library that present a heavy demand on our budget. In [the] future, we would love to consider a makerspace, and whether it would be a sensible and appropriate investment that would benefit our students.” When asked what their reasons were for not having a makerspace, some respondents (8, or 27.6%) said they had not given it much thought, but most (21, or 72.4%) offered specific answers. Among these, the most frequently cited reason (11, or 37.8%) was that a library makerspace would be redundant: such spaces and labs were already offered in other departments within the institution or in the broader community. At one CRL, for example, the respondent said the library did not want to compete with faculty initiatives elsewhere on campus. Other reasons included that makerspaces were expensive and not a priority. Some (5, or 17.2%) libraries preferred to allocate their funds to different types of spaces such as “a very good book arts studio/workshop” or “simulation labs.” Some (6, or 20.6%) shared concerns about a lack of space, staff, or simply “a good culture of collaboration [on campus].” Merging these sentiments, one respondent concluded, “People still need the library to be fairly quiet. . . . Having makerspace equipment in our library would be too distracting.” While some were skeptical (sharing concerns about potential hazards or that makerspaces were simply “the flavor of the month”), the majority (roughly 60%) were open and enthusiastic. One respondent, in fact, held a leadership position in a community makerspace beyond campus. According to this librarian, 3D printers, scanners, and laser cutters were sure to become more common, and CRLs would no doubt eventually develop “a formal space for making stuff.” CRLs with Plans for a Makerspace in the Near Future (P2 = 17) The second section of the survey (P2) focused primarily on the motivations and means by which this cohort planned to develop a makerspace. When asked why they were creating a makerspace, the most common response was to promote learning and literacy (15 respondents, or 88.2%). In addition, a large majority (12 respondents, or 70.6%) felt that makerspaces helped to promote the library as relevant, particularly in the digital age. Three more reasons that earned top scores (10 respondents each, or 58.2%) were being inspired by the ethos of making, creating a complement to digital repositories and scholarship initiatives, and providing access to expensive machines or tools. Additional reasons included building outreach and responding to community requests.13 (See figure 4.) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 100 Figure 4. Rationale behind P2 respondents’ decision to plan a makerspace (n = 17). While P2 respondents indicated a clear decision to create a makerspace, their timeframes were noticeably different. I categorized their open responses into one of six timeframes: “within six months,” “within one year,” “within two years,” “within four years,” “within six years,” and “unknown.” The result presented a clear trimodal distribution with three subgroups: six CRLs with plans to open within 18 months, five with plans to open within the next two years, and six with plans to open after three or more years (see figure 5). In addition to their timeframe, P2 respondents were also asked about their plans for financing their future makerspaces. Based on their open responses, the following six funding sources emerged: • the library budget, including surplus moneys or capital project funds • internal funding, including from campus constituents • donations and gifts • external grants • cost recovery plans, including small charges to users • not sure/in progress CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 101 https://doi.org/10.6017/ital.v37i2.9825 Figure 5. P2 respondents’ timeframe for developing the makerspace (n = 17). With seven mentions, the most common of the above funding was the “library budget.” With two mentions each, the least common sources were “cost recovery” and “not sure/in progress.” Among those who mentioned external grant applications, one respondent mentioned a focus on Women and STEM opportunities, and another specifically discussed attempts at grants from the Institute of Museum and Library Services. (See figure 6.) Figure 6. P2respondents’ plans for gathering and financing makerspace (n = 17). Regarding target user groups, some respondents focused on opportunities to enhance specific disciplinary knowledge, while others emphasized a general need for creating a free and open environment. One respondent mentioned that at her state-funded library, the space would be “geared to younger [primary and secondary school] ages,” “student teachers,” and “librarians on practicum assignments.” By contrast, another respondent at a large, private, Carnegie R1 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 102 university emphasized that the space was earmarked for the undergraduate and graduate students. In contrast to the cohort in P1, a notable number in P2 chose to create a makerspace despite the existence of maker-oriented research labs elsewhere on campus. As one respondent noted, the university was still “lacking a physical space where people could transition between technologies” and an open environment “where students doing projects for faculty” could come, especially later in the evenings. Another respondent at a similarly large, private institution explained that his colleagues recognized that most labs at their university were earmarked for specific professional schools. As a result, his colleagues came up with a strategy to provide self-service 3D printing stations at the media center, located in the library at the heart of campus. CRLs with Operating Makerspaces (P3 = 9) The final section of the survey (P3) focused on the motivations and means by which CRLs with makerspaces already in operation chose to develop and maintain their sites. In addition, this section gathered information on P3 CRL funding decisions, service models, and types of users in their makerspaces. Of the nine respondents in this path, all had makerspaces that had opened within the last three years. Among these, roughly a third (4) had been in operation from one to two years; another third (3) had operated for two to three years; and two had opened within the last year. (See table 1.) Table 1. Length of time the CRL makerspace has been in operation for P3 respondents (n = 9). Age of CRL Makerspace or Lab—P3 Answer Options Responses % Less than 6 months 1 11.1 6–12 months 1 11.1 1–2 years 4 44.4 2–3 years 3 33.3 More than 3 years 0 0.0 Total Responses 9 100.0 Priorities and Rationale The reasons behind P3 decisions to make a makerspace were slightly different from those of P2. While “promoting literacy and learning” was still a top priority, two other reasons, “promoting the maker culture of making” and “providing access to expensive machinery,” were deemed equally important (6 respondents, or 66.7%, for each). Other significant priorities included “promoting community outreach” (4 respondents, or 44.4%), “promoting the library as relevant” and in “direct response to community requests” (3 respondents, or 33.3%, for each). (See figure 7.) CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 103 https://doi.org/10.6017/ital.v37i2.9825 Figure 7. Rationale behind P3 respondents’ decision to develop and maintain a makerspace (n = 9). The answer of “other” was also given top priority (5 respondents, or 55.6%). I conclude that this indicated a strong desire among respondents to express in their own words their library’s unique decisions and circumstances. (Their free responses to this question are discussed below.) A familiar theme in the responses of the five respondents who elaborated on their choice of “other” was the desire to situate a makerspace in the central and open environment of the campus library. As one participant noted, there were “other access points and labs on campus,” but those labs were “more siloed” or cut off from the general population. By contrast, the campus library aimed to serve a broader population and anticipated a general “student need.” Later, the same respondent added that the makerspace was an opportunity to promote social justice, cultivate student clubs, and encourage engagement at the hub of the campus community. This type of ecumenical thinking was manifested in a similar remark that the library’s role was to reinforce other learning environments on campus. One respondent saw the makerspace as an additional resource “that complemented the maker opportunities that we have had in our curriculum resource center for decades.” Likewise, the library makerspace was intended to offer opportunities to a range of users on campus and beyond. Funding, Staffing, and Service Models When prompted to discuss how they gathered the resources for their makerspaces, the largest group (4 respondents) stated that a significant means for funding was through gifts and donations. Thus, the majority of CRL makerspaces in New England depended primarily on contributions from friends of the library, university/college alumni, and donors. The second most common source (3 respondents) was through the library budget, including surplus money at the end of the year. Making use of grant money and cost recovery were mentioned by two library participants, and internal and constituent support was useful for two libraries. (See figure 8.) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 104 Figure 8. P3 methods for gathering and financing a makerspace (n = 9). Among these, a particularly noteworthy case was a makerspace that had originated from a new student club focused on 3D printing. Originally based in a student dorm, the club was funded by a campus student union, which allocated grant money to students through a budget derived from the college tuition. As the club quickly grew, it found significant support in the library, which subsequently provided space (on the top floor of the library), staff, and financial support from surplus funds in the library budget. As this example would suggest, the sum of the responses showed that financing the makerspaces depended on a combination of strategies. One participant summarized it best: “We’ve slowly accumulated resources over time, using different funding for different pieces. Some grant funding. Mostly annual budget.” Regarding service models, more than half of these libraries (five) currently offer a combination of programming and open lab time where users could make appointments or just drop in. By contrast, two of the libraries offered programs only, and did not offer an open lab; another two did the opposite, offering no programming but an open makerspace at designated times. Of the latter, one is open Monday to Friday from 8 a.m. to 4 p.m., and the other is open during regular hours, with spaces that “can be booked ahead for classes or projects.” Most labs supported drop-in visitors and were open evenings and weekends. At one makerspace, where there was increasingly heavy demand, the staff required students to submit proposals with project goals. (See table 2.) While some libraries brought in community experts, others held faculty programs, and some scheduled lab time for individual classes. One makerspace prioritized not only the campus, but also the broader community, and thus featured programs for local high schools and seniors. Responses from this library emphasized the social justice thread that inspired their work and the community culture that they aimed to foster. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 105 https://doi.org/10.6017/ital.v37i2.9825 Table 2. Model for services offered in the CRL makerspace or 3D printing lab Do you offer programs in the makerspace/lab or is it simply opened at defined times for users to use? Answer Options Responses % Yes, we offer the following types of programs. 2 22.2 No, we simply leave the makerspace/lab open at the specific times. 2 22.2 We do both. We offer the programs and leave the makerspace/lab open at specific times. 5 55.6 As this data would suggest, most makerspaces were used by students (undergraduates and graduates) and faculty, in addition to local experts and generational groups. Survey responses showed that undergraduate students were the most common users (9 of 9 respondents checked this group as the most frequent type of user), and faculty and graduate students were the second and third most common (8 of 9 respondents checked these groups as most frequent) user groups in the labs. Local entrepreneurs, artists, designers, craftspeople, and campus and library staff also use the makerspaces. (See figure 9.) When prompted to identify “other” categories, one respondent specifically listed “learners, makers, sharers, studiers, [and] clubs.” Figure 9. Of the different types of users listed above, P3 respondents ranked them in order of who used the makerspace or equivalent lab most often (n = 9). The number and type of staff that managed and operated the makerspaces also varied widely at the nine CRLs in P3. Seven of the CRLs employed full-time, dedicated staff, among whom four participants checked off the “dedicated staff”–only options. Of the remaining two CRLs, one INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 106 reported staffing the makerspace with only one student, and one reported not having any staff working in the makerspace. I assume that the makerspace with no employees is managed by staff and students who are assigned to other, unspecified library departments or work groups. (See figure 10.) Figure 10. The staffing situations at the P3 respondents (n = 9), where each respondent is assigned a letter from “A” to “I.” Library programing was also diverse in terms of targeted audiences, speakers, and learning objectives. Instructional workshops varied from 3D scanning and printing to soldering, felt making, sewing, knitting, robotics, and programming (e.g., Raspberry Pi.) The type of equipment contained in each lab is likely correlated to the range in programming; however, investigating these links was beyond the scope of this study. Regarding this equipment, the size and activity of the participant CRLs varied considerably. Some responses were more specific than others, and thus the resulting dataset was incomplete (See table 3.) Challenges and Philosophies of CRL Makerspaces The final portion of the survey invited participants to freely offer their thoughts about operating a CRL makerspace. What follows below is a summary of the two most prominent themes that emerged: the challenges of building the lab and the social philosophies that framed these initiatives. In terms of challenges, the most common hurdle noted was the tremendous learning curve involved in establishing, maintaining, and promoting a makerspace. Setting up some of the 3D printers, for example, required knowledge about electrical networks, computer systems, and safety policies at a federal and local level. Once the hardware was running, lab managers needed to know how the machines interfaced with different and challenging software applications. Communication skills were also critical, as one respondent reported, “Printing anything and everything takes knowledge, experience.” Communicating with stakeholders and users in accessible and proactive ways required strong teaching and customer service skills. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 107 https://doi.org/10.6017/ital.v37i2.9825 Table 3. The types of tools and equipment used at P3 CRL respondents (n = 8), which are assigned letters from A to H. Major Equipment Offered by Individual Library Makerspaces or Equivalent Labs—Path 3 CRL Label Response Text A Die cut machine, 3D printer, 3D pens, raspberry pi, arduino, makey makey, art supplies, sewing supplies, pretty much anything anyone asks for we will try to get. B 2 Makerbot replicators, 1 digital scanner, 1 Othermill C 3D printing, 3D scanning, and Laser cutting. D 3D printing, 3D scanning, laser cutting, vinyl cutting, large format printing, cnc machine, media production/postproduction. E No response F 3 CreatorX, 1 Powerspec, 3 M3D, 2 Replicator 2, 1 Replicator2x, 1 Makergear, 1 LeapfrogXL, 1 Ultimaker, 1Type A,1 Deltaprinter, 1 Delta Maker, 2 Printrbot, 2 Filabots, 2X-box kinect for scanning, 2 Oculus rifts, embedded systems cabinet with Soldering stations, solar panels and micro controllers etc, 1 formlabs SLA, 1 Muve SLA, RoVa 5, a bunch of quadcopters G 3D printers (4 printers, 3 models), 3D scanning/digitizing equipment (3 models), Raspberry Pi, Arduino, a laser cutter and engraving system, poster printer, digital drawing tablets, GoPro, a variety of editing and design software, a number of tools (e.g. Dremel, soldering iron, wrenches, pliers, hammers, etc.), and a number of consumable or misc. items (e.g. paint, electrical tape, acetone, safety equipment, LED lights, screws and nails, etc.) H 48 printers (all Makerbot brand), 35 replicator 5th Gen (a moderate size printer, 5 Replicator Z18 printers (larger built size), and 5 replicator minis, 3 Replicator 2X) 5 Makerbot digitzers (turntable scanners 8" by 8") 1 Cubify Sense Hand Scanner 7 still cameras for photogrammetry 21 I-Mac computers 2 Mac Pros 2 Wacom graphics tablets (thinking about complementing other resources at other labs on campus) Another challenge that often came up was that of managing resources. As one respondent warned, CRLs should beware the “early adoption of certain technologies,” which can become “quickly INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 108 outdated by a rapidly growing field.” For others, it was a challenge to recruit the right staff that could run and fix machines in constant need of repair. In addition to hiring people with manufacturing and teaching skills, a successful lab required individuals who were savvy about outreach and community needs. Despite such challenges, many respondents were eager to discuss the aspirations and rewards of CRL makerspaces. Above all, respondents focused on the pedagogical opportunities on the one hand, and the potential for outreach and social justice on the other. One participant conceded that measuring advances in literacy and education was “intangible,” but he saw great value in “giving students the experience of seeing their ideas come to fruition.” The excitement that this created for one student manifested in a buzz, and subsequently a “fever” or groundswell, in which more users came in to tinker and learn. Meanwhile, the learning that took place among future professionals on campus was “critical,” even when results did not “go viral.” The aspiration to create human connections within and beyond campus was another striking theme. According to one respondent, the makerspace had “enabled some incredibly fruitful collaborations with different departments on campus.” This “fantastic outcome” was becoming more and more visible as the maker community grew. Other CRL makerspaces took pride in fostering a type of learning that was explicitly collaborative, exciting, and even “fun” for users. This in turn meant that some libraries were becoming “very popular,” generating a lot of “good PR,” and becoming central in the lives of new types of library users. Along these lines, some respondents aimed to leverage the power of the makerspace to achieve social justice goals that resonated with core values of librarianship. According to one enthusiastic participant, the ethos of sharing was alive and strong among the staff and the many students who saw their participation in the lab as a lifestyle and culture of collaborating. In another initiative, the respondent looked forward to eventually offering grants to those users who proposed meaningful ways to use the makerspace to create practical value for the community. From this perspective, there was added value in having the 3D printing lab situated specifically on a college or university campus. According to this respondent, the unique quality of the CRL makerspace was that by virtue of its location amid numerous and energetic young people, it was ripe for exploitation by those “who had great ideas and time and energy to do good.” DISCUSSION The aim of this study was to explore why and which types of CRLs had developed makerspaces (or an equivalent space) for their communities. Of the 56 respondents, roughly half (46%) were P2 and P3 libraries who were currently developing or operating a makerspace, respectively. Data from this survey indicated that none of the P2 or P3 CRLs fit a mold or pattern in terms of their size, educational models, or classifications. Upon analyzing the data, I found that the differentiators between the three groups were less clearly defined than originally anticipated. In one example of blurred lines, at least two respondents in P1 indicated that they were more actively engaged with makerspaces than two respondents in P2. Despite not having physical labs within their libraries, these P1 respondents were in the process of actively supporting or making plans for a makerspace within their CRL community. One P1 respondent, for example, served on the planning board for a local community makerspace and had therefore “thoroughly investigated and used” the makerspace at a CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 109 https://doi.org/10.6017/ital.v37i2.9825 neighboring university. Based on his knowledge, he decided to develop a complementary initiative (e.g., a book arts workshop) at his university library. Although his library did not yet have a formal makerspace, he felt confident that the diffusion of 3D printers would come to his library in the near future. Another P1 respondent was responsible for administering faculty teaching and innovation grants. Among the recent grant recipients were two faculty collaborators who used the library’s funds to build a makerspace at a campus location that was separate from the library. Although the makerspace was not directly developed by the respondent’s library, it was nevertheless a direct product of his library’s programmatic support. The respondent reported that for this reason, his library did not want to compete with its own faculty initiatives. In another example of blurred distinctions, one librarian in P2 was as deeply immersed in providing access and education on makerspaces as his colleagues in P3. Although he was not clear on when or how his library would finance a future makerspace, his library already offered many of the same services and workshops as P3 libraries. As a “Maker in the Library,” he offered non- credit-bearing 3D printing seminars to students and offered trial 3D printing services in the library for graduates of the 3D printing seminar. In addition, he made appearances at relevant campus events. When the university museum ran a 3D printing day, for instance, he participated as an expert panelist and gave public demonstrations on library-owned 3D printers and a scanner Kinect bar. In sum, despite the respondents’ categorization in P1 and P2, they sometimes shared more in common with the cohorts in P2 and P3, respectively. Given their library’s programmatic involvement in creating and endorsing the maker movement, these respondents were more than just “interested” or “open to” the prospect of creating a makerspace. While only 16% of CRLs (P3 = 9) responded as actively operating a makerspace, another 30% (P2 = 17) were involved in developing a makerspace in the near future. Moreover, the number of CRLs formally involved with the diffusion of maker technologies was not limited to just these two groups. Although some makerspaces were not directly run by the library, they had come to fruition because of library- based funding, grants, and professional support. And although some libraries did not have immediate plans for a makerspace, they were already promoting maker technologies and the maker ethos in other significant ways. CONCLUSION This study is one of the first comprehensive and comparative studies on CRL makerspace programs and their respective goals, policies, and outcomes. While the number of current CRL makerspaces is relatively low, the data suggests that the population is increasing; a growing number of CRLs are involved in the makerspace movement. More than two dozen CRLs were planning to develop makerspaces in the near future, helping to diffuse maker technologies through CRL programming, and/or supporting nonlibrary maker initiatives on campus and beyond. In addition, some CRLs were buying equipment, hiring dedicated staff, offering relevant workshops and demonstrations, and supporting community efforts to build labs beyond the library. Although the author aimed to find structural commonalities between CRLs in groups P2 and P3, none were found. Respondents in these groups came from institutions of all sizes , a wide variety INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 110 of endowment levels, and both public and private funding models, and they ranged in emphasis from the liberal arts to professional certifications and graduate-level research. Although a majority of CRL respondents were not currently making plans to create a makerspace, many respondents were enthusiastic about current trends, and some even promoted the maker movement in unexpected ways. Acknowledging the steady diffusion of 3D printers, many anticipated using such technologies in the future to promote traditional library values and goals. Respondents in P2 and P3 indicated that their primary rationale for developing a makerspace was to promote learning and literacy. Other prominent reasons included promoting library outreach and the maker culture of learning. Data from CRLs with makerspaces indicated that these benefits were often symbiotic and correlated to strong ideas about universal access to emergent tools and practices in learning. Unexpected challenges for developing and operating makerspaces include staffing them with highly skilled, knowledgeable, and service-oriented employees. Learning the necessary skills— including operating the printers, troubleshooting models, and maintaining a safe environment, to name a few—was time-consuming and labor intensive. The majority of funding for CRLs with or planning maker labs came from internal budgets, gifts and donors, and some grants. While some P1 CRLs indicated that their reason for not developing makerspaces was a lack of community interest, P2 and P3 CRLs were not necessarily motivated by user requests or needs, nor was lack of explicit need or interest a deterrent. On the contrary, a few reported a desire to promote the campus library as ahead of the curve by keeping in front of student and community needs. In a similar contradiction, some P1 respondents reported that their libraries did not want to compete with other labs on campus. Respondents from P2 and P3, however, wanted to offer an alternative to the more siloed or structured model of department- or lab-funded makerspaces. Although makerspaces were sometimes forming in other parts of campus, some P2 and P3 CRLs felt there was a gap in accessibility and therefore aimed to offer more open and flexible spaces. A final salient theme among P2 and P3 respondents was their commitment to equity of access and issues of social justice. Above all, they saw a unique fit for makerspaces in their CRL philosophies to serve the greater good. Among other advantages, CRLs were in a unique position to leverage the power of the makerspaces to take advantage of campus communities of “cognitive surplus” and millennial aspirations to share and create spontaneous communities of knowledge. Given the amount of resources that are required to create and maintain a makerspace, this research will be useful for CRLs considering such a space in the future. The present data suggests that no one type of library currently has a monopoly on maker spaces; regardless of size or funding levels, the common thread among P2 and P3 CRLs was simply a commitment to providing access to emergent technologies and supporting new literacies. While annual budgets and grant applications were critical for some libraries, the majority of CRLs funded the bulk of their makerspaces through gifts and donations. Future studies on the characteristics and challenges of P2 and P3 populations beyond those in New England will certainly amplify our understanding of these trends. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 111 https://doi.org/10.6017/ital.v37i2.9825 APPENDIX: SURVEY QUESTIONS Informed Consent CURRENT TRENDS IN THE DEVELOPMENT OF MAKERSPACES AND 3D PRINTING LABS AT NEW ENGLAND COLLEGE AND RESEARCH LIBRARIES Consent for the Participation in a Research Study Southern Connecticut State University Purpose You are invited to participate in a research project conducted by Ann Marie L. Davis, a masters student in library and information studies at Southern Connecticut State University. The purpose of this project is to investigate the experiences and goals of college and research libraries (CRLs) that currently have or are making plans to have an open makerspace (or an equivalent room or space). The results from this study will be included in a special project report for the MLS degree and the basis for an article to submit for peer-review. Procedures If you decide to participate, you will volunteer to take a fifteen-minute online survey. Risks and Inconveniences There are no known risks associated with this research; other than taking a short amount of time, the survey should not burden you or infringe on your privacy in any way. Potential Benefits and Incentive By participating in this research, you will be contributing to our understanding of current trends and practices with regards to community learning labs in CRLs. In addition, you will be providing useful knowledge that can support other libraries in making more informed decisions as they potentially develop their own makerspaces in the future. Voluntary Participation Your participation in this research study is voluntary. You may choose not to participate and you may withdraw your consent to participate at any time. You will not be penalized in any way should you decide not to participate or withdraw from this study. Protection of Confidentiality The survey is anonymous and does not ask for sensitive or confidential information. Contact Information Before you consent, please ask any questions on any aspect of this study that is unclear to you. You may contact me at my student email address at any time: xxx@owls.southernct.edu. If you have questions regarding your rights as a research participant, you may contact the Southern Connecticut State Institutional Review Board at (203) xxx-xxxx. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 112 Consent By proceeding to the next page, you confirm that you understand the purpose of this research, the nature of this survey and the possible burdens and risks as well as benefits that you may experience. By proceeding, this indicates that you have read this consent form, understand it , and give your consent to participate and allow your responses to be used in this research. ACRL Survey on Makerspaces and 3D Printers Q1. What is the size of your college or university? • 4,999 students or less • 5,000–9,999 students • 10,000–19,999 students • 20,000–29,999 students • 30,000 students or more Q2. How would you categorize your institution? (Please check all that apply) • Private • Public • Doctorate-Granting University (awards 20 or more doctorates) • Master’s College or University (awards 50 or more master’s degrees, but fewer than 20 doctorates) Liberal Arts and Sciences College • Other Q3. Do any of the libraries at your institution have a makerspace or equivalent hands-on learning lab (including a 3-D printing station or lab)? • Yes [if “Yes,” respondents are directed to question 14] • No [if “No,” respondents are directed to question 4] Q4. Do any of the libraries at your institution have plans to develop a makerspace or equivalent learning lab in the near future? • Yes [if “Yes,” respondents are directed to question 8] • No [if “No,” respondents are directed to question 5] PATH ONE (CRLs with no makerspace, no plans for makerspace) Q5. Are there specific reasons why your institution has decided not to pursue developing a makerspace or equivalent lab in the near future? • No reasons. We have not given much thought to makerspaces for our library. • Yes Q6. Thank you for your participation. Would you like a copy of the results when the report is completed? If yes, please enter your email address in the space provided. CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 113 https://doi.org/10.6017/ital.v37i2.9825 • No • Yes (please enter your email address below) Q7. You have almost concluded this survey. Before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. If no comments, please click “Next” to end the survey. PATH TWO [CRLs with plans to build a makerspace] Q8. What are the main goals that motivated your library’s decision to develop a makerspace or equivalent lab? (Please check all that apply) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other Q9. Of these goals, please rank them in order of their level of priority for your library. (Choose “N/A” for goals that you did not select in the previous question) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other Q10. What is your library’s time frame for developing a makerspace or equivalent lab? Q11. What are your library’s current plans for gathering and/or financing the resources needed for developing and maintaining the makerspace or equivalent lab? Q12. Thank you for your participation. Would you like a copy of the results when the report is completed? • No • Yes (please enter your email address below) Q13. You have almost concluded this survey. Before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. If no comments, please click “Next” to end the survey. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 114 PATH THREE [CRLs with a makerspace] Q14. How long have you had your makerspace or equivalent learning lab? • less than 6 months • 6–12 months • 1–2 years • 2–3 years • more than 3 years Q15. What were the main goals that motivated your library's decision to develop a makerspace or equivalent lab? (Please check all that apply) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs other Q16. Of these goals, please rank them in order of their level of priority for your library. (Choose “N/A” for goals that you did not select in the previous question) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other Q17. How did your library gather and/or finance the resources needed for developing and maintaining the makerspace or equivalent learning lab? Q18. Do you offer programs in the makerspace/lab or is it simply opened at defined times for users to use? • Yes, we offer the following types of programs: • No, we simply leave the makerspace/lab open at the following times (please note times and/or if a reservation is required): • We do both. We offer the following types of programs and leave the makerspace/lab open at the following times (please note types of programs, times open, and if a reservation is required): CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 115 https://doi.org/10.6017/ital.v37i2.9825 Q19. What type of community members tend to use your library's makerspace or equivalent lab most? (Please check all that apply) • undergraduate researchers • graduate researchers • faculty • staff • general public • local artists, designers, or craftspeople • local entrepreneurs • other Q20. Of the cohorts chosen above, please rank them in order of who uses the makerspace or equivalent lab most often. (Use “N/A” for cohorts that are not relevant to your space or lab) • undergraduate researchers • graduate researchers • faculty • staff • general public • local artists, designers, or craftspeople • local entrepreneurs • other Q21. How many dedicated staff does your library currently employ for the makerspace or equivalent? • 0 • 1 • 2 • 3 • other Q22. Where is your makerspace or equivalent lab located? Q23. What is the title or name of your makerspace or equivalent lab, and if known, what were the reasons behind this particular name? Q24. What major equipment and services does your library makerspace or equivalent lab provide? Q25. What unexpected considerations, challenges, or failures has your library faced in developing and maintaining the makerspace or equivalent lab? Q26. How would you assess the benefits or “return on investment” of having a makerspace or equivalent lab? Q27. Thank you for your participation. Would you like a copy of the final results when the report is completed? If yes, please enter your email address in the space provided. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 116 • No • Yes (please enter your email address below) Q28. You have almost concluded this survey. Before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. If no comments, please click “Next” to end the survey. REFERENCES AND NOTES 1 Laura Britton, “A Fabulous Laboratory: The Makerspace at Fayetteville Free Library,” Public Libraries 51, no. 4 (July/August 2012): 30–33, http://publiclibrariesonline.org/2012/10/a- fabulous-labaratory-the-makerspace-at-fayetteville-free-library/; Madelynn Martiniere, “Hack the World: How the Maker Movement is Impacting Innovation: From DIY Geige,” Medium, October 27, 2014, https://medium.com/@mmartiniere/hack-the-world-how-the-maker- movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz. 2 David V. Loertscher, “Maker Spaces and the Learning Commons,” Teacher Librarian 39, no. 6 (October 2012): 45–46, accessed December 9, 2016, Library, Information Science & Technology Abstracts with Full Text, EBSCOhost; Jon Kalish, “Libraries Make Room For High-Tech ‘Hackerspaces,’” National Public Radio, December 25, 2011, http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech- hackerspaces; Diane Slatter and Zaana Howard, “A Place to Make, Hack, and Learn: Makerspaces in Australian Public Libraries,” Australian Library Journal 62, no. 4: 272–84, https://doi.org/10.1080/00049670.2013.853335. 3 Sharon Crawford Barniskis, “Makerspaces and Teaching Artists,” Teaching Artist Journal 12, no. 1: 6–14. 4 Anne Wong and Helen Partridge, “Making as Learning: Makerspaces in Universities,” Australian Academic & Research Libraries 47, no. 3 (September 2016): 143–59, https://doi.org/10.1080/00048623.2016.1228163. 5 Erich Purpur et al., “Refocusing Mobile Makerspace Outreach Efforts Internally as Professional Development,” Library Hi Tech 34, no. 1 (2016): 130–42. 6 Britton, “A Fabulous Laboratory,” 30. 7 TJ McCue, “First Public Library to Create a Maker Space,” Forbes, November 15, 2011, http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker- space/. 8 Phillip Torrone, “Is It Time to Rebuild and Retool Public Libraries and Make ‘TechShops’?,” Make:, March 20, 2011, http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public- libraries-and-make-techshops/. 9 R. David Lankes, “Killing Librarianship,” (keynote speech, New England Library Association Annual Conference, October 3, 2011, Burlington, Vermont), https://davidlankes.org/killing- librarianship/. http://publiclibrariesonline.org/2012/10/a-fabulous-labaratory-the-makerspace-at-fayetteville-free-library/ http://publiclibrariesonline.org/2012/10/a-fabulous-labaratory-the-makerspace-at-fayetteville-free-library/ https://medium.com/@mmartiniere/hack-the-world-how-the-maker-movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz https://medium.com/@mmartiniere/hack-the-world-how-the-maker-movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech-hackerspaces http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech-hackerspaces https://doi.org/10.1080/00049670.2013.853335 https://doi.org/10.1080/00048623.2016.1228163 http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker-space/ http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker-space/ http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public-libraries-and-make-techshops/ http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public-libraries-and-make-techshops/ https://davidlankes.org/killing-librarianship/ https://davidlankes.org/killing-librarianship/ CURRENT TRENDS AND GOALS IN THE DEVELOPMENT OF MAKERSPACES | DAVIS 117 https://doi.org/10.6017/ital.v37i2.9825 10 Janet L. Balas, “Do Makerspaces Add Value to Libraries?,” Computers in Libraries 32, no. 9 (November 2012): 33. 11 Balas, “Do Makerspaces Add Value to Libraries?,” 33; Adrian G Smith et al., “Grassroots Digital Fabrication and Makerspaces: Reconfiguring, Relocating and Recalibrating Innovation?” (working paper, University of Sussex, SPRU Working Paper SWPS, Falmer, Brighton, September 2013), https://doi.org/10.2139/ssrn.2731835. 12 The number of and interval between emails corresponded roughly with Dillman’s “five-contact framework” as outlined in Carolyn Hank, Mary Wilkins Jordan, and Barbara M. Wildemuth, “Survey Research,” in Applications of Social Research Methods to Questions in Information and Library Science, edited by Barbara Wildemuth, 256–69 (Westport, CT: Libraries Unlimited, 2009), 261. 13 In choosing these priorities, respondents were asked to select as many of the reasons that applied to their own CRL. https://doi.org/10.2139/ssrn.2731835 ABSTRACT Introduction and Overview Literature Review Research Design and Method Survey Population Survey Design Data Collection Results CRLs with No Makerspace (P1 = 29) CRLs with Plans for a Makerspace in the Near Future (P2 = 17) CRLs with Operating Makerspaces (P3 = 9) Priorities and Rationale Funding, Staffing, and Service Models Challenges and Philosophies of CRL Makerspaces Discussion Conclusion Appendix: Survey Questions Informed Consent Purpose Procedures Risks and Inconveniences Potential Benefits and Incentive Voluntary Participation Protection of Confidentiality Contact Information Consent ACRL Survey on Makerspaces and 3D Printers PATH ONE PATH TWO PATH THREE References and Notes 9878 ---- Digitization of Text documents Using PDF/A Yan Han and Xueheng Wan INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 52 Yan Han (yhan@email.arizona.edu) is Full Librarian, the University of Arizona Libraries, and Xueheng Wan (wanxueheng@email.arizona.edu) is a student, Department of Computer Science, University of Arizona. ABSTRACT The purpose of this article is to demonstrate a practical use case of PDF/A for digitization of text documents following FADGI’s recommendation of using PDF/A as a preferred digitization file format. The authors demonstrate how to convert and combine TIFFs with associated metadata into a single PDF/A-2b file for a document. Using real-life examples and open source software, the authors show readers how to convert TIFF images, extract associated metadata and International Color Consortium (ICC) profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and ICC profiles. Providing theoretical analysis and empirical examples, the authors show that PDF/A has many advantages over the traditionally preferred file format, TIFF/JPEG2000, for digitization of text documents. BACKGROUND PDF has been primarily used as a file delivery format across many platforms in almost every device since its initial release in 1993. PDF/A was designed to address concerns about long-term preservation of PDF files, but there has been little research and few implementations of this file format. Since the first standard (ISO 19005 PDF/A-1), published in 2005, some articles discuss the PDF/A family of standards, relevant information, and how to implement PDF/A for born-digital documents.1 There is growing interest in the PDF and PDF/A standards after both the US Library of Congress and the National Archives and Records Administration (NARA) joined the PDF Association in 2017. NARA joined the PDF Association because PDF files are used as electronic documents in every government and business agency. As explained in a blog post, the Library of Congress joined the PDF Association because of the benefits to libraries, including participating in developing PDF standards, promoting best-practice use of PDF, and access to the global expertise in PDF technology.2 Few articles, if any, have been published about using this file format for preservation of digitized content. Yan Han published a related article in 2015 about theoretical research on using PDF/A for text documents.3 In this article, Han discussed the shortcomings of the widely used TIFF and JPEG2000 as master preservation file formats and proposed using the then-emerging PDF/A as the preferred file format for digitization of text documents. Han further analyzed the requirements mailto:yhan@email.arizona.edu mailto:wanxueheng@email.arizona.edu DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 53 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 of digitization of text documents and discussed the advantages of PDF/A over TIFF and JPEG2000. These benefits include platform independence, smaller file size, better compression algorithms, and metadata encoding. In addition, the file format reduces workload and simplifies post- digitization processing such as quality control, adding and updating missing pages, and creating new metadata and OCR data for discovery and digital preservation. As a result, PDF/A can be used in every phase of a digital object in an Open Archival Information System (OAIS)—for example, a Submission Information Package (SIP), Archive Information Package (AIP), and Dissemination Information Package (DIP). In summary, a PDF/A file can be a structured, self-contained, and self- described container allowing a simpler one-to-one relationship between an original physical document and its digital surrogate. In September 2016, the Federal Agencies Digital Guidelines Initiative (FADGI) released its latest guidelines for digitization related to raster images: Technical Guidelines for Digitizing Heritage Materials.4 The de-facto best practices for digitization, these guidelines provide federal agencies guidance and have been used in many cultural heritage institutions. Both the PDF Association and the authors welcomed the recognition of PDF/A as the preferred master file format for digitization of text documents such as unbound documents, bound volumes, and newspapers.5 GOALS AND TASKS Since Han has previously provided theoretical methods of coding raster images, metadata, and related information in PDF/A, the goals of this article are threefold: 1. present real-life experience of converting TIFFs/JPEG2000s to PDF/A and back, along with image metadata 2. test open source libraries to create and manipulate images, image metadata, and PDF/A 3. validate generated PDF/As with the first legitimate validator for PDF/A validation The tasks included the following: ● Convert all the master files in TIFFs/JPEG2000 from digitization of text documents into single PDF/A files losslessly. One document, one PDF/A file. ● Evaluate and extract metadata from each TIFF/JPEG2000 image and encode it along with its image when creating the corresponding PDF/A file. ● Demonstrate the runtimes of the above tasks for feasibility evaluation. ● Validate the PDF/A files against the newly released open source PDF/A validator veraPDF. ● Extract each digital image from the PDF/A file back to its original master image files along with associated metadata. ● Verify the extracted image files in the back-and-forth conversion process against the original master image files Choices of PDF/A Standards and Conformance Level This article demonstrates using PDF/A-2b as a self-contained self-describing file format. Currently, there are three related PDF/A standards (PDF/A-1, PDF/A-2, and PDF/A-3), each with INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 54 three conformance levels (a, b, and u). The reasons for choosing PDF/A-2 (instead of PDF/A-1 or PDF/A-3) are the following: ● PDF/A-1 is based on PDF 1.4. In this standard, images coded in PDF/A-1 cannot use JPEG2000 compression (named in PDF/A as JPXDecode). One can still convert TIFFs to PDF/A-1 using other lossless compression methods such as LZW. However, the space- saving benefits of JPEG2000 compression over other methods would not be utilized. ● PDF/A-2 and PDF/A-3 are based on PDF 1.7. One significant feature of PDF 1.7 is that it supports JPEG2000 compression, which saves 40–60 percent of space for raster images compared to uncompressed TIFFs. ● PDF/A-3 has one major feature that PDF/A-2 does not have, which is to allow arbitrary files to be embedded within the PDF file. In this case, there is no file to be embedded. The authors chose conformance level b for simplicity. ● b is basic conformance, which requires only necessary components (e.g., all fonts embedded in the PDF) for reproduction of a document’s visual appearance. ● a is accessible conformance, which means b conformance level plus additional accessibility (structural and semantic features such as document structure). One can add tags to convert PDF/2b to PDF/2a. ● u represents a conformance level with the additional requirement that all text in the document have Unicode equivalents. This article does not cover any post-processing of additional manual or computational features such as adding OCR text to the generated PDF/A files. These features do not help faithfully capture the look and feel of original pages in digitization, and they can be added or updated later without any loss of information. In addition, OCR results rely on the availability of OCR engines for the document’s language, and results can vary between different OCR engines over time. OCR technology is getting better and will produce better results in the future. For example, current OCR technology for English gives very reliable (more than 90 percent) accuracy. In comparison, traditional Chinese manuscripts and Pashto/Persian give unacceptably low accuracy (less than 60 percent). The cutting edge on OCR engines has started to utilize artificial intelligence networks, and the authors believe that a breakthrough will happen soon. Data Source The University of Arizona Libraries (UAL) and Afghanistan Center at Kabul University (ACKU) have been partnering to digitize and preserve ACKU’s permanent collection held in Kabul. This collaborative project created the largest Afghan digital repository in the world. Currently the Afghan digital repository (http://www.afghandata.org) contains more than fifteen thousand titles and 1.6 million pages of documents. Digitization of these text documents follows the previous version of the FADGI guideline, which recommended scanning each page of a text document into a separate TIFF file as the master file. These TIFFs were organized by directories in a file system, where each directory represents a corresponding document containing all the scanned pages of this title. An example of the directory structure can be found in Han’s article. http://www.afghandata.org/ DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 55 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 PDF/A and Image Manipulation Tools There are a few open source and proprietary PDF software development kits (SDK). Adobe PDF Library and Foxit SDK are the most well-known commercial tools to manipulate PDFs. To show readers that they can manipulate and generate PDF/A documents themselves, open source software, rather than commercial tools, was used. Currently, only a very limited number of open source PDF SDKs are available, including iText and PDFBox. iText was chosen because it has g ood documentation and provides a well-built set of APIs to support almost all the PDF and PDF/A features. Initially written by Bruno Lowagie (who was in the ISO PDF standard working group) in 1998 as an in-house project, Lowagie later started up his own company, iText, and published iText in Action with many code examples.6 Moreover, iText has Java and C# coding options with good code documentation. It is worth mentioning that iText has different versions. The author used iText 5.5.10 and 5.4.4. Using an older version in our implementation generated a non-compatible PDF/A file because the it was not aligned with the PDF/A standard.7 For image processing, there were a few popular open source options, including ImageMagick and GIMP. ImageMagick was chosen because of its popularity, stability, and cross-platform implementation. Our implementation identified one issue with ImageMagick: the current version (7.0.4) could not retrieve all the metadata from TIFF files as it did not extract certain information such as the Image File Directory and color profile. These metadata are critical because they are part of the original data from digitization. Unfortunately, the author observed that some image editors were unable to preserve all the metadata from the image files during the conversion process. Hart and De Varies used case studies to show the vulnerability of metadata, demonstrating metadata elements in a digital object can be lost and corrupted by use or conversion of a file to another format. They suggested that action is needed to ensure proper metadata creation and preservation so that all types of metadata must be captured and preserved to achieve the most authentic, consistent, and complete digital preservation for future use.8 Metadata Extraction Tools and Color Profiles As we digitize physical documents and manipulate images, color management is important. The goal of color management is to obtain a controlled conversion between the color representations of various devices such as image scanners, digital cameras, and monitors. A color profile is a set of data that control input and output of a color space. The International Color Consortium (ICC) standards and profiles were created to bring various manufacturers together because embedding color profiles into images is one of the most important color management solutions. Image formats such as TIFF and JPEG2000 and document formats such as PDF may contain embedded color profiles. The authors identified a few open source tools to extract TIFF metadata, includin g ExifTool, Exiv2, and tiffInfo. ExifTool is an open source tool for reading, writing, and manipulating metadata of media files. Exiv2 is another free metadata tool supporting different image formats. The tiffInfo program is widely used in the Linux platform, but it has not been updated for at least ten years. Our implementations showed that ExifTool was the one that most easily extracted the full ICC profiles and other metadata from TIFF and JPEG2000 files. ImageMagick and other image processing software were examined in Van der Knijff’s article discussing JPEG2000 for long-term preservation.9 He found that ICC profiles were lost in ImageMagick. Our implementation has INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 56 showed the current version of ImageMagick has fixed this issue. A metadata sample can be found in appendix A. IMPLEMENTATION Converting and Ordering TIFFs into a Single PDF/A-2 File When ordering and combining all individual TIFFs of a document into a single PDF/A-2b file, the authors intended to preserve all information from the TIFFs, including raster image data streams and metadata stored in each TIFF’s header. The raster image data streams are the main images reflecting the original look and feel of these pages, while the metadata (including technical and administrative metadata such as BitsPerSample, DateTime, and Make/Model/Software) tells us important digitization and provenance information. Both are critical for delivery and digital preservation. The TIFF images were first converted to JPEG2000 with lossless compression using the open source ImageMagick software. Our tests of ImageMagick demonstrated that it can handle different color profiles and will convert images correctly if the original TIFF comes with a color profile. This gave us confidence that past concerns about JPEG2000 and ImageMagick had been resolved. These images were then properly sorted into their original order and combined into a single PDF/A-2 file. An alternative is to directly code TIFF’s image data stream into a PDF/A file, but this approach would miss one benefit of PDF/A-2: tremendous file size reduction with JPEG2000. The following is the pseudocode of ordering and combining all the TIFFs in a text document into a single PDF/A- 2 file. CreatePDFA2(queue TiffList) { Create an empty queue XMLQ; Create an empty queue JP2Q; /* TiffFileList is pre-sorted queue based on the original order */ /* Convert each TIFF to JPEG2000 losslessly, then add each JPEG2000 and its metadata into a queue */ while (TiffList is NOT empty) { String TiffFilePath = TiffList.dequeue(); string xmlFilePath = Tiff metadata extracted using exiftool; XMLQ.enqueue(xmlFilePath); String jp2FilePath = JPEG2000 file location from Tiff converted by ImageMagick; JP2Q.enqueue(jp2FilePath); } /* Convert each image’s metadata to XMP, add each JPEG2000 and its metadata into the PDF/A-2 file based on its original order */ Document pdf2b = new Document(); /* create PDF/A-2b conformance level */ PdfAWriter writer = PdfAWriter.getInstance(doc, new FileOutputStream(PdfAFilePath),PdfAConformaceLevel.PDF_A_2B); writer.createXmpMetadata(); //Create Root XMP DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 57 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 pdf2b.open(); while(JP2Q is NOT empty){ Image jp2 = Image.getInstance(JP2Q.dequeue()); Rectangle size = new Rectangle(jp2.getWidth(), jp2.getHeight()); //PDF page size setting pdf2b.setPageSize(size); pdf2b.newPage(); // create a new page for a new image byte[] bytearr = XmpManipulation(XMLQ.dequeue()); // convert original metadata based on the XMP standard writer .setPageXmpMetadata(bytearr); pdf2b.add(jp2); } pdf2b.close(); } Converting PDF/A-2 Files back to TIFFs and JPEG2000s To ensure that we can extract raster images from the newly created PDF/A-2 file, the authors also wrote code to convert a PDF/A-2 file back to the original TIFF or JPEG2000 format. This implementation was a reverse process of the above operation. Once the reverse conversion process was completed, the authors verified that the image files created from the PDF/A-2 file were the same as before the conversion to PDF/A-2. Note that we generated MD5 checksums to verify image data streams. Images data streams are the same, but metadata location can be varied because of inconsistent TIFF tags used over the years. When converting one TIFF to another TIFF, ImageMagick has its implementation of metadata tags. The code can be found in appendix B. PDF/A Validation PDF/A is one of the most recognized digital preservation formats, specially designed for long -term preservation and access. However, no commonly accepted PDF/A validator was available in the past, although several commercial and open source PDF preflight and validation engines (e.g., Acrobat) were available. Validating a PDF/A against the PDF/A standards is a challenging task for a few reasons, including the complexity of the PDF and PDF/A formats. The PDF Association and the Open Preservation Foundation recognized the need and started a project to develop an open source PDF/A validator and build a maintenance community. Their result, VeraPDF, is an open source validator designed for all PDF/A parts and conformance levels. Released in January 2017, the goal of veraPDF is to become the commonly accepted PDF/A validator. 10 Our generated PDF/As have been validated with veraPDF 1.4 and Adobe Acrobat Pro DC Preflight. Both products validated the PDF/A-2b files as fully compatible. Our implementations showed that veraPDF 1.4 verified more cases than Acrobat DC Preflight. Figure 1 shows a PDF file structure and its metadata. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 58 Figure 1. A PDF object tree with root-level metadata. RUNTIME AND CONCLUSION The time complexity of our code is O(log n) because of the sorting algorithms used. TIFFs were first converted to JPEG2000. When JPEG2000 images are added to a PDF/A-2 file, no further image manipulation is required because the generated PDF/A-2 uses JPEG2000 directly (in other words, it uses the JPXDecode filter). Tables 1 and 2 show the performance comparison running in our computer hardware and software environment (Intel Core i7-2600 CPU@3.4GHz, 8GB DDR3 RAM, 3TB 7200-RPM 64MB-cache hard disk running Ubuntu 16.10). DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 59 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 Table 1. Runtimes of converting grayscale TIFFs to JPEG2000s and to PDF/A-2b No. of Files Total File Size (MB) Image Conversion Runtime (TIFFs to JP2s in seconds) Total Runtime (TIFFs to JP2s to a single PDF/A-2b in seconds) 1 9.1 3.61 3.98 10 91.1 35.63 36.71 20 182.2 71.83 73.98 50 455.5 179.06 184.63 100 910.9 358.3 370.91 Table 2. Runtimes of converting color TIFFs to JPEG2000s and to PDF/A-2b No. of Files Total File Size (MB) Image Conversion Runtime (TIFFs to JP2s in seconds) Total Runtime (TIFFs to JP2s to a single PDF/A-2b in seconds) 1 27.3 14.80 14.94 10 273 150.51 151.55 20 546 289.95 293.21 50 1,415 741.89 749.75 100 2,730 1490.49 1509.23 The results show that (a) the majority of the runtime (more than 95 percent) is spent in converting a TIFF to a JPEG2000 using ImageMagick (see figure 2); (b) the average runtime of converting a TIFF has a constant positive relationship with the file’s size (see figure 2); (c) in INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 60 comparison, the runtime of converting a color TIFF is significantly higher than that of converting a greyscale TIFF (see figure 2); and (d) it is feasible in terms of time and resources to convert existing master images of digital document collections to PDF/A-2b. For example, the runtime of 1 TB of conversion of color TIFFs will be 552,831 seconds (153.5 hours; 6.398 days) using the above hardware. The authors have already processed more than 600,000 TIFFs using this method. The authors conclude that using PDF/A gives institutions advantages of the newly preferred master file format for digitization of text documents over TIFF/JPEG2000. The above implementation demonstrates the ease, the reasonable runtime, and the availability of open source software to perform such conversions. From both the theoretical analysis and empirical evidences, the authors show that PDF/A has advantages over the traditional preferred file format TIFF for digitization of text documents. Following best practice, a PDF/A file can be a self- contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and ICC profiles. SUMMARY The goal of this article is to demonstrate empirical evidences of using PDF/A for digitization of text document. The authors evaluated and used multiple open source software programs for processing raster images, extracting image metadata, and generating PDF/A files. These PDF/A files were validated using the up-to-date PDF/A validators veraPDF and Acrobat Preflight. The authors also calculated the time complexity of the program and measured the total runtime in multiple testing cases. Most of the runtime was spent on image conversions from TIFF to JPEG2000. The creation of the PDF/A-2b file with associated page-level metadata accounted for less than 5 percent of the total runtime. Runtime of conversion of a color TIFF was much higher than that of a greyscale one. Our theoretical analysis and empirical examples show that using PDF/A-2 presents many advantages over the traditional preferred file format (TIFF/JPEG2000) for digitization of text documents. DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 61 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 Figure 2. File size, greyscale and color TIFFs and runtime ratio. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 62 APPENDIX A: SAMPLE TIFF METADATA WITH ICC HEADER 8 3400 4680 8 8 8 Uncompressed RGB (Binary data 41025 bytes, use -b option to extract) 3 1 (Binary data 28079 bytes, use -b option to extract) 400 400 Chunky APPL 2.2.0 Display Device Profile RGB XYZ 2006:02:02 02:20:00 acsp Apple Computer Inc. Not Embedded, Independent none Reflective, Glossy, Positive, Color Perceptual 0.9642 1 0.82491 EPSO 0 EPSON sRGB 0.43607 0.22249 0.01392 0.38515 0.71687 0.09708 0.14307 0.06061 0.7141 0.95045 1 1.08905 Copyright (c) SEIKO EPSON CORPORATION 2000 - 2006. All rights reserved. (Binary data 8204 bytes, use -b option to extract) (Binary data 8204 bytes, use -b option to extract) (Binary data 8204 bytes, use -b option to extract) 0 0 0 DIGITIZATION OF TEXT DOCUMENTS USING PDF/A | HAN AND WAN 63 HTTPS://DOI.ORG/10.6017/ITAL.V37I1.9878 APPENDIX B: SAMPLE CODE TO CONVERT PDF/A-2 BACK TO JPEG2000S /* Assumption: The PDF/A-2b file was specifically generated from image objects converted from TIFF images with JPXDecode along with page-level metadata */ public static void parse(String src, String dest) throws IOException{ PdfReader reader = new PdfReader(src); PdfObject obj; int counter = 0; for(int i = 1; i <= reader.getXrefSize(); i ++){ obj = reader.getPdfObject(i); if(obj != null && obj.isStream()){ PRStream stream = (PRStream) obj; byte[] b; try{ b = PdfReader.getStreamBytes(stream); }catch(UnsupportedPdfException e){ b = PdfReader.getStreamBytesRaw(stream); } PdfObject pdfsubtype = stream.get(PdfName.SUBTYPE); FileOutputStream fos = null; if (pdfsubtype != null && pdfsubtype.toString().equals(PdfName.XML.toString())) { fos = new FileOutputStream(String.format(dest + "_xml/" + counter+".xml", i)); System.out.println("Page Metadata Extracted!"); } if (pdfsubtype != null && pdfsubtype.toString().equals(PdfName.IMAGE.toString())) { counter ++; fos = new FileOutputStream(String.format(dest + "_jp2/" + counter+".jp2", i)); } if (fos != null) { fos.write(b); fos.flush(); fos.close(); System.out.println("JPEG2000s Conversion from PDF completed !"); } } } /* Then Use ImageMagick library to convert JPEG2000s to TIFFs */ INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2018 64 REFERENCES 1 PDF-Tools.com and PDF Association, “PDF/A—The Standard for Long-Term Archiving,” version 2.4, white paper, May 20, 2009, http://www.pdf- tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf; Duff Johnson, “White Paper: How to Implement PDF/A,” Talking PDF, August 24, 2010, https://talkingpdf.org/white-paper- how-to-implement-pdfa/; Alexandra Oettler, “PDF/A in a Nutshell 2.0: PDF for Long-Term Archiving,” Association for Digital Standards, 2013, https://www.pdfa.org/wp- content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf; Library of Congress, “PDF/A, PDF for Long-Term Preservation,” last modified July 27, 2017, https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml. 2 Library of Congress, “The Time and Place for PDF: An Interview with Duff Johnson of the PDF Association,” The Signal (blog), December 12, 2017, https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff- johnson-of-the-pdf-association/. 3 Yan Han, “Beyond TIFF and JPEG2000: PDF/A as an OAIS Submission Information Package Container,” Library Hi Tech 33, no. 3 (2015): 409–23, https://doi.org/10.1108/LHT-06-2015- 0068. 4 Federal Agencies Digital Guidelines Initiative, Technical Guidelines for Digitizing Cultural Heritage Materials. (Washington, DC: Federal Agencies Digital Guidelines Initiative, 2016), http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20D igital%20Guidelines%20Initiative-2016%20Final_rev1.pdf. 5 Duff Johnson, “US Federal Agencies Approve PDF/A,” PDF Association, September 2, 2016, http://www.pdfa.org/new/us-federal-agencies-approve-pdfa/. 6 Bruno Lowagie, iText in Action, 2nd ed. (Stamford, CT: Manning, 2010). 7 “iText 5.4.4,” iText, last modified September 16, 2013, http://itextpdf.com/changelog/544. 8 Timothy Robert Hart and Denise de Vries, “Metadata Provenance and Vulnerability,” Information Technology and Libraries 36, no. 4 (2017), https://doi.org/10.6017/ital.v36i4.10146. 9 Johan Van der Knijff, “JPEG 2000 for Long-Term Preservation: JP2 as a Preservation Format,” D- Lib 17, no. 5/6 (2011), https://doi.org/10.1045/may2011-vanderknijff. 10 PDF Association, “How veraPDF does PDF/A Validation,” 2016, http://www.pdfa.org/how- verapdf-does-pdfa-validation/. http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://www.pdfa.org/wp-content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf https://www.pdfa.org/wp-content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://doi.org/10.1108/LHT-06-2015-0068 https://doi.org/10.1108/LHT-06-2015-0068 http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ http://itextpdf.com/changelog/544 http://itextpdf.com/changelog/544 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.1045/may2011-vanderknijff https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ Abstract Background Goals and Tasks Choices of PDF/A Standards and Conformance Level Data Source PDF/A and Image Manipulation Tools Metadata Extraction Tools and Color Profiles Implementation Converting and Ordering TIFFs into a Single PDF/A-2 File Converting PDF/A-2 Files back to TIFFs and JPEG2000s PDF/A Validation Runtime and Conclusion Summary Appendix A: Sample TIFF Metadata with ICC header Appendix B: Sample Code to convert PDF/A-2 back to JPEG2000s References 9953 ---- Mobile Website Use and Advanced Researchers: Understanding Library Users at a University Marine Sciences Branch Campus Mary J. Markland, Hannah Gascho Rempel, and Laurie Bridges INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 7 ABSTRACT This exploratory study examined the use of the Oregon State University Libraries website via mobile devices by advanced researchers at an off-campus branch location. Branch campus–affiliated faculty, staff, and graduate students were invited to participate in a survey to determine what their research behaviors are via mobile devices, including frequency of their mobile library website use and the tasks they were attempting to complete. Findings showed that while these advanced researchers do periodically use the library website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. Mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. Results of this survey will be used to address knowledge gaps around library resources and research tools and to generate more ways to study advanced researchers’ use of library services via mobile devices. INTRODUCTION As use of mobile devices has expanded in the academic environment, so has the practice of gathering data from multiple sources about what mobile resources are and are not being used. This data informs the design decisions and resource investments libraries make in mobile tools. Web analytics is one tool that allows researchers to discover which devices patrons use to access library webpages. But web analytics data do not show what patrons want to do and what hurdles they face when using the library website via a mobile device. Web analytics also lacks nuance in that it cannot distinguish user characteristics, such as whether users are novice or advanced researchers, which may affect how these users interact with a mobile device. User surveys are another tool for gathering data on mobile behaviors. User surveys help overcome some of the limitations of web analytics data by directly asking users about their perceived research skills and the resources they use on a mobile device. As is the case at most libraries, Oregon State University Libraries serves a diverse range of users. We were interested in learning whether advanced researchers—particularly advanced researchers who work at a branch campus—use the library’s resources differently than main Mary J. Markland (mary.markland@oregonstate.edu), is Head, Guin Library; Hannah Gascho Rempel (hannah.rempel@oregonstate.edu) is Science Librarian and Coordinator of Graduate Student Success Services; and Laurie Bridges (laurie.bridges@oregonstate.edu) is Instruction and Outreach Librarian, Oregon State University Libraries and Press. mailto:mary.markland@oregonstate.edu mailto:hannah.rempel@oregonstate.edu mailto:laurie.bridges@oregonstate.edu MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 8 campus users. We were chiefly interested in these advanced researchers because of the mobile nature of their work. They are graduate students and faculty in the field of marine science who work in a variety of locations, including their offices, labs, and in the field (which can include rivers, lakes, and the ocean). We focused on the use of the library website via mobile devices as one way to determine whether specific library services should be adapted to best meet the needs of this targeted user community. Oregon State University (OSU) is Oregon’s land-grant university; its home campus is in Corvallis, Oregon. Hatfield Marine Science Center (HMSC) in Newport is a branch campus that includes a branch library. Guin Library at HMSC serves OSU students and faculty from across the OSU colleges along with the co-located federal and state agencies of the National Oceanic and Atmospheric Administration (NOAA), US Fish and Wildlife Service, Environmental Protection Agency (EPA), United States Geological Survey (USGS), United States Department of Agriculture (USDA), and the Oregon Department of Fish and Wildlife. The Guin Library is in Newport, which is forty-five miles from the main campus. Like many other branch libraries, Guin Library was established at a time when providing a print collection close to where researchers and students work was paramount, but today it must adapt its services to meet the changing information needs of its user base. Branch libraries are typically designed to serve a clientele or subject area, which can create a different institutional culture from the main library. Guin Library serves advanced undergraduates, graduate students, and scientific researchers. HMSC’s distance from Corvallis, the small size of the researcher community, and the shared focus on a research area—marine sciences—create a distinct culture. While Guin Library is often referred to as the “heart of HMSC,” the number of in-person library users is decreasing. This decline is not unexpected as numerous studies have shown that faculty and graduate students have fewer needs that require an in-person trip to the library.1 Studies have also shown that faculty and graduate students can be unaware of the services and resources that libraries provide, thereby continuing the cycle of underuse. 2 To learn more about the needs of HMSC’s advanced researchers, this exploratory study examined their research behaviors via mobile devices. The goals of this study were to • determine if and with what frequency advanced researchers at HMSC use the OSU Libraries website via mobile devices; • gather a list of tasks advanced users attempt to accomplish when they visit the OSU Libraries website on a mobile device; and • determine whether the mobile behaviors of these advanced researchers are different from those of researchers from the main OSU campus (including undergraduate students), and if so, whether these differences warrant alternative modes of design or service delivery. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 9 LITERATURE REVIEW The conversation about how best to design mobile library websites has shifted over the past decade. Early in the mobile-adoption process some libraries focused on creating special websites or apps that worked with mobile devices.3 While libraries globally might still be creating mobile- specific websites and apps,4 US libraries are trending toward responsively designed websites as a more user-friendly option and a simpler solution for most libraries with limited staff and budgets.5 Most of the literature on mobile-device use in higher education is focused on undergraduates across a wide range of majors who are using a standard academic library. 6 To help provide context for how libraries have designed their websites for mobile users, some of those specific findings will be shared later. But because our study focused on graduate students and faculty in a science- focused branch library, we will begin with a discussion of what is known about more advanced researchers’ use of library services and their mobile-device habits. Several themes emerged from the literature on graduate students’ relationships with libraries. In an ironic twist, faculty think graduate students are being assisted by the library while librarians think faculty are providing graduate students with the help they need to be successful.7 This results in many graduate students end up using their library’s resources in an entirely disintermediated way. Graduate students, especially those in the sciences, visit the physical library less often and use online resources more than undergraduate students.8 Most graduate students start their research process with assistance from academic staff, such as advisors and committee members,9 and are unaware of many library services and resources.10 As frequent virtual-library users who receive little guidance on how to use the library’s tools, graduate students need a library website that is clear in scope and purpose, offers help, and has targeted services. 11 Compared to reports on undergraduate use of mobile devices to access their library’s website, relatively few studies have focused on graduate-student or faculty mobile behaviors. A recent survey of Japanese Library and Information Science (LIS) students compared and undergraduate graduate students’ usage of mobile devices to access library services and found slight differences. However, both groups reported accessing libraries as last on their list of preferred smartphone uses.12 Aharony examined the mobile use behaviors of Israeli LIS graduate students and found approximately half of these graduate students used smartphones and perceived them to be useful and easy tools for use in their everyday life, and could transfer those habits to library searching behaviors.13 When looking specifically at how patrons use library services via a mobile device, Rempel and Bridges found the top reason graduate students at their main campus used the OSU Libraries website via mobile devices was to find information on library hours, followed by finding a book and researching a topic.14 Barnett-Ellis and Vann surveyed their small university and found that both undergraduate and graduate students were more than twice as likely to use mobile devices as are their faculty and staff; a majority of students also indicated they were likely to use mobile devices to conduct research.15 Finally, survey results showed graduate students in Hofstra University’s College of Education reported accessing library materials via a mobile device twice as often as other student groups. In addition, these graduate students reported being comfortabl e MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 10 reading articles up to five pages long on their mobile devices. Graduate students were also more likely to be at home when using their mobile device to access the library, a finding the authors attributed to education graduate students frequently being employed as full-time teachers.16 Research on how faculty members use library resources characterizes a population that is confident in their literature-searching skills, prefers to search on their own, and has little direct contact with the library.17 Faculty researchers highly value convenience;18 they rely primarily on electronic access to journal articles but prefer print access to monographs.19 Faculty tend to be self-trained at using search tools, such as PubMed or other online databases, and therefore are not always aware of the more in-depth functionality of these tools.20 In contrast to graduate students, Rempel and Bridges found that faculty using the library website via mobile devices were less interested in information about the physical library, such as library hours, and were more likely to be researching a topic.21 Medical faculty are one of the few faculty groups whose mobile-research behaviors have been specifically examined. A survey administered by Bushhousen et al. at a medical university revealed that a third of respondents used mobile apps for research-related activities.22 Findings by Boruff and Storie indicate that one of the biggest barriers to mobile use in health-related academic settings was wireless access.23 Thus apps that did not require the user to be connected to the internet were highly desired. Faculty and graduate students in health-related academic settings saw a role for the library in advocating for better wireless infrastructure, providing access to a targeted set of heavily used resources, and providing online guides or in-person tutorials on mobile apps or procedures specific to their institution. 24 According to the literature, most design decisions for library mobile sites have been made on the basis of information collected about undergraduate students’ behavior at main-branch campuses. To help inform our understanding of how recent decisions have been made, the remainder of the literature review focuses on what is known about undergraduate students’ mobile behavior. Undergraduate students are very comfortable using mobile technologies and perceive themselves to be skilled with these devices. According to the 2015 EDUCAUSE Center for Research and Analysis’ (ECAR) study of undergraduate students and information technology, most undergraduate students consider themselves sophisticated technology users who are engaged with information technologies.25 Undergraduate students mainly use their smartphones for non- class activities. But students indicate they could be more effective technology users if they were more skilled at tools such as the learning management system, online collaboration tools, e-books, or laptops and smartphones in class. Of interest to libraries is the ECAR participants’ top area of reported interest, “search tools to find reference or other information online for class work.”26 However, when a mobile library site is in place, usage rates have been found to be lower than anticipated. In a study of undergraduate science students, Salisbury et al. found only 2 percent of respondents reported using their cell phones to access library databases or the library’s catalog every hour or daily, despite 66 percent of the students browsing the internet using their mobile INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 11 phone hourly or daily. Salisbury et al. speculated that users need to be told about mobile- optimized library resources if libraries want to increase usage. 27 Rempel and Bridges used a pop-up interrupt survey while users were accessing the OSU Libraries mobile site.28 This approach allowed a larger cross-section of library users to be surveyed. It also reduced memory errors by capturing their activities in real time. Activities that had been included in the mobile site because of their perceived usefulness in a mobile environment, such as directions, asking a librarian a question, and the coffee shop webcam, were rarely cited as a reason for visiting the mobile site. The OSU Libraries branch at HMSC is entering a new era. A Marine Studies Initiative will result in the building of a new multidisciplinary research campus at HMSC that aims to serve five hundred undergraduate students. The change in demographics and the increase in students who will need to be served has prompted Guin Library staff to explore how the current population of advanced researchers interact with library resources. In addition, examining the ways undergraduate students at the main campus use these tools will help with planning for the upcoming changes in the user community. METHODS This study used an online Qualtrics survey to gather information about how frequently advanced researchers (graduate students, faculty, and affiliated scientists at a branch library for marine science) use the OSU Libraries website via mobile devices, what they search for, and other ways they use mobile devices to support their research behaviors. A recruitment email with a link to the survey was sent to three discussion lists used by HMSC community in Spring 2016. The survey was available for four weeks, and a reminder email was sent one week before the survey closed. The invitation email included a link to an informed- consent document. Once the consent document had been reviewed, users were taken to the survey via a second link. Respondents could provide an email address to receive a three-dollar coffee card for participating in the study, but their email address was recorded in a separate survey location to preserve their anonymity. The invitation email indicated that this survey was about using the website via a mobile device, and the first survey question asked users if they had ever accessed the library website on a mobile device. If they answered “no,” they were immediately taken to the end of the survey and were not recorded as a participant in the study. A similar survey was conducted with users from OSU’s main campus in 2012–13 and again in 2015. The results from 2012–13 have been published previously,29 but the results from 2015 have not. While the focus of the present study is on the mobile behaviors of advanced researchers in the HMSC community, data from the 2015 main-campus study is used to provide a comparison to the broader OSU community. OSU main-campus respondents in 2015 and HMSC participants in 2016 both answered closed- and open-ended questions that explored participants’ general mobile- device behaviors and behaviors specific to using the OSU Libraries website via mobile devices. MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 12 However, the HMSC survey also asked questions about behaviors related to using the OSU (nonlibrary) website via a mobile device and participants’ mobile scholarly reading and writing behaviors. The survey concluded with several demographic questions. The survey data was analyzed using Qualtrics’ cross-tab functionality and Microsoft Excel to observe trends and potential differences between user groups. Open-ended responses were examined for common themes. Twenty-three members of the HMSC community completed the survey, whereas one hundred participants responded to the 2015 main campus survey. Participation in the 2015 survey was capped at one hundred respondents because limited incentives were available. The participation difference between the two surveys reflects several differences between the two sampled communities. The most obvious difference is size. The OSU community comprises more than thirty-six thousand students, faculty, and staff; the HMSC community is approximately five hundred students, researchers, and faculty—some of whom are also included as part of the larger OSU community. The second factor influencing response rates relates to the difference in size between the two communities, but is more striking in the HMSC community: the survey relied on a self-selected group of users who indicated they had a history using the library website via a mobile device. Therefore, it is not possible to estimate the population size of mobile-device library-website users specific to the branch library or the main campus library. This limitation means that the results from this study cannot be used to generalize findings to all users who visit a library website via mobile devices; instead the results are intended to present a case that other libraries may compare with behaviors observed on their own campuses. Sharing the behaviors of advanced researchers at a branch campus is particularly valuable as this population has historically been understudied. RESULTS AND DISCUSSION Participant Demographics and Devices Used Of the twenty-three respondents to the HMSC mobile behaviors survey, 13 (62 percent) were graduate students, 7 (34 percent) were faculty (this category includes faculty researchers and courtesy faculty), and one respondent was an NOAA employee. Two participants declined to declare their affiliation. Of the 97 respondents to the 2015 OSU main-campus survey who shared their affiliation, 16 (16 percent) were graduate students, 5 (5 percent) were faculty members, and 69 (71 percent) were undergraduates. Respondents varied in the types of mobile devices they used when doing library research. Smartphones were used by 78 percent (18 respondents) and 22 percent (5 respondents) used a tablet. Apple (15 respondents) was the most common device brand used, although six of the respondents used an Android phone or tablet. Compared to the general population’s device ownership, these respondents are more likely to own Apple devices, but the two major device types owned (Apple and Android) match market trends.30 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 13 Frequency of Library Site Use on Mobile Devices Most of the HMSC respondents are infrequent users of the library website via mobile devices: 50 percent (11 respondents) did so less than once a month; 41 percent (9 respondents) did so at least once a month; and 9 percent (2 respondents) did so at least once a week. The low level of library website usage via mobile devices was especially notable as this population reports being heavy users of the library website via laptops or desktop computers, with 82 percent (18 respondents) visiting the library website via those tools at least once a week. Researchers at HMSC used the library website via mobile devices much less often than the 2015 main-campus respondents (undergraduates, graduate students, and faculty). No HMSC respondents visited the mobile site daily compared to 10 percent of main-campus users, and only 9 percent of HMSC respondents visited weekly compared to 28 percent of main-campus users (see Figure 1). Figure 1. 2016 HMSC participants vs. 2015 OSU main-campus participants reported frequency of library website visits via a mobile device by percent of responses. While HMSC advanced researchers share some mobile behaviors with main-campus students, this exploratory study demonstrates they do not use the library website via mobile devices as frequently. Some possible reasons for this are researchers rarely spend time coming and going to and from classes and therefore do not have small gaps of time to fill throughout their day. Instead, their daily schedule involves being in the field or in the lab collecting and analyzing data. 0% 10% 20% 30% 40% 50% 60% This is my first time Less often than once a month At least once a month At least once a week Every day or almost every day Branch 2016 Main 2015 MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 14 Alternatively, they are frequently involved in writing-intensive projects such as drafting journal articles or grant proposals. They carve out specific periods to do research and do not appear to be filling time with short bursts of literature searching. They can work on laptops and do not need to multitask on a phone or tablet between classes or in other situations. Mobile-device ownership among HMSC graduate students might also be limited because of personal budgets that do not allow for owning multiple mobile devices or for having the most recent model. In addition, this group of scientists may not be on the front edge of personal technologies, especially compared to medical researchers, because few mobile apps are designed specifically for the research needs of marine scientists. Where Researchers Are When Using Mobile Devices for Library Tasks Because mobile devices facilitate connecting to resources from many locations, and because advanced researchers conduct research in a range of settings—including the field, the office, and home—we asked respondents where they were most likely to use the library website via a mobile device. Thirty-two percent were most likely to be at home, 27 percent in transit; 18 percent at work; and 9 percent in the field. The popularity of using the library website via mobile devices while in transit was somewhat unexpected, but perhaps should not have been because many people try to maximize their travel time by multitasking on mobile devices. The distance from the main campus might explain this finding because a local bus service provides an easy way to travel to and from the main campus, and the hour-long trip would provide opportunities for multitasking via a mobile device. Relatively few respondents used mobile devices to access the library website while at work. Previous studies show that a lack of reliable campus wireless internet access can affect students’ ability to use mobile technology.31 HMSC also struggles to provide consistent wireless access, and signals are spotty in many areas of our campus. Despite signal boosters in Guin Library, wireless access is still limited at times. In addition, cell phone service is equally spotty both at HMSC and up and down the coast of Oregon. It is much less frustrating to work on a device that has a wired connection to the internet while at HMSC. These respondents did use mobile devices while at home, which might indicate they had a better wireless signal there. Alternatively, working from home on a mobile device might indicate that they compartmentalize their library-research time as an activity to do at home instead of in the office. Researchers used their mobile devices to access the library while in the field less than originally expected, but upon further reflection, it made sense that researchers would be less likely to use library resources during periods of data collection for oceanic or other water-based research projects because of their focused involvement during that stage. The water-based research also increases the risk of losing mobile devices. Library Resources Accessed via Mobile Devices INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 15 To learn more about how these respondents used the library website, we asked them to choose what they were searching for from a list of options. Respondents could choose as many options as applied to their searching behaviors. HMSC respondents’ primary reason for visiting the library’s site via a mobile device was to find a specific source: 68 percent looked for an article, 45 percent for a journal, 36 percent for a book, and 14 percent for a thesis. Many of the HMSC respondents also looked for procedural or library-specific information: 36 percent looked for hours, 32 percent for My Account information, 18 percent for interlibrary loan, 14 percent for contact information, 9 percent for how to borrow and request books, 9 percent for workshop information, and 9 percent for Oregon estuaries bibliographies—a unique resource provided by the HMSC library. Fifty-five percent of searches were for a specific source and 43 percent were for procedural or library- specific information. Notably missing from this list were respondents who reported searching via their mobile device for directions to the library. Compared to the 2015 OSU Libraries main-campus survey respondents, HMSC respondents were much more likely to visit the library website via a mobile device to look for an article (68 percent vs. 37 percent), find a journal (45 percent vs. 23 percent), access My Account information (32 percent vs. 7 percent), use interlibrary loan (18 percent vs. 5 percent), or find contact information (14 percent vs. 1 percent). However, unlike HMSC participants, who do not have access to course reserves at the branch library, 7 percent of OSU main-campus respondents used their mobile devices to find course reserves on the library website. See Figure 2. 0% 10% 20% 30% 40% 50% 60% 70% Directions Contact information Interlibrary loan Course reserves My account A journal A book Library hours An article Branch 2016 Main 2015 MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 16 Figure 2. 2016 HMSC vs. 2015 OSU main-campus participants reported searches while visiting the library website via a mobile device by percent of responses. It is possible that HMSC users with different affiliations might use the library site via a mobile device differently. These exploratory findings show that graduate students used the greatest variety of content via mobile devices. Graduate students as a group reported using 11 of the 14 provided content choices via a mobile device while faculty reported using 8 of the 14. Graduate students were the largest group (62 percent of respondents), which might explain why as a group they searched for more types of content via mobile devices. Interestingly, faculty members and faculty researchers reported looking for a thesis via a mobile device, but no graduate students did. Perhaps these graduate students had not yet learned about the usefulness of referencing past theses as a starting point for their own thesis writing. Or perhaps they were only familiar with searching for journal articles on a topic. In contrast, faculty members might have been searching for specific theses for which they had provided advising or mentoring support. To help us make decisions about how to best direct users to library content via mobile devices, we asked respondents to indicate their searching behaviors and preferences. Of the 16 HMSC respondents who answered this question, 12 (75 percent) used our web-scale discovery search box via mobile devices; 4 (25 percent) reported that they did. Presumably these latter searchers were navigating to another database to find their sources. Of 16 respondents, only 6 (38 percent) indicated that they looked for a specific library database (as opposed to the discovery tool) when using a mobile device. Those respondents who were looking for a database tended to be looking for the Web of Science database, which makes sense for their field of study. When conducting searches for sources on their mobile devices, HMSC respondents employed a variety of search strategies: the 12 respondents who replied used a combination of author (75 percent), journal title (67 percent, keyword (67 percent), and book title (50 percent) searches when starting at the mobile version of the discovery tool. When asked about their preferred way to find sources, a majority of HMSC respondents reported that they tended to prefer a combination of searching and menu navigation while using the library website from mobile devices, while the remainder were evenly divided between preferring menu - driven and search-driven discovery. While OSU Libraries does not currently provide links to any specific apps for source discovery, such as PubMed Mobile or JSTOR Browser, 13 (62 percent) of the HMSC respondents indicated they would be somewhat or very likely to use an app to access and use library services. This finding connects to the issue of reliable wireless access. Medical graduate students had a wider array of apps available to them, but the primary reason they wanted to use these apps was because they provided a better searching experience in hospitals that had intermittent wireless access—an experience to which researchers at HMSC could relate.32 University Website Use Behaviors on Mobile Devices To help situate respondents’ library use behaviors on mobile devices in comparison to the way they use other academic resources on mobile devices, we asked HMSC respondents to describe INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 17 their visits to resources on the OSU (nonlibrary) website via mobile devices. Compared to their use of the library site on a mobile device, respondents’ use of university services was higher: 43 percent (9 respondents) visited the university’s website via a mobile device at least once a week compared to only 9 percent (2 respondents) who visited the library site with that frequency. This makes sense because of the integral function many of these university services play in most university employees’ regular workflow. Respondents indicated visiting key university sites including MyOSU (a portal webpage, visited by 60 percent of respondents), the HMSC webpage (55 percent), Canvas (the university’s learning management system, visited by 50 percent of respondents), and webmail (45 percent). See Figure 3. Figure 3. University webpages HMSC respondents access on a mobile device by percent of responses. University resources such as campus maps, parking locations, and the graduate school website were frequently used by this population. The use of the first two makes sense as HMSC users are located off-site and need to use maps and parking guidance when they visit the main campus. The use of the graduate school website makes sense because the respondents were primarily graduate students and graduate school guidelines are a necessary source of information. Interestingly, our advanced users are similar to undergraduates in that they primarily read email, information from social networking sites, and news on their mobile devices. 33 Other Research Behaviors on Mobile Devices MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 18 We wanted to know what other research-related behaviors the HMSC respondents are engaged in via mobile devices to determine if there might be additional ways to support researchers’ workflows. We specifically asked about respondents’ reading, writing, and note-taking behaviors to learn how well these respondents have integrated them with their mobile usage behaviors. All respondents reported reading on their mobile device (see Figure 4). Email represented the most common reading activity (95 percent), followed by “quick reading” activities, such as reading social networking posts (81 percent), current news (81 percent), and blog posts (62 percent). Smaller numbers used their mobile devices for academic or long-form reading, such as reading scholarly articles (33 percent) or books (19 percent). Of those respondents who read articles and books on their mobile devices, only respondents highlighted or took notes using their mobile device. Seven respondents used a citation manager on their mobile device: three used EndNote, one used Mendeley, one used Pages, and one used Zotero. One respondent used Evernote on their mobile device, and one advanced user reported using specific data and database management software, websites, and apps related to their projects. More advanced and interactive mobile- reading features, such as online spatial landmarks, might be needed before reading scholarly articles on mobile devices becomes more common.34 Figure 4. What HMSC respondents reported reading on a mobile device by percent of responses. LIMITATIONS This exploratory study had several limitations, most of which reflect the nature of doing research with a small population at a branch campus. This study had a small sample size, which limited observations of this population; however, future studies could use research techniques such as interviews or ethnographic studies to gather deep qualitative information about mobile-use 19% 33% 62% 81% 81% 95% 0% 20% 40% 60% 80% 100% 120% Books Academic or scholarly articles Blog posts Current news Social networking posts (Facebook, Twitter, etc.) Email Percent of Responses INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 19 behaviors in this population. A second limitation was that previous studies of the OSU Libraries mobile website used Google Analytics to compare survey results with what users were actually doing on the library website. Unfortunately, this was not possible for this study. Because of how HMSC’s network was set up, anyone at HMSC using the OSU internet connections is assigned an IP address that shows a Corvallis, Oregon, location rather than a Newport, Oregon, location, which rendered parsing HMSC-specific users in Google Analytics impossible. The research behaviors of advanced researchers at a branch campus has not been well-examined; despite its limitations, this study provides beneficial insights into the behaviors of this user population. CONCLUSION Focusing on how advanced researchers at a branch campus use mobile devices while accessing library and other campus information provides a snapshot of key trends among this user group. These exploratory findings show that these advanced researchers are infrequent users of library resources via mobile devices and, contrary to our initial expectations, are not using mobile devices as a research resource while conducting field-based research. Findings showed that while these advanced researchers do periodically use the library website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. Mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. The results of this survey will be used to address the HMSC knowledge gaps around use of library resources and research tools via mobile devices. Both graduate students and faculty lack awareness of library resources and services and have unsophisticated library research skills. 35 While the OSU main campus has library workshops for graduate students and faculty, these workshops have been inconsistently duplicated at the Guin Library. Because the people working at HMSC come from such a wide variety of departments across OSU that focus on marine sciences, HMSC has never had a library orientation. The results indicate possible value in devising ways to promote Guin Library’s resources and services locally, which could include highlighting the availability of mobile library access. While several participants mentioned using research tools like Evernote, Pages, or Zotero on their mobile devices, most participants did not report enhancing their mobile research experience with these mobile-friendly tools. Workshops specifically modeling how to use mobile-friendly tools and apps such as Dropbox, Evernote, GoodReader, or Browzine could help introduce the benefits of these tools to these advanced researchers. Because wireless access is even more of a concern for researchers at this branch location than for researchers at the main campus, database-specific apps will be explored to determine if the use of searching apps could help alleviate inconsistent wireless access. If database apps that are appropriate for marine science researchers are available, these will be promoted to this user population. Future research might involve follow-up interviews or focus groups, ethnographic studies, or interviews, which could expand the knowledge of these researchers’ mobile-device behaviors and MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 20 their perceptions of mobile devices. Exploring the technology usage by these advanced researchers in their labs, including electronic lab notebooks or other tools, might be an interesting contrast to their use of mobile devices. In addition, as the HMSC campus grows with the expansion of the Marine Studies Initiative, increasing numbers of undergraduates will use Guin Library. The ECAR 2015 statistics show that current undergraduates own multiple internet-capable devices.36 Presumably, these HMSC undergraduates will be likely to follow the trends seen in the ECAR data. Certainly, the plans to expand HMSC’s internet and wireless infrastructure will affect all its users. Our mobile survey gave us insights into how a sample of the HMSC population uses the library’s resources and services. These observations will allow Guin Library to expand its services for the HMSC campus. We encourage other librarians to explore their unique user populations when evaluating services and resources. REFERENCES 1 Maria Anna Jankowska, “Identifying University Professors’ Information Needs in the Challenging Environment of Information and Communication Technologies,” Journal of Academic Librarianship 30, no. 1 (2004): 51–66, https://doi.org/10.1016/j.jal.2003.11.007; Pali U. Kuruppu and Anne Marie Gruber, “Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences,” Journal of Academic Librarianship 32, no. 6 (2006): 609–23; Lotta Haglund and Per Olsson, “The Impact on University Libraries of Changes in Information Behavior among Academic Researchers: A Multiple Case Study,” Journal of Academic Librarianship 34, no. 1 (2008): 52–59, https://doi.org/10.1016/j.acalib.2007.11.010; Nirmala Gunapala, “Meeting the Needs of the ‘Invisible University’: Identifying Information Needs of Postdoctoral Scholars in the Sciences,” Issues in Science and Technology Librarianship, no. 77 (Summer 2014), https://doi.org/10.5062/F4B8563P. 2 Tina Chrzastowski and Lura Joseph, “Surveying Graduate and Professional Students’ Perspectives on Library Services, Facilities and Collections at the University of Illinois at Urbana- Champaign: Does Subject Discipline Continue to Influence Library Use?,” Issues in Science and Technology Librarianship no. 45 (Winter 2006), https://doi.org/10.5062/F4DZ068J; Kuruppu and Gruber, “Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences”; Haglund and Olsson, “The Impact on University Libraries of Changes in Information Behavior Among Academic Researchers.” 3 Ellyssa Kroski, “On the Move with the Mobile Web: Libraries and Mobile Technologies,” Library Technology Reports 44, no. 5 (2008): 1–48, https://doi.org/10.5860/ltr.44n5. 4 Paula Torres-Pérez, Eva Méndez-Rodríguez, and Enrique Orduna-Malea, “Mobile Web Adoption in Top Ranked University Libraries: A Preliminary Study,” Journal of Academic Librarianship 42, no. 4 (2016): 329–39, https://doi.org/10.1016/j.acalib.2016.05.011. 5 David J. Comeaux, “Web Design Trends in Academic Libraries—A Longitudinal Study,” Journal of Web Librarianship 11, no. 1 (2017), 1–15, https://doi.org/10.1080/19322909.2016.1230031; https://doi.org/10.1016/j.jal.2003.11.007 https://doi.org/10.1016/j.acalib.2007.11.010 https://doi.org/10.5062/F4B8563P https://doi.org/10.5062/F4DZ068J https://doi.org/10.5860/ltr.44n5 https://doi.org/10.1016/j.acalib.2016.05.011 https://doi.org/10.1080/19322909.2016.1230031 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 21 Zebulin Evelhoch, “Mobile Web Site Ease of Use: An Analysis of Orbis Cascade Alliance Member Web Sites,” Journal of Web Librarianship 10, no. 2 (2016): 101–23, https://doi.org/10.1080/19322909.2016.1167649. 6 Barbara Blummer and Jeffrey M. Kenton, “Academic Libraries’ Mobile Initiatives and Research from 2010 to the Present: Identifying Themes in the Literature,” in Handbook of Research on Mobile Devices and Applications in Higher Education Settings, ed. Laura Briz-Ponce, Juan Juanes- Méndez, and José Francisco García-Peñalvo (Hershey, PA: IGI Global, 2016), 118–39. 7 Jankowska, “Identifying University Professors’ Information Needs in the Challenging Environment of Information and Communication Technologies.” 8 Chrzastowski and Joseph, “Surveying Graduate and Professional Students’ Perspectives on Library Services, Facilities and Collections at the University of Illinois at Urbana-Champaign.” 9 Carole A. George et al., “Scholarly Use of Information: Graduate Students’ Information Seeking Behaviour,” Information Research 11, no. 4 (2006), http://www.informationr.net/ir/11- 4/paper272.html. 10 Kristin Hoffman et al., “Library Research Skills: A Needs Assessment for Graduate Student Workshops,” Issues in Science and Technology Librarianship 53 (Winter-Spring 2008), https://doi.org/10.5062/F48P5XFC; Hannah Gascho Rempel and Jeanne Davidson, “Providing Information Literacy Instruction to Graduate Students through Literature Review Workshops,” Issues in Science and Technology Librarianship 53 (Winter-Spring 2008), https://doi.org/10.5062/F44X55RG. 11 Jankowska, “Identifying University Professors’ Information Needs in the Challenging Environment of Information and Communication Technologies.” 12 Ka Po Lau et al., “Educational Usage of Mobile Devices: Differences Between Postgraduate and Undergraduate Students,” Journal of Academic Librarianship 43, no. 3 (May 2017), 201–8, https://doi.org/10.1016/j.acalib.2017.03.004. 13 Noa Aharony, “Mobile Libraries: Librarians’ and Students’ Perspectives,” College & Research Libraries 75, no. 2 (2014): 202–17, https://doi.org/10.5860/crl12-415. 14 Hannah Gashco Rempel and Laurie M. Bridges, “That Was Then, This Is Now: Replacing the Mobile-Optimized Site with Responsive Design,” Information Technology and Libraries 32, no. 4 (2013): 8–24, https://doi.org/10.6017/ital.v32i4.4636. 15 Paula Barnett-Ellis and Charlcie Pettway Vann, “The Library Right There in My Hand: Determining User Needs for Mobile Services at a Medium-Sized Regional University,” Southeastern Librarian 62, no. 2 (2014): 10–15. https://doi.org/10.1080/19322909.2016.1167649 http://www.informationr.net/ir/11-4/paper272.html http://www.informationr.net/ir/11-4/paper272.html https://doi.org/10.5062/F48P5XFC https://doi.org/10.5062/F44X55RG https://doi.org/10.1016/j.acalib.2017.03.004 https://doi.org/10.5860/crl12-415 https://doi.org/10.6017/ital.v32i4.4636 MOBILE WEBSITE USE AND ADVANCED RESEARCHERS | MARKLAND, REMPEL, AND BRIDGES doi:10.6017/ital.v36i4.9953 22 16 William T. Caniano and Amy Catalano, “Academic Libraries and Mobile Devices: User and Reader Preferences,” Reference Librarian 55, no. 4 (2014), 298–317, https://doi.org/10.1080/02763877.2014.929910. 17 Haglund and Olsson, “The Impact on University Libraries of Changes in Information Behavior Among Academic Researchers.” 18 Kuruppu and Gruber, “Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences.” 19 Christine Wolff, Alisa B. Rod, and Roger C. Schonfeld, “Ithaka S+R US Faculty Survey 2015,” Ithaka S+R, April 4, 2016, http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey- 2015/. 20 M. Macedo-Rouet et al., “How Do Scientists Select Articles in the PubMed Database? An Empirical Study of Criteria and Strategies,” Revue Européenne de Psychologie Appliquée/European Review of Applied Psychology 62, no. 2 (2012): 63–72. 21 Rempel and Bridges, “That Was Then, This Is Now.” 22 Ellie Bushhousen et al., “Smartphone Use at a University Health Science Center,” Medical Reference Services Quarterly 32, no. 1 (2013): 52–72, https://doi.org/10.1080/02763869.2013.749134. 23 Jill T. Boruff and Dale Storie, “Mobile Devices in Medicine: A Survey of How Medical Students, Residents, and Faculty Use Smartphones and Other Mobile Devices to Find Information,” Journal of the Medical Library Association 102, no. 1 (2014): 22–30, https://doi.org/10.3163/1536- 5050.102.1.006. 24 Bushhousen et al., “Smartphone Use at a University Health Science Center”; Boruff and Storie, “Mobile Devices in Medicine.” 25 Eden Dahlstrom et al., “ECAR Study of Students and Information Technology, 2015 ," research report, EDUCAUSE Center for Analysis and Research, 2015, https://library.educause.edu/~/media/files/library/2015/8/ers1510ss.pdf?la=en. 26 Ibid., 24. 27 Lutishoor Salisbury, Jozef Laincz, and Jeremy J. Smith, “Science and Technology Undergraduate Students’ Use of the Internet, Cell Phones and Social Networking Sites to Access Library Information,” Issues in Science and Technology Librarianship 69 (Spring 2012), https://doi.org/10.5062/F4SB43PD. 28 Rempel and Bridges, “That Was Then, This Is Now.” 29 Ibid. https://doi.org/10.1080/02763877.2014.929910 http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ https://doi.org/10.1080/02763869.2013.749134 https://doi.org/10.3163/1536-5050.102.1.006 https://doi.org/10.3163/1536-5050.102.1.006 https://library.educause.edu/~/media/files/library/2015/8/ers1510ss.pdf?la=en https://doi.org/10.5062/F4SB43PD INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 23 30 “Mobile/Tablet Operating System Market Share,” NetMarketShare, March 2017, https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8&qpcustomd=1. 31 Boruff and Storie, “Mobile Devices in Medicine”; Patrick Lo et al., “Use of Smartphones by Art and Design Students for Accessing Library Services and Learning,” Library Hi Tech 34, no. 2 (2016): 224–38, https://doi.org/10.1108/LHT-02-2016-0015. 32 Boruff and Storie, “Mobile Devices in Medicine.” 33 Dahlstrom et al., “ECAR Study of Students and Information Technology, 2015.” 34 Caroline Myrberg and Ninna Wiberg, “Screen vs. Paper: What Is the Difference for Reading and Learning?” Insights 28, no. 2 (2015): 49–54, https://doi.org/10.1629/uksg.236. 35 Barnett-Ellis and Vann, “The Library Right There in My Hand”; Haglund and Olsson, “The Impact on University Libraries of Changes in Information Behavior Among Academic Researchers”; Hoffman et al., “Library Research Skills”; Kuruppu and Gruber, “Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences”; Lau et al., “Educational Usage of Mobile Devices”; Macedo-Rouet et al., “How Do Scientists Select Articles in the PubMed Database?” 36 Dahlstrom et al., “ECAR Study of Students and Information Technology, 2015.” https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8&qpcustomd=1 https://doi.org/10.1108/LHT-02-2016-0015 https://doi.org/10.1629/uksg.236 ABSTRACT INTRODUCTION LITERATURE REVIEW METHODS RESULTS AND DISCUSSION Participant Demographics and Devices Used Frequency of Library Site Use on Mobile Devices Where Researchers Are When Using Mobile Devices for Library Tasks Library Resources Accessed via Mobile Devices University Website Use Behaviors on Mobile Devices Other Research Behaviors on Mobile Devices LIMITATIONS CONCLUSION REFERENCES 9959 ---- Everyone’s Invited: A Website Usability Study Involving Multiple Library Stakeholders Elena Azadbakht, John Blair, and Lisa Jones INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2017 34 Elena Azadbakht (elena.azadbakht@usm.edu) is Health and Nursing Librarian and Assistant Professor, John Blair (john.blair@usm.edu) is Web Services Coordinator, and Lisa Jones (lisa.r.jones@usm.edu) is Head of Finance and Information Technology, University of Southern Mississippi, Hattiesburg, Mississippi. ABSTRACT This article describes a usability study of the University of Southern Mississippi Libraries website conducted in early 2016. The study involved six participants from each of four key user groups— undergraduate students, graduate students, faculty, and library employees—and consisted of six typical library search tasks, such as finding a book and an article on a topic, locating a journal by title, and looking up hours of operation. Library employees and graduate students completed the study’s tasks most successfully, whereas undergraduate students performed relatively simple searches and relied on the Libraries’ discovery tool, Primo. The study’s results displayed several problematic features that affected each user group, including library employees. These results increased internal buy-in for usability-related changes to the library website in a later redesign. INTRODUCTION Within the last decade, usability testing has become a common way for libraries to assess their websites. Eager to gain a better understanding of how users experience our website, we assembled a two-person team and conducted the first usability study of the University of Southern Mississippi Libraries website in February 2016. The Web Advisory Committee—which is tasked with developing, maintaining, and enhancing the Libraries’ online presence—wanted to determine if the content on the website was organized in a way that made sense to users and facilitated the efficient use of the Libraries’ online resources. Our usability study involved six participants from each of the following library user groups: undergraduate students, graduate students, faculty, and library employees. Student and faculty participants represented several academic disciplines and departments. All of the library employees involved in the study work in public-facing roles. The Web Advisory Committee and Libraries’ administration wanted to know how each of these groups differ in their website use and whether they have difficulty with the same architecture or features. Usability testing helped illuminate which aspects of the website’s design might be hindering users from accomplishing key tasks, thereby identifying where and how improvement needed to be made. We included library employees in this study to compare their approach to the website to that of other users in the mailto:elena.azadbakht@usm.edu mailto:john.blair@usm.edu mailto:lisa.r.jones@usm.edu EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 35 https://doi.org/10.6017/ital.v36i4.9959 hope of increasing internal stakeholders’ buy-in for recommendations resulting from this study. This article will discuss the usability study’s design, results, and recommendations as well as the implications of the study’s findings for similarly situated academic libraries. We will give special consideration to how the behavior of library employees compared to that of other groups. LITERATURE REVIEW The literature on library-website user experience and usability is extensive. In 2007, Blummer conducted a literature review of research related to academic-library websites, including usability studies. Her article provides an overview of the goals and outcomes of early library-website usability studies. 1 More recent articles focus on a portion or aspect of a library’s website such as the homepage, federated search or discovery tool, or subject guides. Fagan published an article in 2010 that reviews user studies of faceted browsing and outlines several best practices for designing studies that focus on next-generation catalogs or discovery tools. 2 Other library-website studies have reported on the habits of user groups, with undergraduates being the most commonly studied constituent group. Emde, Morris, and Claassen-Wilson observed University of Kansas faculty and graduate students’ use of the library website, which had been recently redesigned, including a new federated search tool. 3 Many of the study’s participants gravitated toward the subject-specific resources they were familiar with and either missed or avoided using the website’s new features. When asked for their opinions on the federated search tool, several participants said that while it was not a tool they saw themselves using, they did see how it might be a helpful for undergraduate students who were still new to research. The researchers also provided the participants with an article citation and asked them to locate it using the using the library’s website or online resources. While half the participants did use the website’s “E-Journals” link, others were less successful. Some who had the most difficulty “search[ed] for the journal title in a search box that was set up to search database titles.” 4 This led Emde, Morris, and Claassen-Wilson to observe that “locating journal articles from known citations is a difficult concept even for some advanced researchers.” Turner’s 2011 article describes the result of a usability study at Syracuse University Library that included both students and library staff. Participants were asked to start at the library’s homepage and complete five tasks designed to emulate the types of searches a typical library user might perform, such as finding a specific book, a multimedia item, an article in the journal Nature, and primary sources pertaining to a historic event. 5 When asked to find Toni Morrison’s Beloved, most staff members used the library’s traditional online catalog whereas students almost always began their searches with the federated search tool located on the homepage. Participants of both types were less successful at locating a primary source, although this task highlighted key differences in each groups’ approach to searching the library website. Since library staff were more familiar than students with the library’s collections and online search tools, they relied more on facets and limiters to narrow their searches, and some even began their searches by navigating to the library’s webpage for special collections. INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 36 Library staff tended to be more persistent; draw upon their greater knowledge of the library’s collections, website, and search tools; and use special syntax in their searchers, like inverting an author’s first and last names. “Library staff took more time, on average, to locate materials,” writes Turner, because of their “interest in trying alternative strategies.” 6 Students, on the other hand, usually included more detail than necessary in their search queries (such as adding a word related to the format they were searching for after their keywords) and could not always differentiate various types of catalog records, for example, the record for a book review and the record for the book itself. Turner concludes that the students’ mental models for searching online and their experiences with other web-search environments influence their expectations of how library search tools work and that library-website design should take these mental models into consideration. Research on the search behaviors of students versus more experienced researchers or subject experts also has implications for library website design. Two recent articles explore the different mental models or mindsets students bring to a search. The students in Asher and Duke’s 2012 study “generally treated all search boxes as the equivalent of a Google search box” and used very simple keyword searches. 7 This tracked with Holman’s 2010 study, which likewise found that the students she observed relied on simple search strategies and did not understand how search interfaces and systems are structured. 8 METHODS Our research team consisted of the Libraries’ health and nursing librarian and the web services coordinator. We worked closely with the head of finance and information technology in designing and running the usability study. A two-week period in mid-February 2016 was chosen for usability testing to avoid losing potential participants to midterms or spring break. We posted a call for participants to two university discussion lists, on the Libraries website, and on social media (Facebook and Twitter). We also reached out directly to faculty in academic departments we regularly work with and emailed library employees directly. We directed nonlibrary participants to a web form on the Libraries website to provide their name, contact information, university affiliation/class standing, and availability. The health and nursing librarian followed up with and scheduled participants on the basis of their availability. Each student participant received a ten-dollar print card and each faculty participant received a ten-dollar Starbucks gift card. To record the testing sessions, we needed a free or low-cost software option. Since the Libraries already had a subscription to Screencast-O-Matic to develop video tutorials, and the tool allows for simultaneous screen, audio, and video capture, so we decided to use it to record all testing sessions. We also used a spare laptop with an embedded camera and microphone. The health and nursing librarian served as both facilitator and note-taker for most usability testing sessions. Participants were given six tasks to complete. We encouraged participants to EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 37 https://doi.org/10.6017/ital.v36i4.9959 narrate as they completed each task. The sessions began with simple, secondary navigational questions like the following: • How late is our main library open on a typical Monday night? • How could you contact a librarian for help? • Where would you find more information about services offered by the library? Next, we asked the participants to complete tasks designed to assess their ability to search for specific library resources and to illuminate any difficulty users might have navigating the website in the process. Each of the three tasks focused on a particular library-resource type, including books, articles, and journals: • Find a book about rabbits. • Find an article about rabbits. • Check to see if we have a subscription/access to a journal called Nature. After the usability testing was complete, we reviewed the recordings and notes and coded them. For each task, we calculated time to completion and documented the various paths participants took to answer each question, noting any issues they encountered. We also compared the four user groups in our analysis. Limitations Although we controlled for user type (undergraduate, graduate, faculty, or library employee) in the recruitment of study participants, we did not screen by academic discipline. Doing so would have hindered our team’s ability to include enough graduate students and faculty members in the study, as nearly all the volunteers from these two groups were from humanities or social science fields. The results might have differed slightly had the study successfully managed to include more faculty from the so-called hard sciences and allied health fields. Additionally, the order in which we asked participants to attempt the tasks might have affected how they approached some of the later tasks. If a participant chose to search for a book using the Primo discovery tool, for example, they might be more inclined to use it to complete the next task (find an article) rather than navigate to a different online resource or tool. Despite these limitations, usability testing has helped improve the website in key ways. We plan to correct for these limitations in future studies. RESULTS Every group included a participant who failed to complete at least one of the six tasks. An adequate answer to each of the study’s six tasks can be found within one or two pages/clicks from the Libraries homepage (Figure 1). The average distance to a solution remained at about two page loads across all of the study’s participants, despite a few individual “website safaris.” INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 38 Figure 1. University of Southern Mississippi Libraries’ homepage. Graduate students tended to complete tasks the quickest and were generally as successful as library employees. They preferred to use Primo for finding books but tended to favor the list of scholarly databases on the “Articles & Databases” page to find articles and journals. Undergraduates were the second fastest group, but many struggled to complete one or more of the six tasks. They had the most trouble finding books and locating the journal by title. Undergraduates generally performed simple searches and had trouble recovering from missteps. They were heavy users of Primo, relying on the discovery tool more than any other group. The other two user groups, faculty and library employees, were slower at completing tasks. Of the two, faculty took the longest to complete any task and failed to complete tasks at a similar rate as undergraduates. Likewise, this group favored Primo nearly as often. In contrast, library employees took almost as long as faculty to complete tasks but were much more successful. As a group, library employees demonstrated the different paths users could take to complete each task but favored those paths they identified as the “preferred” method for finding an item or resource over the fastest route. EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 39 https://doi.org/10.6017/ital.v36i4.9959 The majority of study participants across all user groups had little trouble with the first three tasks. Although most participants favored the less direct path to the Libraries’ hours—missing the direct link at the top of the homepage (Figure 2)—they spent relatively little time on this task. Likewise, virtually all participants took note of the links to our “Ask-A-Librarian” and “Services” pages located in our homepage’s main navigation menu. This portion of the usability study alerted us to the need for a more prominent display of our opening hours on the homepage. Figure 2. Link to “Hours” from the homepage. Of the second set of tasks—find a book, find an article, and determine if we have access to Nature—the first and last proved the most challenging for participants. One undergraduate was unable to complete the book task, and one faculty member took nearly eight minutes to do so—the longest time to completion of any task by any user in the study. Primo was the most preferred method for finding a book. Although an option for searching our Classic Catalog (which uses Innovative Interfaces’ Millennium integrated library system) is contained within a search widget on the homepage, Primo is the default search option and therefore users’ default choice. Interestingly, even after statements from some faculty such as “I don’t love Primo,” “Primo isn’t the best,” and “the [Classic Catalog] is better,” these participants proceeded to use Primo to find a book. Library employees were evenly split between Primo and Classic Catalog. One undergraduate student, graduate student, and library employee were unable to determine whether we have access to Nature. This task was the most time consuming for library employees because there are multiple ways to approach this question and library employees tended to favor the most consistently successful yet most time-consuming options (e.g., searching within the Classic Catalog). Lacking a clear option in the main navigation bar, the most popular path started INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 40 with our “Articles & Databases” page, but the answer was most often successfully found using Primo. Several participants tried using the “Search for Databases” search box on the “Articles & Databases” page, which yielded no results because it searches only our database list. The search widget on the homepage that includes Primo has an option for searching e-journals by title, as shown in Figure 3. However, nearly all nonlibrary employees missed this feature. Participants from both the undergraduate and graduate student user groups had trouble with this task, including those who were ultimately successful. Unfortunately, many of the undergraduates could not differentiate a journal from an article, and while graduate students were aware of the distinction, a few indicated that they were not used to the idea of finding articles from a specific journal. Figure 3. E-journals search tab. When it came to finding articles, undergraduates, as well as several faculty and a few library employees, gravitated toward Primo. Others, particularly graduate students and library employees, opted to search a specific database—most often Academic Search Premier or JSTOR. However, those who used Primo to answer this question arrived at an answer two to three times faster because of the discovery tool’s accessibility from a search widget on the homepage. Regardless of the tool or resource they used, most participants found a sufficient result or two. Common Breakdowns Despite the clear label “Search for Databases,” at least one participant from each user group, including library employees, attempted to enter a book title, journal name, or keyword into the LibGuides’ database search tool on our “Articles & Databases” page (Figure 4). Some participants attempted this repeatedly despite getting no results. Others did not try a search but stated, with EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 41 https://doi.org/10.6017/ital.v36i4.9959 confidence, that entering a journal, book, or article title into the “Search for Databases” field would yield a relevant result. A few participants also attempted this with the search box on our Research Guides (LibGuides) page, which searches only within the content of the LibGuides themselves. Across all groups, when not starting at the homepage, many participants had difficulty finding books because no clear menu option exists for finding books like it does for articles (our “Articles & Databases” page). This was difficulty was compounded by many participants struggling to return to the Libraries homepage from within the website’s subpages. Those participants who were able to navigate back to the homepage were reminded of the Primo search box located there and used it to search for books. Figure 4. “Search for Databases” box on the “Articles & Databases” page. Another breakdown was the “Help & FAQ” page (Figure 5). Participants who turned there for help at any point in the study spent a relatively long time trying to find a usable answer and often ended up more confused than before. In fact, only one in three participants managed to use “Help & FAQ” successfully because the FAQ consists of many questions with answers on many different pages and subpages. This portion of the website had not been updated in several years and therefore the questions were not listed in order of frequency. INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 42 Figure 5. The answer to the “How do I find books?” FAQ item leads to several subpages. DISCUSSION Using the results of the study, we made several recommendations to the Libraries’ Web Advisory Committee and administration: (1) display our hours of operation on the homepage; (2) remove the search boxes from the “Articles & Databases” and “Research Guides” pages; (3) condense the “Help & FAQ” pages; and (4) create a “Find Books” option on the homepage. All of these recommendations were taken into account during a recent redesign of the website. We also considered each user group’s performance and its implications for website design as well as instruction and outreach efforts. First, our team suggested that the current day’s hours of operation be featured prominently on the website’s front page. Despite “How late is our main library open on a typical Monday night?” being one of two tasks that had a 100 percent completion rate, this change is easy to make, adds convenience, and addresses a long-voiced complaint. Several participants expressed a desire to see this change implemented. Moreover, this is something many of our peer libraries provide on their websites. The team’s next recommendation was to remove the “Find Databases by Title” search box from the “Article & Databases” page. During the study, participants who had a particular database in mind opted to navigate directly to that database rather than search for it. Another such search box exists on the “Research Guides” page. Although most of the participants did not encounter this search box during the study, those that did also mistook it for a general search tool. Participants EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 43 https://doi.org/10.6017/ital.v36i4.9959 from all groups, especially undergraduate students, assumed that any search box on the Libraries’ website was designed to search for and within resources like article databases and the online catalog, regardless of how the search box was labeled. Given our findings, libraries with similar search boxes might also consider removing these from their websites. Another recommended change was to condense the “Help & FAQ” section of the website considerably. The “Help & FAQ” section was too large and unwieldy for participants to use successfully without becoming visibly frustrated, defeating its purpose. Moreover, Google Analytics showed that only nine of the more than one hundred “Help & FAQ” pages were used with any regularity. Going forward, we will work to identify the roughly ten most important questions to feature in this section. The final major recommendation was to consider adding a top-level menu item called “Find Books” that would provide users with a means to escape the depths of the site and direct them to Primo or the Classic Catalog. When participants would get stuck on the book-finding task, they looked for a parallel to the “Articles & Databases” menu option. A “Getting Started” page or LibGuide could take this idea a step further by also including brief, straightforward instructions on finding articles and journals by title. In effect, this option would be another way to condense and reinvent some of the topics originally addressed in the “Help & FAQ” pages. Comparing each user group’s average performance helped illuminate the strengths and weaknesses of the website’s design. We suspect that graduate students were the fastest and nearly most successful group because they are early in their academic careers and doing a great deal of their own research (as compared to faculty). Many of them are also responsible for teaching introductory courses and are working closely with first-year students who are just learning how to do research. Faculty, because their research tends to be on narrower topics, were familiar with the specific resources and tools they use in their work but were less able to efficiently navigate the parts of the website with which they have less experience. Moreover, individual faculty varied widely in their comfort level with technology, and this affected their ability to complete certain tasks. CONCLUSION The results of our website usability study echo those found elsewhere in the literature. Students approach library search interfaces as if they were Google and generally conduct very simple searches. Without knowledge of the Libraries’ digital environment and without the research skills library employees possess, undergraduates in our study tended to favor the most direct route to the answer—if they could identify it. This group had the most trouble with library and academic terminology or concepts like the difference between an article and a journal. Though not as quick as the graduate students, undergraduates completed tasks swiftly, mainly becau se of their reliance on the Primo discovery tool. However, undergraduate students were less able to recover from missteps; more of them confused the “Find Databases by Title” search tool for an article search tool than participants from any other group. Since undergraduates compose the bulk of our user INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2017 44 base and are the least experienced researchers, we decided to focus our redesign on solutions that will help them use the website more easily. Although all of the library employees in our study work in public-facing roles, not all of them provide regular research help or teach information literacy. Since most of them are very familiar with our website and online resources, they approached the tasks more methodically and thoroughly than other participants. Library employees tended to choose the search strategy or path to discovery that would yield the highest-quality result or they would demonstrate multiple ways of completing a given task, including any necessary workarounds. The inclusion of library employees yielded the most powerful tool in our research team’s arsenal. Holding this group’s “correct” methods side-by-side to equally valid methods of discovery helped shake loose rigid thinking, and the fact that some library employees were unable to complete certain tasks shocked all parties in attendance when we presented our findings to stakeholders. Any potential argument that student, faculty, and staff missteps were the result of improper instruction and not of a usability issue was countered by evidence that the same missteps were sometimes made by library staff. Not only was this an eye-opening revelation to our entire staff, it served as the evidence our team needed to break through entrenched resistance to making any changes. We were met with almost instant, even enthusiastic, buy-in to our redesign recommendations from the Libraries’ administration. Therefore, we highly recommend that other academic libraries consider including library staff as participants in their website usability studies. REFERENCES 1 Barbara A. Blummer, “A Literature Review of Academic Library Web Page Studies,” Journal of Web Librarianship 1, no. 1 (2007): 45–64, https://doi.org/10.1300/J502v01n01_04. 2 Jody Condit Fagan, “Usability Studies of Faceted Browsing: A Literature Review,” Information Technology and Libraries 29, no. 2 (2010): 58–66, https://ejournals.bc.edu/ojs/index.php/ital/article/view/3144/2758. 3 Judith Z. Emde, Sara E. Morris, and Monica Claassen-Wilson, “Testing an Academic Library Website for Usability with Faculty and Graduate Students,” Evidence Based Library and Information Practice 4, no. 4 (2009): 24–36, https://doi.org/10.18438/B8TK7Q. 4 Ibid., 30. 5 Nancy B. Turner, “Librarians Do It Differently: Comparative Usability Testing with Students and Library Staff,” Journal of Web Librarianship 5, no. 4 (2011): 286–98, https://doi.org/10.1080/19322909.2011.624428. 6 Ibid., 295. https://doi.org/10.1300/J502v01n01_04 https://ejournals.bc.edu/ojs/index.php/ital/article/view/3144/2758 https://doi.org/10.18438/B8TK7Q https://doi.org/10.1080/19322909.2011.624428 EVERYONE’S INVITED | AZADBAKHT, BLAIR, AND JONES 45 https://doi.org/10.6017/ital.v36i4.9959 7 Andrew D. Asher and Lynda M. Duke, “Searching for Answers: Student Behavior at Illinois Western University,” in College Libraries and Student Culture: What We Now Know (Chicago: American Library Association, 2012), 77–78. 8 Lucy Holman, “Millennial Students’ Mental Models of Search: Implications for Academic Librarians and Database Developers,” Journal of Academic Librarianship 37, no. 1 (2011): 21– 23, https://doi.org/10.1016/j.acalib.2010.10.003. https://doi.org/10.1016/j.acalib.2010.10.003 ABSTRACT INTRODUCTION METHODS Limitations RESULTS Common Breakdowns DISCUSSION CONCLUSION REFERENCES 9966 ---- Untitled A Case Study on the Path to Resource Discovery Beth Guay INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 18 ABSTRACT A meeting in April 2015 explored the potential withdrawal of valuable collections of microfilm held by the University of Maryland, College Park Libraries. This resulted in a project to identify OCLC record numbers (OCN) for addition to OCLC’s Chadwyck-Healey Early English Books Online (EEBO) KBART file.1 Initially, the project was an attempt to adapt cataloging workflows to a new environment in which the copy cataloging of e-resources takes place within discovery system tools rather than traditional cataloging utilities and MARC record set or individual record downloads into online catalogs. In the course of the project, it was discovered that the microfilm and e-version bibliographic records contained metadata which had not been utilized by OCLC to improve its link resolution and discovery services for digitized versions of the microfilm resources. This metadata may be advantageous to OCLC and to others in their work to transition from MARC to linked data on the Semantic Web. With MARC record field indexing and linked data implementations, this collection and others could better support scholarly research. Collections, Discovery Tools, and Metadata Services The University of Maryland, College Park Libraries’ (the Libraries; UM Libraries) collections include 3.45 million print books and 1.2 million eBooks, 17,000 electronic journals, and 352 electronic databases.2 In late 2011, the Libraries implemented WorldCat Local, OCLC’s single- search-box interface to the WorldCat database of cataloged resources and a central index of metadata provided by publishers, Abstracting and Indexing Services, institutional repositories, and so on. With WorldCat Local, and later, WorldCat Discovery, OCLC utilizes a knowledge base in managing e-resources discovery and access.3 Knowledge bases are “associated with link resolvers and electronic resource management systems” and “contain title-level metadata, linking syntax rules, publication ranges and other data.”4 KBART files are so named to represent files compliant with the NISO recommended practice, “Knowledge Bases and Related Tools (KBART).”5 KBART files, created and supplied by content providers, are used to transmit this title level metadata to knowledge base vendors and discovery service providers.6 Since OCLC enhances these files with OCLC numbers (OCN) in order to provide automated holdings maintenance on WorldCat bibliographic records, the Libraries’ Metadata Services Department (MSD) adopted a policy in 2012 to provide access to e-resources only via WorldCat when such files are available. Beth Guay (baguay@umd.edu) is Continuing Resources Librarian, University of Maryland Libraries, University of Maryland, College Park. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 19 Space Planning Early on, the Libraries’ collection policies targeted duplicate copies of print monographs and print journals held electronically in trusted repositories, e.g., JSTOR, for deselection. By March 2014, the Libraries’ Collection Development Council discussed moving microfilm collections to the yet to be opened Severn Library, slated to “house lesser used materials … in order to free up much needed space for users and the development of new collaborative learning spaces.” 7 8 A year later, in April 2015, a meeting was called by the Assistant Head, Collection Development, to investigate microfilm collection retention decisions. This time the Libraries were considering the withdrawal of microfilm resources for which equivalent versions were held online. A caveat placed on the withdrawal of the microfilm by the collection managers was that prior to their withdrawal and subsequent deletion of the Libraries’ holdings on the WorldCat bibliographic records, the equivalent e-version resources should be made discoverable in WorldCat UMD (the Libraries’ WorldCat Discovery implementation) by the addition of the Libraries’ holdings on e-version bibliographic records corresponding to the microfilm version records. Following the meeting, the Librarian for English, Latin American, & Latina/o Studies and Second Language Acquisition provided the Continuing and Electronic Resources Cataloger (C-ER Cataloger) with a list of eight valuable microfilm collections of resources and for each, the name of the comparable online collection (or e-collection) subscribed to. It was agreed that the C-ER Cataloger would investigate to determine if any of those microfilm collections could be withdrawn in compliance with the collection managers’ caveat. In other words, the C-ER Cataloger’s mission was to ensure a one-to-one correspondence of electronic and microfilm version bibliographic records for the equivalent versions of the resources. One of the e-collections added to the WorldCat Knowledge Base (WCKB) by the Libraries was Gale’s, The Making of the Modern World, 1450-1850: Part I collection (MOMW). This collection is comprised of digitized versions of Gale's microfilm resources in the series, The Goldsmiths'-Kress library of economic literature.9 A KBART file was derived from the Libraries’ MOMW MARC record set and uploaded to the WCKB sandbox where it supports the Libraries’ access to the e-version resources. The MOMW MARC record set had been reviewed and vetted by the Libraries prior to its purchase, and upon its purchase, Gale had set the Libraries’ holdings on the WorldCat bibliographic records representing the resources. With this information in mind, the C-ER Cataloger determined that the MOMW e-resource bibliographic records were comparable to those representative of the Libraries’ corresponding Goldsmiths'-Kress library of economic literature microfilm collection, thus meeting the collection managers’ criteria for deselection. The 3380 reels that could be withdrawn comprised a small but not insignificant allotment of physical space in the library. Provision of discoverability of equivalent e-versions of resources held in other collections proved difficult. For example, the corresponding microfilm collections represented in the WCKB’s British Periodicals Collections I and II were held in the series, Early British periodicals and English literary INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 20 periodicals.10 The Libraries had cataloged 186 individual serial titles in the microfilm series, Early British periodicals in 2002, but none in the series, English literary periodicals. Thus the objective would have been to ensure discoverability for the equivalent electronic versions of the Libraries’ 186 cataloged microfilm versions in the Early British periodicals series. At the time of this investigation, there were 580 British Periodicals I and II KBART file title entries; 390 of which had OCN. Whereas the OCN of The Making of the Modern World, 1450-1850: Part I WCKB collection were known entities, the OCN of the remaining e-collections had yet to be vetted. Thus the British Periodicals Collections I and II records were spot checked for evaluation. The quality of the 390 OCLC records ranged from excellent, e.g., OCLC record #297425799, to poor, e.g. #818401694 (see Figure 1, 2, 3, and 4). MARC record images in Figures 1-4 are sourced from OCLC’s Connexion cataloging client interface to the WorldCat bibliographic database. Figures 1 and 2 represent a microfilm version record and a comparable “excellent” quality record given for the resource in the WorldCat Knowledge Base, while Figures 3 and 4 represent a microfilm version and comparable “poor” quality record given for the resource in the WCKB. Note that the C-ER Cataloger’s definition of an excellent quality e-version record was one which provided metadata comparable to those of its equivalent microfilm version record; likewise, a poor quality record lacked comparable metadata. In other words, an excellent quality record was viewed as a guarantor of a discoverable resource, while a poor quality record was viewed as an obstacle to discovery. For this WCKB collection, the C-ER Cataloger determined that staff expertise with serial bibliographic records was required, and due to MSD staffing limitations, moved ahead to examine the other collections. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 21 Figure 1. Microfilm version record INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 22 Figure 2. Excellent quality e-version record — OCN in the KB file A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 23 Figure 3. Microfilm version record INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 24 Figure 4. Poor quality e-version record — OCN in the KB file In an investigation into OCLC’s Chadwyck-Healey Early English Books Online (EEBO) KBART file, for which equivalent e-versions of microfilm resources in the series Early English books, 1475- 1640 and Early English books, 1641-1700 are held, it was found that the availability of comparable e-version bibliographic records was optimal.11 In consultation with the MSD department head, a project to ensure the discoverability of equivalent e-versions of the Libraries’ 5,062 cataloged microfilm resources in the series, Early English books, 1475-1640 was initiated. The C-ER Cataloger had hoped to follow with a similar effort for the Libraries’ resources in the series Early English books, 1641-1700 (represented by 41,306 records in the Libraries’ Integrated Library System). Background: EEBO, Related Resources and Bibliographic Records Much has been written on EEBO’s inception and continuing development as a collection of digital reproductions of microfilm reproductions of pre-1700 print resources, and on its scholarly value (Kitchuk, 2007; Martin, 2007; Gadd, 2009; Mak, 2013; Folger Shakespeare Library, 2015).12 Alfred Pollard and Gilbert Redgrave’s A short-title catalogue of books printed in England, Scotland, & Ireland and of English books printed abroad, 1475-1640 (“STC”), and the “companion” volume, Donald Wing's Short-title catalogue of books printed in England, Scotland, Ireland, Wales, and British America, and of English books printed in other countries, 1641-1700 (“Wing”), respectively, were used in selecting the print resources for filming.13 Gadd (2009, 683) pinpointed the STC as “a catalogue of editions (or more accurately, editions and issues) not copies although, of course, the A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 25 information about any edition is derived primarily from the surviving copies … Each entry gives the location of known copies …” 14 The “successor” to STC and Wing, the English Short Title Catalog (ESTC), “includes records for every item listed in STC, every item in Wing, every item in the Eighteenth Century Short Title Catalogue … and newspapers and other serials which began publication before 1801” and is freely available online from the British Library.15 16 Gadd (2009, 685-686) offered this critique concerning EEBO’s bibliographic data and relationship to the ESTC: EEBO’s relationship with the original STC and Wing is straightforward and clear; EEBO’s relationship with electronic ESTC, on the other hand, is less well-known. A series of agreements made between ESTC and University Microfilms/ProQuest between 1989 and 1997 allowed EEBO to draw directly on ESTC’s existing bibliographical data … EEBO heavily edited ESTC’s data for its own purposes; certain categories of data were removed (e.g. collations, Stationer’s Register entrances), some information was amended (e.g., subject headings), and some was added (e.g. microfilm specific details). Second, there is no formal mechanism for synchronizing the data between the two resources. Occasionally, snapshots of data are sent by EEBO to ESTC but there is no guarantee that a correction or revision made to an ESTC entry will be replicated in the corresponding EEBO or vice-versa: neither ESTC nor EEBO will necessarily know when the other made a correction.17 Gadd postured that “as both resources continue to amend and expand their bibliographical data for their own purposes, there is an increasing likelihood of significant discrepancy between the two resources.”18 He did not further address the quality of the bibliographic records describing the EEBO versions of the resources; perhaps he was unaware of the sources of the EEBO bibliographic data. Microfilm version bibliographic records serve as the basis of the metadata describing the EEBO version resources. According to ProQuest, “MARC records (from which EEBO Bibliographic records derive) are produced for the microfilm collection Early English Books (EEB) after they are filmed.”19 OCLC’s cataloging database has served as one source of microfilm version records for titles in the series since the 1980s. In 1984, the Association of Research Libraries (1984, p. J-3) reported that one library had “input an indeterminate amount [of bibliographic records] into OCLC” for Early English books, 1475-1640, and that one had “input records for an indeterminate percentage of the set into OCLC” for resources in the series, Early English books 1641-1700.20 The cataloging sources of these microfilm resources have varied over time, from cooperative projects to UMI/ProQuest staff to individual libraries, however, adherence to standards has characterized the totality of the efforts invested. Joachim (1993, p. 111) described the cooperative effort begun in 1984 by the Indiana University Libraries, University of California, Riverside, University of Delaware, and the University of Utah to catalog microfilm version resources cataloged by Wing: INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 26 In order to maintain standards and consistency among the five libraries, the project director prepared a “Wing STC Project manual.” The manual includes general information, information on authority work, a bibliography, a discussion of special cataloging problems and procedures, sample records, and database input guidelines.21 OCLC’s MARC records for the microfilm and EEBO version resources contain note fields identifying the locations of the print copies filmed and subsequently reproduced digitally by UMI/ProQuest. Gadd (2009, p. 686) emphasized the importance of this information to scholars in stating that “different copies from the same edition might vary, sometimes markedly.”22 As to Gadd’s (2009) critique concerning the lack of a formal synchronization mechanism and increasing likelihood of discrepancies between EEBO and ESTC, further examination of EEBO and ESTC bibliographic record displays such as those shown in Figures 5 and 6 suggest that the British Library is working with ProQuest to align their data. It appears a focus of the British Library may be to inform the scholar of the availability of the microfilm and electronic versions of the print resources. In its ESTC overview, the British Library states that “the existence of selected … printed and digital surrogates within products such as Early English Books Online … is … noted” in its records and that its records “act as an index to several major research microform series … including Early English Books, 1475-1640 … [and] Early English books, 1641-1700.”23 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 27 Figure 5. EEBO bibliographic record for the resource cited by STC 2nd edition entry 9164 and reproduced from the copy held at the Society of Antiquaries, London. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 28 Figure 6. ESTC catalog record for STC 2nd edition, entry 9164 (http://estc.bl.uk/S3614). The code, “Lsa” given as “Loc. Of filmed copy” is the British Libraries’ MARC code for the Society of Antiquaries Library. 24 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 29 Finally, to add to this mix of print, microfilm, and EEBO digitized images, XML/SGML versions of the resources are being created by the Text Creation Partnership (TCP), formed in 1999 by the university libraries of Michigan and Oxford, ProQuest, and the Council on Library and Information Resources, to provide full text search capability.25 Catalog records describing TCP versions are available in WorldCat. According to the TCP, “the TCP does not have the resources to create new catalog records for each text we produce (though you are welcome to do so, and if you are willing to share them we would be very glad to know about it).”26 The UM Libraries’ EEBO Project The OCLC EEBO KBART file, which contained 129,544 title entries when downloaded, 58,518 of which lacked OCN, was combined with a file extracted from the 5,062 MARC records that represented the microfilm resources. The merged file was to be used as a tool in identifying the OCN of the equivalent e-versions of the microfilm resources held. The plan was to add the e- version OCN to the EEBO KBART file via OCLC’s OCN correction form.27 Significant time was spent developing and documenting procedures by which staff could perform the work of identifying OCN for addition to the EEBO file. The basic procedures are as follows: (1) via the OCLC Connexion cataloging client, search and retrieve the e-version record using the microfilm version record data; (2) use titles and/or OCN of the microfilm version record to identify the comparable EEBO resource in the KBART file; (3) view the EEBO resource record using the URL in the file; and (4) record the OCN of the matching e-version record in the appropriate row/column of the file.28 Subsequently, two MSD staff members were recruited to assist in the effort. In early November and mid-December, 2015, training sessions were held with both staff, followed by an individual session with each. Before the year’s end, each staff member had successfully completed an assigned number of “titles” for review. Importantly, from the initial investigative work, a KBART file with 50 OCN was compiled and submitted to OCLC. Confirmation from OCLC Customer Support was given that the file would be loaded. Due to the ongoing developmental status of OCLC’s services, the OCN were not loaded into the WCKB until June 2016. However, a second file sent in April 2016 was loaded in June as well. The number of OCN added to the WorldCat Knowledge Base from the project’s inception through 2016 was small due to staffing issues. The average staff time to complete a microfilm/equivalent e-version title entry in the KBART file was 13 minutes.29 As the project progressed, staff following the procedures confirmed that some OCN in the EEBO KBART file were incorrect. Most often, the “errors” stemmed from the attribution of TCP or German language of cataloging record OCN to the EEBO version resources. These TCP and German language of cataloging records correctly corresponded to matching EEBO version resources, however, TCP version records refer to XML/SGML encoded text editions; secondly, OCLC attempts to prefer English language of cataloging records over others in its knowledge base.30 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 30 Other OCN errors seriously detract from the WCKB’s EEBO file’s value. For example, WorldCat record number 606541404 describes the “fourth edition very much enlarged” of “A Most exact catalogue of the Lords spirituall and temporall, as peers of the realme, in the higher House of Parliament, according to their dignities, offices, and degrees: some other called thither for their assistance, & officers of their attendances …” yet this OCN in the WorldCat Knowledge Base’s EEBO KBART file links to an EEBO record describing the “third edition much enlarged.” See Figure 8 illustrating the WorldCat UMD record which links to an EEBO resource record describing the “third edition much enlarged.” Note that the OCLC record (as seen in the Connexion client view of the record in Figure 9) is cited by STC (2nd ed.) 7746.3 while the EEBO version record linked to is cited by STC (2nd ed.) 7746.2. To make matters worse, the author determined that the corresponding image associated with the EEBO catalog record cited by STC 7746.2 and displayed at the site corresponded to neither resource cited as STC 7742.2 and STC 7746.3. These were both printed in 1628, but the image provided at the EEBO site was of a resource printed in 1640 (see Figure 10). Figure 8. WorldCat UMD record OCN 606541404 linking to the wrong version of a resource in EEBO. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 31 Figure 9. Connexion client view of OCN 606541404 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 32 Figure 10. Digital image linked to from EEBO record describing the “third edition much enlarged” of a resource printed in 1628. http://gateway.proquest.com/openurl?ctx_ver=Z39.88- 2003&res_id=xri:eebo&rft_id=xri:eebo:image:23639 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 33 Further investigation identified errors of misappropriation of OCN in the KBART file to EEBO version records describing copies of editions filmed at locations other than those noted in the corresponding OCLC records. For example, the EEBO resource, “By the King. A proclamation for the adiournement of part of Trinitie terme,” identified in the WCKB as associated with OCN 71492075, links the scholar to a resource described by the EEBO version record as the copy filmed at the British Library. OCLC record 71492075 however indicates that the copy it describes was the copy filmed at the Henry E. Huntington Library and Art Gallery. See Figures 11-13. Figure 11. The WCKB associates OCN 71492075 with the EEBO resource, “By the King. A proclamation for the adiournement of part of Trinitie terme,” described by the EEBO website as the copy filmed at the British Library. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 34 Figure 12. The EEBO resource record linked from OCN 71492075 by the OCLC EEBO KBART file indicates the copy filmed was held by the British Library. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 35 Figure 13. OCN 71492075 indicates it describes a copy of the resource, “By the King : a proclamation for the adiournement of part of Trinitie terme,” filmed at the Henry E. Huntington Library and Art Gallery. Evaluation The UM Libraries’ EEBO project procedures revealed that match points of equivalent microfilm and e-version records were the names of the institutions holding the filmed copies and the STC citations to the resources.31 STC citations are carried in the MARC 510 fields of the bibliographic records in two subfields: 1. in subfield “a,” the names of citing works, given in a brief form, e.g., “STC” to represent Pollard and Redgrave’s Short-title catalogue; and 2. in subfield “c,” the location (e.g., page number or volume) within the citing works, e.g. “8626.”32 Figure 14 displays a Connexion Client view of OCN 33150534, cited as STC 9170, and Figure 15 shows the same record in the WorldCat display view. Unfortunately, the MARC 510 fields are neither indexed by OCLC nor displayed in WorldCat.33 OCLC could enable the identification and INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 36 collocation of records for equivalent print, microfilm and electronic versions by indexing the MARC 510 fields and subfields.34 Figure 14. Microfilm version record OCN 33150534, cited as STC 9170. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 37 Figure 15. WorldCat.org view of OCN 33150534, STC 9170 (http://www.worldcat.org/oclc/33150534). The underlying MARC 510 field metadata is not displayed. Investigation by the author revealed that TCP version records supply these metadata elements in duplicate in different MARC fields; one a free text note field, the other a number/code field, 024. The 024 field is defined to carry a “standard number or code published on an item which cannot be accommodated in another field (e.g., field 020 (International Standard Book Number).”35 It should be noted that use of the 024 field to carry a number that is not published on the item is not in accordance with the field’s definition. The TCP records use the 024 field with a first indicator value “8,” conveying that the number is an unspecified type of standard number or code.36 Subfield “a” of the 024 field, which carries the STC numbers in the TCP version records, is indexed by OCLC. In the TCP version records, however, these elements are ensconced within strings of text, e.g., “(stc) STC (2nd ed.) 9170.”37 A search on standard number, “9170,” in WorldCat will therefore fail to retrieve the appropriate record. See Figure 16 for an example of a TCP version record of a resource cited as STC 9170. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 38 In respect to the MARC field definitions, should there be a need to retrieve bibliographic records representing TCP versions of resources via STC citations, these numbers should be entered in “a” subfields, and the brief abbreviated names of the citing source, e.g., “STC (2nd ed.),” “Wing,” etc. in the “2” subfield which is defined to carry the “Source of number or code.”38 Should OCLC choose to index the MARC 510 fields as described above, the Text Creation Partnership records would be missed. Figure 16. Text Creation Partnership version OCN 832931179, STC 9170 Indexing of the MARC 510 fields/subfields by OCLC combined with use of other MARC field/subfield values, such as language of cataloging, to limit results to desired OCN could support elimination of EEBO KBART file OCN errors and identification of thousands of new OCN for addition to this and perhaps other similar files. 39 As a point of reference, according to OCLC’s “MARC Usage in WorldCat” webpages, as of January 1, 2016, there were 6,382,317 instances of MARC 510 “a” subfields and 4,082,280 instances of the “c” subfields.40 It should be noted, however, there are five first indicator values available for use in MARC 510 fields and only one of them is used to convey the information that the location in the source data is given in the field. Also worth noting, 024 data at the “MARC Usage in WorldCat” webpages shows that there were 4,633,776 occurrences of subfield 2 of the 024 field, and 43,711,819 occurrences of subfield “a.”41 510 field indexing to support identification of OCN for addition to the EEBO KBART file may require the participation of the content provider, ProQuest. The 510 field elements are indexed in A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 39 its Early English Books Online collection. ProQuest could add these data to its EEBO KBART file in support of OCN matching. The KBART Recommended Practice allows content providers “to include any extra data fields after the last KBART utilized position.”42 Finally, it should be noted that reconciliation of errors in the WCKB EEBO file pertaining to the locations of the filmed copies as noted in OCLC records but found to be different at the EEBO site would require more complex steps than 510 field matching. Furthermore, catalogers working on the EEBO project were not instructed to check the images at the EEBO website but only to confirm the STC citation match points in the EEBO version records. A closer examination of EEBO in light of the findings in this paper of an EEBO record linked to a resource printed 12 years later is an area calling for further study. In respect of the needs of scholars as eloquently described by Gadd (2009), the WorldCat Knowledge Base OCN must improve its accuracy in terms of access provision via WorldCat Discovery. MARC 510 Elements: Opportunities for Linked Data Applications? OCLC is actively engaged in research and collaboration with the greater library community to transition its metadata to linked data, however, MARC 510 metadata is lacking in its linked data record display views (see Figure 14 in a Connexion client view of a record and Figure 17 in the WorldCat linked data display view).43 44 On the other hand, in its work to transfer its English Short Title Catalog, a “MARC based … vendor-supplied ILS” to “ESTC21” a “native linked data resource,” it appears the British Library combines the MARC 510 subfield values, e.g., “Bristol, B7384” as a resource property value (Figures 18 and 19).45 46 “Bristol, B7384” represents entry number 7384 in Roger P. Bristol’s Supplement to Charles Evans' American bibliography (see Figure 20, WorldCat OCLC record number 88701).47 As presented in Figure 19 (Stahmer, 2014), “Bristol, B7384” may be comprehensible to a well-versed scholar, librarian or archivist, but not to a computer. Hillmann, Dunsire, and Phipps (2013) posited that “it would be useful if all managers of schemas and other standards were to develop element sets and value vocabulary representations that match the source semantics at the finest granularity and make them available along with maps of the internal ontologies.”48 Could a Semantic Web implementation of MARC 510 metadata at the finest granularity, with resource identifiers representing citing works such as “Bristol” and with property values such as “7384” representing locations within citing works, offer benefits to scholarship? It has been demonstrated in this paper that the consistent match points across bibliographic records representing equivalent versions of these resources has been the metadata contained in MARC 510 fields. Ultimately, a linked data implementation of the MARC bibliographic 510 field should lead the scholar to every known print copy comprising every edition, according to Gadd’s definition of an edition, above, and to the institutional holdings of equivalent microform, digitized images, or digitized full-text versions, giving the scholar the path to the resources of interest.49 OCLC, the British Library, members of the TCP, and other stakeholders may want to consider further exploration of use case scenarios to determine or rule out additional benefits of transforming MARC 510 field metadata to linked data. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 40 Figure 17. Linked data view of OCLC #33150534, http://www.worldcat.org/oclc/33150534 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 41 Figure 18. MARC 510 field data in ESTC INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 42 Figure 19. MARC 510 metadata in structured data view in ESTC21 A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 43 Figure 20. Print version of OCN 88701, Supplement to Charles Evans' American bibliography by Roger P. Bristol, http://www.worldcat.org/oclc/88701. CONCLUSION At the current pace, given available staffing and the number of EEBO resources lacking OCN, the time and effort spent by the Libraries’ Metadata Services Department staff toward the goal of adding OCN to the OCLC EEBO KBART file, though well spent, will be years in the making. A collective effort in this endeavor by the WCKB community of users is welcomed by this author.50 A combined effort by OCLC and ProQuest to improve discovery and link resolution services for these valuable scholarly resources could increase their discoverability exponentially, allowing MSD staff to spend more time creating and enhancing the metadata that will lead researchers to the uncatalogued EEBO resources they seek. As to the transition of MARC 510 field metadata to linked data, OCLC, the British Library, members of the TCP, and other stakeholders should consider their options before moving forward without it. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 44 ACKNOWLEDGEMENT The author wishes to thank Karen Coyle for reading and advising on earlier versions of this paper; Becky Culbertson, Nathan Putnam, and Patricia Herron for supporting the project; and Joshua Westgard for converting the data to get the project underway. Special thanks are due to staff members of the UM Libraries, Donna King, Roselin Becker, Erica Hemsley, Yeo-Hee Koh, and Tanisha Lee, and to Freeda Brook, Luther College, for their work on the project. REFERENCES 1. A KBART file is a file compliant with the NISO recommended practice, Knowledge Bases and Related Tools (KBART). See KBART Phase II Working Group, Knowledge Bases and Related Tools (KBART): Recommended Practice: NISO RP-9-2014 (Baltimore, MD: National Information Standards Organization (NISO), 2014), accessed March 14, 2017, http://www.niso.org/publications/rp/rp-9-2014/. 2. University of Maryland Libraries. “About.” Last updated July 28, 2016, http://www.lib.umd.edu/about 3. In 2015, the Libraries implemented WorldCat Discovery, intended to be a replacement for WorldCat Local. 4. Marshall Breeding, The Future of Library Resource Discovery, (Baltimore, MD: National Information Standards Group (NISO), 2015): 17, accessed February 18, 2017. http://www.niso.org/apps/group_public/download.php/14487/future_library_resource_disc overy.pdf 5. KBART Phase II Working Group, Knowledge Bases and Related Tools (KBART): Recommended Practice: NISO RP-9-2014, (Baltimore, MD: National Information Standards Group (NISO), 2014), accessed April 13, 2017, http://www.niso.org/publications/rp/rp-9-2014/ 6. Open Discovery Initiative Working Group, Open Discovery Initiative: Promoting Transparency in Discovery: NISO RP-19-2014, (Baltimore, MD: NISO, 2014): 13, accessed March 14, 2017, http://www.niso.org/publications/rp/rp-9-2014/ 7. University of Maryland Libraries Collection Development Council. “Meeting Notes,” March 4, 2014. 8. “University of Maryland Libraries Master Space Plan,” Nov. 2015, June 2016 update. 9. See Gale’s web page, “The Making of the Modern World (MOMW) FAQ,” at http://find.galegroup.com/mome/component/researchtools/xml/FAQ.xml, accessed February 18, 2017, for a details about the collection. WorldCat Knowledge Base collections may be created by libraries and uploaded to the Knowledge Base. Details on the process are available at http://www.oclc.org/support/services/collection- manager/documentation.en.html#knowledgebase, accessed February 18, 2017. A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 45 10. ProQuest’s British Periodicals collection “offers facsimile page images and searchable full text for nearly 500 British periodicals published from the 17th century through to the early 21st” and “is available in four separate collections, British Periodicals Collections I, II, III, and IV, each of which can be purchased separately.” ProQuest British Periodicals product description page, http://search.proquest.com/britishperiodicals/productfulldescdetail?accountid=14696, accessed Jan. 29, 2017 11. Details about resources available in EEBO are provided by ProQuest at its website, “EEBO: About EEBO,” accessed January 29, 2017. http://eebo.chadwyck.com/marketing/about.htm 12. Diana Kichuk, “Metamorphosis: Remediation in Early English Books Online (EEBO),” Literary and Linguistic Computing, 22:3 (2007): 291-303; Shawn Martin, “EEBO, Microfilm, and Umberto Eco: Historical Lessons and Future Directions for Building Electronic Collections,” Microform & Imaging Review, 36:4 (2007): 159-164; Ian Gadd, “The Use and Misuse of Early English Books Online, Literature Compass, 6:3 (2009): 680-692; Bonnie Mak, “Archaeology of a Digitization,” Journal of the Association for Information Science and Technology, 65:8 (2014): 1515-1526; Folger Shakespeare Library, “History of Early English Books Online,” http://folgerpedia.folger.edu/History_of_Early_English_Books_Online, last modified on 26 August 2015. 13. A.W. Pollard and G. R. Redgrave. A short-title catalogue of books printed in England, Scotland, & Ireland and of English books printed abroad, 1475-1640, Rev. ed. (London: The Bibliographical Society, 1976–1991); Donald Wing, Short-title catalogue of books printed in England, Scotland, Ireland, Wales, and British America, and of English books printed in other countries, 1641-1700, 2d ed., newly rev. and enl. (New York : Modern Language Association of America, 1972- <1994>) 14. Gadd, “The Use and Misuse of Early English Books Online,” 683. 15. “About EEBO.” 16. Details on the ESTC are provided by the British Library at http://www.bl.uk/reshelp/findhelprestype/catblhold/estccontent/estccontent.html, viewed March 12, 2017 17. Gadd, “The Use and Misuse of Early English Books Online,” 685-686. 18. Gadd, “The Use and Misuse of Early English Books Online,” 686. 19. EEBO, “Frequently Asked Questions,” accessed February 18, 2017. http://eebo.chadwyck.com/help/faqs.htm 20. Association of Research Libraries, Microform Sets in U.S. and Canadian Libraries, (Washington, D.C.: Association of Research Libraries, 1984), J-3. 21. Martin D. Joachim, “Cooperative Cataloging of Microform Sets,” in Cooperative Cataloging: Past, Present, and Future (New York: The Haworth Press, 1993), 111. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 46 22. Gadd, “The Use and Misuse of Early English Books Online,” 686. 23. British Library, “Catalogs of British Library Holdings: English Short Title Catalogue - content,” accessed February 18, 2017. http://www.bl.uk/reshelp/findhelprestype/catblhold/estccontent/estccontent.html 24. The British Libraries ESTC codes for filmed copy locations are difficult to translate. See Meaghan J. Brown’s finding aid, “STC Location Code Transcription” wherein she offers details on STC and ESTC location codes and the problem her finding aid addresses. Brown explains, “… it is currently possible to search the ESTC for items using MARC codes, but not the location codes familiar from the STC,” accessed February 18, 2017. http://www.meaghan- brown.com/stc-location-codes/ 25. Text Creation Partnership, accessed January 25, 2017. http://www.textcreationpartnership.org/home/ 26. Text Creation Partnership, accessed January 25, 2017. http://www.textcreationpartnership.org/catalog-records/ 27. OCLC’s form is available at https://www.oclc.org/content/dam/support/knowledge- base/ocn_report.xlsx, accessed October 18, 2016. 28. See Appendix 1 for the Procedures 29. With streamlined KBART search features introduced by a Metadata Services Department colleague, it’s expected this time may be reduced moving forward. 30. A June 9, 2015 email from an OCLC staff member to the KB-L@oclc.org listserv reported on OCLC’s efforts to match OCN in its KBART files to English language of cataloging records, when available. 31. UM Libraries’ staff use this metadata in the equivalent OCLC microfilm and e-version and EEBO resource records as match points. Staff do not verify that the images linked to the EEBO version records correspond to those in the aforementioned bibliographic records. It is hoped that ProQuest will investigate the case described in this paper in which the EEBO resource differs from its corresponding record. 32. “510 Citation/Reference Note,” OCLC, Bibliographic Formats and Standards. 4th Edition, last revised August 22, 2016. https://www.oclc.org/bibformats/en/5xx/510.html 33. As of January 29, 2017, the MARC 510 field has not been indexed by OCLC. See http://www.oclc.org/support/help/SearchingWorldCatIndexes/#05_FieldsAndSubfields/5xx _fields.htm 34. E.g., OCLC indexes “internet resources” using a combination of MARC data elements. These are laid out in “Searching WorldCat Indexes” at http://www.oclc.org/support/help/SearchingWorldCatIndexes/#06_Format_Document_Typ e_Codes/Format_Document_type_codes.htm. MARC 21 Bibliographic at A CASE STUDY ON THE PATH TO RESOURCE DISCOVERY | GUAY | doi:10.6017/ital.v36i3.9966 47 https://www.loc.gov/marc/bibliographic/bdleader.html provides the Leader position 06 code for “Language material.” MARC Code List for Languages (http://www.loc.gov/marc/languages/) contains the language codes contained in the language of cataloging field/subfield (MARC 040 field, subfield “b”). 35. “024 Other Standard Identifier,” in OCLC, Bibliographic Formats and Standards, 4th edition, accessed January 25, 2017. https://www.oclc.org/bibformats/en/0xx/024.html 36. Ibid. 37. OCLC. Searching WorldCat Indexes, accessed February 18, 2017. http://www.oclc.org/support/help/SearchingWorldCatIndexes/#05_FieldsAndSubfields/0xx _fields.htm%3FTocPath%3DFields%2520and%2520subfields%7C_____2 38. See OCLC Bibliographic Formats and Standards, Fourth edition. 024 Other Standard Identifier https://www.oclc.org/bibformats/en/0xx/024.html, viewed January 25, 2017 39. An Oct. 18, 2016 review of OCLC’s all-collections-list, available at https://www.oclc.org/content/dam/support/knowledge-base/all-collections-list.xlsx indicates that 38.5% percent of the 129,498 resources on the EEBO KBART file have OCLC number coverage. 40. http://experimental.worldcat.org/marcusage/510.html 41. http://experimental.worldcat.org/marcusage/024.html 42. KBART Phase II Working Group, Knowledge Bases and Related Tools (KBART): Recommended Practice: NISO RP-9-2014 (Baltimore, MD: NISO 2014), 18. http://www.niso.org/workrooms/kbart 43. https://www.oclc.org/worldcat/data-strategy.en.html, viewed Jan. 26, 2017 44. The image of the linked data view of Figure 14 was captured on February 18, 2017. 45. Carl Stahmer, “Making MARC Agnostic: Transforming the English Short Title Catalogue for the Linked Data Universe,” in Linked Data for Cultural Heritage, (Chicago: ALA Editions), p. 23-25. 46. The assertion that the ESTC transformation of MARC 510 field metadata is solely based on Carl Stahmer, “The ESTC as a 21st Century Research Tool,” Presentation given at the 2014 conference of the Text Encoding Initiative, viewed February 19, 2017. https://figshare.com/articles/ESTC21_at_TEI_2014/1558057 47. Roger P. Bristol, Supplement to Charles Evans' American Bibliography (Charlottesville: University Press of Virginia, 1970). 48. Dianne Hillmann, Gordon Dunsire, and Jon Phipps, “Maps and Gaps: Strategies for Vocabulary Design and Development,” in DCMI International Conference on Dublin Core and Metadata Applications, 2013: 88, accessed February 18, 2017. http://dcpapers.dublincore.org/pubs/article/view/3673/1896 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2017 48 49. See Reference 14 above. 50. A discussion and invitation to collaborate on this work took place in late 2016 on the OCLC WorldCat KB listserv (see http://listserv.oclc.org/scripts/wa.exe?SUBED1=kb-l&A=1). To date, the Preus Library, Luther College, will be working with the Libraries on this project. 9987 ---- It is Our Flagship: Surveying the Landscape of Digital Interactive Displays in Learning Environments Lydia Zvyagintseva INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 50 Lydia Zvyagintseva (lzvyagintseva@epl.ca) is the Digital Exhibits Librarian at the Edmonton Public Library in Edmonton, Alberta. ABSTRACT This paper presents the findings of an environmental scan conducted as part of a Digital Exhibits Intern Librarian Project at the Edmonton Public Library in 2016. As part of the Library’s 2016–2018 Business Plan objective to define the vision for a digital exhibits service, this research project aimed to understand the current landscape of digital displays in learning institutions globally. The resulting study consisted of 39 structured interviews with libraries, museums, galleries, schools, and creative design studios. The environmental scan explored the technical infrastructure of digital displays, their user groups, various uses for the technologies within organizational contexts, the content sources, scheduling models, and resourcing needs for this emergent service. Additionally, broader themes surrounding challenges and successes were also included in the study. Despite the variety of approaches taken among learning institutions in supporting digital displays, the majority of organizations have expressed a high degree of satisfaction with these technologies. INTRODUCTION In 2020, the Stanley A. Milner Library, the central branch of the Edmonton (Alberta) Public Library (EPL) will reopen after extensive renovations to both the interior and exterior of the building. As part of the interior renovations, EPL will have installed a large digital interactive display wall modeled after The Cube at Queensland University of Technology (QUT) in Brisbane, Australia. To prepare for the launch of this new technology service, EPL hired a digital exhibits intern librarian in 2016, whose role consisted of conducting research to inform the library in defining the vision for a digital display wall serving as a shared community platform for all manner of digitally accessible and interactive exhibits. As a result, the author carried out an environmental scan and a literature review related to digital display, as well as their consequent service contexts. For the purposes of this paper, “digital displays” refers to the technology and hardware used to showcase information, whereas “digital exhibits” refers to content and software used on those displays. Wherever the service of running, managing, or using this technology is discussed, it is framed as “digital display service” and concerns both technical and organizational aspects of using this technology in a learning institution. METHOD The data were collected between May 30 and August 20, 2016. A series of structured interviews were conducted by Skype, phone, and email. The study population was driven by searching Google mailto:lzvyagintseva@epl.ca IT IS OUR FLAGSHIP | ZVYAGINTSEVA 51 https://doi.org/10.6017/ital.v37i2.9987 and Google News for keywords such as “digital interactive AND library,” “interactive display,” “public display,” or “visualization wall” to identify organizations that have installed digital displays. A list of the study population was expanded by reviewing websites of creative studios specializing in interactive experiences and through a snowball effect once the interviews had begun. A small number of vendors, consisting primarily of creative agencies specializing in digital interactive services, were also included in the study population. Participants were then recruited by email. The goal of this project was to gain a broad understanding of the emergent technology, content, and service model landscape related to digital displays. As a result, structured interviews were deemed to be the most appropriate method of data collection because of their capacity to generate a large amount of qualitative and quantitative data. In total, 39 interviews were conducted. A list of interview questions prepared for the interviews is included in appendix A. Additionally, a complete list of the study population can also be found in Appendix B. Predominantly, organizations from Canada, the United States, Australia, and New Zealand are represented in this study. LITERATURE REVIEW Definitions • Public displays, a term used in the literature to refer to a particular type of digital display, can refer to “small or large sized screens that are placed indoor . . . or outdoor for public viewing and usage” and which may be interactive to support information browsing and searching activities.”1 In public displays, a large proportion of users are passers-by and thus first-time users.2 In academic environments, these technologies may be referred to as “video walls” and have been characterized as display technologies with little interactivity and input from users, often located in high-traffic, public areas with content prepared ahead of time and scheduled for display according to particular priorities.3 • Semi-public displays, on the other hand, can be understood as systems intended to be used by “members of a small, co-located group within a confined physical space, and not general passers-by.”4 In academic environments, they have been referred to as “visualization spaces” or “visualization studios,” and can be defined as workspaces with real-time content displayed for analysis or interpretation, often placed in in libraries or research department units.5 For the purposes of this paper, “digital displays” refers to both public and semi-public displays, as organizations interviewed as part of this study had both types of displays, occasionally simultaneously. • Honeypot effect describes how people interacting with an information system, such as a public display, stimulate other users to observe, approach, and engage in interaction with that system.6 This phenomenon extends beyond digital displays to tourism, art, or retail environments, where a site of interest attracts attention of passers-by and draws them to participate in that site. Interactivity The area of interactivity with public displays has been studied by many researchers, with three commonly used modes of interaction clearly identified: touch, gesture, and remote modes. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 52 • Touch (or multi-touch): This is the most common way users interact with personal mobile devices such as smartphones and tablets. Multi-touch interaction on public displays should support many individuals interacting with the digital screen simultaneously, since many users expect immediate access and will not take turns. For example, some technologies studied in this report support up to 30 touch points at any given time, while others, like QUT’s The Cube, allow for a near infinite number of touch points. Though studies show that this technique is fast and natural, it also requires additional physical effort from the user.7 While touch interaction using infrared sensors has a high touch recognition rate, its shortcomings have been identified as being expensive and being influenced by light interference, such as light around the touch screen.8 • Gesture: This is interaction is through movement of the user’s hands, arms, or entire body, recognized by sensors such as the Microsoft Kinect or Leap Motion systems. Although studies show that this type of interaction is quick and intuitive, it also brings “a cognitive load to the users together with the increased concern of performing gestures in public spaces.”9 Specifically, body gestures were found not to be well suited to passing-by interaction, unlike hand gestures, which can be performed while walking. Hand gestures also have an acceptable mental, physical and temporal workload.10 Research into gesture- based interaction shows that “more movement can negatively influence recall” and is therefore not suited for informational exhibits.11 Similarly, people consider gestures to be too much work “when they require two hands and large movements” to execute.12 Not surprisingly, research suggests that gestures deemed to be socially acceptable for public spaces are small, unobtrusive ones that mimic everyday actions. They are also more likely to be adopted by users. • Remote: These are interactions using another device, such as mobile phones, tablets, virtual-reality headsets, game controllers, and other special devices. Connection protocols may include Bluetooth, SMS messaging, near-field communication, radio-frequency identification, wireless-network connectivity, and other methods. Mobile-based interaction with public displays has received a lot of attention in research, media, and commercial environments because this mode allows users to interact from variable distance with minimal physical effort. However, users often find mobile interaction with a public display “too technical and inconvenient” because it requires sophisticated levels of digital literacy in addition to having access to a suitable device.13 Some suggest that using personal devices for input also helps “avoid occlusion and offers interaction at a distance” without requiring multi-touch or gesture-based interactions.14 As well, subjects in studies on mobile interaction often indicate their preference for this mode because of its low mental effort and low physical demand. However, it is possible that these studies focused on users with high degrees of digital literacies rather than the general public with varying degrees of access and comfort with mobile technologies. User Engagement Attracting user attention is not necessarily guaranteed by virtue of having a public display. According to research, the most significant factors that influence user engagement with public digital displays are age, display content, and social context. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 53 https://doi.org/10.6017/ital.v37i2.9987 Age Hinrichs found that children were the first to engage in interaction with public displays and would often recruit adults accompanying them toward the installation.15 On the other hand, the Hinrichs found adults to be more hesitant in approaching the installation: “they would often look at it from a distance before deciding to explore it further.”16 These findings suggest that designing for children first is an effective strategy for enticing interaction from users of all ages. Display Content Studies on engagement in public digital display environments indicate that both passive and active types of engagement exist with digital displays. The role of emotion in the content displayed also cannot be overlooked. Specifically, Clinch et al. state that people typically pay attention to displays “only when they expected the content to be of interest to them” and that they are “more likely to expect interesting content in a university context rather than within commercial premises.”17 In other words, the context in which the display is situated affects user expectations and primes them for interaction. The dominant communication pattern in existing display and signage systems has been narrowcast, a model in which displays are essentially seen as distribution points for centrally created content without much consideration for users. This model of messaging exists in commercial spaces, such as malls, but also in public areas like transit centers, university campuses, and other spaces where crowds of people may gather or pass by. Observational studies indicate that people tend to perceive this type of content as not relevant to them and ignore it.18 For public displays to be engaging to end users, in other words, “there needs to be some kind of reciprocal interaction.”19 In public spaces, interactive displays may be more successful than non- interactive displays in engaging viewers and making city centers livelier and more attractive.20 In terms of precise measures of attention to such displays, studies of average attention time correlate age with responsiveness to digital signage. Children (1–14 years) are more receptive than adults and men spend more time observing digital signage than women.21 Studies also indicate a significantly higher average attention times for observing dynamic content as compared to static content.22 Scholars like Buerger suggest that designers of applications for public digital displays should assume that viewers are not willing “to spend more than a few seconds to determine whether a display is of interest.”23 Instead, they recommend presenting informational content with minimal text and in such a way that the most important information can be determined in two-to-three seconds. In a museum context, the average interaction time with the digital display was between two and five minutes, which was also the average time people spent exploring analog exhibits.24 Dynamic, game-like exhibits at The Cube incorporate all the above findings to make interaction interesting, short, and drawing the attention of children first. Social Context Social context is another aspect that has been studied extensively in the field of human-computer interaction, and it provides many valuable lessons for applying evidence-based practices to technology service planning in libraries. Many scholars have observed the honeypot effect as related to interaction with digital displays in public settings. This effect describes how users who are actively engaged with the display perform two important functions: they entice passers-by to become actively engaged users themselves, and they demonstrate how to interact with the technology without formal instruction. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 54 Many argue that a conductive social context can “overcome a poor physical space, but an inappropriate social context can inhibit interaction” even in physical spaces where engagement with the technology is encouraged.25 This finding relates to use of gestures on public displays. Researchers also found that contextual social factors such as age and being around others in a public setting do, in fact, influence the choice of multi-touch gestures. Hinrichs suggests enabling a variety of gestures for each action—accommodating different hand postures and a large number of touch points, for example—to support fluid gesture sequences and social interactions.26 A major deterrent to users’ interaction with large public displays has been identified as the potential for social embarrassment.27 As an implication, the authors suggest positioning the display along thoroughfares of traffic and improving how the interaction principles of the display are communicated implicitly to bystanders, thus continually instructing new users on techniques of interaction.28 FINDINGS Technical and Hardware Landscape The average age of public displays was around three years, indicating an early stage of development of this type of service among learning institutions. Such technologies first appeared in Europe more than 10 years ago (for example, the most widely cited early example of a public display is the CityWall in Helsinki in 2007).29 However, adoption in North American did not start until around 2013.The median year for the installation of these technologies among organizations studied in this report is 2014. Among public institutions represented in the study population, such as public libraries and museums, digital displays were most frequently installed in 2015. While most organizations have only one display space, it was not unusual to find several within a single organization. For example, for the purposes of this study, the researcher has counted The Cube as three display spaces, as documentation and promotional literature on the technology cites “3 separate display zones.” As a result, the average number of display spaces in the population of this study is 1.75. The following modes of interaction beyond displaying video content with digital displays have been observed in the study population in descending order of frequency: • Sound (79%). While research on human-computer interaction is inconclusive about best practices related to incorporating sound into digital interactive displays, it is clear, among the organizations interviewed in the environmental scan, that sound is a major component of digital exhibits and should not be overlooked. • Touch or multi-touch (46%). This finding highlights that screens capable of supporting multi-user interaction is not consistent across the study population. • Gesture (25%): These include tools such as Microsoft Kinect, Leap Motion, or other systems for detecting movement for interaction. • Mobile (14%). While some researchers in the human-computer interaction field suggest mobile is the most effective way to bridge the divide between large public displays, personalization of content, and user engagement, mobile interactivity is not used frequently to engage with digital displays in the study population. One outlier is North Carolina State University Library, which takes a holistic, “massively responsive design” approach in which responsive web design principles are applied to content that can be IT IS OUR FLAGSHIP | ZVYAGINTSEVA 55 https://doi.org/10.6017/ital.v37i2.9987 displayed effectively at once online, on digital display walls, and on mobile devices while optimizing institutional resources dedicated to supporting visualization services. Further, as in the broader personal computing environment, the Microsoft Windows operating system dominates display systems, with 61% of the organizations choosing a Windows machine to power their digital display. A fifth (21%) of all organizations have some form of networked computing infrastructure, such as The Cube with its capacity to process exhibit content using 30 servers. Instead, the majority (79%) of organizations interviewed have a single computer powering the display. This finding is perhaps not surprising, given that few institutions have dedicated IT teams to support a single technology service like The Cube. Users and Use Cases Understanding primary audiences was also important for this study, as the organizational user base defines the context for digital exhibits. The breakdown of these audiences is summarized in figure 1. For example, the University of Oregon Ford Alumni Center’s digital interactive display focuses primarily on showcasing the success of its alumni, with a goal of recruiting new students to the university. However, the interactive exhibits also serve the general public through tours and events on the University of Oregon campus. Other organizations with digital displays, such as All Saints Anglican School and the Philadelphia Museum of Art, also target specific audiences, so planning for exhibits may be easier in those contexts than in organizations like the University of Waterloo Stratford Campus, with its display wall at the downtown campus that receives visitor traffic from students, faculty, and the public. 44% 33% 22% Types of Audience Academic Public Both public and academic INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 56 Figure 1. Audience types for digital displays in the study population. Digital displays serve various purposes, which depend on the context of the organization in which they exist, their technical functionality, their primary audience, their service design, and other factors. Interview participants were asked about the various uses for these technologies at their institutions. A single display could have multiple functions within a single institution. The following list summarizes these multiple uses: 1. Educational (67%), such as displaying digital collections, archives, historical maps, and other informational. These activities can be summarized in the words of one participant as “education via browse”—in other words, self-guided discovery rather than formal instruction. 2. Fun or entertainment (56%), including art exhibitions, film screenings, games, playful exhibits, and other engaging content to entice users. 3. Communication (47%), which can be considered a form of digital signage to promote library or institutional services and marketing content. Displays can also deliver presentations and communicate scholarly work. 4. Teaching (42%), including formal and semi-formal instruction, workshops, student presentations, and student course-work showcases. 5. Events (31%), such as public tours, conferences, guest speakers, special events, galas, and other social activities near or using the display. 6. Community engagement (28%), including participation from community members through content contribution, showing local content, using the display technology as an outreach tool, and other strategies to build relationships with user communities. 7. Research (22%), where the display functions as a tool that facilitates scholarly activities like data collection, analysis, and peer review. Many study participants acknowledged challenges in using digital displays for this purpose and have identified other services that might support this use more effectively. Content Types and Management In the words of Deakin University librarians, “Content is critical, but the message is king,” so it was particularly important for the author to understand the current digital display landscape as it relates to content.30 Specifically, the research project encompassed the variety of content used on digital displays as well as how it is created, managed, shared, and received by the audiences of various organizations interviewed in this study. As can be observed in figure 2, all organizations supported 2D content, such as images, video, audio, presentation slides, and other visual and textual material. However, dynamic forms of content, such as social media feeds, interactive maps, and websites were less prevalent. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 57 https://doi.org/10.6017/ital.v37i2.9987 Figure 2. Types of content supported by digital displays in the study population. Discussions around interest in emergent, immersive, and dynamic 3D content such as games and virtual and augmented reality also came up frequently in the study interviews, and the researcher found that these types of content were supported in only 16 (57%) of the 28 total cases. This number is lower than the total number of interviewees because not all organizations interviewed had content to manage or display. In addition, many organizations recognized that they would likely be exploring ways to present 3D games or immersive environments through their digital display in the near future. Not surprisingly, the creative agencies included in this study revealed an awareness and active development of content of this nature, noting “rising demand and interest in 3D and game-like environments.” Furthermore, projects involving motion detection, the Internet of Things, and other sensor-based interactions are also seeing rise in demand, according to study participants. 100 % 61 % 57 % 0 10 20 30 40 50 60 70 80 90 100 Content types Supported Content Types Static 2D Dynamic web Dynamic 3D INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 58 Figure 3. Content management systems for digital displays. In terms of managing various types of content, 20 (71%) of the organizations interviewed had used some form of content management system (CMS), while the rest did not use any tool to manage or organize content. Of those organizations that used a CMS, 15 (75%) relied on a vendor- supplied system, such as tools by FourWinds Interactive, Visix, or NEC Live. The remaining 5 (18%) CMS users created a custom solution without going to a vendor. This finding suggests that since the majority of content supported by organizations with digital displays is 2D, current vendor solutions for managing that content are sufficient for the study population at this point. It is unclear how the rise in demand for dynamic, game-like content will be supported by vendors in the coming years. Table 1 reflects the distribution of approaches to managing content observed in the study population. 18% 11% 53% 18% 71% Content Management No system Unknown Vendor-supplied system In-house created system IT IS OUR FLAGSHIP | ZVYAGINTSEVA 59 https://doi.org/10.6017/ital.v37i2.9987 Table 1. Content management in study population Content Management Responses % Vendor supplied system 15 54 In-house created system 5 18 No system 5 18 Unknown 3 10 Middleware, Automation, and Exhibit Management Middleware can be described as the layer of software between the operating system and applications running on the display, especially in a networked computing environment. For example, most organizations studied in the environmental scan supported a Windows environment with a range of exhibit applications, like slideshows, web browsers, and executable files, such as games. Middleware can simplify and automate the process of starting up, switching between, and shutting off display applications on a set schedule. As figure 4 demonstrates, the majority of the organizations in the study population (17, or 61%) did not have a middleware solution. However, this group was heterogeneous: 14 organizations (50%) did not require a middleware solution because they ran content semi-permanently or relied on user-supplied content, in which case the display functioned as a teaching tool. The remaining three organizations (11%) manually managed scheduling and switching between exhibit content. In such cases, a middleware solution would be valuable to management of content, especially as the number of applications grows, but it was not present in these organizations. Comparatively, 10 organizations (36%) used a custom solution, such as a combination of Windows or Linux scripts to manage automation and scheduling of content on the display. One organization (3%) did not specify their approach to managing content. These findings suggest that no formalized solution to automating and managing software currently exists among the study population. In addition to organizing content, digital-exhibits services involve scheduling or automating content to meet user needs according to the time of day, special events, or seasonal relevance. As a result, the middleware technology solution supports sustainable management of displays and predictable sharing of content for end users. This environmental scan revealed that digital exhibits and interactive experiences are still in the early days of development. It is possible that new solutions for managing content both at the application and the middleware level may emerge in the coming years, but they are currently limited. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 60 Figure 4. Middleware solutions in the study population. Sources of Content When finding sources of content to be displayed on digital displays, organizations interviewed used multiple strategies simultaneously. Table 2 below brings together the findings related to this theme. Table 2. Content sources for digital exhibits Content Source % External/commissioned 64 User-supplied 64 Internal/in-house 50 Collaborative with partner 43 For example, many organizations rely on their users to generate and submit material (18, or 64%); others commission vendors to create exhibits for them (18, or 64%). In 50% of all cases, organizations also produce content for exhibits in-house. In other words, most organizations used a combination of all sources to generate content for their digital displays. Only a few use a single 61% 36% 3% Middleware Use None Custom Unknown IT IS OUR FLAGSHIP | ZVYAGINTSEVA 61 https://doi.org/10.6017/ital.v37i2.9987 source of content, such as the semi-permanent historical exhibit at Henrico County Public Library. Others, like the Duke Media Wall, rely entirely on their users to supply content, which employs a “for students by students” model of content creation. Additionally, only 12 (43%) of the organizations interviewed had explored or established some form of partnership for creating exhibits. Primarily, these partnerships existed with departments, centers, institutes, campus units, and/or students in academic settings, such as the computer science department, faculty of graduate studies, and international studies. Other examples of partnerships were with similar civic, educational, cultural, and heritage organizations, such as municipal libraries, historical societies, art galleries, museums, and nonprofits. Examples included study participants working with Ars Electronica, local symphony orchestras, Harvard Space Science, and NASA on digital exhibits. Clearly, a variety of approaches were taken in the study population to come up with digital exhibits content. Content Creation Guidelines Seven organizations (19%) in the study population shared publicly the content guidelines aimed to simplify the process of engaging users in creating exhibits. These guidelines were analyzed, and key elements were identified that are necessary for users to know in order to contribute in a meaningful way, thereby lowering the barrier to participation. These elements include resolution of the display screen(s), touch capability, ambient light around the display space, required file formats, and maximum file size. A complete list of organizations with such guidelines, along with websites where these guidelines can be found, is included in appendix C. Based on the analysis of this limited sample, the bare minimum for community participation guidelines would include clearly outlining • the scope, purpose, audience, and curatorial policy of the digital exhibits service; • the technical specifications, such as the resolution, aspect ratio, and file formats supported by the display; • the design guidelines, such as colors, templates and other visual elements; • the contact information of the digital exhibits coordinator; and • the online or email submission form. It should be noted, however, that such specifications are primarily useful when a CMS exists and the content solicited from users is at least somewhat standardized. For example, images, slides, or webpages may be easier for community partners to contribute than video games or 3D interactive content. No examples of guidelines for the latter were observed in the study. Content Scheduling Whereas the middleware section of this study examined the technical approaches to content management and automation, this section explores the frequency of exhibit rotation from a service design perspective. As can be observed in figure 5, no consistent or dominant model for exhibit scheduling has been identified in the study population. Generally, approaches to scheduling digital exhibits reflect organizational contexts. For example, museums typically design an exhibit and display it on a permanent basis, while academic institutions change displays of student work or scholarly communication once per semester. The following scheduling models have emerged in the descending order of frequency in the study population. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 62 Figure 5. Content scheduling distribution in the study population. 1. Unstructured (29%): no formal approach, policy, or expectation is identified by the organization regarding displaying exhibits. This model is largely related to the early stage of service development in this domain, lack of staff capacity to support the service, and/or responsiveness to user needs. One study participant, for example, referred to this loose approach by noting that “no formalized approach and no official policy exists.” For example, institutions may have frameworks for what types of content are acceptable but no specific requirements on the content subjects. Institutions adopting a lab space model (see figure 6) for digital displays largely belong to this category. In other words, content is created on the fly through workshops, data analysis, and other situations as needed by users. In this case, no formal scheduling is required apart from space reservations. 2. Seasonal (29%), which can be defined as a period from three to six months and includes semester-based scheduling in academic institutions. Many organizations operate on a quarterly basis, so it would seem logical that content refresh cycles reflect the broader workflow of the organization. 3. Permanent (21%): in the cases of museums, permanent exhibits may mean displaying content indefinitely or until the next hardware refresh, which might reconfigure the entire interactive display service. No specific date ranges were cited for this model. 4. Monthly (10%): this pattern was observed among academic libraries, with production of “monthly playlists” featuring curated book lists or other monthly specials. 5. Weekly (7%): North Carolina State University and Deakin University Libraries aim to have fresh content up once per week; they achieve this in part by formalizing the roles needed to support their digital display and visualization services. 29% 29% 21% 10% 7% 4% Content Scheduling Unstructured Seasonal Permanent Monthly Weekly Daily IT IS OUR FLAGSHIP | ZVYAGINTSEVA 63 https://doi.org/10.6017/ital.v37i2.9987 6. Daily (4%): only Griffith University ensures that new content is available every day on its #SeeMore display; it does this largely by relying on standardized external and internal inputs, such as weather updates and the university marketing department content. Staffing and Skills One key element of the digital exhibits research project included investigating staffing models required to support a service of this nature. Not surprisingly, the theme around resource needs for digital exhibits emerged in most interviews conducted. Several participants have noted that one “can’t just throw up content and leave it” while others advised to “have expertise on staff before tech is installed.” Data gathered shows that the average full-time equivalent (FTE) needed to support digital display services in organizations interviewed was 2.97—around three full time staff members. In addition, 74% of the organizations studied had maintenance or support contracts with various vendors, including AV integrators, CMS specialists, creative studios that produced original content, or hardware suppliers. Hardware and AV integrators typically provided a 12-month contract for technical troubleshooting while creative studios ensured a 3- month support contract for digital exhibits they designed. The average time to create an original, interactive exhibit was between 9 and 12 months according to the data provided by creative agencies, The Cube teams, and learning organizations who have in-house teams creating exhibits regularly. This length of time varies on the complexity of interaction designed, depth of the exhibit “narrative,” and modes of input supported by the exhibit application. Additionally, it was important to understand the curatorial labor behind digital exhibits; the author did not necessarily speak with the curator of exhibits, and this work may be carried out by multiple individuals within organizations with digital displays or creative studios. In 20 (57%) of the cases, the person interviewed also curated some of or all the content for the digital display in their respective institutions. In five (14%) of the cases, the individual interviewed was not a curator for any of the content, because there was no need for curation in the first place. For example, displays in these cases were used for analysis or teaching and therefore did not require prepared content. In the rest of the cases (10, or 29%), a creative agency vendor, another member of the team, or a community partner was responsible for the curation of exhibit content. This finding suggests that, while a significant number of organizations outsource the design and curation of exhibits, the majority retain control over this process. Therefore, dedicating resources to curation, organization, and management of exhibit content is deemed significant by the organizations represented in the study. In terms of the capacity to carry out digital display services, skills that have been identified by study participants as being important to supporting work of this nature include the following: 1. technical skills (such as the ability to troubleshoot), general interest in technology, and flexibility and willingness to learn new things (74%) 2. design, visual, and creative sensibility (40%), as this type of work is primarily a visual experience 3. software-development or programming-language knowledge (31%) 4. communication, collaboration, and relationship-building (25%) 5. project management (20%) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 64 6. audiovisual and media skills (14%), as digital exhibits are “as much an AV experience as an IT experience,” according to one study participant 7. curatorial, organizational, and content-management skills (11%) The most frequent dedicated roles mentioned in the interviews are shown in table 3. Table 3. Types of roles significant to digital exhibits work Position Responses % Developer/programmer 11 31 Project manager 8 23 Graphic designer 6 17 User experience or user interface designer 4 11 IT systems administrator 4 11 AV or media specialist 4 11 The relatively low percentages represented in this table suggest the distribution of skills mentioned above among various team members or combining multiple skills in a single role, as may be the case in small institutions or those without formalized services with dedicated roles. Nevertheless, the presence of specific job titles indicates understanding of various skill sets needed to run a service that uses digital displays. Challenges and Successes Many challenges were identified by study participants related to initiating and supporting a service that uses digital displays for learning. Clearly, multiple challenges could be associated with the services related to digital displays within a single organization. However, many successes and lessons learned were also shared by interviewees, often overlapping with identified challenges. This pattern suggests that some organizations can pursue strategies that address challenges faced by their library or museum colleagues while perhaps lacking resources or capacity in other areas related to this type of service. For example, some organizations have observed a lack of user engagement because of limited interactivity of the technology solution they used. Others have had successful user engagement largely by investing in technology solutions that provide a range of modes of interaction. It is important to learn from both these areas to anticipate possible pain points and to be able to capitalize on successes that lead to industry recognition and engagement from library customers. Table 4 summarized the range of challenges identified. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 65 https://doi.org/10.6017/ital.v37i2.9987 Table 4. Challenges related to digital display services Challenge Identified Responses % Technical 14 41 Content 11 33 Costs 11 33 User expectations 11 33 Workflow 10 29 Service design 9 26 Time 8 24 Organizational culture 8 24 User engagement 7 20 As reflected in table 4, several key challenges have been discussed: 1. Technical, such as troubleshooting the technology, keeping up with new technologies or upgrades, and finding software solutions appropriate for the hardware selected. 2. Content, such as coming up with original content or curating existing sources. In the words of one participant, “quality and refresh of content is key—it has to be meaningful, interesting, and new.” This clearly presents a resource requirement. 3. Costs, such as the financial commitment to the service, the unseen costs in putting exhibits together, software licensing, and hardware upgrades. 4. User expectations, such as keeping the service at its full potential, using maximum functionality of the hardware, and software solutions. According to study participants, users “may not want what they think or they say they want,” and to some extent, "such technologies are almost an expectation now, and not as exciting for users.” 5. Workflow or project-management strategies specifically related to emergent multimedia experiences that require new cycles of development and testing. 6. Time to plan, source, create, troubleshoot, launch, and improve exhibits. 7. Service design, such as thinking holistically about the functions of the technology within the larger organizational structure. As one study participant stated, organizations “cannot disregard the reality of the service being tied to a physical space” in that these types of technologies are both a virtual and physical customer experience. 8. Organizational culture and policy, in terms of adapting project-based approaches to planning and resourcing services, getting institutional support, and educating all staff about the purpose, function, and benefits of the service. 9. User engagement, particularly keeping users interested in the exhibits and continually finding new and exciting content. Various participants have found that “linger time is INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 66 between 30 seconds to few minutes” and content being displayed needs to be “something interesting, unique, and succinct, but not a destination in itself.” Despite the clear challenges with delivering digital exhibits services, organizations that participated in this study have identified keys to success (see table 5). Table 5. Successes and lessons learned in using digital displays Successful Approach or Lesson Identified Responses % User engagement and interactivity 16 47 Service design 14 41 “Wow” factor 12 35 Organizational leadership 12 35 Technology solution 10 29 Flexibility 10 29 Communication and collaboration 10 29 Project management 9 26 Team and skill sets 9 26 As reflected in table 5, several approaches have been discussed: • User engagement and interactivity, particularly for those institutions that invested in highly interactive and immersive experiences; the rewards are seen in interest and enthusiasm of their user groups. • Service design: organizations that have carefully planned the service have found that this technology was successfully serving the needs of their user communities. • Promotion and “wow factor” that has brought attention to the organization and the service. It is not surprising that digital displays are central points on tours of dignitaries, political figures, and external guests. Further, many have commented that they “did not imagine a library could be involved in such an innovative experiment,” and others have added that their digital displays have “created new conversations that did not exist before.” • Leadership and vision at the organizational level, which secures support and resources as well as defines the scope of the service to ensure its sustainability and success: “Money is not necessarily the only barrier to doing this service, but risk taking, culture.” • Technology solution, where “everything works” and both the organization and users of the service are happy with the functionality, features, and performance of the chosen solution. • Flexibility and willingness to learn new things, including being open to agile project- management methods, taking risks, and continually learning new tools, technologies, and processes as the service matures. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 67 https://doi.org/10.6017/ital.v37i2.9987 • Communication and collaboration, both internally among stakeholders and externally by building community partnerships, new audiences, and user participation in content creation. For example, one study participant noted that the technology “has contributed to giving the museum a new audience of primarily young people and families—a key objective held in 2010 at the commencement of the gallery refurbishments.” • Workflow and project management for those embracing new approaches required to bring multiple skill sets together to create engaging new exhibits. As one participant has put it, “These types of approaches require testing, improvement, a new workflow and lifecycle for the projects.” • Having the right team with appropriate skills to support the service, though this theme was rated as being less significant than designing services effectively and securing institutional support for the technology service. In other words, study participants noted that having in-house programming or design skills is not enough without proper definition of success for digital exhibits services. Perceptions Institutional and user reception of digital displays as a service to pursue in learning organizations has been identified as overwhelmingly positive, with 87% of the organizations noting positive feedback. For example, one study participant noted the positive attention received by the wider community for the digital display, stating “it is our flagship and people are in general impressed by both the potential and some of the existing content." Some participants have gone as far as to say that the reception among users has been “through the roof” and they have “never had a negative feedback comment” about their display. This finding indicates a high degree of satisfaction with such technologies by organizations that pursued a digital display. Table 6 further explores the range of perceptions observed in the study. Table 6. Perception of digital display services Perception Responses % Positive 20 87 Hesitation or uncertainty 7 30 Concerns about purpose 4 17 Concerns about user engagement 4 17 Concerns about costs 3 13 Negative 3 13 A minority (13%) have noted some negative perceptions, largely related to concerns about costs or functionality of the technology; 30% have observed uncertainty and hesitation on behalf of the staff and users in terms of engagement as well as interrogating its purpose in the organization. For example, one study participant summarizes this mixed sentiment by saying, “The perception is INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 68 that it’s really neat and worthwhile for exploring new ways of teaching, but that the same features and functions could be achieved with less (which we think is a good thing!).” It is helpful to note this trend in perception, as any new service will likely bring a mixture of excitement, hesitation, and occasional opposition. Interestingly, these reactions have originated both from the staff of organizations interviewed and their communities of users. DISCUSSION The findings from this study indicate that the functions of the digital displays are highly dependent on the organizational context in which displays exist. This context, in turn, defines the nature of the services delivered through the digital display. For example, figure 6 can be useful in classifying the various ways digital displays appear in the study population, from research and teaching-oriented lab spaces to public spaces with passive messaging or active immersive game- like digital experiences. Figure 6. Types of digital displays in the study population. As such, visualization walls might belong in the “lab spaces” category that typically appears in academic libraries or research units and do not require content planning and scheduling. What we might call “digital interactive exhibits” tend to appear in museums and galleries with a primarily public audience and may have a permanent, seasonal, or monthly rotation schedule. However, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. Despite these common concerns, the digital-exhibits services were perceived as being overwhelmingly satisfactory in all types of organizations included in this study because they brought new audiences to the organization and were often seen as “showpieces” in the broader community. The data gathered in the environmental scan demonstrates that there is currently little consistency among digital displays in learning environments. This lack of consistency is seen in content-development methods among study participants, their programming, content IT IS OUR FLAGSHIP | ZVYAGINTSEVA 69 https://doi.org/10.6017/ital.v37i2.9987 management, technology solutions, and even naming of the display (and, by extension, the display service). For example, this study revealed that no evidently “open platform” for managing content at the application or the middleware level currently exists. A small number of software tools are used by organizations to support digital displays, but their use is in no way standardized, as compared to nearly every other area of library services. There is some indication that digital- display services may become more standardized in the coming years, and more tools, solutions, vendors, and communities of practice will be available. For example, many signage CMSs are currently on the market, and the number of game-like immersive experience companies is growing, suggesting extension of these services to libraries in the coming years. Only a few software tools exist for creating exhibits, such as IntuiFace and TouchDesigner, though no free, open-source versions of exhibit software are currently available. As well, the growing number of digital exhibits and interactive media companies currently focuses on turnkey—rather than software-as-a-service or platform—solutions. In contrast, some consistency exists in staffing needs and skills required to support the digital- exhibits service. A majority of organizations interviewed agreed that design, software development, systems administration, and project-management skills are needed to ensure digital-exhibits services run sustainably in a learning organization. In addition, lack of public library representation in this study makes it challenging to draw parallels to the library context. Adapting museum practices is also not necessarily reliable, as there is rarely a mandate to engage communities and partner on content creation, as there is in libraries. For example, only the El Paso (Texas) Museum of History engages the local community to source and organize content. These findings suggest that digital displays are a growing domain, and more solutions are likely to emerge in the coming years. The Cube, compared to the rest of the study population, is a unique service model because it successfully brings together most elements examined in the environmental scan. For example, to ensure continual engagement with the digital display, The Cube schedules exhibits on a regular basis and employs user interface designers, systems administrators, software engineers, and project managers. It also extends the content through community engagement, public tours, and STEM programming. It has created an in-house middleware solution to simplify exhibit delivery and has chosen Unity3D as its platform of choice for exhibit development. LIMITATIONS Only organizations from English-speaking countries were interviewed as part of the environmental scan. It is therefore unclear if access to organizations from non–English-speaking countries would have produced new themes and significantly different results. In addition, as with all environmental scans, the data is limited by the degree of understanding, knowledge, and willingness to share information of the individual being interviewed. Particularly, individuals with whom the author spoke may or may not have been technology or service leads for the digital display at their respective institutions. Thus, the study participants had a range of understanding of hardware specifications, functionality, and service-design components associated with digital displays. For example, having access to technology leads would have likely provided more nuanced responses around the middleware solutions and the underlying technical infrastructure required to support this service. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 70 A small number of vendors were also interviewed as part of the environmental scan even though vendors did not necessarily have digital displays or service models parallel to libraries or museums. They are included in appendix B. Nevertheless, gathering data from this group was deemed relevant to the study, as creative agencies have formalized staffing models and clearly identified skill sets necessary to support services of this nature. In addition, this group possesses knowledge of best practices, workflows, and project-management processes related to exhibit development. Finally, this environmental scan also did not capture any interaction with direct users of digital displays, whose experiences and perceptions of these technologies may or may not support the findings gathered from the organizations interviewed. These limitations were addressed by increasing the sample size of the study within the time and resource constraints of the research project. CONCLUSION The findings of this study show that the functions of digital-display technologies and their related services are highly dependent on the organizational context in which they exist. However, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. Despite these common concerns, digital displays were perceived as being overwhelmingly positive in all types of organizations interviewed in this study, as they brought new audiences to the organization and were often seen as “showpieces” in the broader community. The successes and lessons learned from the study population are meant to provide a broader perspective on this maturing domain as well as help inform planning processes for future digital exhibits in learning organizations. IT IS OUR FLAGSHIP | ZVYAGINTSEVA 71 https://doi.org/10.6017/ital.v37i2.9987 APPENDIX A. ENVIRONMENTAL SCAN QUESTIONS Digital Exhibits Environmental Scan Interview Questions—Museums, Libraries, Public Organizations 1. What are the technical specifications of the digital interactive technology at your institution? 2. Who are the primary users of this technology (those interacting with the platform)? Is there anyone you thought would use it and isn’t? 3. What are primary uses for the technology (events, presentations, analysis, workshops)? 4. What types content is supported by the technology (video, images, audio, maps, text, games, 3D, all of the above?) 5. Where is content created and how is this content managed? 6. What is the schedule for the content and how is it prioritized? 7. Can you estimate the FTE (full-time equivalent) of staff members involved in supporting this technology/service, both directly and indirectly? What does indirect support for this technology entail? 8. In your experience, what kinds of skills are necessary in order to support this service? 9. Have partnerships with other organizations producing content to be exhibited been established or explored? 10. What challenges have you encountered in providing this service? 11. What have been some keys to the successes in supporting this service? 12. What has been the biggest success of this service and what has been the biggest disappointment? 13. What is the perception of this technology in institution more broadly? 14. Are there any other institutions you suggest we contact to learn more about similar technologies? INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 72 Digital Exhibits Environmental Scan Interview Questions: Vendors 1. What is the relationship between creative studio and hardware/fabrication? Do you do everything or work with AV integrators instead to put together touch interactives? 2. Who have been the primary users of the interactive exhibits and projects you have completed? 3. Who writes the use cases when creating a digital interactive exhibit? 4. What types content is supported by the technology (video, images, audio, maps, text, games, 3D, all of the above?) Do you see a rise in interest for 3D and game-like environments and do you have internal expertise to support it? 5. Where is content created for the exhibits and how is this content managed? Who curates? 6. What timespan or lifecycle do you design for? 7. How big is your team? How long to projects typically take to create? 8. What types of expertise do you have in house? What might a project team look like? 9. To what extent is there a goal of sharing knowledge back with the company from clients or users? 10. What challenges have you encountered in providing this service? 11. What have been some keys to the successes in supporting this service? IT IS OUR FLAGSHIP | ZVYAGINTSEVA 73 https://doi.org/10.6017/ital.v37i2.9987 APPENDIX B: STUDY POPULATION IN ENVIRONMENTAL SCAN Organization Location Date Interviewed All Saints Anglican School Merrimac, Australia July 25, 2016 Anode Nashville, TN July 22, 2016 Belle & Wissell Seattle, WA July 26, 2016 Bradman Museum Bowral, Australia July 10, 2016 Brown University Library Providence, RI June 3, 2016 University of Calgary Library and Cultural Resources Calgary, AB June 2, 2016 Deakin University Library Geelong, Australia June 14, 2016 University of Colorado Denver Library Denver, CO June 24, 2016 Duke University Library Durham, NC August 17, 2016 El Paso Museum of History El Paso, TX June 24, 2016 Georgia State University Library Atlanta, GA June 10, 2016 Gibson Group Wellington, New Zealand July 16, 2016 Henrico County Public Library Henrico, VA August 9, 2016 Ideum Corrales, NM July 26, 2016 Indiana University Bloomington Library Bloomington, IN May 31, 2016 Interactive Mechanics Philadelphia, PA August 2, 2016 Johns Hopkins University Library Baltimore, MD June 20, 2016 Nashville Public Library Nashville, TN July 22, 2016 North Carolina State University Library Raleigh, NC June 8, 2016 University of North Carolina atChapel Hill Library Chapel Hill, NC June 2, 2016 University of Nebraska Omaha Omaha, NE June 16, 2016 Omaha Do Space Omaha, NE July 11, 2016 University of Oregon Alumni Center Eugene, OR June 7, 2016 Philadelphia Museum of Art Philadelphia, PA August 10, 2016 Queensland University of Technology Brisbane, Australia June 30; July 29, 2016; August 16, 2016 Société des Arts Technologiques Montreal, QC August 8, 2016 Second Story Portland, OR July 28, 2016 St. Louis University St. Louis, MO July 4, 2016 Stanford University Library Stanford, CA July 22, 2016 University of Illinois at Chicago Chicago, IL June 22, 2016 University of Mary Washington Fredericksburg, VA July 7, 2016 Visibull Waterloo, ON August 12, 2016 University of Waterloo Stratford Campus Stratford, ON June 22, 2016 Yale University Center for Science and Social Science Information New Haven, CT July 13, 2016 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 74 APPENDIX C: DIGITAL CONTENT PUBLISHING GUIDELINES Organization Name Guidelines Website Deakin University Library http://www.deakin.edu.au/library/projects/sparking-true- imagination Duke University https://wiki.duke.edu/display/LMW/LMW+Home Griffith University https://intranet.secure.griffith.edu.au/work/digital- signage/seemore North Carolina State University Library http://www.lib.ncsu.edu/videowalls University Colorado Denver http://library.auraria.edu/discoverywall University of Calgary Library and Cultural Resources http://lcr.ucalgary.ca/media-walls University of Waterloo Stratford Campus https://uwaterloo.ca/stratford-campus/research/christie- microtiles-wall http://www.deakin.edu.au/library/projects/sparking-true-imagination http://www.deakin.edu.au/library/projects/sparking-true-imagination https://wiki.duke.edu/display/LMW/LMW+Home https://intranet.secure.griffith.edu.au/work/digital-signage/seemore https://intranet.secure.griffith.edu.au/work/digital-signage/seemore http://www.lib.ncsu.edu/videowalls http://library.auraria.edu/discoverywall http://lcr.ucalgary.ca/media-walls https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall IT IS OUR FLAGSHIP | ZVYAGINTSEVA 75 https://doi.org/10.6017/ital.v37i2.9987 REFERENCES 1 Flora Salim and Usman Haque, “Urban Computing in the Wild: A Survey on Large Scale Participation and Citizen Engagement with Ubiquitous Computing, Cyber Physical Systems, and Internet of Things,” International Journal of Human-Computer Studies 81 (September 2015): 31–48, https://doi.org/10.1016/j.ijhcs.2015.03.003. 2 Peter Peltonen et al., “It’s Mine, Don't Touch! Interactions at a Large Multi-touch Display in a City Center,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Florence, Italy, April 5–10, 2008, 1285–94, https://doi.org/10.1145/1357054.1357255. 3 Shawna Sadler, Mike Nutt, and Renee Reaume, “Managing Public Video Walls in Academic Library,” (presentation, CNI Spring 2015 Meeting, Seattle, Washington, April 13-14, 2015), http://dro.deakin.edu.au/eserv/DU:30073322/sadler-managing-2015.pdf. 4 Peltonen et al., “It’s Mine, Don't Touch!” 5 John Brosz, E. Patrick Rashleigh, and Josh Boyer. “Experiences with High Resolution Display Walls in Academic Libraries” (presentation, CNI Fall 2015 Meeting, Washington, DC, December 13-14, 2015), https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf; Bryan Sinclair, Jill Sexton, and Joseph Hurley, “Visualization on the Big Screen: Hands-On Immersive Environments Designed for Student and Faculty Collaboration” (presentation, CNI Spring 2015 Meeting, Seattle, Washington, April 13–14, 2015), https://scholarworks.gsu.edu/univ_lib_facpres/29/. 6 Niels Wouters et al., “Uncovering the Honeypot Effect: How Audiences Engage with Public Interactive Systems. Conference on Designing Interactive Systems,” DIS ’16 Proceedings of the 2016 ACM Conference on Designing Interactive Systems, Brisbane, Australia, June 4–8, 2016, 5- 16, https://doi.org/10.1145/2901790.2901796. 7 Gonzalo Parra, Joris Klerkx, and Erik Duval, “Understanding Engagement with Interactive Public Displays: An Awareness Campaign in the Wild,” Proceedings of the International Symposium on Pervasive Displays, Copenhagen, Denmark, June 3–4, 2014, 180–85, https:/doi.org/10.1145 /2611009.2611020; Ekaterina Kurdyukova, Mohammad Obaid, and Elisabeth Andre, “Direct, Bodily or Mobile Interaction?,” Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia, Ulm, Germany, December 4–6, 2012, https://doi.org/10.1145 /2406367.2406421; Tongyan Ning et al., “No Need to Stop: Menu Techniques for Passing by Public Displays,” Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems, Vancouver, British Columbia, https://www.gillesbailly.fr/publis/BAILLY_CHI11.pdf. 8 Jung Soo Lee et al., “A Study on Digital Signage Interaction Using Mobile Device,” International Journal of Information and Electronics Engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/IJIEE.2015.V5.566. Jung Soo Lee et al., “A Study on Digital Signage Interaction Using Mobile Device,” International Journal of Information and Electronics Engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/IJIEE.2015.V5.566. 9 Parra et al, “Understanding Engagement,” 181. https://doi.org/10.1016/j.ijhcs.2015.03.003 https://doi.org/10.1145/1357054.1357255 http://dro.deakin.edu.au/eserv/DU:30073322/sadler-managing-2015.pdf https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf https://scholarworks.gsu.edu/univ_lib_facpres/29/ https://doi.org/10.1145/2901790.2901796 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2406367.2406421 https://doi.org/10.1145/2406367.2406421 https://www.gillesbailly.fr/publis/BAILLY_CHI11.pdf https://doi.org/10.7763/IJIEE.2015.V5.566 https://doi.org/10.7763/IJIEE.2015.V5.566 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2018 76 10 Parra et al, “Understanding Engagement,” 181; Walter, Robert, Gilles Gailly, and Jorg Müller. “StrikeAPose: revealing mid-air gestures on public displays.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, April 27-May 2, 2013, 841- 850. https://doi.org/10.1145/2470654.2470774. 11 Philipp Panhey et al., “What People Really Remember: Understanding Cognitive Effects When Interactive with Large Displays,” Proceedings of the 2015 International Conference on Interactive Tabletops & Surfaces, Madeira, Portugal, November 15–18, 2015, 103–6, https://doi.org/10.1145/2817721.2817732. 12 Christopher Ackad et al., “An In-the-Wild Study of Learning Mid-air Gestures to Browse Hierarchical Information at a Large Interactive Public Display,” Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, September 7–11, 2015, 1227–38, https://doi.org/10.1145/2750858.2807532. 13 Parra et al, “Understanding Engagement,” 181; Kurdyukova, Obaid and Andre, 2012, n.p. 14 Jouni Vepsäläinen et al., “Web-Based Public-Screen Gaming: Insights from Deployments,” IEEE Pervasive Computing 15 no. 3 (2016): 40–46, https://ieeexplore.ieee.org/document/7508836/. 15 Uta Hinrichs, Holly Schmidt, and Sheelagh Carpendale, “EMDialog: Bringing Information Visualization into the Museum,” IEEE Transactions on Visualization and Computer Graphics 14 no. 6 (November 2008):1181-1188. https://doi.org/10.1109/TVCG.2008.127. 16 Hinrichs, Schmidt, and Carpendale, “EMDialog.” 17 Sarah Clinch et al., “Reflections on the Long-term Use of an Experimental Digital Signage System,” Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, September 17-21, 2011, 133-142. https://doi.org/10.1145/2030112.2030132. 18 Elaine M. Huang, Anna Koster, and Jan Borchers. “Overcoming Assumptions and Uncovering Practices: When Does the Public Really Look at Public Displays?,” Proceedings of the 6th International Conference on Pervasive Computing, Sydney, Australia, May 19-22, 2008, 228-243. https://doi.org/10.1007/978-3-540-79576-6_14; Jorg Muller et al., “Looking glass: a field study on noticing interactivity of a shop window,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, Texas, May 5-10, 2012, 297-306. https://doi.org/10.1145/2207676.2207718. 19 Salim & Haque, “Urban Computing in the Wild,” 35 20 Mettina Veenstra et al., “Should Public Displays Be Interactive? Evaluating the Impact of Interactivity on Audience Engagement,” Proceedings of the 4th International Symposium on Pervasive Displays, Saarbruecken, Germany, June 10–12, 2015, 15–21, https://doi.org/10.1145/2757710.2757727. 21 Clinch et al., “Reflections.” https://doi.org/10.1145/2470654.2470774 https://doi.org/10.1145/2817721.2817732 https://doi.org/10.1145/2750858.2807532 https://ieeexplore.ieee.org/document/7508836/ https://doi.org/10.1109/TVCG.2008.127 https://doi.org/10.1145/2030112.2030132 https://doi.org/10.1007/978-3-540-79576-6_14 https://doi.org/10.1145/2207676.2207718 https://doi.org/10.1145/2757710.2757727 IT IS OUR FLAGSHIP | ZVYAGINTSEVA 77 https://doi.org/10.6017/ital.v37i2.9987 22 Robert Ravnik and Franc Solina, “Audience Measurement of Digital Signage: Qualitative Study in Real-World Environment Using Computer Vision,” Interacting with Computers 25, no. 3 (2013), https://doi.org/10.1093/iwc/iws023. 23 Neal Buerger, “Types of Public Interactive Display Technologies and How to Motivate Users to Interact,” Media Informatics Advanced Seminar on Ubiquitous Computing, 2011, Hausen, Doris, Conradi, Bettina, Hang, Alina, Hennecke, Fabiant, Kratz, Sven, Lohmann, Sebastian, Richter, Hendrik, Butz, Andreas and Hussmann, Heinrich (eds). University of Munich, Department of Computer Science, Media Informatics Group, 2011. https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf . 24 C. G. Screven, “Information Design in Informal Settings: Museums and other Public Spaces,” in Information Design, ed. Robert E. Jacobson (Cambridge, MA: MIT Press, 2000), 131–192. 25 Parra et al., “Understanding Engagement,” 181. 26 Uta Hinrichs and Sheelagh Carpendale, “Gestures in the wild: Studying Multi-touch Gesture Sequences on Interactive Tabletop Exhibits,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vancouver, British Columbia, May 7–12, 2011, 3023–32, https://doi.org/10.1145/1978942.1979391. 27 Harry Brignull and Yvonne Rogers, “Enticing People to Interact with Large Public Displays in Public Spaces,” INTERACT ’03, Proceedings of the International Conference on Human-Computer Interaction, Zurich, Switzerland, September 1-5, 2003, 17-24, Matthias Rauterberg, Marino Menozzi, and Janet Wesson (eds.), Tokyo: IOS Press, 2003. http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/INTERACT200 3-p17.pdf. 28 Peltonen et al., “It’s Mine, Don't Touch!” 29 Peltonen et al., “It’s Mine, Don't Touch!” 30 Anne Horn, Bernadette Lingham, and Sue Owen, “Library Learning Spaces in the Digital Age,” Proceedings of the 35th Annual International Association of Scientific and Technological University Libraries Conference, Espoo, Finland, June 2-5, 2014. http://docs.lib.purdue.edu/iatul/2014/libraryspace/2. https://doi.org/10.1093/iwc/iws023 https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf https://doi.org/10.1145/1978942.1979391 http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/INTERACT2003-p17.pdf http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/INTERACT2003-p17.pdf http://docs.lib.purdue.edu/iatul/2014/libraryspace/2 ABSTRACT Introduction Method Literature Review Definitions Interactivity User Engagement Age Display Content Social Context Findings Technical and Hardware Landscape Users and Use Cases Figure 1. Audience types for digital displays in the study population. Content Types and Management Middleware, Automation, and Exhibit Management Sources of Content Content Creation Guidelines Content Scheduling Staffing and Skills Challenges and Successes Perceptions Discussion Figure 6. Types of digital displays in the study population. Limitations Conclusion Appendix A. ENVIRONMENTAL SCAN QUESTIONS Digital Exhibits Environmental Scan Interview Questions—Museums, Libraries, Public Organizations Digital Exhibits Environmental Scan Interview Questions: Vendors Appendix B: Study Population in Environmental Scan Appendix C: Digital Content Publishing Guidelines References