germain.p65 URLs : Uniform Resource Locators or Unreliable Resource Locators 359 URLs: Uniform Resource Locators or Unreliable Resource Locators Carol Anne Germain As the use of citing electronic World Wide Web sites grows, the question arises as to whether this practice has scholarly limitations due to the fact that uniform resource locators (URLs) often become inaccessible. This research studies the accessibility of sixty-four URLs cited in thirty- one academic journal articles. Results of this longitudinal study found an increasing decline in the availability of URL citations. en years ago, most people had no idea that the Internet ex­ isted. Today, it is used daily by millions of people who access it for a variety of reasons. Some use it to connect with friends and family; others use it for entertainment purposes (jokes, sports and freebies); and still others use it for research. Students approaching a library reference desk often insist that the Internet be used to locate information for papers, projects, and other academic as­ signments. Many journal articles, includ­ ing refereed articles, contain citations to Internet sources. Despite the popularity of Internet citations, we still may ques­ tion the integrity of this practice. How often have we tried to link to uniform or universal resource locators (URLs) only to find a “404 NOT FOUND” or other messages denying access? These warn­ ings let us know that the information we came to access is no longer accessible at this site. The information may have been moved to another site, equipment may be down, or the information may have been removed completely. This is frustrating because cited references need to be acces­ sible and persistent. Citations provide the reader with an outline of the works an author has consulted to develop an article, conference paper, monograph, or other scholarly study. After a review of the im­ portance of permanence as a feature of academic citation, this paper presents evidence of the impermanence of actual URL citations. The Role of the Citation What is the purpose of a citation? Why is this erudite mechanism so important? The Oxford English Dictionary defines the verb to cite as “to quote (a passage, book, or author) generally with implication of ad­ ducing as an authority.”1 Authority fur­ nishes credibility to the written piece. Ci­ tation allows the reader to reference other works the author has cited. The reader then has the ability to verify a quotation, check the semantic connection, or confirm whether the author has included all of the materials and statistics of a study. In a sense, “citation keeps you honest.”2 It is essential that the academic community be able to rely on and utilize the studies, ar­ guments, and findings of other scholars. Carol Anne Germain is the Networked Resources Education Librarian at the State University of New York at Albany ( SUNY ); e-mail: cg219@csc.albany.edu. 359 mailto:cg219@csc.albany.edu 360 College & Research Libraries July 2000 Citation also provides the ability to acknowledge the works of others that support a piece of research. When using the materials of others, citation offers the opportunity to recognize the cited author. “A paper that conforms to the norms of scholarly perfection would explicitly cite every past publication to which it owes an intellectual debt.”3 One of the most important functions of the citation is that it links the written work into a much larger community. When the novice physicist uses Einstein’s theories to uphold an argument, a con­ nection is established between significant works of the past and works of the present. Other physicists will evaluate this work and reach conclusions as to whether it is an addition to the field. Every time a scholar presents a re­ view of the literature in her area of inquiry, or writes a bibliographic es­ say, or incorporates another writer’s words or ideas to advance her own thesis, she maps the field of her dis­ cipline. She draws the boundaries, circumscribes the territory of her field of discourse, and determines who else is within and who is with­ out.4 In other words, “she” makes herself a part of a much larger community. This community promotes intellectual growth that may, in turn, stimulate the develop­ ment of new medicines and cures, novel writing techniques, or breakthroughs in technology. The dialogue that is encour­ aged with the usage of citation encour­ ages a learned fellowship. Thirty-one randomly chosen academic journal articles, containing sixty-four citations with URLs, were reviewed. Henry Small, when describing why an author or scientist cites another text, re­ ferred to the citation as a “symbol.” These “symbols of concepts or methods” func­ tion as connections to earlier works that an author-researcher has embedded as a reference in his or her writings. “This leads to the citing of works which embody ideas the author is discussing. The cited documents become, then, in a general sense, ‘symbols’ for these ideas.”5 Blaise Cronin summed up the need for a theory of citing very eloquently: Metaphorically speaking, citations are frozen footprints on the landscape of scholarly achievement; footprints that bear witness to the passage of ideas. From footprints it is possible to deduce direc­ tion; from the configuration and depth of the imprints it should be possible to con­ struct a picture of those who have passed by, whilst the distribution and variety furnish clues as to whether the advance was orderly and purposive.6 The Persistence of Citations An important feature of scholarly links is that they are available indefinitely. It is imperative that cited materials be acces­ sible and not ephemeral. Phyllis Franklin, executive director of the Modern Lan­ guage Association, stated that “the M.L.A. has concluded that scholarship depends on getting back to a source.”7 The re­ searcher depends on cited work as a col­ laboration of ideas. If the locations of ideas that substantiate the author ’s work no longer exist, the foundation of their work is in question. To assume that all cited works are eas­ ily obtainable is naive.8 Fugitive material and grey literature are found in written works, the former being pamphlets, pro­ grams, and other literature published (not always officially) in small quantities and often produced for one-time use.9 Mate­ rials such as these are almost impossible to retrieve and thus are generally not cited. Grey literature is literature that can­ not normally be purchased through book­ sellers. Examples of these types of mate­ rials include conference proceedings, trade brochures, preprints, technical re­ ports, dissertations, and government agency publications. It is often difficult to acquire these materials and frequently takes some skill to do so. The National URLs : Uniform Resource Locators or Unreliable Resource Locators 361 Technical Information Service (NTIS) pro­ vides access to technology reports, while Bell and Howell, formerly University Microfilms International (UMI), places dissertations on microform and numer­ ous trade associations archive their pro­ fessions’ literature. Although it is difficult to work through resources such as NTIS and Bell and Howell, one is assured that their materials are retrievable. One of the reasons for this assurance is that various institutions and organizations, such as government agencies, union affiliations, and academic institutions, have respon­ sibility for maintaining and preserving the materials. With the emergence of the Internet and Internet publishing, individuals and in­ stitutions in increasing numbers are authoring and posting papers and stud­ ies on this electronic medium. One of the complications with this type of publica­ tion is that there is no guarantee that these works will be perpetually available. “Es­ timates put the average lifetime for a URL (the Web site location) at forty-four days.”10 A longitudinal study undertaken by Wallace Koehler reviewed the persis­ tence of 361 randomly chosen Web sites and Web pages over one year. Results of this study found that 110 (31%) of the Web sites and Web pages failed to respond at the final test.11 This electronic environ­ ment, though very exciting and stimulat­ ing, also is quite volatile. The academic world should be con­ cerned about the citation of documents that are located on the Internet. When users try to retrieve electronic sources listed in the citing publication, they often do not find the references but, instead, are faced with an “error” message. It is un­ fortunate, but documents found within the electronic setting have the character­ istic of lacking permanency.12 “URLs change at the whim of hardware reconfiguration, file system reorganiza­ tion, or changes in organizational struc­ ture, leaving users in 404 Limbo.”13 “The Internet’s holdings change every minute of the day.” Students and researchers find that materials on the information super­ highway can disappear “with the touch of a Webmaster ’s delete key.”14 In its style manual, the American Psychological As­ sociation warns those who use online in­ formation: The researcher has immediate ac­ cess to a wealth of information but must consider the reader ’s access to that material: Will the information be available to the reader even if the reader follows a given retrieval path, or will the material soon be archived to tape and difficult to ob­ tain? Is the information widely ac­ cessible or accessible only on a campus’s local network? This pub­ lication recommends that if the same data is available in both print and electronic formats then the writer should use the “preferred print version.”15 Methodology The following study was undertaken to investigate the reliability of URLs in aca­ demic citation. Thirty-one randomly cho­ sen academic journal articles, containing sixty-four citations with URLs, were re­ viewed. The academic journals used were from a variety of disciplines. Thirteen ci­ tations were from information and library science, ten from the hard sciences, sev­ enteen from computer science, eleven from the humanities, and thirteen from the social sciences. The printed journals were published between 1995 and 1997. To verify the persistence of the URL citations, each address was accessed to see if the site was currently active. Using a Netscape browser, the URL address was logged into the Netscape “open” window. Over a three-year period (1997–1999), this procedure was conducted once a month for three consecutive months (February, March, and April). This was to determine if each cited site still existed. Three dif­ ferent access days were used each year to insure against temporary interruptions. Reasons for denied access might include that the URL’s host computer was down, that a Web site was not being worked on http:permanency.12 362 College & Research Libraries July 2000 to either relocation or removal TABLE 1 of the site.16 “File Not Found”Availability of Cited URLs is similar in nature and means that the user has reached the Not host computer, but the host can-Accessible Accessible % Unavailable 1997 17 47 1998 24 40 1999 31 33 and unreachable, or that too much traffic on the Internet caused a time-out. Each of the nine testings was conducted be­ tween 8 a.m. and 9 p.m. The content of the Web site, update information, and style format were not reviewed. In certain circumstances, some effort was made to access a site if a spell­ ing error or misprint seemed to be within the URL. This included omitting periods where the publisher added them as style; omitting hyphens at the end of a line, within a URL; and adding a top-level domain, such as edu, to the domain name where it seemed to be absent. Only direct URL searching was done; no attempt was made to use Internet search engines to find the cited materials. Some may say that Internet search engines provide help with locating sites, but these tools are neither authoritative nor exhaustive. In this paper, persistence of a URL ci­ tation is understood as the ability to ac­ cess a cited URL containing the Web site with the identical title of the cited work. If an index or search tool was retrieved that linked to the cited work, the URL ci­ tation also was considered persistent. Ci­ tations containing URLs that accessed a host site, but not the cited file, were not regarded as persistent. URL citations that had moved to a new URL and contained the same title/author were appraised as persistent. When an Internet site cannot be ac­ cessed, a variety of error messages may appear. The error message “404 Not Found” appears when Netscape cannot locate the specified Web site. This is due 26.5 37.5 48.4 not find the requested Web site file. The “Not Found ” error message gives the user a vari­ ety of reasons for not being able to connect to the desired docu­ ment. “Unable to Locate Server,” “Socket Error,” and “No Response” are error mes­ sages resulting from not being able to con­ nect to the remote server. This may occur when the remote server is either too busy or no longer in existence. Generally, re­ mote computers only send error mes­ sages. Unless instructed, they give no for­ warding address or other indication of the materials location. Results It is assumed that all of the URLs found in the cited works were active Internet sites when they were cited originally. Within each test year, the results did not vary significantly over the three monthly samples; however, results of annual com­ parisons did produce variability. After checking for persistence of the sixty-four citations, seventeen (26.5%) could not be accessed in 1997. In 1998, twenty-four (37.5%) could not be accessed and thirty- one (48.4%) could not be reached in 1999. As table 1 shows, availability of cited URLs declined about 11 percent annually. A review of the error messages shows that “Not Found” notices appeared nine times in 1997 and 1998 and thirteen times in the final test. Server errors were re­ trieved five times in 1997 and twelve times in both 1998 and 1999. Messages in­ dicating relocation appeared three times in the first two years and six times in the final year (see table 2). This decline in availability of cited URLs had a dramatic impact on the origi­ nal articles from which these citations were drawn. Of the thirty-one original source articles, in 1997, twelve (38.7%) contained inaccessible citations; in 1998, seventeen (54%) had citations that could URLs : Uniform Resource Locators or Unreliable Resource Locators 363 TABLE 2 Review of Error Messages not be retrieved; and in the last year, twenty-one (67.7%) con­ tained citations that could not be found (see table 3). Conclusion After a three-year period, almost 1997 1998 1999 "Not Found" 9 9 13 Server Errors 5 12 12 RelocatedlUnavailable 3 3 6 50 percent of the URL citations could not be accessed and two- thirds of the journal articles contained corroded citations. How can this pro­ found loss of academic citation be ex­ plained? Originally, some of the URL citations may have contained misspellings, incor­ rect domain names, or punctuation errors. Computer software requires meticulous input and is unforgiving when encoun­ tering any text or syntax error. Further decline in the accessibility of the tracked URL citations may be attributed to the vast changes in computer and institu­ tional infrastructures. A researcher mov­ ing to another job, the purchase of a new server, or the restructuring of an academic department may change the location of a computer file and its URL. Thus, the cited URL is rendered inaccessible. Print resources have authoritative in­ dexes and finding aids to locate hard-to­ find citations. When an author cites the incorrect volume number, the correct one can be found in a variety of sources. The Internet does not have comparative tools. Some may say that Internet search en­ gines provide help with locating sites, but these tools are neither authoritative nor exhaustive. An assortment of solutions for preserv­ ing Internet materials has been initiated. In the United States, Brewster Kahle and TABLE 3 a small group of technical professionals have started a project called the Internet Archive. Over a number of years, they have taken a “snapshot” of Web pages found on the Internet.17 Although this is a noteworthy project, there is no assur­ ance that these records will be maintained in the future. Without adequate finding aids, it will be impossible to access infor­ mation from a snapshot. Other efforts to preserve materials found on the Internet are being developed by OCLC. This vast library consortium is working on numerous projects that in­ volve the cataloging and archiving of re­ sources found on the Internet. InterCAT, a project funded by the U.S. Department of Education, is one such endeavor. With the effort of libraries and institutions of higher education, the creation, implementation, testing, and evaluation of a searchable da­ tabase of USMarc records that contain elec­ tronic location and access information has been initiated.18 “This is the most tradi­ tional library-type approach to finding material on the web.”19 InterCAT uses vol­ unteers to catalog electronic sites found on the Internet. To date, this catalog contains more than 70,000 records.20 Another project undertaken by OCLC is implementation of persistent uniform resource locators (PURLs). A PURL is a record of URL sites that individuals or institutions have reg­ istered with OCLC. Annual Comparison of Journal Articles “Instead of pointingContaining Inaccessible URL Citations directly to an Internet location, a PURL# Articles Containing % Containing Inaccessible points to an interme-Inaccessible Citations URL Citations 1997 12 38.7 diate resolution ser­ 1998 17 54.0 vice that maintains a 1999 21 67.7 database linking the PURL to its current http:records.20 http:initiated.18 http:Internet.17 364 College & Research Libraries July 2000 URL and returning that URL to the user.”21 This trial, though, may be a “short-term experiment or a long-term solution.”22 It is ironic that a utility called “persis­ tent” could be part of a “short-term ex­ periment.” Persistence qualifies endur­ ance. Endurance is essential when dis­ cussing materials that are to be cited. For the scholarly community to retain its in­ tegrity, standards must be set to ensure that cited works are retrievable. In the past, this has not been such a consequen­ tial issue. Printed materials have been bought, stored, and archived in libraries for hundreds of years. It seems unlikely that a published book or journal could not be found in some library or archive in the world. Electronic data hold no such prom­ ise. With the average life span of an Internet file being less than two months, how many data and materials already have been lost? Not one of the sixty-four citations re­ viewed in this study was a PURL. All of the articles were published in academic journals and written by members of the scholarly community. At the final testing, twenty-one of the thirty-one articles con­ tained citations that could not be accessed. Whether this information is available in parallel print sources is unknown. None­ theless, it is frightening to think that the substructure of the intellectual commu­ nity is relying on a medium that is so vola­ tile. The Internet is a very provocative en­ vironment. It provides the ability to con­ nect, communicate, and share with mem­ bers of many disciplines. However, this useful tool needs, at this point, to be viewed as a medium for exchange rather than as a library. Until there is some se­ cure means of accessing data continu­ ously from this resour ce, using the Internet as a virtual depository of cited materials is indefensible. Academic cita­ tions need to be reliable and accessible, and URL citations are not. Students and scholars should proceed with caution and utilize sources that endure. Notes 1. The Oxford English Dictionary, vol. 3. (New York: Clarendon Pr., 1989), 248. 2. Mary-Claire Van Leunen, A Handbook for Scholars, rev. ed. (New York: Oxford University Pr., 1992), 9. 3. Manfred Kochen, “How Well Do We Acknowledge Intellectual Debts?” Journal of Docu­ mentation 43 (Mar. 1987): 54–64. 4. Shirley Rose, “Citation Rituals in Academic Cultures” (paper presented at the annual meet­ ing of the Conference on College Composition and Communication, Seattle, Mar. 16–18, 1989), ERIC ED 309 434, microfiche. 5. Henry Small, “Cited Documents as Concept Symbols,” in Social Studies of Science, vol. 8 (Beverly Hills, Calif.: SAGE, 1978), 327–40. 6. Blaise Cronin, The Citation Process: The Role and Significance of Citations in Scientific Commu­ nication (London: Taylor, 1984), 25. 7. Lisa Guernsey, “Cyberspace Citations,” Chronicle of Higher Education 42 (Jan. 12, 1996):A18– 21. 8. Charles Auger, Information Sources in Grey Literature (New Providence, N.J. : Bowker-Saur, 1994), 3. 9. Leonard Montague Harrod, Harrod’s Librarians’ Glossary of Terms Used in Librarianship, Docu­ mentation and the Book Crafts and Reference Book, comp. Ray Prytherch (Brookfield, Vt.: Gower Publishing, 1990), 263. 10. Brewster Kahle, “Preserving the Internet,” Scientific American 276 (Mar. 1997): 82–83. 11. Wallace Koehler, “An Analysis of Web Page and Web Site Constancy and Permanence,” Journal of the American Society for Information Science 50 (Feb. 1999): 162–80. 12. Corrinne Jorgensen and Peter Jorgensen, “Citations in Hypermedia: Maintaining Critical Links,” College & Research Libraries 52 (Nov. 1991): 528–36. 13. K. E. Shafer, S. L. Weible, and E. Jul, “The PURL Project,” Annual Review of OCLC Research (1996): 25–26. 14. Michael A. Arnzen, “Cyber Citations: Documenting Internet Sources Presents Some Thorny Problems,” Internet World 7 (Sept. 1996): 2–4. URLs : Uniform Resource Locators or Unreliable Resource Locators 365 15. Publication Manual of the American Psychological Association (Washington, D.C.: American Psychological Association, 1994), 218. 16. Netscape 2 Simplified (Foster City, Calif.: IDG Books Worldwide, 1996), 35. 17. Kahle, “Preserving the Internet,” 82. 18. Jeanette Woodward, “Cataloging and Classifying Information Resources on the Internet,” Annual Review of Information Science and Technology 31 (1996): 189–220. 19. Pat L Ensor, “Libraryland Organizes the Web: An Unnatural Process?” Technicalities 15 (Nov. 1995): 9–11. 20. Norm Medeiros, “Making Room for MARC in a Dublin Core World,” Online 23 (Nov./ Dec. 1999): 57–60. 21. Jennifer L. Marill, “A Survey of Standards for Identifying Serial Items on the Internet,” Acquisitions Librarian 21 (1999): 83–91. 22. Karen Schneider, “Cataloging Internet Resources: Concerns and Caveats,” American Li­ braries 28 (Mar. 1997): 57.