teper.p65 Long-term Retention of Electronic Theses and Dissertations 61 61 Long-term Retention of Electronic Theses and Dissertations Thomas H. Teper and Beth Kraemer Thomas H. Teper is Head of Preservation at the University of Illinois Urbana-Champaign; e-mail: tteper@uiuc.edu. Beth Kraemer is an Electronic Resources Librarian at the University of Kentucky; e- mail: kraemer@email.uky.edu. This paper examines the increasing trend of universities to pursue elec- tronic thesis and dissertation (ETD) programs. Although the goal of most programs is similar, procedural variations impact a program’s long-term success. As primary research generators, responsibility for providing long-term access to unique materials must be borne by universities. How- ever, this responsibility is in conflict with many ETD program goals, such as increased access and ease of production. n much the same way that digitization projects came to represent a university library’s technical prowess in the 1990s, the growing trend for universities to pur- sue electronic thesis and dissertation (ETD) projects is something that institu- tions can no longer ignore. Not only do ETD programs provide universities with the opportunity to promote their pro- grams, they also enable institutions to advertise their technical muscle. Al- though the goals of most ETD programs are similar, procedural variations among institutions influence the long-term suc- cess of these programs. Consequently, ETD programs are projects that institu- tions should approach with a great deal of measured thought and consideration. Technical variations such as the electronic formats chosen for the submission and retention of these unique documents, combined with an institution’s willing- ness to commit resources for proper long- term migration and storage, will have a significant impact on the long-term reten- tion of ETDs. If these documents do not survive in the long term, or if the later recovery of stranded data requires signifi- cant additional funds, it is the authors’ assertion that these programs can hardly be called successful. The responsibility for providing long- term access to unique materials must be borne by universities. Traditionally, this responsibility has been that of a university’s libraries and archives. For those institutions without active preser- vation, conservation, or records manage- ment programs, the principle of benign neglect has occasionally proven a material’s greatest ally. However, this only applies to traditional, paper-based materials. History has proven that benign neglect is not an acceptable manner in which to preserve access to electronic and digital information. Moreover, as stan- dards for digital archiving have yet to be established, programs embarking on ETD projects must make decisions that will affect the long-term feasibility of their programs with no specific guidelines be- 62 College & Research Libraries January 2002 yond the pantheon of “best practices” that continue to morph with each new tech- nical iteration. In the authors’ investigation of those ETD programs available through the Na- tional Digital Library of Theses and Dis- sertations (NDLTD) Web site, few insti- tutions embarking on ETD programs ap- pear to be actually considering the long- term ramifications of their decisions. If long-term preservation is considered by institutions embarking on ETD programs, the resulting decisions are often based on compromises in which the simplicity of student production and the university’s twin desire for immediate publication and an immediate Web presence become the primary considerations while concerns about long-term access are put on the back burner. Even worse, many programs ap- pear to leave long-term preservation is- sues unspecified, adopting a “we’ll deal with that when it comes up” approach. As demonstrated countless times, this cavalier approach could result in infor- mation loss. Unlike the retention of pa- per documents, the long-term retention of electronic documents is an active, re- source-intensive process. As a result, uni- versities that intend to maintain their in- formation must undertake long-term preservation planning. Another issue that heightens the need for intensive planning is that, unlike other documents that might be digitized to provide better access, ETDs are inherently “born digital” and do not necessarily have eye-legible backups available. Consequently, a lack of institu- tional planning for long-term retention may result in the loss of these unique documents. As institutions pursue ETD projects, their practices are going to affect the prob- ability of providing long-term access to the product they are desperately attempt- ing to market. Outside the ETD commu- nity, institutions have long experimented with additional options for the long-term preservation of, and access to, digital materials. Through a reasoned examina- tion of the strengths and weaknesses of specific formats, format strategies, the regulations governing institutional records, and the purpose of information production, suggestions for ensuring long-term access and the long-term suc- cess of ETD projects will be examined. Theses, Dissertations, and ETDs The archiving of electronic documents is a hot topic in many institutional communi- ties, including universities, libraries and archives, museums, private businesses, and the records management industry. Although all of these communities differ greatly, all share an interest in what tech- nology can offer. However, three main components make the long-term retention of electronic documents different from that of paper. First, born-digital information has no innate paper backup. Consequently, there is little to fall back on should format changes strand data. Second, electronic documents are different from paper be- cause access and delivery will change in the future. Archiving electronic documents is an active process, and the best format for delivery is not necessarily the best for- mat for retention. Finally, the production and storage of paper documents are rela- tively straightforward processes that have remained relatively stable over time. The production and retention of electronic documents is not quite so simple. As there is little opportunity for institutions to an- ticipate format, changes will depend on an institution’s ability to maximize its flex- ibility. Within the ETD community, long-term retention is an issue because few institu- tions consider long-term access issues when making format and procedural de- cisions about their programs. Currently, many rely on proprietary formats for both document delivery and retention. Unfor- tunately, this practice is not in keeping with current archival and preservation thinking. Many large-scale digitization Unlike the retention of paper documents, the long-term retention of electronic documents is an active, resource-intensive process. Long-term Retention of Electronic Theses and Dissertations 63 projects carried out in the mid-1990s con- cluded that reliance on proprietary soft- ware and hardware was a mistake—and a costly one, at that. As stated earlier, ETDs differ from tra- ditional theses and dissertations in that they are born-digital documents. This is what makes them simultaneously so tan- talizing to some and so feared by others. Within the realm of the traditional thesis or dissertation, there is one format—pa- per. That paper record is an eye-legible, permanent backup. Moreover, microfilm- ing by UMI provides institutions with an additional backup should something hap- pen to the original. As technologies developed that per- mitted students to create multimedia packages, universities began accepting theses and dissertations with a multitude of additional components intended to enhance the research. Reel-to-reel tape, audiocassettes, photographs, videos, floppy disks ranging in diameter from 3.5” to 8”, and CD-ROMS all became in- tegral components of a student’s research; and many libraries worked to incorporate these within bound volumes. Over time, the functionality of these component pieces declined to a condition of nonfunctionality because institutions could not reasonably ensure continued access. The resulting document could then be considered incomplete. Despite this incomplete state, the functional pa- per component remains a testimony of the student’s original accomplishment. Present developments are rapidly lead- ing institutions to a point where they en- vision electronic documents as the nor- mal means by which students submit the- ses and dissertations. Unfortunately, de- ficient planning in regard to acceptable nonproprietary format types and the cre- ation of backup versions of a student’s work threaten not just components of the ETD with obsolescence, but also the en- tire document. No more will the bound volume remain a partial record of the student’s work. Long-term access will need to be ensured by administrations that, during times of economic hardship, will be just as likely to fall back on the old stopgap of benign neglect. Five chief factors affect the longevity of electronic formats. Formats must be well documented, well tested, nonpropri- etary, widely distributed, and platform independent.1 Unfortunately, there are no archival standards or accurate gauges for the longevity of electronic formats. This may come in the future, but current projects base their decisions on guidelines and best practices developed by other programs and research projects. Although these do provide incredible assistance, guidelines and best practices inevitably reflect the fluid technological environ- ments of their creation, leaving institu- tions with outdated projects. Viewing Preservation within the ETD Program Within the university library setting, the term preservation is an umbrella term that concerns itself with providing access to materials for as long as they are needed by whoever might need them. Preserva- tion involves binding, conservation, dea- cidification, care and handling, and refor- matting programs, and its success depends on cooperation within the insti- tution. Preservation also involves making choices. Unfortunately, the resources available are frequently far below what could be spent on preservation programs, and institutions must prioritize how their dollars are spent. Another way to view preservation within the library context is as asset man- agement.2 Asset management is the busi- ness of providing access and protecting the institution’s investment. Although this may be an uncomfortable truth, uni- versities are businesses. They have invest- ment portfolios, assets, insurance, and Few authors expend energy develop- ing a standardized definition of what digital preservation actually means, let alone that various facets of it are more or less applicable depending on the situation. 64 College & Research Libraries January 2002 asset managers. They are in the business of creating knowledge, educating stu- dents, and perpetuating themselves. Within this context, preservation projects are asset management programs that fo- cus on library and archival collections that often include university records and stu- dents’ theses and dissertations. Although preservation administrators frequently pursue the same objectives, differences are emerging with the ready adoption of digital media. These differ- ences are causing individuals within the library community to rethink traditional notions of preservation and to formulate methods of handling emerging technolo- gies. One example of this comes from Maggie Jones, the National Library of Australia’s director of collection manage- ment and retrieval service. In a paper en- titled “Preservation Roles and Responsi- bilities of Collecting Institutions in the Digital Age,” she highlighted this trans- formation in thought: In the digital environment the links between selection of materials, pro- vision of access to those materials, and preservation of them over time is so inextricably linked that at the National Library we tend to talk in- creasingly simply of providing short- and long-term access rather than even making a semantic distinction between preservation and access.3 As members of the ETD community view such assertions, the possibility ex- ists there is confusion about what consti- tutes preservation. Within the preserva- tion community, administrators fre- quently consider preservation and access integral components of the same goal. Without preservation, long-term access is impossible; without long-term access, preservation is meaningless. Tradition- ally, the key concept in preservation is maintaining access to the intellectual con- tent of the item, not necessarily the arti- fact. This does not mean that preserva- tion programs ignore the artifact; indeed, most are based on maintaining access to the original for as long as possible before resorting to reformatting options. The second area of apparent confusion deals with the phrase “digital preserva- tion.” Much like the term preservation, digi- tal preservation is best viewed as a blanket term that incorporates two concepts. The first concept is preservation of physical objects through digital imagery. This in- volves providing access only through digital surrogates. The reduced usage of the original decreases the likelihood of use-related damage. However, real suc- cess depends on an institution’s willing- ness to create written policies and proce- dures that restrict access to original ma- terials. The second concept is that of pre- serving born-digital information. This concept pertains directly to the ETD com- munity, and it is the authors’ belief that, as links to our collective intellectual his- tory, they need to be preserved. In the past decade, a great deal of lit- erature appeared about the electronic en- vironment and its impact on education, scholarship, and librarianship. One of the areas most written about and debated within the library field is that of digital preservation. Books, articles, and research publications range from a desire to throw caution to the wind to those that seek to proceed cautiously. These two ap- proaches, opposingly labeled “futurist” and neo-Luddite, continue to produce the bulk of this material. However, the very definition of digi- tal preservation is incredibly vague and tends to vary from author to author. Few authors expend energy developing a stan- dardized definition of what digital pres- ervation actually means, let alone that various facets of it are more or less appli- cable depending on the situation. With that in mind, this article maintains that there is a distinction between preserving digital information and preserving arti- facts through digital imagery. Digital pres- ervation, therefore, is an umbrella term that encompasses a number of different practices. The third definition, and the one that pertains to the ETD community, is the Long-term Retention of Electronic Theses and Dissertations 65 preservation of digital information. ETDs represent the further development of our collective intellectual heritage. They are records of a student’s creativity. In addi- tion, in some states, theses and disserta- tions stand as permanent records of a student’s academic accomplishment. Consequently, they are permanent records according to the state’s records retention schedules. Understanding that the institution is, in one sense or another, both morally and ethically bound to preserve the materials submitted by students, materials with which institutions have traditionally been entrusted, the problem now becomes one of how an institution goes about preserv- ing something as complex and multidi- mensional as an electronic thesis and dis- sertation. In Preservation in the Digital World, Paul Conway concluded that there were three requirements for digital preservation— making use possible, protecting the origi- nal item, and protecting the surrogate.4 These conditions are valid in the ETD community. Moreover, the need for insti- tutions to preserve digitized information is more important to the ETD community than it is to the community of digital li- brary projects that gave rise to Conway’s report. The reason centers on the simple fact that the programs generally consider ETDs to be electronic entities—created, accessed, and stored in an electronic en- vironment. Although this facilitates short- term access, it should raise serious ques- tions about the potential for long-term sustainability because, unlike scanning projects, there is not necessarily a hard copy to fall back on. The variable created by the ETD is that of an electronic origi- nal. This means that institutions face a situation in which the electronic surro- gates created by digital preservation projects are now effectively equal to the original items. Failure to protect the origi- nal is now equal to a failure to protect the surrogate and therefore negates the pos- sibility of future use. Moreover, as is dem- onstrated in later sections, the costs of recovery far outweigh those of proper planning. The Problem of Preserving ETDs The digital environment’s flexibility is an incredible benefit to the methods by which users may access materials. How- ever, the instability that accompanies an industry in which the developmental year is measured in six-week intervals means that long-term preservation of digital in- formation is difficult. As a result, it is the authors’ assertion that digital preserva- tion, as defined by many individuals, is a misnomer. The process for creating per- manent digital surrogates akin to preser- vation microfilm is not yet a reality. The research of Jeff Rothenburg, a com- puter scientist with the RAND Corpora- tion, concluded that born-digital informa- tion such as that recorded in ETDs re- quired four things for preservation. First, preserving the item required its ability to be copied perfectly. Second, preservation required that individuals had the ability to access the information without geo- graphic restraint. Third, the preservation of digital information required that the item be machine-readable. Finally, the preservation of born-digital information required that an institution preserve the unique functionality of the original item.5 Within the ETD community, preserving functionality is, perhaps, the most impor- tant aspect of the equation. The poten- tially dynamic nature of ETDs makes them so desirable to institutions and stu- dents. Without their dynamic functional- ity, the ETD is little more than a paper document—static. However, projects that hope to be suc- cessful in both mounting their ETDs online and maintaining their long-term access and functionality must weigh the short-term benefits of instant access and an immediate Web presence against other considerations, such as the very real need to maintain long-term access. Tradition- ally, preservation focused on activities that increase the period in which access to original materials is possible before reformatting them. Because electronic in- 66 College & Research Libraries January 2002 formation is increasingly becoming the norm, a number of models are being de- veloped within the preservation commu- nity. The first is predominantly technologi- cal. Through the planned process of mi- gration, some maintain that institutions may preserve the functionality of their original projects. An example of this strat- egy is the LOCKSS model. LOCKSS, which stands for Lots of Copies Keep Stuff Safe, maintains that institutions must preserve the bits themselves, access to the bits, and the ability to translate the bits. It also maintains that the presence of many distributed, open-source electronic archives is key to maintaining long-term access. However, as characterized by Rothenberg, the research of many com- puter scientists primarily encourages the development of software emulators. The CEDARS project, a collaborative project coordinated in the United King- dom, concluded that both migration and emulation have merits, depending on the situation.6 However, one of the project’s more controversial conclusions in the digital preservation realm was that pres- ervation and access are not necessarily the same thing. Consequently, preservation administrators within the ETD commu- nity must understand that although pres- ervation and access are both integral com- ponents of one another, they might be the end goals of two separate, but necessary, processes. More traditional preservation models rely on analog backups of electronic ma- terials. The greatest problem with these, however, is that the electronic environ- ment does not easily transfer to the ana- log world. The functionality of dynamic Web pages, databases, and their ilk can- not be replicated in eye-legible media. The final preservation model takes its cues from early library preservation. In the electronic realm, it has arguably done more to harm our cultural resources than any other preservation activity and can be characterized by the phrase, “put it on the shelf and hope for the best.” Although this model has succeeded with some tra- ditional materials over the span of a few hundred years, Seamus Ross related the chance recovery of digital data to the re- covery of archaeological materials: Information stored in digital form is as delicate as archaeological re- mains of flora and fauna—it is rare to discover them, the environmen- tal conditions under which they were deposited influences their sur- vival, their recovery and study de- pends upon substantial investment of labour, and their interpretation requires a vast array of scientific technique.7 Within the ETD arena, benign neglect is a guaranteed model for failure. Format Choices, or PDF and Long- term Permanence Whereas preservation professionals and working groups are seeking to find a happy medium in which the concerns of long-term access and preservation are both considered and met, many members of the ETD community are taking advan- tage of a false middle ground between the technological and traditional views of preservation. This middle ground is quickly making itself appear to be some sort of standard, despite the fact that it is not. It is Adobe’s Portable Document For- mat (PDF). Increasingly attached with large-scale efforts at distributing textual information, PDF’s faithful replication of printed formats is making it one of the prime means of communicating textual data online. Indeed, it has become the de facto standard for the Government Print- ing Office’s (GPO) publication of govern- ment documents as well as the publica- tion of a great deal of the Web’s white and gray literature. The GPO’s adoption of PDF has made it one of the most preferred formats be- cause many institutions mistakenly as- sume that the government’s current use will ensure the format’s long-term viabil- ity. However, it is the authors’ assertion that this belief should be adopted only Long-term Retention of Electronic Theses and Dissertations 67 with extreme caution. Although many individuals are dutifully working at de- veloping long-term preservation and ac- cess strategies, the federal government’s record for preserving data is far from ex- emplary. In the 1970s, the National Aero- nautical and Space Administration (NASA) transferred to magnetic tape a great deal of information that is no longer readable. The National Archives and Records Administration (NARA) lost the records for many Vietnam veterans after electronically encoded files became cor- rupted. Making assumptions about the longevity of magnetic tape, NARA de- stroyed paper records decades before this tragedy. As a result, verifying the service records of tens of thousands of veterans is next to impossible. Similarly, Adobe’s “agreement” with the government to en- sure backward capability for twenty-five years should not be counted on when it comes to the long-term preservation of ETDs and, as is demonstrated later, should not be considered synonymous with a government-sanctioned preserva- tion plan. Recently, the GPO’s use of PDF has become a major component of the Fed- eral Depository Library Program. The distribution of government documents, traditionally characterized by monthly catalogs, orders, and a multitude of for- mats ranging from paper and fiche to floppy disks and CD-ROMS, has been a nightmare for many government docu- ments librarians. Although the lack of an overarching government information policy continues to make the lives of gov- ernment documents librarians difficult, PDF is making their lives easier, just as it is making the lives of records managers and others charged with organizing and disseminating information in the elec- tronic environment simpler. Despite this use, PDF is a distribution format; paper remains the preservation format at the GPO. What leads the authors to urge caution in accepting PDF as the permanent file format for document imaging stems from three sources—the government, record managers, and private industry. As noted earlier, the government’s record of accom- plishment in preserving access to elec- tronic information is not the best, and the advent of PDF is no reason to believe oth- erwise. At a conference jointly sponsored by the CEDARS project, the Online Com- puter Library Center, the Research Librar- ies Group, the U.K.’s Office for Library Networking (OLN), and the Joint Infor- mation Systems Committee (JISC), George D. Barnum of the GPO presented a paper entitled “The Federal Depository Library Program: Preserving a Tradition of Access to United States Government Information.” In this presentation, Barnum stated that there is no reason to assume that the government will continue to rely on PDF. Presentation of electronic publica- tions that rely on an open standard … will presumably remain straight- forward as the Web and its succes- sor technologies develop. Publica- tions, however, that rely on a propri- etary format or commercial software for their use pose serious challenges, since backward compatibility in newer technology will depend on market forces and demand. GPO cannot consider content separate from access and access mechanisms; thus the greatest challenge over the coming years will be to keep publi- cations captured in 2000 viable de- spite the advance of technology. Transfer of all publications in the archive to a single, migration- friendly, open standard format has not, in the interest of preserving the official nature of the publications, been pursued thus far. Such transfer may, however, present itself as the best alternative for keeping archived publications alive.8 PDF was not mentioned once through- out the entire presentation. Barnum did, however, mention that the GPO’s three guiding principles for pursuing electronic access were the trend in government to 68 College & Research Libraries January 2002 adopt electronic media for communicat- ing with the public, the rapid adoption of electronic media in libraries generally, and the clear direction of Congress to implement greater electronic access and to seek reductions in the cost of dissemi- nating information. He then stated that, though the preservation of electronic files was important, the third reason, the re- duction in distribution costs, was the most imperative force behind the push for elec- tronic access.9 If this were not enough to give pause, the federal government is currently un- dertaking an interagency project involv- ing twenty-five federal agencies. Its goal is to develop a Portable Document Deliv- ery Format (PDDF) and a Federal Infor- mation Processing Standard (FIPS). FIPS will provide a means for Gov- ernment agencies to archive final form electronic documents in an open, transportable, format while maintaining document integrity. The importance of this achievement cannot be over-stated, since the availability of public domain non- proprietary software will enable vir- tually any Internet user to submit complex electronic documents (au- dio, text, graphics) in a form that can be retrieved in its original form with full retention of document integrity (no loss of format, content, color, etc.).10 Although the federal government is currently taking advantage of the bless- ing that PDF provides in widening access and reducing the cost of dissemination, it also is realistic about proprietary soft- ware products and the maintenance of long-term access to digital information while still preserving document integrity. The result is the government’s effort to seek a permanent, nonproprietary soft- ware system that will permit simulta- neous delivery and the preservation of document integrity. Both inside and outside the govern- ment, record managers have long been experimenting with the management of electronic files. A tour of the Association of Records Managers (ARMA) product floor at an annual conference five years ago would have led visitors to believe that very few companies were producing tra- ditional micrographic equipment any- more—the products from Kodak, Canon, and other imaging companies were domi- nated by purely digital systems. Attend- ees at current ARMA conferences will see that the tables have actually turned back about ninety degrees. The imaging and records management industry now is dominated by the hybrid imaging system, dual output—electronic files such as PDF and other digital raster images for access and microfilm records for long-term pres- ervation. In a highly technical paper entitled “Per- manent Digital Records and the PDF For- mat: Defining a Permanent TransFormat Records Management System, a Hierarchy of Record Storage Formats, Five PDF For- mats, and Document Copying/Migra- tion,” Stephen J. Gilheany, a certified records manager and certified document imaging archivist, addressed the chal- lenges of creating a long-term records man- agement system.11 Though speaking favor- ably of raster formats and, in particular, Adobe’s PDF, Gilheany’s model for suc- cessful management and preservation car- ried an underlying hint of caution. This vague caution centered on the necessity of preserving electronic documents in a mul- titude of formats, including native formats such as Microsoft Word for contextual in- formation, raster format such as PDF or TIFF for access, the OCR output so that documents will be searchable, a structured format such as SGML or XML for preser- vation purposes, and an ASCII document to serve as a last-ditch method of recover- ing lost data. The paper also noted the ex- istence of five PDF formats, some of which are more migration- and preservation- friendly than others. The final reason for not accepting PDF, or any other single format, as a preserva- tion tool, emerges from private industry. Pro Quest (formerly Bell & Howell Info- Long-term Retention of Electronic Theses and Dissertations 69 Learning, formerly UMI) is primarily known to the academic community as a microfilm producer. Pro Quest films the- ses, dissertations, newspapers, books, serials, and other materials too numerous to name. Currently, it plans to receive the New York Times and a number of other major newspapers electronically. Using I- Beam Readers, the raster images they re- ceive of these newspapers will be output to microfilm for sale and permanent stor- age. As Bell & Howell Info-Learning, the company’s net earnings in 2000 exceeded $275.2 million. The company maintains digital files of the materials it receives electronically; however, it does not trust the long-term accessibility of these elec- tronic files. Impact on the ETD Community When specifying document format re- quirements for ETD programs, institu- tions have to balance ease of production for the student with ease of migration/ retention for the institution. PDF docu- ments are very easy for students to cre- ate. The appeal of a PDF-based ETD pro- gram is that students do not need to be highly computer literate to create this document format. The typical word pro- cessor a student uses to do writing can export the final document as a PDF with the push of a button. (Of course, the us- ability of this final document will depend on enhancements such as internal link- ing, proper embedding of unusual fonts, and other “details” that institutions specify in the instructions that students receive. Apparently, some PDFs are more equal than others.) PDF also has the ad- vantage of being easily deliverable via the Web. The Acrobat Reader program is available as a free download and permits seamless viewing of PDF documents us- ing any Web browser. PDF documents are among the most common document for- mats available via the Web. Simple creation and delivery are very tempting features to an administration looking to start an ETD program from scratch. However, the authors believe that this choice is shortsighted. ETD programs are new, and none have been tested in format migration, which is inevitable. Migration is likely to be an unfortunate, yet inevitable, reality. At present, a file format’s ability to migrate can be assessed and should be a primary consideration when administrators make decisions. The question that then arises is, Can an ETD program have it all? Several pro- grams are investigating XML as the for- mat of the future for ETDs. XML (eXtensible Markup Language) is a tagged ASCII text. Interpreted by a non- proprietary “browser” for maximum readability, plain-text readers such as Notepad read XML. XML can be difficult for the average student to produce and will initially require more hand-holding by institutions. However, for those uni- versities interested in fostering true infor- mation literacy among their students, the hand-holding will have more lasting re- sults than the push-button methods de- scribed above. XML is very promising for ETDs, but other alternatives might be easier to implement in the short term. One possi- bility that would improve an institution’s chances for migration in the future would be to require submission by the student of both the PDF (as a handy delivery for- mat) and the “native” format used to cre- ate the original document (e.g., Word). The word processor file is still proprietary and thus is not the best candidate for long-term storage. Nevertheless, it can be converted to other formats much more easily than PDF can. If the campus has a standard program used for word process- ing, the documents could be converted in batch mode to each new version and, ul- timately, to some other software in the future. At present, one format does not have to serve all purposes. PDF is a fine delivery format, but institutions need something else for long-term retention. Dialog with the student should be a primary component of any ETD program and is necessary to discuss the rationale behind format restrictions. Requiring stu- dents to do extra work (such as submit- ting two versions of the ETD) might dis- 70 College & Research Libraries January 2002 courage participation in programs. Pro- viding options and explaining ramifica- tions will shift some of the responsibility to the student author. A student might want to include a file type that might not migrate because the increased value for the document now is worth the risk. A program also might want to restrict these variations to appendices, asking that the student craft the document so that it can stand alone, should the appendix not migrate. Such “optional obsolescence” might be OK theoretically, but it raises questions about the documents that become part of the permanent record. The document that is available in fifty years with no appen- dix is not the same document created by the author. In Kentucky, state law requires that the University of Kentucky preserve university records, including disserta- tions, in perpetuity. Consequently, this project presents administrators with the need to make format decisions with an awareness of the long-term implications. The institutional library or archives, as the unit frequently responsible for long-term retention of university documents, will be well in tune with the long-term preser- vation and access issues affecting ETDs, but other participants in the process can tend to focus on the other function of these documents, as a step completed in partial fulfillment of the requirements for a degree. In this view, the document has served its purpose and does not need to be retained forever. Institutions in Ken- tucky cannot afford to think that way. Asset Management, or Short-term Benefits versus Long-term Costs In the course of touring just about any preservation lab at a large institution, visi- tors see signs that say, “Think Twice; Cut Once.” By urging caution and restraint in cutting boards and using supplies, pres- ervation departments have generally managed to keep collections conserva- tion, the preservation of library materi- als, a lower-cost alternative to the whole- sale replacement of damaged and destroyed materials. In the realm of electronic records and information, a similar adage applies. Al- ter it slightly, and the reader might imag- ine a sign that states, “Plan Properly; Plan Once.” The cost of planning properly is time and, perhaps, a little bit of the pres- tige that accompanies being the first in- stitution out there with the sexiest Web page for accessing ETDs. The cost of failing to plan is far greater. In a publication recently released by the British Library’s National Preservation Office, Seamus Ross of Glasgow University’s Humanities Advanced Tech- nology and Information Institute noted: “The short-term economic and productiv- ity advantages offered by digital storage, manipulation, and communication encour- ages us to depend on them more and more. Although some are aware of the preserva- tion risks, society in general is ignorant of them.”12 Ross then proceeded to outline the two digital preservation strategies that he believes most often characterize projects— a proactive approach and an approach rep- resented by accident and rescue. Regardless of what preservation strat- egy an institution chooses, preservation— whether characterized by a proactive ap- proach or by stumbling into it—essen- tially centers on dealing with the results of institutional choice. Consequently, the costs must be considered. In very black-and-white terms, the cost of planning preservation activities is great, but it consists primarily of time and resources devoted to developing an infra- structure capable of dealing with preser- vation activities. The other cost is, in the case of many ETD programs, the greater motivator—institutional recognition. In an academic world increasingly driven by the speed and competition that have long characterized the for-profit sector, being the second institution to develop a viable program is not good enough. The final reason for planning is that many believe failure may not occur until after decision makers are out of the picture. Long-term Retention of Electronic Theses and Dissertations 71 As institutions plan for ETD programs, they must remember three things. The first impetus for preservation planning is that the economic costs of not planning are greater than the initial outlay. At the 1995 meeting of the ISO Archiving Stan- dards working group, participants re- ported that it cost between $2.65 and $3.75 per megabyte per year to retain electronic records created in the engineering sector, but about $662.50 to reconstruct them if they were lost or destroyed. Oil survey records are even more costly to recreate. The National Archives of Australia (NAA) holds 600,000 computer tapes of oil sur- vey data. In the early 1990s, the NAA es- timated recreation of the offshore data at $5,300 per meter, or $5.3 billion in total.13 Of course, all the data cited above per- tain to materials that could be salvaged from analog sources. As increasing vol- umes of materials exist in solely electronic forms, the potential cost rises. The second reason for planning pres- ervation activities is the danger that in- stitutions face from a lack of memory. Print materials have served as the pri- mary tools for historians and researchers for years. However, the advent of audio and video technologies has shown schol- ars that technologies are frequently fleet- ing. There is no reason to believe that nontextual digital formats will be any dif- ferent. What they are is more complex. Frequently, hardware and software must interpret the data in question before ma- nipulation or display. In their raw form, they are often meaningless. The final reason for planning is that many believe failure may not occur until after decision makers are out of the pic- ture. However, the probability that fail- ure may occur after administrators leave is not an excuse. “Preserving digital as- sets cannot happen as an after-thought, it must be planned: media degrade, techni- cal developments make systems obsolete, or information is rendered inaccessible by changes in encoding formats.”14 Preser- vation requires active intervention. Un- secured, it is susceptible to loss through the physical breakdown of the media, ren- dered inaccessible by technological ad- vances, or left meaningless through a lack of or insufficient contextual evidence. Currently, a multitude of choices is available. The ETD community is faced with choosing between their obligations as educators and scholars and their obli- gations as members of the ETD commu- nity. One of those dictates that they seek to preserve and enrich human understand- ing of the world through traditional schol- arship; the other dictates that they seek to preserve and enrich human understand- ing of the world through more contempo- rary scholarship. The authors do not be- lieve that these are mutually exclusive con- cerns. Rather, the authors believe that ad- ministrators must simultaneously accept that ETDs are here and that the reality is that ETDs are far from perfect. Notes 1. Richard Ficher and Charles Dollar, “File Formats to Support Long-term Access to Elec- tronic Records,” in 2000 Managing Electronic Records Conference Proceedings (Chicago: Cohasset Associates, Inc., 2000). 2. Sally A. Buchanan, “Too Big, Too Expensive, Too Time-consuming,” Wilson Library Bulle- tin 67 (Oct. 1993): 64. 3. Maggie Jones, “Preservation Roles and Responsibilities of Collecting Institutions in the Digital Age,” in 1995 National Preservation Office (NPO) Conference Multimedia Preservation: Chas- ing the Rainbow (Brisbane, Aus: National Library of Australia, 1995). Available online from http:/ /www.nla.gov.au/nla/staffpaper/npomj.html. A print version is also available through the National Library of Australia. Ordering information is available with the online proceedings at http://www.nla.gov.au/niac/meetings/npo95.html. 4. Paul Conway, Preservation in the Digital World (Washington, D.C.: Council on Library and Information Resources, 1996). Available online from http://www.clir.org/pubs/reports/ conway2/index.html. 72 College & Research Libraries January 2002 5. Jeff Rothenberg, Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation: A Report to the Council on Library and Information Resources (Washington, D.C.: Council on Library and Information Resources, 1999), 3. Available online from http:// www.clir.org/pubs/reports/rothenberg/introduction.html. 6. CEDARS (CURL Exemplars in Digital Archives) is an international, higher education ef- fort charged with promoting awareness of the need for digital preservation, developing collec- tion management strategies for digital materials, and investigating methods of digital preserva- tion. Information on the CEDARS project is available online from http://www.leeds.ac.uk/ce- dars/. 7. Seamus Ross, “Changing Trains at Wigan: Digital Preservation and the Future of Scholar- ship,” National Preservation Office Preservation Guidance Occasional Papers (London: The Brit- ish Library, 2000): 3. 8. George D. Barnum and Steven Kerchoff, “The Federal Depository Library Program: Pre- serving a Tradition of Access to United States Government Information” (paper presented at Preservation 2000: An International Conference on the Preservation of Long-term Accessibility to Digital Material). Available online from http://www.rlg.org/events/pres-2000/barnum.html. Full conference proceedings are available online from http://www.rlg.org/events/pres-2000/ prespapers.html. 9. Ibid. 10. Library of Congress, “Services of the Preservation Research and Testing Division” (Last updated 28 June 1999). Available online from http://lcweb.loc.gov/preserv/resear.html. 11. Stephen J. Gilheany, “Permanent Digital Records and the PDF Format: Defining a Perma- nent TransFormat Records Management System, a Hierarchy of Record Storage Formats, Five PDF Formats, and Document Copying/Migration.” Available online from at www.ArchiveBuilders.com. 12. Ross, “Changing Trains at Wigan,” 6. 13. Ibid., 4. Ross’s paper originally cited the figures in Australian dollars. The authors con- verted the numbers to U.S. dollars. Figures are current as of March 2001. 14. Ibid., 6.