recognised charitable organization by the US Internal Revenue Service. http://www.gutenberg.net Chapter 5: Details and circumstances of the Interviews Michael Hart, Founder and Director of Project Gutenberg and Dr. Greg Newby, CEO of the Project Gutenberg Literary Archive Foundation completed the questionnaire and participated in email communications between March and April 2004. Chapter 6: Analysis This section presents an analysis of the data collected during the case study. It is organised to mirror the sequence of topics in the questionnaire. * Perception and Awareness of Digital Preservation * Preservation Activity * Compliance Monitoring * Digital Preservation Costs * Future Outlook Perception and Awareness of Digital Preservation Project Gutenberg is one of the earliest web sites on the internet and one of the earliest digital libraries in existence. They have been active in creating eBooks for over thirty years and are aware of the social benefits to be gained through preserving these resources for public access. Project Gutenberg ensures that all eBooks are available in plain text and other open formats to avoid obsolescence. The eBooks are uploaded to two main servers (9) and can then be mirrored by over thirty sites worldwide. The combination of open formats and many copies should ensure that access to these digitised literary works is preserved for the long-term. The Main Problems The major long-term problem lies in ensuring that copyright laws are respected for all of the digitised works made accessible by Project Gutenberg. Mirror sites exist in many countries around the world and, as such, ensuring that copyright laws are respected in each can be difficult. However, no eBook will be posted to the main site in the U.S. without gaining copyright clearance. Recent extensions to copyright laws in the U.S. and Europe have presented new challenges for the Project Gutenberg team. This is because no new works will be released to the public domain until 2018. Hart believes that these extensions to copyright laws benefit 'very few copyright holders at the expense of universal access to literature and knowledge'(10). These changes will impact the amount of research that needs to be done before an eBook can be digitised and made available. Asset Value and Risk Exposure Project Gutenberg exists to make literature and reference materials freely accessible to the general public in a digitised format. As mentioned above, Michael Hart believes that free access to literary works is vital for enabling the sharing of knowledge, art, music and culture. Regulatory Environment Project Gutenberg must adhere to U.S. laws involving operation as a not-for-profit corporation. However, these regulations are not sector specific. Project Gutenberg must be exceedingly careful to respect U.S. copyright laws regarding the works that they digitise and make available over the Internet. However, once a publication has been verified as being in the public domain, there are no other legal restrictions affecting Project Gutenberg. Preservation Activity Policies and Strategies Project Gutenberg scans literary works and employs OCR technology to create eBooks. In some cases, eBooks are typed in by hand. The eBooks are then edited by a team of volunteer proof-readers. There are procedures and guidelines available online for volunteers to consult when scanning and editing texts for Project Gutenberg to ensure that all eBooks follow a standard format. Once the eBook has been produced, it is uploaded to two main servers. The eBook is made accessible via the official Project Gutenberg website and the Internet Archive site and on over thirty mirror sites around the world. As there are no access or distribution issues, Project Gutenberg encourages users to save copies of the eBooks to CD or DVD. Project Gutenberg believes that by generating a multitude of versions - those stored on the main servers, on local servers (through mirror sites) and those downloaded to CD and DVD - will ensure that the bit stream of the literary work is preserved for access. This embodies the philosophy of the LOCKSS strategy. LOCKSS 'uses the caching technology of the web to collect pages of journals as they are published, allowing libraries to take physical custody of selected electronic titles they purchase'(11). LOCKSS was inspired by the words of Thomas Jefferson who said "let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident." (12) Selection Project Gutenberg aims to make digitised versions of popular literature and reference materials in the public domain freely accessible to the general public. As copyright expires, publications can be freely replicated and distributed. Many of these works are out of print. By digitising the out of print works, Project Gutenberg feels that they are saving the publications from 'obscurity and ultimate oblivion'(13). Basically, all of the texts can be classified into three categories: light literature (such as Alice in Wonderland), heavy literature (such as Shakespeare and Dante) and references (such as Roget's Thesaurus). Mathematical and scientific works are also made available including the Human Genome. There are no real restrictions to what Project Gutenberg will make accessible. As long as the material is in the public domain, they can be digitised and submitted to Project Gutenberg. However, Project Gutenberg aims to benefit the widest possible audience and therefore prioritise the digitisation of popular literature and reference materials rather than extremely specialised works. Project Gutenberg already have texts in over 31 languages and are especially keen to increase their multilingual holdings. Preservation Project Gutenberg already has numerous plain text files that are 20-30 years old. In that time, many file formats have come and gone while plain text is still readable on virtually all computers. The use of plain text will also help to insure against future obsolescence. All Project Gutenberg eBooks are created as plain ASCII text files. This means that people with 'Apples and Ataris all the way to the old homebrew Z80 computers' (14) as well as Mac and UNIX users are all able to read the text files. Any open format can be submitted but the Project Gutenberg team will also generate plain ASCII (15) text files. Project Gutenberg encourages users to created new formats from the plain text files to suit their individual needs. Once the eBook has been generated and edited by volunteers, it is uploaded to two main servers. The first is the Project Gutenberg site itself and the other is the Internet Archive site. From this point, mirror sites can download the redundant files to their own sites and store them on their own servers. Project Gutenberg uses the unique eBook number as the file name. Therefore, if the eBook is the 10001 plain text file created it will be named 10001. txt. Project Gutenberg will accept as many open file formats as volunteers are willing to submit, but will also generate a plain text version. Additional versions in other formats will be named accordingly but with different file extensions (e.g., html, pdf, xml). Each eBook has its own subdirectory that contains all versions of the eBook. Project Gutenberg have volunteers representing a wide range of sectors (cultural heritage, government and higher education). Through these affiliations, they keep up to date with digital preservation developments. Project Gutenberg staff have ties with many organisational leaders and informal collaborations on best practices are common. Access