recognised charitable organization by the US Internal Revenue Service.

http://www.gutenberg.net


Chapter 5: Details and circumstances of the Interviews
Michael Hart, Founder and Director of Project Gutenberg and Dr. Greg
Newby, CEO of the Project Gutenberg Literary Archive Foundation
completed the questionnaire and participated in email communications
between March and April 2004.

Chapter 6: Analysis
This section presents an analysis of the data collected during the case
study. It is organised to mirror the sequence of topics in the
questionnaire.
* Perception and Awareness of Digital Preservation
* Preservation Activity
* Compliance Monitoring
* Digital Preservation Costs
* Future Outlook


Perception and Awareness of Digital Preservation

Project Gutenberg is one of the earliest web sites on the internet and
one of the earliest digital libraries in existence. They have been
active in creating eBooks for over thirty years and are aware of the
social benefits to be gained through preserving these resources for
public access. Project Gutenberg ensures that all eBooks are available
in plain text and other open formats to avoid obsolescence. The eBooks
are uploaded to two main servers (9) and can then be mirrored by over
thirty sites worldwide. The combination of open formats and many copies
should ensure that access to these digitised literary works is
preserved for the long-term.


The Main Problems

The major long-term problem lies in ensuring that copyright laws are
respected for all of the digitised works made accessible by Project
Gutenberg. Mirror sites exist in many countries around the world and,
as such, ensuring that copyright laws are respected in each can be
difficult. However, no eBook will be posted to the main site in the
U.S. without gaining copyright clearance.  Recent extensions to
copyright laws in the U.S. and Europe have presented new challenges for
the Project Gutenberg team. This is because no new works will be
released to the public domain until 2018. Hart believes that these
extensions to copyright laws benefit 'very few copyright holders at the
expense of universal access to literature and knowledge'(10). These
changes will impact the amount of research that needs to be done before
an eBook can be digitised and made available.


Asset Value and Risk Exposure

Project Gutenberg exists to make literature and reference materials
freely accessible to the general public in a digitised format. As
mentioned above, Michael Hart believes that free access to literary
works is vital for enabling the sharing of knowledge, art, music and
culture.


Regulatory Environment

Project Gutenberg must adhere to U.S. laws involving operation as a
not-for-profit corporation. However, these regulations are not sector
specific. Project Gutenberg must be exceedingly careful to respect U.S.
copyright laws regarding the works that they digitise and make
available over the Internet. However, once a publication has been
verified as being in the public domain, there are no other legal
restrictions affecting Project Gutenberg.


Preservation Activity

Policies and Strategies

Project Gutenberg scans literary works and employs OCR technology to
create eBooks. In some cases, eBooks are typed in by hand. The eBooks
are then edited by a team of volunteer proof-readers. There are
procedures and guidelines available online for volunteers to consult
when scanning and editing texts for Project Gutenberg to ensure that
all eBooks follow a standard format. Once the eBook has been produced,
it is uploaded to two main servers. The eBook is made accessible via
the official Project Gutenberg website and the Internet Archive site
and on over thirty mirror sites around the world. As there are no
access or distribution issues, Project Gutenberg encourages users to
save copies of the eBooks to CD or DVD.

Project Gutenberg believes that by generating a multitude of versions -
those stored on the main servers, on local servers (through mirror
sites) and those downloaded to CD and DVD - will ensure that the bit
stream of the literary work is preserved for access. This embodies the
philosophy of the LOCKSS strategy. LOCKSS 'uses the caching technology
of the web to collect pages of journals as they are published, allowing
libraries to take physical custody of selected electronic titles they
purchase'(11).  LOCKSS was inspired by the words of Thomas Jefferson
who said "let us save what remains: not by vaults and locks which fence
them from the public eye and use in consigning them to the waste of
time, but by such a multiplication of copies, as shall place them
beyond the reach of accident." (12)


Selection

Project Gutenberg aims to make digitised versions of popular literature
and reference materials in the public domain freely accessible to the
general public. As copyright expires, publications can be freely
replicated and distributed. Many of these works are out of print. By
digitising the out of print works, Project Gutenberg feels that they
are saving the publications from 'obscurity and ultimate oblivion'(13).
Basically, all of the texts can be classified into three categories:
light literature (such as Alice in Wonderland), heavy literature (such
as Shakespeare and Dante) and references (such as Roget's Thesaurus).
Mathematical and scientific works are also made available including the
Human Genome. There are no real restrictions to what Project Gutenberg
will make accessible. As long as the material is in the public domain,
they can be digitised and submitted to Project Gutenberg. However,
Project Gutenberg aims to benefit the widest possible audience and
therefore prioritise the digitisation of popular literature and
reference materials rather than extremely specialised works. Project
Gutenberg already have texts in over 31 languages and are especially
keen to increase their multilingual holdings.


Preservation

Project Gutenberg already has numerous plain text files that are 20-30
years old. In that time, many file formats have come and gone while
plain text is still readable on virtually all computers. The use of
plain text will also help to insure against future obsolescence. All
Project Gutenberg eBooks are created as plain ASCII text files. This
means that people with 'Apples and Ataris all the way to the old
homebrew Z80 computers' (14) as well as Mac and UNIX users are all able
to read the text files. Any open format can be submitted but the
Project Gutenberg team will also generate plain ASCII (15) text files.
Project Gutenberg encourages users to created new formats from the
plain text files to suit their individual needs. Once the eBook has
been generated and edited by volunteers, it is uploaded to two main
servers. The first is the Project Gutenberg site itself and the other
is the Internet Archive site. From this point, mirror sites can
download the redundant files to their own sites and store them on their
own servers.

Project Gutenberg uses the unique eBook number as the file name.
Therefore, if the eBook is the 10001 plain text file created it will be
named 10001. txt. Project Gutenberg will accept as many open file
formats as volunteers are willing to submit, but will also generate a
plain text version. Additional versions in other formats will be named
accordingly but with different file extensions (e.g., html, pdf, xml).
Each eBook has its own subdirectory that contains all versions of the
eBook.

Project Gutenberg have volunteers representing a wide range of sectors
(cultural heritage, government and higher education). Through these
affiliations, they keep up to date with digital preservation
developments. Project Gutenberg staff have ties with many
organisational leaders and informal collaborations on best practices
are common.


Access