Jump to main content
What is the Nineteenth-Century Knowledge Project?
About
Acknowledgements for all contributors.
How we keep hundreds of thousands of files organized.
Edition-Section System
File organization depends on two basic folder types
Folder names
As the OCR workflow passes through its various stages, production moves into specific folders for each stage. Their names and contents are given below:
Repositories
A guide to the different repositories used to store ocr-project data.
Setting Up the Repositories
Create local copies of the remote repositories
The procedures we use to get the best quality text recognition in ABBYY Fine Reader.
AFR Interface
Learn about the main elements of the program interface
Create a Page-Inventory File
Create a page-inventory file.
Create an Image Collection
Organize image files for scanning.
Create an OCR-Project
How to create and manage an OCR-Project.
Settings
Recommended settings for all options in ABBYY FineReader
Draw Boxes
Manually creating text recognition boxes improves accuracy
Page Recognition
Excellent page recognition depends on preparing pages properly.
Save and Output
How to output your OCR results.
This introduction to Oxygen XML Editor shows you how to navigate the interface and perform standard procedures on the Encyclopedia files.
Oxygen Interface
An introduction to the main components of the Oxygen interface.
Create an XML-Project
Using Oxygen XML Editor to organize files.
Transform DOCX to TEI
How to convert DOCX files to TEI in Oxygen.
Procedures for converting single pages into Encyclopedia entries.
Convert Page to Entry Files
Before page files can be converted to entry files, we need to do some housekeeping.
Entry-Inventory File
Document the filenames of every entry in a section using the entry-inventory file.
Validate Entry Files
Use Oxygen to validate the entry files.
Reference information on file/folder names, TEI-encoding standards, and unicode characters.
Editorial standards
The following editorial principles are employed in creating this digital edition.
Image Sources
Bibliographic information on print editions and image repositories.
Naming Conventions
Lists the naming conventions we use for editions, sections, folders, and files.
TEI Style Manual
All TEI encoding must follow these guidelines.
Unicode Characters
List of unicode characters and entities used frequently in the Encyclopedia and not on the standard US keyboard.

Project Director Peter Melville Logan
National Endowment for the Humanities HAA-261228-18