CIC Wright 19th Century American Fiction Project

Encoding Guidelines



DTD and Files for XMetal, Author/Editor, WordPerfect

To retrieve any of these files, hold down right mouse button, and click on "Save Link As" (Netscape) or "Save Target As" (Internet Explorer).

Return to top


Related Documents

Return to top


Introduction

These guidelines are meant to supplement and extend the TEI Text Encoding in Libraries Guidelines for Best Encoding Practices. The texts in this project will be encoded at Level 4 as described in the Guidelines, with more detailed description of encoding practice below.
Return to top

Principles

  1. It is better to do less, than to do wrong or mislead.

  2. All encoding decisions should allow for enhancement and avoid tag abuse.

  3. It should be easy to move from transcription to page images. Elaborate encoding schemes to describe rendition are not necessary.

Return to top

General Transcription and Encoding Practice

One of the primary goals for the project should be accurate transcriptions. By this we mean Return to top

Structural Divisions

Return to top


Individual Elements and Features

Return to top


TYPE Attribute Values

Return to top


Proofreading

Proofreading is the most important part of this whole project, and the most difficult. It requires a great deal of concentration and a different kind of reading than you're probably used to. You'll want to record the text as it appears on the page, and be sure that you're taking the time necessary to see the page as it is printed. The best method is generally to read a line of text, read a line on screen. When you find an error, fix it and reread the line to see if there are other errors in the same line. You'll also want to impress upon your staff the importance of accurate proofreading.

I've tried to provide the most accurately OCR'ed pages possible. In fact, I've been OCR'ing most of the pages twice, using different settings, and choosing the more accurate page. But in some cases, the printing is too faint or too dark, or has too much bleed-through, or too many random speckles, or something, and the OCR comes out completely garbled.

The best thing to do in these cases is to retype the page, rather than try to correct the errors in the OCR'ed text--you'll figure out pretty quickly which ones these are. The more errors a page has, the more difficult it is to proofread, because you may fix one error, but miss others in the process. A certain number of titles simply cannot be OCR'ed, and we'll have to consider having these titles retyped by a vendor. Contact me if you think your text has too many errors to proceed.

Some common errors you'll find from the OCR include:


Last updated: 26 January 2001
URL: http://www.letrs.indiana.edu/wright/guidelines.html
Comments: PWILLETT@indiana.edu