Skip Navigation.

Early American Fiction Project Workflow

NOTE: These procedures are taken from training guidelines used by EAF digitization staff during the project. These are not current imaging workflows or standards.

Book Handling

Prior to scanning, selected volumes were pulled from the stacks, inspected, and relocated to the digital lab. The books remained in the lab until they were digitized, had a TEI header created, and the jpeg derivatives checked.

Workflow Database

Each book was given a record in a FileMaker Pro database.

With the book in hand, the EAF staff recorded information into the database:

Upon volume imaging completion, the FileMaker Pro record was filtered to a TEI header.

Parsing, tiff header integration and quality assurancewill were done by the Electronic Text Center staff. AACR-2 compliance and MARC record generation was coordinated by the UVa Special Collections Cataloging Department.


Digital Image Creation

File-naming convention: [xxx-001]

See the EAF Digital Image Scanning Procedures for a detailed description of camera operation, software settings, imaging, batching, and database tracking.

Conversion to JPEG

Batch-processing scripts were run in PhotoShop to produce a large JPEG file, used to generate gif thumbnails and two other levels of jpeg files..

From the large JPEG version:

The aim is to keep the jpegs a known and predictable percentage of the original, so that they maintain relative size differences (e.g. an image of a small book looks smaller than an image of a large one.)


Text Processing

JPG files were uploaded to vendor's FTP site for processing according to a "Data Conversion Design Document." The goal for the vendor was to reproduce the source in every aspect, including capturing line breaks and page breaks at the exact location as in the source.

Every <divx> has a <head> in the Chadwyck-Healey (C-H) scheme as in TEI, but the head is numbered along with the <divx> -- a <div0> takes a <comhd0>, a <div1> takes a <comhd1>, etc. At present, we think we will use the n= attribute to record this information : <head n="comhd1">. This will be easy to change to <comhd1> for C-H purposes.

The <text> tag in TEI cannot take a <head> itself, but its C-H equivalent needs a <head> and an <attrib> field. One solution is to add a <div1 type="chad"> at the top of every <front> before the real <front> matter, and move it up before teh <front> for the C-H format. Its <head> -- <head n=comhd0> -- contains a <bibl> containing the full, inverted author name (<author>) and the volume short title (<title>), including the date of publication in parentheses.

We still need to decide the precise form of the tags in the <text> that correspond to the C-H <attribs> group: <attauth>, <attgend>, <attgenre>, <attdate>, and <attbal> for full author name, author sex, genre of work, date of publication, and Bibliography of American Literature number. A <ref type="attribs"> containing a <bibl> is possible as a container for this information, within the <div1 type="chad">.

The end result was a parsed TEI document that could be automatically re-shaped to meet Chadwyck-Healey encoding standards.

Guide for image description : <figDesc>

Book illustrations and other figurative content will be described as to its content, for searching purposes, using the TEI <figure> tag.

Procedures for parsing, indexing, and testing completed texts when returned from the keyboarders

This process followed Etext Center practices of the time. In particular for this project, the process checked for unintentinally minimized tags during parsing. The TEI.DTD allows minimization, but this project did not allow the practice.