Digital Initiatives Home About the Digital Initiatives Services Research and Development Metadata Reports Ask Questions Virgo Catalog
University of Virginia
University of Virginia Library
Digital Initiatives: Reports

Directory Structures for Ingestion of Texts and Images into the Central Digital Repository

March 10, 2004

This proposed directory structure which is based on content models. We plan to name all known content models and make them part of the directory structure so it is clearer to everyone what constitutes a content model and how the directory structure can vary based on content model being used. This structure is meant to facilitate writing the batch ingest scripts which are based on content models.

Known Content Models

Texts
uvaGenText - transcriptions only
uvaBook - page images and transcriptions
uvaPageBook - page images only, no transcription

Directory structures for text content models are:

uvaGenText
admin (uvaAdminMeta derived from TEI header and other soruces)
dc (Dublin Core derived from uvaDescMeta)
desc (uvaDescMeta derived from TEI header)
tei (TEI transcription)

uvaBook
admin (uvaAdminMeta derived from TEI header and other soruces)
dc (Dublin Core derived from uvaDescMeta)
desc (uvaDescMeta derived from TEI header)
tei (TEI transcription)

uvaPageBook
admin (uvaAdminMeta derived from TEI header and other soruces)
dc (Dublin Core derived from uvaDescMeta)
desc (uvaDescMeta derived from TEI header)
tei (skeleton TEI that provides page structure)

Images
uvaBitonal - bitonal tiff
uvaHighRes - jpeg preview, jpeg screen, sid max
uvaLowRes - jpeg preview, jpeg screen, no physical max; max just points to screen size

Directory structures for image content models are:

uvaBitonal
admin (uvaAdminMeta derived(eventually) from TIFF/JPEG header); from meta file for now)
dc (Dubin Core derived from uvaDescMeta)
desc (uvaDescMeta derived from parent TEI header)
tiff (bitonal tiff)

uvaHighRes
admin (uvaAdminMeta derived(eventually) from TIFF/JPEG header); from meta file for now)
dc (Dubin Core derived from uvaDescMeta)
desc (uvaDescMeta derived from parent TEI header)
preview (jpeg preview)
screen (jpeg screen)
max (MrSID max)

uvaLowRes
admin (uvaAdminMeta derived(eventually) from TIFF/JPEG header); from meta file for now)
dc (Dubin Core derived from uvaDescMeta)
desc (uvaDescMeta derived from parent TEI header)
preview (jpeg preview)
screen (jpeg screen)

The high-level directory structure for texts will differ from that of images since each text has a set of images associated with it.
The entity that defines this set is the text id (book-id for TEI, gdms-id for GDMS texts, findingAid-id for EAD).

Example: The Lewis and Clark Project - lc

text/lc
text/lc/uvaBook
text/lc/uvaBook/admin/*.xml
text/lc/uvaBook/dc/*.xml
text/lc/uvaBook/desc/*.xml
text/lc/uvaBook/tei/*.xml (all TEI files in project lc)

* - indicates that all filenames will have the same prefix across the different directories. e.g., tei/b000023449.xml is the TEI text file, desc/b000023449.xml is the uvaDescMeta file, admin/b000023449.xml is the uvaAdminMeta file, dc/b000023449.xml is the Dublin Core file, etc.

text/lc
text/lc/uvaPageBook
text/lc/uvaPageBook/admin/*.xml
text/lc/uvaPageBook/dc/*.xml
text/lc/uvaPageBook/desc/*.xml
text/lc/uvaPageBook/tei/*.xml

text/lc
text/lc/uvaGenText
text/lc/uvaGenText/admin/*.xml
text/lc/uvaGenText/dc/*.xml
text/lc/uvaGenText/desc/*.xml
text/lc/uvaGenText/tei/*.xml

image/lc/b000023449
image/lc/b000023449/uvaBitonal
image/lc/b000023449/uvaBitonal/admin/*.xml
image/lc/b000023449/uvaBitonal/dc/*.xml
image/lc/b000023449/uvaBitonal/desc/*.xml
image/lc/b000023449/uvaBitonal/tiff/*.tif

image/lc/b000023449
image/lc/b000023449/uvaHighRes
image/lc/b000023449/uvaHighRes/admin/*.xml
image/lc/b000023449/uvaHighRes/dc/*.xml
image/lc/b000023449/uvaHighRes/desc/*.xml
image/lc/b000023449/uvaHighRes/preview/*.jpg
image/lc/b000023449/uvaHighRes/screen/*.jpg
image/lc/b000023449/uvaHighRes/max/*.sid

image/lc/b000023449
image/lc/b000023449/uvaLowRes
image/lc/b000023449/uvaLowRes/admin/*.xml
image/lc/b000023449/uvaLowRes/dc/*.xml
image/lc/b000023449/uvaLowRes/desc/*.xml
image/lc/b000023449/uvaLowRes/preview/*.jpg
image/lc/b000023449/uvaLowRes/screen/*.jpg

Digital Initiatives
University of Virginia
PO Box 400112
Charlottesville, VA 22904-4112

Digital Initiatives Home • UVa Library Home
Search the Library Site • UVa Home
Maintained by: dl@virginia.edu
Last Modified: Monday, June 02, 2008
© The Rector and Visitors of the University of Virginia