Digital Initiatives Home About the Digital Initiatives Services Research and Development Metadata Reports Ask Questions Virgo Catalog
University of Virginia
University of Virginia Library
Digital Initiatives: Reports

UVa Library Internal Production Digitization Standards

Metadata | Images (raster) | Images (vector) | Electronic Texts | Audio | Video
Statistical Data | Spatial Data (raster) | Spatial Data (vector)

This document is also available as a PDF (220K) for easy printing and reference.

Definitions for this Document

  • Access Quality Collection Master:  An uncompressed and uncorrected raw digitized file, digitized to a level that supports the minimum standards for generating quality deliverable files, but not to a higher preservation standard.
  • Bit-depth:  The number of colors/level of grayscale captured in the digitization.
  • Compression:  When a file's size is reduced to save storage space.  Much compression is "lossy," which results in permanent loss of some of the data captured at the point of digitization.  Master files for all formats but video are not compressed.
  • Delivery Master:  A color-corrected, cropped, edited, or otherwise altered version of the original digital master that will be used to generate all deliverable files.
  • Preservation Quality Collection Master:  An uncompressed and uncorrected raw digitized file, digitized to a very high quality to support preservation.
  • Preview/Thumbnail: Highly reduced in quality and size or duration, functions as an identifier; has little or no output or editing value.
  • Resolution:  The density of pixels captured in the digitization of an image, or the sample rate at which audio and video are digitized.
  • Service/Deliverable: Output quality is reduced, to support efficient delivery to users.  Multiple files may be created at this level for multiple purposes or user communities.  When Service and Deliverable qualities are outlined separately, the distinction resides in the assumption that Service versions are editable by the user, but Deliverable versions are not.

A more extensive glossary with links is available at: http://www.lib.virginia.edu/digital/reports/dl_terminology_uva.htm

Metadata

The Library's decisions about metadata standards are available in full on the following web pages:

Repository XML Encoding Guidelines

General Descriptive Modeling Scheme (GDMS)
http://www.lib.virginia.edu/digital/metadata/gdms.html

UVa Metadata Descriptive Elements (UVa DescMeta) http://www.lib.virginia.edu/digital/metadata/descriptive.html

UVa Metadata Administrative Elements (UVa AdminMeta)
http://www.lib.virginia.edu/digital/metadata/administrative.html

Mappings Between XML Standards

Metadata Decisions and Best Practices

Images – Bitmap/Raster

Capture: Access Quality

 

 

 

 

Type of Original

 

Bit depth

 

Resolution

 

Compression

File Format

 

Books (text pages)

Bitonal (4-bit black and white)

400 ppi

CCITT Group 4 Fax Compression

TIFF

Books (illustrations or figures)

8-bit (grayscale) or 24-bit (color)

400 ppi

Uncompressed

TIFF

Slides (35mm)

24-bit (color)

300ppi @ 900% (2700ppi)

Uncompressed

TIFF

Oversized items (large books, maps, etc.)

24-bit (color)

400 ppi

Uncompressed

TIFF

Project Specific (dictated by desired use)

1-bit, 8-bit, or 24-bit, as appropriate

Resolution should ensure a minimum capture size of 3000 pixels on long side

Uncompressed

TIFF

Capture: Preservation Quality

 

 

 

 

Type of Original

 

Bit depth

 

Resolution

 

Compression

File Format

 

Books (text pages)

24-bit (color)

600 ppi

 

Uncompressed

TIFF

Books (illustrations or figures)

8-bit (grayscale) or 24-bit (color)

600 ppi

Uncompressed

TIFF

Slides (35mm)

24-bit (color)

600ppi @ 900% (5400ppi)

Uncompressed

TIFF

Oversized items (large books, maps, etc.)

24-bit (color)

600 ppi unless hardware constraints limit capture to 400 ppi

Uncompressed

TIFF

Deliverables

 

 

 

 

Purpose

Resolution

Compression

File Format

Thumbnail

4-bit (black and white), 8-bit (grayscale) or 24-bit (color)

120 pixels on the longest side

JPEG is automatically compressed, select High or level 10 compression

JPEG for color or grayscale images; GIF for bitonal images.

Screen-sized

4-bit (black and white), 8-bit (grayscale) or 24-bit (color)

1024 x 768 pixels; or 650-850 pixel width with proportional height (as appropriate) for page images

As above

JPEG for color or grayscale images; 4-bit GIF for bitonal images.

Maximum

 

As appropriate per project

0-800 pixels: 2 levels.
800-1600 pixels: 3 levels.
1600-3200 pixels: 4 levels.
3200-7000: 5 levels.
7000-10,000 pixels: 6 levels
10,000-15,000 pixels: 7 levels
15,000-20,000 pixels: 8 levels
20,000-25,000 pixels: 9 levels
Above 25,000 pixels: 10 levels

Mr. Sid; JPEG2000 is planned

Illustrations/Graphs/Charts - Vector

Creation

 

 

 

 

 

Purpose

Format

Compression

Bit depth

Resolution

Comments

Master copy

EPS, SVG, proprietary formats, e.g. Adobe Illustrator

NA

NA

NA

Include color reference whenever appropriate and feasible

Deliverables

 

 

 

 

 

Purpose

Format

Compression

Bit depth

Resolution

Comments

Deliverable

EPS,SVG, SWF, JPEG

NA

24-bit color, 8-bit grayscale

Appropriate for display of necessary information; 300 ppi if readable printing must be supported

Vector images may be retained in their original format or converted to bitmap/raster formats for delivery; use the chart above as a reference.

Thumbnail

JPEG

JPEG is automatically compressed, select High or level 10 compression

24-bit color; 8-bit grayscale

120 pixels on the longest side

72 ppi

 

Electronic Texts

Capture

 

 

 

Purpose

Description

Format

Standard

Structured Text Transcription

A literal transcription of the text, encoded in XML.

XML

TEI P4, with local modifications; follow the DTD available at: http://www.lib.virginia.edu/digital/reports/teiPractices/dlpsPractices_postkb.html

Unstructured Text Transcription

Plain text that may include minimal structural or formatting information.

HTML, ASCII text, e.g. OCR output

 

Archival Findings Aids

Marked-up collection finding aids.

XML

EAD 2002; follow the guidelines at http://www.lib.virginia.edu/vhp/admin.html and the DTD at http://text.lib.virginia.edu/bin/dtd/eadVIVA/eadVIVA.dtd

Page Images

If the TEI or EAD will include references to page images, select the capture specifications from the Image Table above.

As appropriate from above options

 

Image Metadata

Descriptive and technical metadata for images.  Images must meet technical standards described in the Image Table above.

XML

GDMS; follow the guidelines and DTD at http://www.lib.virginia.edu/digital/metadata/gdms.html

Deliverables

 

 

 

Purpose

Description

Format

Standard

Structured Text Transcription

Marked-up to reflect the content and the structure of the original document. 

XML

TEI, EAD, or GDMS, as documented above.

Unstructured Text Transcription

Plain text that may include minimal structural or formatting information.

HTML, ASCII text, e.g. OCR output

  Not delivered through the UVa Library; requires conversion to TEI, EAD, or GDMS.

Page Image Deliverable(s)

If the electronic text is a transcription with dependent page image deliverables, select the deliverable specifications from the Image Table above. 

As appropriate from above options

 

Audio

Creation

 

 

 

Purpose

Format

Resolution & Sample rate

Description

Master

Broadcast WAV

44.1 kHz, 16 bits per sample

Maintain channel pattern of original, e.g. stereo, mono, and multi-channel.

Deliverables

 

 

 

Purpose

Format

Resolution & Sample rate

Description

Service

MPEG 1/2 Layer 3 (.mp3); MPEG 4/AAC

Appropriate to type and quality or original

Maintain channel pattern where practical.

Deliverable

MPEG 1/2 Layer 3 (.mp3);; MPEG 4/AAC

Appropriate to delivery needs and conditions

 

Preview

MPEG 1/2 Layer 3 (.mp3);;

 

Reduce duration to create a representative sample: a "clip"

Video

Creation

 

 

 

Purpose

Format

Compression

Description

Master

NTSC DV, DV-Cam tape, Beta-SP

DV

Media should be stored in an environmentally stable location

Deliverables

 

 

 

Purpose

Format

Compression

Description

Service

Select as appropriate for use

Appropriate to format; and use

Service, i.e. editable, versions produced as required by "dubbing"; implies change of storage medium and/or format.  Very large file sizes; not network distributable.

Deliverable

MPEG1, MPEG2, MPEG4

Appropriate to format and use

Only highly compressed forms, network distributable.

Preview

MPEG4

Appropriate to format and use

Reduce duration to create a representative sample: a "clip."

Thumbnail

120 pixels on the longest side, JPEG

JPEG is automatically compressed, select High or level 10 compression

Representative frame:  indication of content.

Statistical/Numeric Data

Purpose

Format

Comments

Master copy

ASCII columnar format

SPSS, STATA, SAS program code and/or machine readable text based documentation to define data for analysis

ASCII delimited preferred

DDI standard metadata preferred documentation format

Following the ICPSR standard for data archiving and preservation.

Service

Data stored in some statistical package format (SAS, SPSS, STATA) or in queryable SQL database system

Storage for access, retrieval, or extraction.

Deliverable

SAS, STATA, SPSS, Excel or delimited ASCII format with data map or variable list.

Excel not advised for very large files.  All users get documentation built from DDI records.

Preview

Screen dump of 5% of records, no more than 100

Practice not currently in place.

Spatial Data - Vector

Purpose

Format

Comments

Master copy

ASCII-based exchange format such as SDTS,  Arc Exchange (.e00), ArcGenerate (.gen), or delimited text.

Note that two of these are tied to proprietary software formats and are not available for all data models. SDTS is available but rarely used in data distribution.

Service

Industry standard formats such as ESRI shape (.shp) or ArcInfo Coverage model, or CAD format such as Microstation (.dgn) or AutoCAD (.dgw).  Possible storage in SQL based system through proprietary middleware (ArcSDE, Oracle Spatial)

Note that ESRI’s shapefile model consists of several related files.  The ArcInfo Coverage model is directory-based.  RDBMS models are still relatively new.

Deliverable

Industry standard formats such as ESRI shape, Arc Exchange, or CAD formats.

 

Preview

GIF, JPG or other raster image format.

Preview graphics need to be large enough to convey the general “look” of the data.

Spatial Data - Raster

Purpose

Format

Comments

Master copy

Photography or remote sensing imagery:

Non compressed TIF+world file or GeoTIFF (preferred), BIL, IMG (Erdas Imagine)

Also applicable for geo-referenced maps. GeoTIFF retains geographic information in TIFF header; world file does same as separate file.

Non-image raster data:

ASCII based storage and exchange format (Arc Exchange .e00; ArcGenerate .gen; Spatial Data Transfer Standard (SDTS))

SDTS is federal standard, but not widely adopted in commercial industry or government; format is cumbersome for further processing.

Service

Photography or remote sensing imagery:

GeoTIFF, BIL, IMG, SID + world file

 

Non-image raster data:

ArcExchange; GeoTIFF; native data formats (.cdo); native software data models (ArcGRID)

Users will almost always need to process stored data.  Tiffs can store pixel value as color value and be converted in GIS software;  native data formats are common in federal data.  GRID data model is directory, not file-based but could be stored for access purposes.

Deliverable

Photography or remote sensing imagery:

GeoTIFF, BIL, IMG, SID+world file, JPG+world file

 

Non-image raster data:

Arc Exchange, native formats or models, GeoTIFF

Preview

JPG, GIF, or SID

Sizes may need to be slightly larger than those outlined for other types of images

 

Digital Initiatives
University of Virginia
PO Box 400112
Charlottesville, VA 22904-4112

Digital Initiatives Home • UVa Library Home
Search the Library Site • UVa Home
Maintained by: dl@virginia.edu
Last Modified: Monday, June 02, 2008
© The Rector and Visitors of the University of Virginia