Digital Initiatives Home About the Digital Initiatives Services Research and Development Metadata Reports Ask Questions Virgo Catalog
University of Virginia
University of Virginia Library
Digital Initiatives: Reports

University of Virginia Community Digitization Guidelines

Before You Begin | Storage | Definitions | Images (raster) | Images (vector) | Electronic Texts | Audio | Video |
Statistical Data | Spatial Data (raster) | Spatial Data (vector) | Describing Your Resources | Getting Help

This document is also available as a PDF (500K) for easy printing and reference.

This document offers guidance and minimum recommendations that are in line with the UVa Library's current practice for faculty who are planning digitization projects. Inherent or unique characteristics of different source materials necessitate different approaches to scanning and conversion requirements for digital projects should be considered on a case-by-case basis (particularly for grant projects with specific requirements).

These guidelines have been developed in order to:

  1. Increase the interoperability and accessibility of digital collections across UVa through the use of accepted standards and formats
  2. Ensure a consistent, high level of quality across collections
  3. Decrease the likelihood of rescanning in the future by promoting best practices for conversion of materials into digital format and the long-term preservation of these digital resources.

Because technology and industry standards are constantly improving and changing, we view this as a continually evolving document.


Before You Begin Digitizing

Before you digitize anything, take some time to consider your needs. The worst possible outcome is to spend time digitizing materials that end up being inappropriate for the goals of your project. To avoid this scenario, consider a number of issues ahead of time.

  • For what purposes will the materials be used?
  • What level of media quality is necessary to achieve your goals for the project?
  • Who needs to have access to your digital media? Does access need to be limited to certain groups? Do different groups needs different types of access?
  • What options do you have for delivering the materials?
  • Who owns the copyright to the materials you are digitizing?
  • What options, both short-term and long-term, are available to you for storing your digitized media files?

Storage

Storage options for your digitized media should be considered before you begin digitizing. Storage space needs vary significantly, depending on file formats and the quality of media desired. Backup policies should always be implemented.

There are numerous solutions for storing media. Hard drives and CD or DVD offer local but limited storage space for many media types. For storage of larger media like raw digital audio and video, you might consider an external hard drive. The Firewire standard allows for faster access to these drives. MiniDV/DVcam is a commonly used medium for storing for video.


Definitions

  • Access Quality Collection Master: An uncompressed and uncorrected raw digitized file, digitized to a level that supports the minimum standards for generating quality deliverable files, but not to a higher preservation standard.
  • Bit-depth: The number of colors/level of grayscale captured in the digitization.
  • Compression: When a file's size is reduced to save storage space. Much compression is "lossy," which results in permanent loss of some of the data captured at the point of digitization. Master files for all formats but video are not compressed.
  • Delivery Master: A color-corrected, cropped, edited, or otherwise altered version of the original digital master that will be used to generate all deliverable files.
  • File Format: The format in which a file is saved, such as .doc for Word, .jpg for images, or .wav for audio. Different formats can be used by different applications for different purposes.
  • Metadata: Descriptive information about the content and format of the files.
  • Pixel: The "dots" that make up an image, measured in resolution and the height and width of an image.
  • Preservation Quality Collection Master: An uncompressed and uncorrected raw digitized file, digitized to a very high quality to support preservation.
  • Preview/Thumbnail: Highly reduced in quality and size or duration, functions as an identifier; has little or no output or editing value.
  • Resolution: The density of pixels captured in the digitization of an image, or the sample rate at which audio and video are digitized.
  • Service/Deliverable: Output quality is reduced, to support efficient delivery to users. Multiple files may be created at this level for multiple purposes or user communities. When Service and Deliverable qualities are outlined separately, the distinction resides in the assumption that Service versions are editable by the user, but Deliverable versions are not.
  • XML: A structured method for encoding content and metadata in a text file. Standards for XML encoding include METS (Metadata Encoding and Transmission Standard), EAD (Encoded Archival Description), TEI (Text Encoding Initiative), and GDMS (Generalized Descriptive Modeling Scheme).

For a more extensive glossary and links, visit: http://www.lib.virginia.edu/digital/reports/dl_terminology_uva.htm


Images - Bitmap/Raster

Capture: Access Quality

 

 

 

 

Type of Original

 

Bit depth

 

Resolution

 

Compression

File Format

 

Books (text pages)

Bitonal (4-bit black and white)

400 ppi

CCITT Group 4 Fax Compression

TIFF

Books (illustrations or figures)

8-bit (grayscale) or 24-bit (color)

400 ppi

Uncompressed

TIFF

Slides (35mm)

24-bit (color)

300ppi @ 900% (2700ppi)

Uncompressed

TIFF

Oversized items (large books, maps, etc.)

24-bit (color)

400 ppi

Uncompressed

TIFF

Project Specific (dictated by desired use)

1-bit, 8-bit, or 24-bit, as appropriate

Resolution should ensure a minimum capture size of 3000 pixels on long side

Uncompressed

TIFF

Capture: Preservation Quality

 

 

 

 

Type of Original

Bit depth

Resolution

Compression

File Format

Books (text pages)

24-bit (color)

600 ppi

 

Uncompressed

TIFF

Books (illustrations or figures)

8-bit (grayscale) or 24-bit (color)

600 ppi

Uncompressed

TIFF

Slides (35mm)

24-bit (color)

600ppi @ 900% (5400ppi)

Uncompressed

TIFF

Oversized items (large books, maps, etc.)

24-bit (color)

400 ppi

Uncompressed

TIFF

Deliverables

 

 

 

 

Purpose

Resolution

Compression

File Format

Thumbnail

 

120 pixels on the longest side

JPEG is automatically compressed, select High or level 10 compression

JPEG

Screen-sized

 

1024 x 768 pixels; or 650-850 pixel width with proportional height (as appropriate) for page images

As above

JPEG

Maximum (create only as needed)

 

3000 pixels on the longest side

As above

JPEG


Illustrations/Graphs/Charts - Vector

Creation

 

 

 

 

 

Purpose

Format

Compression

Bit depth

Resolution

Comments

Master copy

EPS, SVG, proprietary formats, e.g. Adobe Illustrator

NA

NA

NA

Include color reference whenever appropriate and feasible

Deliverables

 

 

 

 

 

Purpose

Format

Compression

Bit depth

Resolution

Comments

Deliverable

EPS,SVG, SWF, JPEG

NA

24-bit color, 8-bit grayscale

Appropriate for display of necessary information; 300 ppi if readable printing must be supported

Vector images may be retained in their original format or converted to bitmap/raster formats for delivery; use the chart above as a reference.

Thumbnail

JPEG

JPEG is automatically compressed, select High or level 10 compression

24-bit color; 8-bit grayscale

120 pixels on the longest side

72 ppi

 


Electronic Texts

Capture

 

 

 

Purpose

Description

Format

Standard

Structured Text Transcription

A literal transcription of the text, encoded in XML.  Requires additional files and specialized server software to deliver, especially if searching is desired.

XML

TEI P4 (Text Encoding Initiative), with local modifications; follow the DTD available at: http://www.lib.virginia.edu/digital/reports/teiPractices/dlpsPractices_postkb.html

Unstructured Text Transcription

Plain text that may include minimal structural or formatting information.

XHTML, ASCII text, e.g. OCR output

 

Page Images

If the text will include references to page images, select the capture specifications from the Image Table above.

As appropriate from above options

 

Toolkit

PDF texts for use in Toolkit.

PDF

If PDFs are needed, please contact Instructional Scanning for assistance.

http://lib.virginia.edu/leo/iss.html

Deliverables

 

 

 

Purpose

Description

Format

Standard

Structured Text Transcription

Marked-up to reflect the content and the structure of the original document. 

XML

TEI, as documented above.

Unstructured Text Transcription

Plain text that may include minimal structural or formatting information.

XHTML, ASCII text, e.g. OCR output

 

Page Image Deliverable(s)

If the electronic text is a transcription with dependent page image deliverables, select the deliverable specifications from the Image Table above. 

As appropriate from above options

 


Audio

Creation

 

 

 

Purpose

Format

Resolution & Sample rate

Description

Master

Broadcast WAV

44.1 kHz, 16 bits per sample

Maintain channel pattern of original, e.g. stereo, mono, and multi-channel.

Deliverables

 

 

 

Purpose

Format

Resolution & Sample rate

Description

Service

MPEG 1/2 Layer 3 (.mp3); MPEG 4/AAC

Appropriate to type and quality or original

Maintain channel pattern where practical.

Deliverable

MPEG 1/2 Layer 3 (.mp3);; MPEG 4/AAC

Appropriate to delivery needs and conditions

 

Preview

MPEG 1/2 Layer 3 (.mp3);;

 

Reduce duration to create a representative sample: a "clip"


Video

Creation

 

 

 

Purpose

Format

Compression

Description

Master

NTSC DV, DV-Cam tape, Beta-SP

DV

Media should be stored in an environmentally stable location

Deliverables

 

 

 

Purpose

Format

Compression

Description

Service

Select as appropriate for use

Appropriate to format; and use

Service, i.e. editable, versions produced as required by "dubbing"; implies change of storage medium and/or format.  Very large file sizes; not network distributable.

Deliverable

MPEG1, MPEG2, MPEG4

Appropriate to format and use

Only highly compressed forms, network distributable.

Preview

MPEG4

Appropriate to format and use

Reduce duration to create a representative sample: a "clip."

Thumbnail

120 pixels on the longest side, JPEG

JPEG is automatically compressed, select High or level 10 compression

Representative frame:  indication of content.


Statistical/Numeric Data

Purpose

Format

Comments

Master copy

ASCII columnar format

SPSS, STATA, SAS program code and/or machine readable text based documentation to define data for analysis

ASCII delimited preferred

DDI standard metadata preferred documentation format

Following the ICPSR standard for data archiving and preservation.

Service

Data stored in some statistical package format (SAS, SPSS, STATA) or in queryable SQL database system

Storage for access, retrieval, or extraction.

Deliverable

SAS, STATA, SPSS, Excel or delimited ASCII format with data map or variable list.

Excel not advised for very large files.  All users get documentation built from DDI records.

Preview

Screen dump of 5% of records, no more than 100

Practice not currently in place.


Spatial Data - Raster

Purpose

Format

Comments

Master copy

Photography or remote sensing imagery:

Non compressed TIF+world file or GeoTIFF (preferred), BIL, IMG (Erdas Imagine)

Also applicable for geo-referenced maps. GeoTIFF retains geographic information in TIFF header; world file does same as separate file.

Non-image raster data:

ASCII based storage and exchange format (Arc Exchange .e00; ArcGenerate .gen; Spatial Data Transfer Standard (SDTS))

SDTS is federal standard, but not widely adopted in commercial industry or government; format is cumbersome for further processing.

Service

Photography or remote sensing imagery:

GeoTIFF, BIL, IMG, SID + world file

 

Non-image raster data:

ArcExchange; GeoTIFF; native data formats (.cdo); native software data models (ArcGRID)

Users will almost always need to process stored data.  Tiffs can store pixel value as color value and be converted in GIS software;  native data formats are common in federal data.  GRID data model is directory, not file-based but could be stored for access purposes.

Deliverable

Photography or remote sensing imagery:

GeoTIFF, BIL, IMG, SID+world file, JPG+world file

 

Non-image raster data:

Arc Exchange, native formats or models, GeoTIFF

Preview

JPG, GIF, or SID

Sizes may need to be slightly larger than those outlined for other types of images


Spatial Data - Vector

Purpose

Format

Comments

Master copy

ASCII-based exchange format such as SDTS,  Arc Exchange (.e00), ArcGenerate (.gen), or delimited text.

Note that two of these are tied to proprietary software formats and are not available for all data models. SDTS is available but rarely used in data distribution.

Service

Industry standard formats such as ESRI shape (.shp) or ArcInfo Coverage model, or CAD format such as Microstation (.dgn) or AutoCAD (.dgw).  Possible storage in SQL based system through proprietary middleware (ArcSDE, Oracle Spatial)

Note that ESRI’s shapefile model consists of several related files.  The ArcInfo Coverage model is directory-based.  RDBMS models are still relatively new.

Deliverable

Industry standard formats such as ESRI shape, Arc Exchange, or CAD formats.

 

Preview

GIF, JPG or other raster image format.

Preview graphics need to be large enough to convey the general “look” of the data.


Describing Your Digital Resources

The Library suggests a minimum list of categories of information that you should use to describe the content of your resources as well as the nature of the digital files themselves. We recommend that you send email to <lib-metadata-help@virgina.edu> at the start of your project. A Librarian with your area or subject expertise will be happy to work with you in setting up a process and identifying appropriate descriptive terminology.

The guidelines that follow outline the type of descriptive information that we recommend you collect and give you some basics for structuring that data. For assistance with creating a database or choosing a metadata format to encode your descriptions, please feel free to contact send email to <lib-metadata-help@virgina.edu>.

There is important descriptive information to be gathered both about the intellectual content of the resource and about the digital creation. These elements are outlined below. Some fields are strongly recommended, some are required, and others are optional. In order for the Library to take ownership of your resource and/or commit to digital preservation, we ask that you consider all of the fields for describing the intellectual content or the digital resource. The absolutely required fields are marked with asterisks. Please document your practices and standards and be prepared to include that documentation with any data files you deliver to the Library.

The Notes in the third column refer to the notes available online at http://www.lib.virginia.edu/digital/metadata/communityguidelines.html.

Describing the Intellectual Content

*Title

The actual title of the content of the resource, or a brief descriptive phrase.

Notes

*Agent

The name(s) of individuals or organizations that bear some important relationship to the content.

At least one agent of some sort is required. Agents have types (creator, publisher, contributor) and one of these types is also required to be specified in the data.

Notes

*Date

Date or date range associated with the creation of the content.

Notes

Place

A physical location associated with the creation of the content (i.e. the place of publication or the location of a building or of a painting).

Notes

Physical Description

The extent of the resource (number of pages of the print book), physical dimensions (for paintings or sculpture), the medium (bronze, oil), etc.

Notes

*Content Type

The nature of the content being described.

Notes

Describing the Digital Resource

*Identifier

A name/code for each resource that is unique within your database.

Notes

*Access Rights

The level of access that a member of the UVa community or the general public can have to this resource.

Notes

Agent

The name(s) of individuals or organizations that bear some important relationship to the digital resource.

Notes

*Resource Type

The type of digital object being described

Notes

*Date

The date the digital file was created.

Notes

Optional Elements

Culture

A culture of origin or context for a given resource.

Notes

Style

A style or period associated with the content.

Notes

Description

Descriptive text, notes, remarks, or comments about the resource.

Notes

Language

The language(s) of the intellectual content of the resource

Notes

Subject/Keywords

Topic of the resource. Typically the subject will be expressed as keywords or phrases that describe the subject content of the resource, or terms related to significant associations of people, events, or other contextual information.

Notes

Place coverage

A physical location represented by the content (i.e. the geographic subject of a book or the representation of a place within a painting).

Notes

Date coverage

Date or date range represented by the content (i.e. the temporal subject of a book).

Notes

Relationships

Used to relate two metadata records together, i.e. items in a set, issues of a newspaper, a painting located within a Church.

Notes

Mimetype

A standard for the formatting of files so that they can be sent over the Internet.

Notes


Where to Get More Help

Digital Media Lab
Clemons Library, 3rd Floor
Judy Thomas, jthomas@virginia.edu
Jama Coartney, jama@virginia.edu
http://lib.virginia.edu/clemons/RMC/dml.html

Digital Scholarship Services

Scholars' Lab
Alderman Library, 4th Floor
Donna Tolson, dtolson@virginia.edu
http://www.lib.virginia.edu/scholarslab/

Rare Materials Digital Services
Small Library, 2nd Floor
Bradley Daigle, bjd2b@virginia.edu
http://www.lib.virginia.edu/rmds/

Fiske Kimball Fine Arts Library
Campbell Hall
Liz Gushee, egushee@virginia.edu
http://www.lib.virginia.edu/fine-arts/collections/visual_res.html

Instructional Scanning Services
Alderman Library, 3rd Floor
Mitch Farish, ISS Coordinator, lib-iss@virginia.edu
http://lib.virginia.edu/leo/iss.html

Charles L. Brown Science & Engineering Library Research Computing Lab
Clark Hall
Andrew Sallans, sallans@virginia.edu
http://www.lib.virginia.edu/science/rescomp/

Developed by the UVa Library - October 8, 2004

Digital Initiatives
University of Virginia
PO Box 400112
Charlottesville, VA 22904-4112

Digital Initiatives Home • UVa Library Home
Search the Library Site • UVa Home
Maintained by: dl@virginia.edu
Last Modified: Monday, June 02, 2008
© The Rector and Visitors of the University of Virginia