Digital Initiatives Home About the Digital Initiatives Services Research and Development Metadata Reports Ask Questions Virgo Catalog
University of Virginia
University of Virginia Library
Digital Initiatives: Reports

Repository Text Object Model Committee Recommendations

April 2, 2003

Committee members: Edward Gaynor, Matthew Gibson, Ronda Grizzle, Leslie Johnston, Greg Murray, Perry Roland, Ross Wayland (chair)

Background

In May 2003, the Library will be ready to begin populating a Fedora Repository with content, but before that can happen some basic decisions need to be made concerning the type of data object models that will be used to construct the Fedora digital objects. The two initial collections slated for inclusion in the repository will consist of the EAD Finding Aid collection and a subset of texts from the Lewis & Clark collection. Both of these collections will involve electronic texts and images. This group was tasked with constructing a set of basic object models for texts that could be used with the EAD and Lewis & Clark text collections but that could also serve as a foundation for text objects across the library. The committee acknowledges that different areas of the Library may have very different requirements in regard to presentation and delivery of textual content, but the focus of this committee was to try and find a low-level common denominator that would work across most of the Library’s electronic text collections. Specialized delivery and functionality can be achieved in the future by adding additional disseminators for those areas that require more than the basic functionality proposed in this base set of object models. New object models can also be added if it is determined that none of the existing models adequately fit the needs and functionality of the object.

Object Model Definition

The term object model or content model is used to describe the structure of a group of related objects in a Fedora repository. A Fedora object has four basic components: 1) A persistent identifier or PID that uniquely identifies the object in a given Fedora repository, 2) a set of Disseminators that define a set of behaviors the object can perform, 3) a set of descriptive and administrative metadata about the object and its content, and 4) one or more datastreams that define the content of the object. Figure 1 depicts a graphical representation of the general Fedora object model. Objects are said to “subscribe” to the same object model if they share the same basic object structure by having the same number and type of datastreams (content streams) and by having the same set of disseminators or behaviors.

Figure 1. Fedora Object Model.

Role of Object Models

Fedora does not require that objects share the same object model. In fact, one could create a different object model for every object in the repository, but doing so would provide little benefit over many of the web sites the Library is currently managing. By carefully designing object models and behavior definitions, one can leverage common functionality and delivery tools across large collections of similar objects. If the Library is to successfully manage its rapidly expanding volume of digital content, we have to consider approaches that simplify both the management and maintenance aspects of digital object creation, storage, and delivery. Defining an object model also does not mean that all objects of the same media type have to fit a single model. The goal is to carefully consider each object model to make it as generic and flexible as possible. There will undoubtedly be exceptions, but the goal is to carefully consider each exception to see if it really warrants a new object model.  One of the key features of the Fedora architecture is the ability to enable different objects to share the same behavior definition or set of behaviors. Carefully designing the behavior definition for a large class of objects can mean that a single behavior definition can be shared across multiple object models that benefits both the managing and delivery of digital objects.

Text Object Models

All electronic texts in the Central Digital Repository will be encoded as XML. It is desirable to have a single behavior definition for all electronic text that provides an abstract description of the basic behaviors for any electronic text. Such a behavior definition would need to be sufficiently generic to accommodate the wide variety of electronic text DTDs and Schemas. There would be multiple implementations (i.e., behavior mechanisms) of the General Text Behavior Definition for each different type of electronic text, but each of these implementations would share the same behavior definition. Additional disseminators can be added to the object to add additional behavior definitions in cases where the General Text Behavior Definition cannot meet the desired functionality. The General Text Behavior Definition defines eight basic behaviors for all electronic texts:

  1. getPreview – display a “preview” representation of the text that represents a bibliographic citation. This would be obtained from the descriptive metadata for the object.
  2. getTreeView(level) –  display an XML DOM node tree representation of the text down to the specified level in the text.
  3. getChunk(idref) – get a chunk of XML specified by the idref; this behavior retrieves the specified XML fragment from the text.
  4. getChunks(XPath) – get multiple chunks of XML specified by the XPath expression; this behavior may retrieve multiple XML fragments from the text.
  5. getStaticView – display static HTML view of text; the static view would be a view just of the text itself.
  6. getDynamicView – display the text in an interactive HTML form; the dynamic view would include the full set of support tools available for the particular type of text.
  7. getPrintable(format) – download a version of the file in the specified format; at present, three formats would be allowed including .pdf, .pdb, and .lib.
  8. getDeliveryMaster – download raw XML text of delivery master

Each digital object will also contain descriptive and administrative metadata about the object as a whole and about each of its content streams (datastreams). The uvaMetadata disseminator will be available on every object and will provide the capability to retrieve descriptive and administrative about the object and its content.

The General Text Object Model would look something like that depicted in figure 2. The model contains five datastreams:

  1. Static XML version of text – points to the raw XML version of the text.
  2. Static XHTML version of text – points to an XHTML version of the text or to a placeholder indicating that a static version does not exist and will be dynamically generated.
  3. Static PDF version of text – points to PDF version of the text or to a placeholder indicating that a static PDF version does not exist and will be dynamically generated.
  4. Static PDB version of text – points to PDB version of the text or to a placeholder indicating that a static PDB version does not exist and will be dynamically generated.
  5. Static LIB version of text – points to LIB version of the text or to a placeholder indicating that a static version does not exist and emails a request to have one externally generated.

Note that datastreams two through five may point to actual content or they may point to a placeholder that indicates a static version does not currently exist for that specific format. This scheme is used to provide static versions of high-demand formats that may vary widely depending on the type of text, the time of year, special events, and other factors. The goal is to only add additional content to an object when necessary and to use placeholders as surrogates when the requested format can be generated dynamically. Generating special formats dynamically is slower than having a static version available, but conserves storage and reduces preparation time by only generating static versions when user demand warrants. If user demand wanes over time, those static versions could also be removed from the object to conserve disk space. The LIB format cannot be generated dynamically and must be performed as an external process. In this case, the placeholder would inform the user that currently the LIB format version is unavailable and could spawn an email or some other notification process requesting that a LIB version be created and the LIB datastream updated in the object.

Figure 2. General Text Object Model.

Digital Initiatives
University of Virginia
PO Box 400112
Charlottesville, VA 22904-4112

Digital Initiatives Home • UVa Library Home
Search the Library Site • UVa Home
Maintained by: dl@virginia.edu
Last Modified: Monday, June 02, 2008
© The Rector and Visitors of the University of Virginia