Are "Central Digital Repository", "The Repository", "Repo",
"Cenrepo", "Fedora" all the same thing?
Yes and no. All of the above terms basically refer to the same thing:
a digital library management and delivery system that can serve a research
library with broad, comprehensive electronic collections. "The Repository",
"Repo", and "Cenrepo" are all shorthand terms that we have used locally
at UVa. Our Repository will contain texts, images, and data from Library
collections, and faculty projects. The resources are available for discovery
and use and new uses of the collections can also be collected back into
the repository.
Fedora is the particular open-source
digital object repository management system built by UVa and Cornell that
is powering our Central Digital Repository.
What is the difference between Fedora and other commercial or open-source
Digital Library systems?
Fedora is is not a "digital library in a box;" it is a foundation
upon which many types of information management systems can be built. The
Fedora architecture (underlying system) is is based on a system for defining
objects which can include digital resources, metadata about the resources,
and links to software tools and services that have been configured to deliver
the content.. Fedora can support many variations in content, record complex
hierarchical relationships between objects, and link those objects to whatever
types of discovery and user tools best suit the institution's users. Fedora's
flexibility allows development of a custom Digital Library system at UVa
that best meets its requirements.
Most available Digital Library systems are less flexible in scope or configurability.
"Digital Asset Management" systems manage individual resources
such as image or text files. "Content Management Systems" manage
the delivery of content over the web. "Institutional Repositories"
are designed to house research data and publications directly deposited
by faculty. Fedora could serve as the underlying architecture for any of
those types of systems. The Central Digital Repository will include aspects
of all three of those types of systems.
What is UVa's digital library implementation plan?
The implementation at UVa is is being rolled out in three phases. At the
end of the project we will have a digital library made up of three major
collections for which a first version of a workflow has been established,
for which search services are operational, and which is as scalable and
sustainable as we can make it. The group of people working on phase two
is referred to as the "Fedora Implementation Team" or "Fedora Imps" for
short. Read
more about Fedora Imps.
What was Phase 1 of the implementation?
Phase 1 was the first test of the implementation of the UVa digital library
functionality and was released to the UVa Library staff in summer 2003.
Two prototype search and delivery interfaces were presented for review and
comment:
- Texts in English, including:
- Westward Exploration
- Lewis and Clark
- Early American Fiction I
- Art and architecture collections, including:
- Jefferson Country
- Barcelona
- Smithsonian Catlin American Indian paintings
- A selection of Fowler Museum objects.
Input was received on the design, functionality and usability, and suggestions
for improvements and additional functionality were collected.
What was done with the feedback from Phase 1?
We received over 130 comments on the Phase 1 Prototype fitting into four
broad categories:
- User Interface
- General User Functionality
- Image-specific Functionality
- Text-Specific Functionality.
Administrative Council, User Services Council, and the Production and Technology
Services Council prioritized the list as follows:
Necessary for Phase 2:
- A minimum set of contents;
- Simple and advanced search across all contents;
- Browsing for and within collections;
- Image browsing;
- An improved image viewer applet;
- The ability to limit text searches within a set;
- Advanced text searching.
Desirable for Phase 2:
- Consistent navigation throughout:
- More detailed help;
- The inclusion of book illustrations in the image collections;
- Advanced images searches that can be limited to a single collection;
- Search functionality enhancement, such as punctuation and phrases;
- An image layout that includes image titles;
- A single consistent layout for both art & architecture and page
images;
- And a "shopping cart" to save sets of objects.
The goal is for all of the updates and upgrades to be in place by Phase
3.
What is Phase 2?
Phase 2 will concentrate on getting the first version of three kinds of
texts with associated images into the digital library, and test end-user
tools for image manipulation. The collections are made up of sets of XML
files: prose texts marked up in TEI that have been created
by DLPS, finding aids marked up in EAD
that were created by Special Collections, and files marked up in GDMS
that describe artworks and architectural sites. Each of these collections
has images associated with them. Phase 2 includes:
- Texts that DLPS completed during 2003 (known as Q3 and Q4) plus the
Westward Exploration and Lewis and Clark texts from Phase 1.
- UVa Special Collections Library finding aids.
- The four art and architecture collections from Phase 1, plus one or
more sets of images created for architectural history faculty.
Phase 2 has a target release date of end of August. It will be used by at least one Art History professor for the fall semester.
Why isn't Geospatial or Scientific Data included in Phase 2?
As a test of digital library functionality, content was selected for Phase
2 that was not delivered elsewhere on the Library's website and exhibited
minimum variation in encoding and delivery requirements. Digital images
and recently produced electronic texts best met those requirements. Datasets
are much more complex and variable, and so require standards and delivery
applications that are still under development.
Why aren't any legacy electronic texts included in Phase 2?
As a test of digital library functionality, content was also selected for
Phase 2 that met some very recent production standards or could easily be
brought up to those standards without too much time, resources, or human
intervention. This comprises a set of texts produced only in the last half
of 2003. Legacy electronic texts, which may first need to be brought up
to the newer standards, will be targeted for Phase 3, and regularly migrated
into the Repository thereafter.
Where can I learn more about UVa Repository Standards?
Metadata Standards
Text
Standards (DTD and markup standards)
Images
Standards (Image master and delivery formats)
What is Phase 3?
The goals of phase 3 will be to formalize workflows based on the experience
of Phase 2 and to review and adjust the metadata standards, as necessary.
Phase 3 will also bring in legacy texts as well as include additional newly
produced text and image sets. Phase 3 will introduce JPEG2000
for our images in place of MrSid
and test the federation
of the digital discovery search into SIRSI Rooms.
What do people mean by a "Fedora Object"?
A representation of an item, such as:
- a book
- a finding aid
- an image representation of a site or a work of art
- a dataset
An object is represented as one or more XML
files that may contain links to other XML files or to images.
An object can also be "rules-based," such as a Collection Object,
where a search dynamically collates objects into a collection -- on-the-fly
-- based on certain strings of words being located in certain elements.
For example, we will have an "Art and Architecture" collection object
which will bring together all art and architecture images -- on demand
by the user -- based on the presence of the code, UVA-LIB-ArtArchit,
in the metadata.
On the big
picture, this symbolizes one single Fedora object representing a TEI
book.
What is a "Content Model"?
A content model describes a particular class of material in the Repository
that exhibit all the same characteristics.
Users expect different types of functionality when working with images
than when working with texts or with datasets. Different combinations
of file types and functionalities require different programs to present
materials to users in a meaningful way.
A content model packages all like materials together.
For example, with TEI
texts, we have so far defined three different content models:
- Fully transcribed texts without page images
- Fully transcribed texts with page images
- Page image-only texts without transcriptions
One certain set of programs needs to be written to deliver the texts
in case #1, a second second for case #2 and a third set for case #3. The
content model describes the package: the files that need to be delivered,
the functionality we want to achieve, and the programs that need to be
written.
The phrase "Object Model" is sometimes used interchangeably.
What is a "Datastream"?
The actual files that make up an object. These can
be XML
text files or XML metadata files or image files,
or other media formats.

In this example of a Fedora object:
- The TEI
text file is one datastream
- The Descriptive metadata file is a second datastream
- The Administrative metadata file is a third datastream
- The System metadata file is a fourth datastream
What is a "Behavior"?
A behavior is a particular functionality that the user can experience or
a particular presentation of an object to the user.
In programming parlance, the behaviors are usually described as "getSomething."
Examples include:
- getThumbnail (present a thumbnail image to a user)
- getPreview (generate and present a bibliographic citation based on an
object's descriptive metadata)
- Or something more elaborate like getPageTurner (present a full page-turning
text application to the user).
One of the most important goals is to define behaviors for large classes
of objects. We want all images to display exactly the same to the user regardless
of the different programming scripts that are actually necessary to display
a particular kind of image file. For example:
getThumbnail means the same thing for the user regardless of what kind
of image file is behind it and what kind of programming it takes to deliver
that image file to the user.
The majority of the behaviors that we have defined so far are used to render
(display) the collections to users.
What is a "Mechanism"?
A mechanism is a script or program that makes a behavior happen.
There may be a number of mechanisms that present identical outcomes to
the user, but are necessary because of the varying types and combinations
of file formats or variations in XML
encoding. To continue with the example from above (What
is a "Behavior")
getThumbnail means the same thing for the user regardless of what kind
of image file is behind it and what kind of programming it takes to deliver
that image file to the user.
- getThumbnail is the behavior we want to achieve (present a thumbnail
image to a user)
- one mechanism is the program that needs to be written to achieve getThumbnail
for a JPEG
file
- another mechanism is the program that needs to be written to achieve
getThumbnail for a TIFF
file
What is a "Disseminator"?
A disseminator is the combination of a behavior
(what functionality you want for a particular object)
and its mechanisms (the program that needs to be
written to achieve that functionality).
- getThumbnail + the program that needs to be written to display a thumbnail
of a JPEG
file -- together represent one single Disseminator
- getThumbnail + the program that needs to be written to display a thumbnail
of a TIFF
file -- together represent a second Disseminator
- getPreview + the program that needs to be written to display a bibliographic
citation for a JPEG
image file -- together represent a third Disseminator
- getPreview + the program that needs to be written to display a bibliographic
citation for a TEI
text -- together represent a fourth Disseminator
- etc.
What type of Metadata will objects have?
As shown
in this representation of a Fedora object, each object
in the Repository will have three types of metadata:
- System metadata
Information about the creation and maintenance of the Fedora object
(the date it was created; who created it; who made modifications to
the file; if the file exists in multiple versions, the version of
this particular object, etc.)
- Administrative metadata
Rights information (who can use this resource and how) and
Technical information (is the image a JPEG
or a GIF?;
is the text markup TEI
or EAD?)
- Descriptive metadata
A bibliographic citation (i.e. a cataloging record)
Descriptive metadata can be created in the first instance in a variety
of standards, i.e.
MARC
(through VIRGO or Worldcat)
TEI
(through a TEI header)
EAD
(through an EAD header)
VRA
Core (through IRIS, the Fine Arts cataloging system)
GDMS
(a locally created standard)
FGDC
(for government information)
Any vendor-supplied format
These standards will then all be mapped to what we call DescMeta (the
UVa metadata standard) for cross collection searching.
For much more on metadata, see the Metadata
Steering Group's website.
What is an "Element"?
A tag in an XML
file. An element is similar in concept to a field in a database record or
to a MARC
tag. An element contains a specific kind of information, formatted in a
consistent way.
For example:
<title>Huckleberry Finn</title>
(title element)
or
<author>
<name>Twain, Mark</name>
<dateRange from="1835" to="1910">1835-1910</dateRange>
</author>
(author, name, dateRange elements)
When we talk about "Texts" or "Images", what do we mean?
A text is a complete XML
file that marks up an electronic text or a finding aid. A text is also an
XML file that contains only metadata, for example a file of metadata
that describes an art and architecture image.
An image is any image resource file. An image can be:
- the child of an electronic text (a complete page image or a particular
chart or illustration)
- the child of a finding aid (a complete page image or a particular chart
or illustration)
- the child of an art and architecture XML metadata file (the XML file
contains the metadata, the image file is the actual JPEG
or TIFF)
What does it mean to "Ingest" something into the Repository?
The process through which an object is transformed
into a Fedora-specific format and imported into the Repository for management
and delivery.
What is a "Fedora Batch"?
A batch is a process for ingesting a class of objects
into the Repository that corresponds to a particular content
model. For Phase 2 we are building the batch templates by hand but will
ingest objects in batches or sets. For Phase 3 or later there will be a
Fedora tool to automate the batch template building process, making it easier
to ingest new types of objects.
What is "Rooms"?
Rooms is a suite of applications from Sirsi intended to improve access
to e-journals and simplify finding the right databases and searching them.
Rooms frees the library and the user from many vendor-imposed limits on
searching and using search results, although it cannot penetrate the most
restrictive proprietary barriers. Rooms also lets us more easily create
and maintain web pages for resource access. Each component has a staff interface
that should reduce the technical knowledge required to manage our complicated
collection of resources and put that job in the hands of the format and
content specialists best qualified to make decisions. Rooms has three different
components:
- Resolver
Resolver uses the OpenURL
standard to make it easier for our users to get directly to full-text.
A Knowledgebase maintains information on what journals the Library
has access to. In the administrative setup for a database, like Web
of Science, we say we have a resolver and want to use a particular
UVa image/logo for our OpenURL search. That logo then appears in all
Web of Science records. When users click on it, the database
sends a search to Resolver. If we have access to the online full text
of the article, Resolver takes them there. If we don't have access,
they go to a page that asks if they want to look it up, request ILL,
etc.
Resolver will work in a similar way with VIRGO (after a VIRGO upgrade
this spring). The UVa logo will appear automatically in VIRGO records
and clicking the logo will take users through Resolver to full-text.
Cataloging Services will no longer have to maintain hardcoded links
in individual bibliographic records.
- SingleSearch
This is metasearch, the current term for what is has been called
federated search, broadcast search, and multiple-database search.
SingleSearch is not one search to bind them all. We could set up a
search that would go to most of our 250 or so databases, but the results
would be unmanageable for the user. What it does do is allow us
to select a group of databases appropriate for a particular information
need -- a subject area, a class, or something general like news services
or encyclopedia -- and search them simultaneously, providing the user
with a similarly formatted list of results that can be sorted and
worked with. Our selection of search targets is key to making SingleSearch
work.
VIRGO and the Repository can also be searched with SingleSearch.
- Rooms
Rooms it the overall name for the system, but it is also specifically
the name for the content management system (CMS). A CMS is a system
for creating web pages from page templates and a database of resources,
but Sirsi has turned the name a little and calls Rooms a context
management system to emphasize that searching and other functions
act differently depending on the context. A search in the Physics
room goes to different databases than a search in the Philosophy room,
and the user may well get different options in dealing with the results.
Where does Rooms fit in to the Repository infrastructure?
The Repository can be federated, along with VIRGO, the Library's e-journals
and the database collection into various single searches. Subsets of the
Repository content can also be included in subject specific Rooms. For example,
the Repository's Art and Architecture collection can be federated into a
single search with art and architecture databases we license from commercial
vendors.
|