Digital Initiatives Home About the Digital Initiatives Services Research and Development Metadata Reports Ask Questions Virgo Catalog
University of Virginia
University of Virginia Library
Digital Initiatives: Reports

Repository Text Functionality Group Report

October 10, 2002

Committee members: Leslie Johnston, Chair, Melinda Baumann, Matt Gibson, Karen Marshall, Chris Ruotolo, Scott Silet, Kendon Stubbs

Goal

Discussion and documentation of the required level of functionality for searching, display of results sets, and the generic stylesheets for texts delivered through the repository that have no overriding stylesheets.  These decisions will be informed by the current etext search and style implementation as well as any limitations of the search engine.

Issues

  • Simple search defaults
  • Advanced search categories (element and region limiters) and search type limiters (proximity, etc.)
  • Display of results sets in text-only searches and searches that return multiple format types (texts, images, datasets)
  • What is delivered when a result is selected

Recommendations/specification will be drawn up for the discovery of texts in a cross-format/collection simple search, a text-only search, and an advanced text-only search, how results should be returned, and how selected texts should be displayed.


Recommendations - Cross-Format and Cross-Collection Simple Search

The proposed functionality for the discovery of texts in the simple search across all formats and collections is as follows:

  • The user must be able to locate known items using a phrase, such as an author or a title.
  • The keyword searching should be a full-text search and not limited only to metadata, but this may not be possible in the short term and we may have no choice but to implement metadata-only searching.
  • The simple search must also be limitable by format, such as text or image.
  • The user must be able to specify how the results should be sorted, such as by relevance, format, maker, title, etc.
  • The default for results should be the whole item (such as a complete text, or site/object).
  • Once the results set has been returned, the user should be able to perform a narrowing search within those search results.  As with the initial search, the user should be able to limit by format, declare a sorting option, AND set whether the results should be the whole item OR a hit in context (a text region or individual images for a site/object) if this search is based on full-text search functionality (if this search is metadata only, then only a whole work can be returned).
  • Results for any search must be weighted; whether by number of hits per object or by a more complex algorithm is to be determined in the implementation.

There is no need for a separate simple text-only search if the general simple search is limitable by format.


Recommendations - Browsing

To get at a digital work in full, most searching takes place from outside search engines (Yahoo! and Google are the most common referrers to etext) or via Virgo (although we have no stats on this use).  Etext supports the access to works in full through their browse pages, which are dynamically generated via a "Make Browse" program created by Sue Munson.  Some sort of dynamic browsing capability should be supported by the repository, such as generating full listings arranged by author, date, title, etc.


Recommendations - Advanced Search

The advanced search must include the following features:

  • Keyword or phrase box.
  • Constraint by author name.
  • Constraint by title.
  • Date limiter.
  • Region limiter (varies by collection and markup).
  • Proximity settings (near, not near, followed by, not followed by, and choice of character or word distance [byte offset amount is to be determined], depending upon search engine possibilities).
  • Ability to perform compound searches.
  • Option for number of results per page.
  • Option to display results as grouped lists of all matches or by work with a notation of the number of matches within that work (link goes to a list of all matches for that work), with grouped by work as the default.
  • Support for performing either a compound or simple searching in a single form.
  • Once the results set has been returned, the user should be able to perform a narrowing search within those search results.  As with the initial search, the user should be able to declare a sorting option, AND set whether the results should be the whole item OR a hit in context (a text region or individual images for a site/object).
  • Results for any search must be weighted; whether by number of hits per object or by a more complex algorithm is to be determined in the implementation.
  • Given that etexts can appear in multiple formats (styled html, Microsoft Ebook, Palm Reader, PDF [including print-on-demand]), the results set listings need to specify text format, and dynamically present the available options – download, display the result context, or display the table of contents for the work in full.  There is no specification as to whether versions should appear conflated as options for a single title or as individual hits.

Features that were noted as particularly desirable in the repository interface include dynamically-populated drop-downs for constraint options (currently these drop-downs are built by hand for each collection); proximity searching; support for either compound or simple searching in a single form as is best exemplified by the search form for JTI, and default of the results to be grouped by work.

There was also some discussion that additional metadata be included in the results returned that relate to the context of the work – the type of work (fiction, non-fiction, manuscript), time period for the work (i.e., Antebellum, Reconstruction), and the collection(s) where the work is contained (EAF, Chadwick-Healey).  It is understood that this is not necessarily consistently extant in the markup.


Recommendations – Results Set Display

  • The results list must include a dynamically generated "jump to" list to navigate through the hit list.
  • Option to change the display result format to either grouped lists of all matches or by work with a notation of the number of matches within that work (link goes to a list of all matches for that work).
  • Matches within a work should be displayed as a keyword in context—the word hit surrounded by a pre-set (to be determined) byte offset.  When results are grouped by work, the results should be initially presented as the author and title of the work, with the number of keyword matches enumerated.  If the user clicks on the title of the work in that display, the next display shows the author and title, with all keyword in context results displayed on separate lines below the work.  (An excellent example of this interface is Lexis/Nexis).  If the user clicks on one of the keyword in context links, the user is taken to a display of a larger structural division of appropriate size containing the hit.  The user preference would likely be a page, but it is acknowledged that this may be difficult to implement.  After reaching the larger structural division, that page should contain the complete set of text navigation options as set out below, so the user can then move to other points in the text or move back to the results set.
  • Results for any search must be weighted; whether by number of hits per object or by a more complex algorithm is to be determined in the implementation.
  • Given that etexts can appear in multiple formats (styled html, Microsoft Ebook, Palm Reader, PDF [including print-on-demand]), the results set listings need to specify text format, and dynamically present the available options – download, display the result context, or display the table of contents for the work in full.  There is no specification as to whether versions should appear conflated as options for a single title or as individual hits.


Recommendations - Display of Selected Text

Once a text has been selected from the results, some general recommendations for navigation are functionality are:

  • General site navigation with links to the DL site context (DL general info, initiatives, collections, services [centers], and LofT), Virgo, online reference help, and the main Library home page.
  • A Table of Content button that, when moused over, gives the user the ability to move to any individual chapter in the work through a dynamically generated set of division links that corresponds to the table of contents.
  • A button that links to the Table of Contents and the entire work.
  • A button that links to the page related to the larger collection that contains the work.
  • A button that links to the page turner (page image viewer), but this button should appear only if page images are available.
  • Within the page turner, the ability to move backward and forward page by page, the ability to go to page "n," the dynamic table of contents that moves to the appropriate page image that starts that section, and a link to the full text file. 


Recommendations - Software Implementation

Given Tamino's inability to meet the requirements for the discovery and retrieval of electronic texts and the potential wait for the addition of this functionality, the group strongly recommends the purchase of a license for XPAT.

Until the time when all collections are indexed by a single search engine that can support Unicode, support the exploitation of XML elements such as Xpath and Xquery, and provide the necessary search and retrieval functions for full text, the DL infrastructure may need to support multiple search engines simultaneously. .  It will be the decision of the Digital Research and Development group whether we will continue to use Tamino, although it is acknowledged that a continuing relationship with the company will better ensure that our functional needs are implemented in Tamino in the future.

To meet the established goal of a single web-based point-of-access to the digital collections, it is assumed that local development of a federating and aggregating front-end for the multiple search engines will be necessary.  It will be the decision of the Digital Research and Development group how to best undertake this implementation.


Recommendation– The Committee's Role in Implementation

The committee feels very strongly that group should participate in the implementation process, with the role of reviewing the implementation as it progresses.  In this way, the group can assist in the decision-making process about necessary alterations to the interface caused by potential technical limitations.

Digital Initiatives
University of Virginia
PO Box 400112
Charlottesville, VA 22904-4112

Digital Initiatives Home • UVa Library Home
Search the Library Site • UVa Home
Maintained by: dl@virginia.edu
Last Modified: Monday, June 02, 2008
© The Rector and Visitors of the University of Virginia