Committee
members:
Leslie Johnston, Chair, Melinda Baumann, Matt Gibson, Karen Marshall, Chris
Ruotolo, Scott Silet, Kendon Stubbs
Goal
Discussion and documentation of
the required level of functionality for searching, display of results sets,
and the generic stylesheets for texts delivered through the repository that
have no overriding stylesheets. These decisions will be informed by the current
etext search and style implementation as well as any limitations of the search
engine.
Issues
- Advanced search categories (element and region
limiters) and search type limiters (proximity, etc.)
- Display of results sets in text-only searches and
searches that return multiple format types (texts, images, datasets)
- What is delivered when a result is selected
Recommendations/specification
will be drawn up for the discovery of texts in a cross-format/collection simple
search, a text-only search, and an advanced text-only search, how results
should be returned, and how selected texts should be displayed.
Recommendations - Cross-Format and Cross-Collection Simple Search
The
proposed functionality for the discovery of texts in the simple search across
all formats and collections is as follows:
- The user must be able to locate known items using
a phrase, such as an author or a title.
- The keyword searching should be a full-text search
and not limited only to metadata, but this may not be possible in the short
term and we may have no choice but to implement metadata-only searching.
- The simple search must also be limitable by format,
such as text or image.
- The user must be able to specify how the results
should be sorted, such as by relevance, format, maker, title, etc.
- The default for results should be the whole item
(such as a complete text, or site/object).
- Once the results set has been
returned, the user should be able to perform a narrowing search within those
search results. As with the initial search, the user should be able to
limit by format, declare a sorting option, AND set whether the results should
be the whole item OR a hit in context (a text region or individual images
for a site/object) if this search is based on full-text search functionality
(if this search is metadata only, then only a whole work can be returned).
- Results for any search must be weighted; whether
by number of hits per object or by a more complex algorithm is to be determined
in the implementation.
There
is no need for a separate simple text-only search if the general simple search
is limitable by format.
Recommendations - Browsing
To get at a digital work in full,
most searching takes place from outside search engines (Yahoo! and Google
are the most common referrers to etext) or via Virgo (although we have no
stats on this use). Etext supports the access to works in full through their
browse pages, which are dynamically generated via a "Make Browse"
program created by Sue Munson. Some sort of dynamic browsing capability should
be supported by the repository, such as generating full listings arranged
by author, date, title, etc.
Recommendations - Advanced Search
The
advanced search must include the following features:
- Constraint by author name.
- Region limiter (varies by collection and markup).
- Proximity settings (near, not near, followed by,
not followed by, and choice of character or word distance [byte offset amount
is to be determined], depending upon search engine possibilities).
- Ability to perform compound searches.
- Option for number of results per page.
- Option to display results as grouped lists of all
matches or by work with a notation of the number of matches within that
work (link goes to a list of all matches for that work), with grouped by
work as the default.
- Support for performing either a compound or simple
searching in a single form.
- Once the results set has been
returned, the user should be able to perform a narrowing search within those
search results. As with the initial search, the user should be able to
declare a sorting option, AND set whether the results should be the whole
item OR a hit in context (a text region or individual images for a site/object).
- Results for any search must be weighted; whether
by number of hits per object or by a more complex algorithm is to be determined
in the implementation.
- Given that etexts can appear
in multiple formats (styled html, Microsoft Ebook, Palm Reader, PDF [including
print-on-demand]), the results set listings need to specify text format,
and dynamically present the available options download, display the
result context, or display the table of contents for the work in full.
There is no specification as to whether versions should appear conflated
as options for a single title or as individual hits.
Features
that were noted as particularly desirable in the repository interface include
dynamically-populated drop-downs for constraint options (currently these drop-downs
are built by hand for each collection); proximity searching; support for either
compound or simple searching in a single form as is best exemplified by the
search form for JTI, and default of the results to be grouped by work.
There was also some discussion
that additional metadata be included in the results returned that relate to
the context of the work the type of work (fiction, non-fiction, manuscript),
time period for the work (i.e., Antebellum, Reconstruction), and the collection(s)
where the work is contained (EAF, Chadwick-Healey). It is understood that
this is not necessarily consistently extant in the markup.
Recommendations Results Set Display
- The results list must include a dynamically generated
"jump to" list to navigate through the hit list.
- Option to change the display result format to either
grouped lists of all matches or by work with a notation of the number of
matches within that work (link goes to a list of all matches for that work).
- Matches within a work should
be displayed as a keyword in contextthe word hit surrounded by a pre-set
(to be determined) byte offset. When results are grouped by work, the results
should be initially presented as the author and title of the work, with
the number of keyword matches enumerated. If the user clicks on the title
of the work in that display, the next display shows the author and title,
with all keyword in context results displayed on separate lines below the
work. (An excellent example of this interface is Lexis/Nexis). If the
user clicks on one of the keyword in context links, the user is taken to
a display of a larger structural division of appropriate size containing
the hit. The user preference would likely be a page, but it is acknowledged
that this may be difficult to implement. After reaching the larger structural
division, that page should contain the complete set of text navigation options
as set out below, so the user can then move to other points in the text
or move back to the results set.
- Results for any search must be weighted; whether
by number of hits per object or by a more complex algorithm is to be determined
in the implementation.
- Given that etexts can appear
in multiple formats (styled html, Microsoft Ebook, Palm Reader, PDF [including
print-on-demand]), the results set listings need to specify text format,
and dynamically present the available options download, display the
result context, or display the table of contents for the work in full.
There is no specification as to whether versions should appear conflated
as options for a single title or as individual hits.
Recommendations - Display of Selected Text
Once
a text has been selected from the results, some general recommendations for
navigation are functionality are:
- General site navigation with links to the DL site
context (DL general info, initiatives, collections, services [centers],
and LofT), Virgo, online reference help, and the main Library home page.
- A Table of Content button that, when moused over,
gives the user the ability to move to any individual chapter in the work
through a dynamically generated set of division links that corresponds to
the table of contents.
- A button that links to the Table of Contents and
the entire work.
- A button that links to the page related to the
larger collection that contains the work.
- A button that links to the page turner (page image
viewer), but this button should appear only if page images are available.
- Within the page turner, the
ability to move backward and forward page by page, the ability to go to
page "n," the dynamic table of contents that moves to the appropriate
page image that starts that section, and a link to the full text file.
Recommendations - Software Implementation
Given
Tamino's inability to meet the requirements for the discovery and retrieval
of electronic texts and the potential wait for the addition of this functionality,
the group strongly recommends the purchase of a license for XPAT.
Until the time when all collections
are indexed by a single search engine that can support Unicode, support the
exploitation of XML elements such as Xpath and Xquery, and provide the necessary
search and retrieval functions for full text, the DL infrastructure may need
to support multiple search engines simultaneously. . It will be the decision
of the Digital Research and Development group whether we will continue to
use Tamino, although it is acknowledged that a continuing relationship with
the company will better ensure that our functional needs are implemented in
Tamino in the future.
To meet the established goal of
a single web-based point-of-access to the digital collections, it is assumed
that local development of a federating and aggregating front-end for the multiple
search engines will be necessary. It will be the decision of the Digital
Research and Development group how to best undertake this implementation.
Recommendation The Committee's Role in Implementation
The committee feels very strongly
that group should participate in the implementation process, with the role
of reviewing the implementation as it progresses. In this way, the group
can assist in the decision-making process about necessary alterations to the
interface caused by potential technical limitations.
|