|
Metadata Home > Metadata Steering
Group > Past Minutes
September 25, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples, Ann Whiteside
This was the first meeting of the newly established MSG!
Review of Charge:
- The charge was clarified: where the text of the charge reads "...
will be responsible for keeping up with changes and adjusting mappings and
the DescMeta schema to stay aligned with external standards ..." the
words "DescMeta schema" will be replaced by "metadata schema
for the UVa Digital Library" so as to include administrative/technical
metadata as well.
- The MSG defined its role as being the group that will officially weigh-in
on all standards and mappings for all items being ingested into the Central
Repository; will address migration issues as standards evolve; will be the
provisional authority that will amend and enforce all DescMeta/AdminMeta
standards and best practices for ingestation into the DL; will develop recommendations
for where that authority will ultimately lie in the production environment;
will balance the tensions between the deeper needs of individual communities
and the merging together of these communities for the DL; and will be responsible
for Library-wide metadata education encouraging those communities to think
broadly and flexibly about metadata issues.
Role of the "experts"
- The charge identifies experts in the various international standards (TEI,
MARC, VRA….) and various content domains (science, music, etc.). The
group will bring these people in as needed for consultation and/or work
on best practices, particular mappings, or migration issues. The experts
include: Beth Blanton-Kent (Science)
- Bradley Daigle (EAD)
- Edward Gaynor (EAD)
- Matt Gibson (TEI)
- Greg Murray (TEI/DLPS)
- Mary Prendergast (Music)
- Andrew Rouner (TEI)
- Christine Ruotolo (TEI)
- Judith Thomas (VRA/GDMS/Audio/Video)
- Jama Coartney (DLPS)
- Leslie Johnston (CenRepo)
- Ross Wayland (CenRepo)
Please email Erin with any appropriate names missing from this list. The
experts will all be invited to a meeting of the MSG in the next few weeks
to put all their issues/concerns on the table and help to define priorities
for the MSG. The people on this list are also encouraged to bring their
metadata issues to the MSG for discussion/consultation at any time as well
as to subscribe to lib-metadata.
Meeting times
- Meeting times were arranged.
Setting priorities
- There is an immediate need for both practical and philosophical work for
the MSG. The group will officially evaluate and approve the TEI mapping
proposal (available at http://www.lib.virginia.edu/digital/reports/teimap.html)
. Edward, Bradley, and Erin have begun discussions on the EAD mapping;
work is beginning now and the proposal will be taken to the MSG for evaluation/approval
ASAP as well. On the more philosophical front, there is an immediate
need to make decisions regarding content and carrier. Are we describing
the intellectual content of the work in our descriptive metadata or are
we describing the electronic version that we "hold" in hand?
Next steps
- Thorny will present to the group a picture of how the metadata fits in
with the CenRepo architecture so that the MSG is all operating with a common
base understanding.
- The MSG will invite all of the experts to a meeting in the next few weeks
to put all their issues/concerns on the table, to help lay out and frame
the questions, and help to define priorities.
October 2, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples, Ann Whiteside
Picture of how the metadata fits in with the CenRepo architecture
- Thorny distributed a diagram representing the current vision of how the
metadata will fit into the CenRepo/Fedora architecture and discussion ensued.
The extraction of DescMeta from the native metadata schema (TEI, EAD, GDMS,
etc.) will be entirely automated. Nobody will touch the actual DescMeta
records.
- Much of the discussion, therefore, surrounded roles and responsibilities
and how we would resolve issues and problems in a distributed environment.
- Also, when a user searches the Digital Discovery Index, s/he is searching
that metadata only. How much metadata can we afford?
Agenda planning/discussion/goals for the Expert Meeting
- The MSG invited stakeholders in metadata decision-making for the Digital
Library to an open meeting on October 9th. We will encourage participants
to put their issues/concerns on the table, to help lay out and frame the
questions, and to help the MSG define its priorities.
- Invitees include those people identified previously as Library standards/content
experts; subscribers of lib-metadata; Martha.
- Conversation will likely be all over the place, but we will try to
sort thoughts into four main categories: 1) Infrastructure and Tools;
2) Standards for data tagging; 3) Standards for data content; 4) Workflow
issues
TEI mapping
- The MSG began considering the TEI mapping done previously by Dan McShane
and currently living posted on the Digital Initiatives website. Questions
included:
The TEI elements part of the series statement are all mapped to <relation>?
- Should the notes statement be mapped to <description>?
- Text Classification says it maps to Subject. The second sentence reads
"Terms denoting specific literary forms (prose, poetry, etc.) should
be mapped to the Form element contained in MediaType." Prose, poetry,
and other similar terms are forms, but in this instance aren't they
subject terms?
- Revision Description -item - maps to Agent type="contributor" form="persname".
Shouldn't this map to Item?
- Thorny suggested we flip the chart and map DescMeta-from-TEI rather
than TEI-to-DescMeta.
October 9, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake
, Greg Murray, Bradley Daigle, Edward Gaynor, Melinda Baumann, Leslie Johnston,
Ross Wayland, Beth Blanton-Kent, Andrew Rouner, Nadine Ellero, Ronda Grizzle,
Chris Ruotolo, Matt Gibson, Jama Coartney
Welcome & Introduction
- Erin welcomed the group and thanked them for coming. She distributed the
MSG charge and encouraged them to join lib-metadata. The MSG invited them
to this meeting because they are all stakeholders in metadata decision-making
for the Digital Library. The MSG encouraged the participants to put their
issues/concerns on the table, to help lay out and frame the questions, and
to help the MSG define its priorities.
Sketch of Search Services for the Digital Library
- Because Thorny was out sick, Erin distributed his diagram, representing
the current vision of how the metadata will fit into the CenRepo/Fedora
architecture. Highlights:
- The extraction of DescMeta from the native metadata schema (TEI, EAD,
GDMS, etc.) will be entirely automated. Nobody will touch the actual
DescMeta records. Creation and maintenance of metadata records will
occur entirely in the native schema. Catalogers/Metadata creators will
need expertise in each of those schema and will need authority to update
metadata in its home location. With new materials ingested into CenRepo,
the DescMeta record will be created as part of the ingest and distributed
to the Digital Discovery Index. Metadata will also simultaneously populate
indexes particular to the type of resource (i.e. Modern English, Art
& Architecture, Finding Aids, etc.). When metadata records need
updating, the work will be done in the native schema and re-disseminated
to the Digital Discovery Index and the native index.
- The DescMeta record will represent selective metadata mapped from
the native schema. When a user searches the Digital Discovery Index,
s/he is searching that metadata only. When the user searches an index
for the native schema (i.e. Modern English Index/TEI), s/he will be
searching the full metadata and/or the full text of the resource.
Roundtable discussion of metadata issues/concerns/priorities
- Beth introduced broad categories of issues to help focus the discussion.
Discussion jumped all over the place and Janis organized them into their
appropriate category. In these minutes, the discussion is summarized first
and then the points are categorized. Names are only mentioned when relevant
to a particular context or workflow.
- SUMMARY
- Melinda, Greg, & Jama: what about plates/pictures/figures in
books? & Beth Blanton-Kent: diagrams/graphs/tables? DLPS is not
currently taking advantage of the “FigDesc” tags. There
are workflow issues: how to do the tagging? how to describe the items
(form: photographs, drawings, etc.; subjects: dogs, fishhooks, etc.),
who has that content expertise?, who will do the description/the tagging?
The group believes that these resources should be discoverable in the
Digital Discovery Index alongside regular images.
- Bradley needs a policy decision for ingesting Special Collections
orphans. They have little metadata from the TIFF header and probably
an associated MARC record for the parent.
- Worthiness of items – are certain items more “worthy”
of having their plates/pictures/figures/diagrams/graphs/tables described?
Who decides? Does this need to be a policy decision or do we do this
by hand? In the traditional print world, Catalogers look at each book
and pull out relevant subject terms/name headings, etc., to be indexed
for the user. Is this do-able/scalable for the DL?
- Who is going to be doing the metadata? – student labor? training?
- Staffing -- humans will have to touch data at some point in the process.
- If Catalogers/Metadata creators are creating and updating records
in their native schema, they need:
- tools (i.e. Rob's GDMS tool) in each of the native areas to work
with.
- the authority to create/update/edit in each native schema.
- Who is going to make the tools to do this? Who has authority to edit
somebody else's TEI (etc., etc.) metadata? Who has the subject expertise?
- Volume/scalability problem – what do we have enough staff to
do?
- First time creation v. long-term maintenance? Enrichment?
- Versioning – Fedora will keep track of all versions of the metadata.
We will be able to clean up all the versions, but who has the authority
to do this? Only the repository manager? Who is the repository manager?
Can we do clean up or make decisions about when to clean up various
versions item-by-item? Or collection-by-collection?
- Enforcement of rules – we can enforce tagging rules programmatically,
but how do we enforce content rules? Bradley: do we need a Metadata
Honor Code? Enforcing content for a meaningful discovery index will
require a human element!
- Bottlenecks -- human intervention will necessarily cause bottlenecks
at some points in the process. What is acceptable?
- How much metadata can we afford?
- Metadata assessment should be part of the selection process. If the
collection we want to purchase does not contain adequate metadata, local
enhancement needs to be considered as part of the purchase cost.
- Collections/aggregations of objects – how do we represent this
in the metadata? There is a need to establish inter-relationships between
resources in the metadata (i.e. by LC call #, Subject, name authority,
etc.)
- Broad vs. specific subjects -- the needs for the humanities is obviously
different from the needs for the scientific communities. If one community
creates subject headings, another must have the authority to update
the metadata by adding subject terms to suit their needs. Where does
Health Sciences fit in here (MESH v. LCSH)? Will Health Sciences have
authority to add subject terms for DL records (in the native TEI, GDMS,
EAD, etc.)?
- What is the minimal set of elements required for meaningful discovery?
- Images are different than text in that the user is not generally looking
for a known item.
- Science is different in that the searching is often more granular.
- We should compare the art & architecture slides with the scientific
slides. How is the metadata different?
- Admin & technical metadata need to be fleshed out. Technical metadata
can be acquired programmatically, where is it maintained?
- Rights management must be a priority. (Right now, Fedora can do IP
restriction, more coming Jan. 2005). We need both to restrict materials
UVa has purchased and to have a place to note usage restrictions on
UVa Special Collections materials. Fedora rights management will be
“rules based”, we should follow that same path now.
- BY CATEGORY
- Infrastructure and Tools
- If Catalogers/Metadata creators are going to create and update
records in their native schema, they need:
- tools (i.e. Rob's GDMS tool) in each of the native areas to
work with.
- the authority to create/update/edit in each native schema
- Staffing -- humans will have to touch data at some point in the
process.
- Bottlenecks -- human intervention will necessarily cause bottlenecks
at some points in the process. What is acceptable?
- What about plates/pictures/figures/diagrams/graphs/tables in books?
If we make them discoverable like other images in the Digital Discovery
Index, who will do the identification/metadata creation?
- How much metadata can we afford?
- Metadata assessment should be part of the selection process. If
the collection we want to purchase does not contain adequate metadata,
local enhancement needs to be considered as part of the purchase
cost.
- Standards for data tagging
- What about plates/pictures/figures/diagrams/graphs/tables in books?
How do we make them discoverable like other images in the Digital
Discovery Index?
- What is the minimal set of elements required for meaningful discovery?
- Admin & technical metadata need to be fleshed out. Technical
metadata can be acquired programmatically, where is it maintained?
- Rights management must be a priority. (Right now, Fedora can do
IP restriction, more coming Jan. 2005). There are needs both to
restrict materials UVa has purchased and to have a place to note
use restrictions on UVa Special Collections materials. Fedora rights
management will be “rules based”, we should follow that
same path now.
- Standards for data content
- Collections -- there is a need to establish inter-relationships
between resources in the metadata (i.e. by LC call #, Subject, name
authority, etc.) B
- road vs. specific subjects -- the needs for the humanities is
obviously different than the needs for the scientific communities.
If one community creates subject headings, another must have the
authority to update the metadata by adding subject terms to suit
their needs. Where does Health Sciences fit in here (MESH v. LCSH)?
Will Health Sciences have authority to add subject terms for DL
records (in the native TEI, GDMS, EAD, etc.)?
- Who has the subject expertise?
- Authority control
- Enforcement of rules – we can enforce tagging rules programmatically,
but how do we enforce content rules? Bradley: do we need a Metadata
Honor Code? Enforcing content for a meaningful discovery index will
require a human element!
- Workflow
- Policy decision on ingesting Special Collections orphans &
how much metadata
- Worthiness of items – are certain items more “worthy”
of having their plates/pictures/figures/diagrams/graphs/tables described?
Who decides? Does this need to be a policy decision or can we do
this by hand? In the traditional print world, Catalogers look at
each book and pull out relevant subject terms/name headings, etc.,
to be indexed for the user. Is this do-able/scalable for the DL?
- Who will be doing the metadata?
- Creation v. long term maintenance?
- Value added/enhancement?
- What about Fedora versioning? – how easy will it be in Fedora
to change/update metadata? Batch tools also not currently available
in Fedora.
- How does Health Sciences fit in?
October 16, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples, Ann Whiteside
TEI mapping.
- The MSG began considering the TEI mapping. Erin and Thorny began
the morning by "flipping" the TEI mapping that was done a while
back and is currently posted on the Digital Initiatives website. Instead
of:
identifying TEI elements and where they map TO in DescMeta
we would:
identify DescMeta elements and where they map FROM in TEI.
With the map re-focused, we believe we'll be better able to articulate
a vision for what we want in DescMeta for the purpose of a meaningful discovery
index. Flipping the chart would also force better articulation of the
minimal requirements outlined last spring. What is minimal for meaningful
discovery? With the chart flipped, the MSG starting reviewing DescMeta.
Highlights:
- The list of DescMeta elements on the Digital Initiatives website is
NOT correct (<covspace> should be <covplace>; <date>
should be <time>, <place> element should be added); the DTD
however and accompanying documentation are correct. Erin will talk
to Leslie about permissions for updating the website and will get this
corrected.
- Discussion of the model of fileDesc and sourceDesc. In TEI, the
idea is that information about the original source should live in sourceDesc
and information about the electronic version (electronic publisher, etc.)
should live in fileDesc. Thorny proposed, and the MSG agreed, that
DescMeta specify elements as belonging to the original item or belonging
to the surrogate. Original publisher/surrogate publisher; original
extent/surrogate extent, etc.; how to represent this will be further discussed
(i.e. type="surrogate"; type="original"; class="surrogate";
class="original"?)
- <agent> -- the minimal requirements outlined last spring by the
Digital Library Metadata Review and Planning Group required agent. Agent
is used all over the place (type="creator"; type="contributor";
type="compiler"). With TEI objects, what we really want
to require is at least one <agent type="creator">.
- <covplace> and <covtime> -- discussion about the feasibility
of mapping LCSH geographic fields to <covplace> and LCSH chronology
fields to <covtime>. Also, clarification: <covplace>
is place of coverage; <place> is place of publication.
October 21, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples, Ann Whiteside
Special Collections
- Erin, Beth, and Janis reported on a meeting with Edward and Melinda regarding
a Special Collections issue. Special Collections is asking DLPS to
digitize "random" documents out of a collection which currently
contains only collection level description in VIRGO. DLPS/Cataloging
workflow, up until this point, has had DLPS extracting VIRGO records for
their base TEI header which Janis then enhances. Only collection level
records exist in VIRGO for the SC materials, however, and SC is selecting
particular items for digitization. There is no item level description
in either VIRGO or in the EAD finding aid. The MSG spent quite a bit
of time on this discussion and resolved that more input from Melinda and
Edward will be needed. We do feel, however, the need to frame this
as a cost-benefit analysis. Metadata should be created where it is
best suited for the native resource -- in this case, the EAD finding aid. Data
can be extracted and manipulated with a fair amount of ease, but will require
a human to create it in the FIRST instance. The process needs to start
somewhere; we need to find the resources.
TEI mapping
- Thorny proposed that we add a new <surrogate> element to the DescMeta
DTD, which would be recursive and contain all other elements except for
itself. The top level element <desc> will change to <descmeta>. All
elements immediately below <descmeta> will describe the original print
resource. All elements under <surrogate> will be applicable to
the electronic surrogate only. Born-digital items are electronically
"original" and therefore will not have surrogate elements. The
MSG approved this proposal. Erin reported that Leslie is getting the
DTD & related documentation moved to the new Digital Initiatives site
and then Erin & Thorny will have permissions to update it. The
DTD will then be adjusted to reflect this change. Unless explicitly designated
as falling under <surrogate>, elements should be mapped to the highest
level (<descmeta>)
- <agent>--in TEI, sometimes names are structured (last names &
first names in separate fields), sometimes unstructured (last name, first
name, date). If the data is unstructured, we will grab the entire contents
and tolerate the presence of date data.
- <authority>--refers to where they got the content of the data. We
decided to ignore this for the TEI, as it is understood that the data comes
from the native schema.
- <covplace> and <covtime> -- We decided to test out the feasibility
of mapping LCSH geographic fields to <covplace> and LCSH chronology
fields to <covtime> by parsing the MARC coding. We'll have to
consider the results after programming.
- <culture>--not relevant for TEI
- <description>--we'll need content rules for notes. What qualifies
as an important note and what just causes clutter in a search? For
mapping, we'll grab what notes are available in the TEI, but we need to
revisit this as part of a future best practices' discussion.
- <form>--we decided to remove <form> as a top level element,
because it is also available as an element of <mediatype>. It
was unclear why this had been assigned as a top level element and the MSG
agreed to discard it. <mediatype><form>prose</form></mediatype>
v. <form>prose</form> does not affect searchability. In
a broader discussion of the purpose of <form>, it was acknowledged
that this could be a highly useful element, but we're not sure how it has
been applied. How do you decide between assigning a work prose and
assigning it non-fiction? The values would need to be drawn from an
authority list and have very clear definitions. We will leave this
out of the current mapping but need to revisit.
- <identifier>--the top level identifier will be the Fedora PID. The
<surrogate> identifiers will include type=ISBN (although we don't
think the current texts have ISBNs) and type=UVa Title Control Number, which
we believe to be more stable than UVa Virgo ID, (the item barcode). UVa
Title Controls may still disappear: if all items on a particular record
are withdrawn; if a bibliographic record is replaced; or if the bibliographic
record was created only for the purpose of DLPS extraction (where we don't
own the book being digitized); but we still believe them to be more stable
than barcodes. If we want to use this number down the road to link
back to SIRSI, we would just need to make sure we provide users a good,
clear error message if the Title Control number no longer exists.
- <mediatype> will be populated with type=text for all TEI objects.
- <mimetype> will be populated with "text/xml" for all TEI
objects.
- The MSG believes (hopes!) we need only one more meeting to finish the
TEI mapping and then it will be given to Perry to begin programming. After
we get a first-pass at programming, the MSG will invite the TEI experts
back to review the results.
October 30, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples
TEI mapping:
- The MSG continued reviewing the TEI mapping. Highlights:
- <covtime>. The DTD currently requires a type= for <date>. The
MSG voted that this is unnecessary & will change the DTD.
- <mediatype>. The DTD currently allows <mediatype>
to be repeatable. The MSG voted that this element should not be
repeatable & will change the DTD.
- <place><geogname>. Places have different content
rules according to various standards, which will affect meaningful searchability. We
will consider adding an attribute to the DTD to all for TGN codes to
normalize <place> data.
- <relation>. Leave off for now. Thorny presented the
ideas he and Ross are discussing for kinship metadata. Kinship
metadata will handle primary parent/child relationships. More complex
relationships will need to be hashed out later.
- <rights>. TEI <availability>copyright 2000 ...</availability>
will map to <rights type="copyright"> The MSG clarified
that the access info. will be coded in administrative metadata and will
drive the DescMeta <rights type="access">. The
DescMeta text will then render to the user. Use info., however,
is not coded in TEI separately from <availability> which is problematic
because it cannot, therefore, be mapped to AdminMeta. Sherry &
Erin will work on a proposal for where to encode this in the TEI. The
MSG will then present this (through Beth & PT Services Council)
to the TEI Experts.
- <subject>. Change: <subject><authority scheme
="LCSH"> to <subject scheme="LCSH">. The
MSG voted that a URL link to the source of the scheme is unnecessary,
voted to delete <authority>, & will change the DTD. The
MSG also decided to scheme everything; if nonexistent in TEI, the default
would be: <subject scheme="unknown">. We will need
a practice rule for naming schemes. There was discussion about
providing very high level subject access following a local UVa scheme. Thorny
said he had tried to no avail in previous projects to find an established
list of very high level subject categorization. Beth suggested
the list that Cataloging uses to categorize ejournals. These were
based on the academic departments and were developed working with subject
selectors to tie these to LC classification numbers. If the TEI contains
LC class numbers, this could be mapped. The list is available at:
http://www.lib.virginia.edu/cataloging/policies/drafts/ejcodes.htm
DTD
- Thorny will give the MSG a DTD-reading tutorial at a follow-up meeting.
November 3, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples, Ann Whiteside
TEI mapping
- The MSG continued reviewing the TEI mapping. Highlights:
- Sherry asked about TEI v. TEI Lite. Thorny explained that DLPS
is using TEI Lite because it allows us tighter control than TEI. Beth
offered to buy a print copy of the DTD if that would be helpful to us.
[A clarification from Greg, at DLPS, after the minutes were distributed
on lib-metadata, on the TEI Lite issue: "DLPS does NOT use TEI
Lite (which is a very loose DTD -- not more tightly controlled than
TEI -- simply omits the more obscure tags). Instead, DLPS uses a customization
of TEI, specifically designed to provide tighter control over TEI documents.
Perhaps the confusion arises from the fact that our customization has
sometimes been called TEI Tight."]
- <subject>. Erin asked to review <subject>. Erin
& Beth went to a meeting on Friday where MODS was discussed. The
DescMeta coding is: <subject scheme="LCSH" type="topic">Religion</subject>
<subject scheme="LCSH" type="geographic">United
States</subject> MODS coding is: <subject><topic>Religion</topic><geographic>United
States</geographic></subject> Erin asked for a clarification
on the difference and Thorny explained that in the MODS example, "Religion"
and "United States" are related to each other by being present
both inside <subject>. In the DescMeta example, they are
two different subject headings. Traditional MARC cataloging has
drawn semantic distinctions between that different type of coding and
Erin, Janis, and Ann felt that it is important for us to consider that
(pre-coordinated v. post-coordinated headings). A general discovery
keyword search will result in both records, but once you have found
a particular record and are looking for other like records, users need
the advanced complexity. If you are specifically looking for "Religion
IN THE United States", the current DescMeta coding makes it difficult. The
MSG voted to keep the structure as is for the current mapping, but to
consider a more complex structure down the road (DescMeta 2.0!) Thorny
also mentioned that MODS had been considered in early DescMeta discussions
but was rejected become it is very textual in nature and wouldn't manage
well the image collections.
- TEI <edition> will map to DescMeta <description type="edition">
- <time><date>. Valid types include type="creation"
; type="publication" ; type="revision" The
MSG believes (for the moment) that revision can encompass textual reprints,
etc., as well as architectural additions.
- <surrogate><time><date>. The date the resource
was put on the first server. The date of ingest into Fedora will
be covered by the object's audit trail.
- <title>. type="primary" -- the MSG voted to always
explicitly tag type="primary". Also available will be
type="series" ; type="alternate" ; type="parallel"
; type="sort" -- Initial articles will cause trouble for browse
lists. The mapping script should take type="primary",
strip out articles (from an attached stopword list), and put the result
into type="sort". We'll do this for the legacy data which,
for the current target collection, is all English and should be ok to
just run through a stopword list. Human judgement is really needed
here however (i.e. Los Angeles, El Cid, Le Corbusier, etc.), so the
MSG will recommend a sort title proposal for future TEI markup.
We will review the minimal requirements by email.
November 13, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples, Ann Whiteside
1) TEI mapping
- The group FINISHED reviewing the TEI mapping! We made a few small last
changes & clarifications regarding notes and examples on the spreadsheet.
The only significant change was a vote to add <name> as an element
under <agent> to make the string more clean. Thorny will update the
DTD, Erin will update the file and send it to Perry to begin programming.
Adam Soroka, from Robertson Media, will actually be working for Perry on
this project. Erin will update the metadata website with the new mapping
and send out an announcement to lib-metadata. When the programming is complete,
we will invite the TEI folks back for a review of the results.
2) Minimal requirements
- The group reviewed & clarified the list of minimal requirements proposed
by the spring Digital Library Metadata Review and Planning Group and approved
by (then) IT Council.
- Highlights & Changes
- ID. The Fedora PID -- <surrogate><identifier>
- Title. <title type="primary"> -- one and only one (type="primary"
not repeatable)
- Agent. <agent type="creator"> -- at least one (for the EAD,
it will likely be one <agent type="repository"> -- we will require
one <agent> but the type can vary based on the collection). Populate
with "Unknown" if anonymous or otherwise unavailable.
- Mediatype. <mediatype> -- one and only one (not repeatable)
- Rights. <rights type="access"> -- the MSG will put together
a proposal for the TEI experts to code in access rights more explicitly
(in <availability>), but we know that all the texts going into
phase 1 & 2 are publicly accessible. Right now, therefore, will
default in <rights type="access">Unrestricted</rights>
- Desc. The spring group intended this to be Description, but the MSG
does not believe <description> to be required for minimum. We
will drop this from the list.
- Date & Creation date -- These were 2 separate items on the spring
list. The Fedora ingest date will be automatic. The MSG believes we
should require <surrogate><time><date type="creation">
as the date the resource was put on the first server. This info. will
be useful for evaluating our digital collections over time. After a
lengthy discussion of which "original" date to require, we chose to
require at least one <time><date> of any date type (which
could be creation, publication, revision, whatever). For a collection
to be minimally acceptable for ingest, some original date info. must
be encoded. In extreme situations, folks can appeal to the MSG for populating
the field with "Unknown"; the MSG will decide if this is acceptable
on a case-by-case basis.
- Erin will write up a revised document that Beth will take back to
PT Services Council for approval.
3) DTD Tutorial
- Thorny provided the group a tutorial on reading a DTD. The group voted
to add <name> under agent (see above under TEI mapping).
4) Planning/Prioritization
- The MSG will look next at the administrative elements and then move on
to EAD.
November 20, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples, Ann Whiteside
1) Discussion of Perry's comments/questions
- The TEI mapping went to Perry on Tuesday. Adam Soroka will be working
with Perry and doing the actually programming. Perry had emailed a number
of questions that needed follow-up from the MSG:
a) Question: Shouldn't mimetype be recorded at the <surrogate>
level, not at the <descmeta> level? It seems to me that mimetype
only applies to the electronic surrogate. Action: MSG agreed that for
the TEI mapping, mimetype applies to <surrogate>
b) Question: Similarly, doesn't the rights info contained in fileDesc/publicationStmt/availability
(p.12) and the blanket access statement (p.13) belong to the surrogate?
These don't apply to the original which has its own set of copyright
and access data. Action: MSG agreed.
c) Question: The DTD now requires mediatype at both the descmeta and
surrogate levels. Wouldn't a surrogate normally inherit its mediatype
from the higher level? If so, then mediatype should be optional at the
surrogate level, but remain required at the descmeta level. Action:
the MSG indeed wants the mediatype to be required at both levels. There
are instances in which the mediatypes may actually be different (the
original is a building, the surrogate is an image of the building, etc.)
It is not always inherited. In the current TEI instances, they will
be the same, however, and Adam should populate "text" at both the descmeta
and the surrogate levels.
d) Question: Element and attribute pairs, i.e., <identifier type="UVaPID">,
<titletype="primary">, <rights type="access">, <time
type="creation">, can't be required in a DTD. Either the DTD must
be changed, that is, requiring elements such as <uvapid>, <primarytitle>,
etc., or a secondary level of checking, i.e., a "quality assurance"
step, must be implemented. I'd recommend the first option. The advantage
of a more constraining DTD is that it catches errors earlier in the
metadata creation. We could discard the current, general DTD, make the
new DTD a customization of the old one, or make the new DTD mappable
to the old one. Action: the MSG decided not to change the DTD because
DescMeta will always be extracted from the native schema. No one will
be entering DescMeta records by hand and therefore the suggested enforcement
really needs to happen in the native schema. This will need to be taken
up separately. [Later clarification from Thorny: "The real point
here is that we do not have a complete list of values for those attributes
and we won't for some time, if ever. I was saying today in the meeting
that one approach would be to develop a specific DTD for each of the
crosswalks that could have specific lists of values, i.e. one for TEI,
one for EAD, etc. Then I said that, because all of this metadata extraction
will be done programmatically, the extraction programs are a point of
control on the attribute values anyway." Note: the MSG will invite
Perry to the next meeting for further discussion of this point and finalization
of the DTD]
e) Question: Is the term "thing" in the list of values allowed in the
type attribute for mediatype intended to be a catch-all for anything
that doesn't fit the other categories or is it intended to mean physical
object? If the former, I'd suggest changing the term to "other". If
the latter, then I'd suggest "physicalobject". "Thing" is too ambiguous
-- all the other values are things too. Action: "Thing" refers to a
physical object; it was the best term Dan McShane could come up with
as the time for such objects. "Entity" couldn't be used because it is
a reserved word. Physicalobject is now available in Dublin Core and
the MSG will adopt this term.
f) Question: With regard to creator, I think "anonymous" and "unknown"
are different things. To me "anonymous" means that there is an author,
we just don't know his name, while "unknown" is a substitute for "no
value was supplied here". They represent different levels of uncertainty.
Action: MSG agreed. If, at the point of creating/editing the TEI header,
someone wants to code something as <agent type="creator">Anonymous</agent>,
they should do so. But, if we are automatically populating a field because
the element is minimally required and the TEI doesn't have an <agent
type="creator"> value, then it should be populated with "Unknown"
because no value was supplied there.
Erin will update the mapping files. Erin also met with Perry today
to learn to use the program for generating the documentation from the
DTD. Erin will update those files as well.
2) Administrative metadata.
- Highlights:
- Digiprov. Will be taken care of by the Fedora audit trail
- Rights. Access is who can use the resource, Use is what you can do
with it. Use and access notes are grouped under <policy> so that
they can be applied in pairs, i.e. Access available to UVa users for
certain uses and access available to non-UVa users for other uses. For
now, the MSG recommends four different access terms: unres (unrestricted,
i.e. publicly available); uva (UVa only); viva (VIVA only); res (restricted,
i.e. to only authorized library staff). Display rending to the public
will come from the <rights> element in DescMeta. AdminMeta will
enforce policies, except that currently the only restrictions we can
put on Fedora is to put the whole CenRepo into an IP box and restrict
the entire database to UVa users. The MSG has on its agenda a plan to
work up a proposal to the TEI folks to code access and use. For the
phase 2 TEI collection, adminMeta can be populated with unres for all
materials. When we get to GDMS, we will need to deal with other variations.
The MSG agreed that DescMeta for restricted materials will populate
the discovery index. Users will be able to find out materials exist,
even if they cannot access them. For use restrictions, we need to think
more about use classes. We'll need credit lines, especially for GDMS.
Lending policies could be another type of use class. We can consider
this further for next meeting, but for the current implementation, <access>
and <use type="credit"> may be enough.
- Technical. We talked a bit about which of the already defined elements
for texts were really necessary. Is word_processor important for adminMeta?
Do we care what software was used to ocr the file or do we just care
that it was ocr'd? For the initial implementation, we will focus on
encoding and markup, which the MSG agreed were most important. We need
information on character encoding (Unicode, etc.), markup (XML, etc.),
and DTD (TEI, etc.). Thorny will talk with Perry and bring the MSG back
a proposal for encoding. Looking at the DLPS TEI's <encodingDesc>
references various entities. Erin will follow up with Greg to find out
exactly what they reference. These values would need to be considered
part of the content when the mapping is programmed, the program will
need to go out and find those referenced values.
Next steps
- More on administrative metadata.
December 4, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Perry Roland,
Thorny Staples, Ann Whiteside
1) Discussion with Perry and finalization of the DescMeta DTD
- Perry joined the MSG to talk through issues and finalize the DTD.
Highlights:
- Clarification of mediatype at both the descmeta and surrogate levels.
The MSG had said we wanted mediatype to be required at both levels.
Perry asked about inheritance, unless otherwise specified can we assume
that mediatype at the <surrogate> level is inherited from mediatype
at <descmeta>? Do we explicitly state it at both levels
when it is the same? MSG agreed that <surrogate> can be
said to inherit mediatype from <descmeta> unless the value is
specifically different. Mediatype will therefore not be encoded
explicitly at both levels. It is required at <descmeta>
but optional at <surrogate>
- We can't enforce type attributes in a DTD. There is no way to
enforce <title type="primary"> or <identifier type="UVaPID>
without explicitly creating elements for them. We, therefore,
went through the list of minimally required elements and revised it
yet again, adding new elements for some required concepts:
- <surrogate><identifier type="UVaPID> will become new
element <pid>. <identifier> will apply to coded
identifiers other than the UVaPID. <pid> will be available
at both <descmeta> and <surrogate> but the DTD can't
enforce requirement at <descmeta> only (for born digital materials
-- but since we have no born digital materials at the moment, we
will live with this now).
- <title type="primary"> will no longer be typed and will
become solely element <title>. All other titles will
be element <alttitle> and will be typed accordingly.
- <agent> ok as is: At least one agent of any type; May be
populated with "Unknown" if unavailable
- <mediatype> (see above)
- <rights type="access"> will become new element <accessrights>.
All other rights will be element <rights> and will be typed
accordingly.
- <surrogate><time><date type="creation"> will
become <surrogate><creationdate>. If unknown,
populate with date of ingest.
- <time><date> will change as a minimal requirement
simply to <time> for now. Is it better to require a
specific date/date range or would <time><timeinterval>
be ok to fulfill minimal requirement? This needs further discussion,
right now we will require only <time>
- Perry will update the DTD
- Erin will update the TEI mapping table and the website.
- Thorny discussed other initiatives, besides DTD, which would allow
us to enforce, not only coding, but practice. Schema introduce
a secondary layer of parsing which is objectionable to some. The
Fedora Project is using Schematron. This needs further consideration.
2) MSG/Metadata website
- Erin will add email addresses for MSG members and will follow up with
Leslie regarding questions on the digital initiatives template design.
3) MSG ownership
- A few questions came up from DLPS this week regarding mappings to GDMS
from other standards. The MSG will take ownership of GDMS and will
respond to those needs as they come up. Prioritization of MSG work
is going to become an increasingly difficult issue.
December 11, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Ann Whiteside
1) Yet again, DescMeta-TEI follow-up
- Sherry asked for clarification of the decision to have <surrogate>
inherit <mediatype> from <descmeta>, rather than have it be
explicit at both levels. The MSG explained Perry's rationale of the
week before and stated that it won't affect discoverability. It is
required at <descmeta> but optional at <surrogate>. When
Erin was writing last week's minutes, she found a conflict. We had said
that we would require <mediatype> at <descmeta> but not at <surrogate>
and then we went on to say that the DTD can't enforce requirement of <pid>
only at the descmeta level. Are these two situations not analogous?
We still need some clarification of what we are able to require in the DTD,
as opposed to best practices. Perry and Adam are currently working
on the mapping and, at the moment, we can enforce best practices, so we
are tabling this issue right now. Needs to be revisited.
2) TEI headers and MARC serial records
DLPS has digitized 5 "random" volumes. Each is a separate file,
will be a separate Fedora Object, and needs a separate TEI header.
We can create alternate titles in the TEI to ease searchability, but we
don't want to populate the discovery index with each volume's metadata because
Public Services will object. They often ask Cataloging to combine
separate monograph records into one serial record:
They won't want this for a hit list:
- A catalogue of the officers and matriculates of the University of
Virginia, 1829
- Catalogue of the officers and students of the University of Virginia,
1840/1841
- Catalogue of the University of Virginia, 1831/1832
- University of Virginia 1906/1907 session. Annual announcements,
with a catalogue of the officers and students.
- University of Virginia catalogue of session 1865/1866 announcements
for session
when they are all members of the same serial publication. Janis brought
this to the MSG to get an opinion from Thorny about the system architecture
before proceeding. In Thorny's absence, she will hold off on creating
TEI files until we know how they will render to the user. The seven
volumes of Lewis & Clark is another example of this problem. We
need to have sufficient metadata to bring together all volumes of a set,
but the rendering of a hit list, with each volume having its own metadata,
is likely to be problematic. Also, how to we indicate to that user
that, although only 5 volumes have been digitized, there are another 75
sitting on the shelf in Special Collections?
3) Continued discussion on Administrative metadata & Admin-TEI mapping
- Highlights:
- <digiprov>. We had said we could ignore this. It
will be taken care of by the Fedora audit trail. The DTD currently
requires <digiprov>, does this mean something needs to be POPULATED
FROM the Fedora audit trail or can we change the DTD to not require
the element? Question tabled for Thorny.
- <adminrights><policy><access>. For now, populate
with "unres" for all TEI objects. All phase 2 texts are publicly
accessible. Can we enforce values in the DTD? (res, unres, viva,
uva)? Or is that best practice only? Question tabled for
Thorny.
- <adminrights><policy><use>. Ignore for the
current mapping, because there are no use restrictions on the phase
2 texts. The DTD currently requires <use>, is this necessary?.
Action: change the DTD.
- <technical><text><encoding> and <technical><text><markup>.
This is the current formulation in the DTD. We need to get to
character encoding (unicode, ASCII), mimetype (xml), and markup schema
(TEI, EAD). How to best code this?
<technical>
<text>
<encoding>
<character>
<mimetype>
<markup>
or
<technical>
<encoding>
<text>
<character>
<markup>
- The DLPS context also points to the actual DTD: <!DOCTYPE
TEI.2 SYSTEM " http://text.lib.virginia.edu/bin/dtd/tei/uvalib_kb/tei2.dtd
" [ Do we point to the DTD or just name it? How do we encode
this? Questions tabled for Thorny.
- We'll need a complete element set and a minimal list for AdminMeta
as well.
4) Website
- The website had said "... there are two distinct categories of information
... for descriptive and administrative (including technical purposes ...
these locally-defined element sets are collectively referred to as UVa DescMeta
..." This is confusing -- we were assuming the "Desc" is "DescMeta"
was meant to imply descriptive only. The MSG voted to change the text
and all associated references to "collectively referred to as UVa
Metadata."
- The UVa DescMeta Descriptive Elements become UVa Metadata
Descriptive Elements (UVa DescMeta)
- The UVa DescMeta Administrative Elements become UVa Metadata
Administrative Elements (UVa AdminMeta) etc., etc.
- Per previous conversation, Erin had added a note on each of the element
sets that said: "UVa DescMeta is currently under development by the
Metadata Steering Group at the University of Virginia Library. The element
set, the minimal requirements, and the DTDs are still considered in-progress.
The DTD will be released as UVa DescMeta version 1.0 when development and
prototyping are complete." Beth pointed out that current DTD calls
itself version 1.01 (09/17/2001) and it was decided we could release the
revised DTD as version 2.0. Also, authorship on the current DTDs goes
to Dan McShane, Perry, and Thorny and the MSG should now be reflected there
as well. How much history is it necessary to keep?
- Erin will update the website
- Thorny will update the DTDs.
December 18, 2003
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake
1) AdminMeta-TEI follow-up.
- Erin and Thorny met earlier in the week to go through the MSG questions
from the previous meeting and Erin reported back to the group (Thorny was
not able to attend this meeting). Highlights:
2) TEI headers/MARC Serial records follow-up.
- Erin & Thorny also talked earlier in the week about this and Erin
reported back to the MSG. Thorny suggests that we create GDMS records to
link together the multiple TEIs. The top divDesc of the GDMS would map to
DescMeta and populate the discovery index rather than the individual TEI
headers. The discussion of which level of the GDMS populates the discovery
index is a discussion that needs to be had more seriously and with the image
experts. Another option is to try and link the files together within the
TEI headers themselves. Thorny suggested that Erin contact Chris Routolo
about this. Email to Chris is outstanding.
3) Best practices for Collection naming/Identification
- How do we identify something as being part of a particular collection?
We need an answer ASAP for Jack and the IRIS-GDMS mapping. MSG proposal
to Ann:
- Change GDMS element to <alttitle type="collection">
as in
<series><title>UVA-ARCH</title></series>
to
<alttitle type="collection">UVA-ARCH</alttitle>
- Alttitle is not currently an element in GDMS, but this would bring
it back in line with DescMeta. We propose this change to the GDMS. We
thought about recommending a <collection> element, but then we
would need <collection> in DescMeta ... using <alttitle>
seemed better. Question for Thorny: Do we need to reference the pid
of the collection object?
- There are 2 naming convention issues:
- The proper name of the collection to be displayed on all public pages
associated with the collection (i.e. the Collection Object)
- The abbreviated name for that collection to be used in the various
headers (GDMS, TEI, EAD, etc.) and pointing to the Collection Object
and mapped to DescMeta.
- Principles:
- The full "proper" name of the collection must be unique.
The abbreviation used must also be unique and we need a formula for
guaranteeing unique abbreviations (Erin will look into standards for
abbreviating titles, like the ISSN Center uses for abbreviating journal
titles -- something short enough to not be just wasting character space
and long enough to be humanly intelligible). [Update: ISSN center:
http://www.issn.org:8080/English/pub/products/lstwa/
based on ISO 4 standard: http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=3569]
- There should be a central authority (perhaps Cataloging) for assigning/approving
collection names in order to guarantee uniqueness. The central authority
would be responsible for pre-searching the database for uniqueness,
adding qualifiers if necessary ... (i.e. Barcelona Collection (1994)
and Barcelona Collection (2004)), collection title changes, creating
the abbreviations, etc.
- No semantic information about the nature of the collection needs to
be inherent in the abbreviation. The fact that it is a UVa created collection
v. a purchased collection would live in the collection object. Relationships
between collections would be represented in the collection objects.
I.e. the GDMS header does not need to represent that Barcelona is a
subset of Art and Architecture.
- Supercollections (Art & Architecture, Modern English, Finding
Aids) are those that will have their own search index. Supercollection
proper names should be prefaced by University of Virginia Library (i.e.
University of Virginia Library Art and Architecture Collection)
- There must be a way to limit a search within any particular collection.
- There was much discussion about what makes a collection; different kinds
of collections (those inherent to the resources themselves (i.e. parts of
a series), UVa created collections (i.e. Barcelona ), UVa compiled collections
(anything having to do with the Rotunda), purchased collections, etc); how
many Supercollections we will ultimately have; etc. None of which was at
all resolved.
- More discussion on this the first meeting after the holidays.
January 6, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake
, Ann Whiteside, Thorny Staples
1) GDMS
- Thorny gave the group an introduction to GDMS. He went through the structure,
the fields, and the MSG viewed various examples out there on the web. Highlights:
- The top Div describes the "thing" as a whole that is being
represented by the GDMS file. It can contain other divs recursively
and the structure describes the relationships between the various objects
(i.e. The Rotunda: the exterior view: the interior view: a particular
interior room: a particular painting in that particular interior room).
- Divs contain type and label attributes. The label is a shorthand name
for the described resource and the type represents what type of div
the resource is describing (an architectural site, a structure, a space,
a feature, an object). Different div types can use different ontologies
for description.
- Divs contain divDesc's and res'. The divDesc provides the descriptive
metadata about the item being represented in the Div. The res points
to the image file, although it can contain its own descriptive data
as well. The res will include the PID of the image object and can also
contain a rescon to allow for accompanying narrative content and html
(i.e. a critical essay on the resource at hand).
- Divs also can contain divincs and resincs to allow you to reuse images
and their accompanying descriptive metadata in other parts of the tree.
For example, if Jefferson had moved a chair in Monticello from the dining
room to the front room, you can show the image in both places as you
trace through history. The divinc contains the referenced divDesc's
identifier and the resinc contains the referenced image's PID.
- The GDMS header contains information about the GDMS file itself (not
about the resource described): who created the file? when?, etc.
- GDMS was developed to be a widely used standard to describe materials
that have complex hierarchies and beg for a relational descriptive structure.
Because it was intended to be used more broadly than UVa, we have a
responsibility to keep it semantically neutral.
- After learning about GDMS more in-depth, the MSG decided we still needed
to look closely at the GDMS header in order to answer Jack's IRIS-GDMS mapping
questions. Thorny stated that the header elements had never really been
evaluated and needed some work by the group. Next meeting.
January 15, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake
, Ann Whiteside, Thorny Staples
1) GDMS Header
- The MSG resolved to avoid semantically loaded terms, and agreed to use
"set" rather than "collection" to describe the various
types of aggregated materials.
- Sets can consist of the following:
- Materials purchased from a vendor as concrete units
- Materials created by the Library (or faculty projects) as concrete
units
- Materials brought together by the Library as being usefully related
but not inherently related to each other
- Materials inherently related to each other by their bibliographic
nature are considered series (i.e. electronic texts bearing a series
statement)
- <setstmt> and <set> elements will be added to the GDMS.
Thorny will update the DTD to accommodate the following formulation:
<setstmt>
<set code="UVA-LIB-ArtArchit"/>
</setstmt>
- <setstmt> is optional: <setstmt>?
- If there is a <setstmet> there can be one or more <set>
elements: <set>+
- The set code only serves to link the GDMS image object back to the
GDMS collection object. See more below on set codes.
- <pubstmt> -- the elements in <pubstmt> should be pulled
out. Right now <title>,<agent><series> are nested
within <pubstmt>. Thorny will update the DTD. The GDMS Head only
really needs <agent type="creator" form="corpname">
University of Virginia Library </agent> (as the creator of the
GDMS file) and <time><date> (for the date the GDMS file
was created). The GDMS file doesn't need a title of its own.
2) Set code conventions
- Set codes will begin with UVA-LIB for any sets created or collected by
the Library. This includes faculty projects that have been selected and
collected by the Library. UVA-LIB will then also be used in the XML namespace,
which nicely represents a hierarchy for the university namespace.
- For vendor collections, set codes will begin with standard abbreviations
in all caps (i.e. SI-SAAM for the Smithsonian Institute, Smithsonian American
Art Museum )
- The MSG considered the ISO standard "Rules for the abbreviation
of title words and titles of publications" for the remainder of
the set code. The MSG agreed to pull abbreviations from the standardized
list and follow their abbreviation conventions without following their punctuation
or capitalization rules. As per previous discussion, there should be a central
authority for determining the "official" name of a set (regardless
of which of the set categories it falls into). Once the official name has
been determined, the code should prefixed as above, followed by a hyphen,
followed by the standardized abbreviation with all words strung together
as a compound word. The first word of each abbreviated title should be capitalized.
All set codes must be unique. Given the above formulation:
- The Art and Architecture collection is: UVA-LIB-ArtArchit;
- The Barcelona collection is UVA-LIB-Barcelona;
- The Architecture of Jefferson Country is: UVA-LIB-ArchitJeffCtry;
- The Catlin collection is SI-SAAM-CatlinIndianPaint (from The Smithsonian
American Art Museum Catlin Indian Paintings Collection).
3) Dates
- Jack and Ann are having issues with era conventions for the IRIS-GDMS
map. GDMS and DescMeta DTD's currently use ad,bc,cc,cd which is based on
FGDC (http://www.fgdc.gov/metadata/csdgm/organization.html).
- ad: A.D. Era to December 31, 9999 A.D.
- bc: B.C. Era to 9999 B.C.
- cc: B.C. Era before 9999 B.C.
- cd: A.D. Era after 9999 A.D.
- Most government information uses these conventions. The archeological
world and IRIS use CE and BCE. AACR uses only ad and bc. It seems that either
are perfectly good standards. Which do we use? Do we need to account for
the extremes of the date ranges? Do we want to be politically correct? Jack
needs an answer for his IRIS-GDMS work. Sherry will look more into FGDC's
use of this and Ann and Erin will continue research.
January 22, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake
, Ann Whiteside, Thorny Staples, Chris Ruotolo
1) TEI and Seriality (with Chris Routolo)
- The MSG invited Chris to the meeting to discuss how to handle serials
in TEI (and then by extension, DescMeta). This has come up now for
"A Catalogue of the officers and matriculates of the University of Virginia"
-- 83 volumes in Special Collections. DLPS has digitized 4.
Chris and Erin had met earlier to hash out some ideas. Out of that
meeting:
- There needs to be an element/attribute in the TEI saying this is a
set or serial, etc. Etext had used <title level=" "> but
the only valid options for the level attribute are: analytic title,
monographic title, journal title, series title, title of unpublished
material. This may not do it for us.
- There needs to be a unique identifier to pull all children of the
set together. UVA Title Control Number would do the trick.
- Chris and Erin had discussed the idea of using <sourcedesc>
to describe the serial in its "purest" entirety (matching the MARC record)
and <filedesc> to describe the particular volume in hand.
The stylesheet would then identify the existence of the set and collate
all the children but present only one sourcedesc to the user at the
time of searching.
Thorny referred to this proposal as an implicit rule, a secret the system
needs to know -- which creates a lot of system overhead. Thorny explained
explicit rules make for better systems modelling and it would be better
if there were a way to represent the serial as its own object. GDMS
seems appealing for this but, we wouldn't be able to dump all of the records
into the same index. Erin was also concerned about staffing implications
if we first must create TEI headers and then on top of that create GDMS
objects. The group talked about the option of having a separate TEI
file entirely that would describe the serial, but have no other file content
besides the header. Issues there include not wanting to have to update
that file each time a new volume is added and concern over having certain
TEI files the database that are unlike the "normal" TEI files. The cataloging
representation pointed out that serials are always different, always require
a different than "normal" workflow, and always cause more overhead.
The question is where do we invest the overhead?
- Discussion of Lewis & Clark and monographic sets. Monographic
sets and serials contain many like qualities, but are also quite different.
- Discussion of using the series statements in sourcedesc. Action:
Janis will mock up some examples; Erin will do some more research about
serials at other TEI institutions [Update: Erin had no luck; Beth
emailed Jackie Shieh at Michigan. They have delivery solutions but
not TEI solutions]. Discussion tabled.
2) TEI/AdminMeta issues
- Discussion continued on how to code elements such as xml/DTD/TEI.2/etc.
in the AdminMeta. The MSG ran out of time and this discussion was
also tabled for next week.
January 29, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake
, Ann Whiteside, Thorny Staples
1) LofT review
- Beth is part of the Planning Team reviewing the 2001 LofT goals. The MSG
reviewed the three 3 metadata goals (LofT 19, 20, & 21) to document
the status and make recommendations for follow-up.
2) Issues on Collection naming (for GDMS/IRIS mapping)
- Jack had a question about our set naming decisions. Clarification was
made that there is no hierarchical relationship implied in the <set>
elements. All hierarchy will be assumed by the collection objects. <set
code="UVA-LIB-ArtArchit"/> and <set code="UVA-LIB-Barcelona"/>
can co-exist in a GDMS header and there is no need for a "plain"
<set code="UVA-LIB"/>.
- Erin noted that we had not considered sets when we reviewed DescMeta and
we need a place there to note sets as well. Thorny will update the DescMeta
DTD to achieve:
<relation><set code=" "/></relation> [Update:
in the end, Thorny and Erin opted for a new <relationships> tag which
will contain <relation> and <set>].
- In TEI, we will endorse <filedesc><series> to describe sets,
as Etext has been doing. Erin will check with Melinda about populating the
DescMeta for the current scripting with <set code="ModEngl"/>
[Update: ok'd by Melinda].
3) Issues on dates (for GDMS/IRIS mapping)
- IRIS uses ce/bce (Common Era v. Before Common Era)
The GDMS DTD uses ad/bc/cc/cd.
Should we change the GDMS DTD or map IRIS' ce/bce to ad/bc?
ad/bc/cc/cd comes from FGDC (http://www.fgdc.gov/metadata/csdgm/organization.html)
and is defined as:
ad -- Era to December 31, 9999 A.D.
bc -- Era to 9999 B.C.
cc -- B.C. Era before 9999 B.C.
cd -- Era after 9999 A.D.
ce/bce are politically correct; cc and cd are important for archaeological
description. Sherry emailed FGDC to see how they made their choice but
had no response. Both are perfectly reasonable standards, the question
is which to choose. Janis noted that because AACR2 uses ad/bc, all bibliographic
data following AACR2 will have those conventions. The MSG decided that
consistency between data was crucial for discovery and usability and so
voted for mapping IRIS' ce/bce to ad/bc.
4) TEI/AdminMeta redux
- Again, short discussion of coding markup in the AdminMeta. Do we note
TEI version? Erin will check with Greg about what it means to be TEI.2 --
is "2" a version? what about "P3" v. "P4"?
[Update from Greg: "The .2 is an unfortunate carry-over from the
second major version of TEI, P2. In TEI P3 and P4, they did not continue
this tradition of changing the top-level element name to reflect the version.
We are using the current version, P4. In TEI P5, coming sometime in the
future, they will finally do away with the .2 and the top-level element
will become simply <TEI>" and the "P"'s stand for Public
Proposal.]. Discussion tabled.
5) Where do we go next?
- TEI & Seriality tabled for next week.
- Then bringing GDMS descriptive elements in line with DescMeta descriptive
elements. This should be short & easy!
February 5, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake
, Ann Whiteside, Thorny Staples
1) Workflow
- Erin brought a number of questions back from the Access-Content Planning
Team discussions on workflow and from Ann and the Image Group on whether
all images need to go into IRIS before they go into the Repo. Can
we go from vendor data to GDMS directly or do we go through IRIS?
- IRIS and other systems provide data management functionality and enrichment
functionality that the Repo will not offer.
- What about tracking information? How do we know what has been
done, needs to be done? What are the triggers for knowing that
something has moved from one step to another? How do we know something
is ready for cataloging? Or done cataloging and sent off to DLPS?
How do we know when something has stopped and is stuck at a particular
step?
- Erin believes that tracking is a kind of administrative metadata.
Whether or not the MSG "owns" that is another question. Thorny
argued that tracking metadata goes hand-in-hand with industrial production.
- Beth is concerned that we have a standard and finite number of metadata
tools. Are we at a point to say that all texts are TEI and all
will be extracted/enriched/enhanced using the current DLPS/Cataloging
tools [VIRGO reports, notetab, proofreader?]; all images will be created/enriched
in IRIS and extracted to GDMS? What about Jack's image collection
tool and the GDMS tool?
- What does it mean to be "integrated"? -- integrated delivery v. integrated
production?
2) TEI/AdminMeta
TEI/Seriality and GDMS/DescMeta tabled for next week.
February 12, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Ann Whiteside, Thorny Staples, Chris Ruotolo, Greg Murray, Melinda Baumann,
Allison Sleeman
1) TEI and Seriality (with Chris Routolo, Greg Murray, Melinda Baumann, and
Allison Sleeman)
- The MSG invited Chris, Greg, Melinda, and Allison to the meeting to discuss
how to handle serials in TEI (and then by extension, DescMeta). This had
come up previously for Lewis and Clark and "A Catalogue of the officers
and matriculates of the University of Virginia." Now, DLPS has the
Cavalier Daily. Highlights & decisions:
- Data modelling. Thorny explained that, from a data modelling
perspective, it would be good and useful to represent a hierarchical
tree of relationships between a title and the issues of that title.
Each title would be an object and each of the individual issues child
objects. If the title of a publication changes, the existing objects
are still children of the old title object but now the old title can
be related to a new title object. If you discover one object,
you should be led to discover the other related objects.
- There was discussion about creating new objects to represent each
title change or whether to simply change the existing object.
In traditional cataloging, this is called successive v. latest entry
cataloging. Do you have a separate record for each title when
College Topics becomes the Cavalier Daily? Key advantages of creating
separate objects include: the ability to discover *only* College
Topics apart from the whole run and the ability to describe the nature
of each publication as opposed to characteristics of a particular issue
or the entire run. Each issue will still have its own TEI header,
but having separate objects for title changes will allow us to describe
each title as a whole, such as frequency changes or editorial changes.
- To represent the title object, Chris says there is a concept of an
"Independent Header" in TEI: http://www.tei-c.org/P4X/SH.html
. An independent header can exist as its own document independent
from the TEI text. The normal individual headers, therefore, can
describe the volume in-hand and the independent header can describe
the serial as a whole. The system we build will link the two together
by the document self-identifying itself as being part of a set and by
detecting the unique identifier for the parent and children resources
(the UVa Title Control Number).
- To accomplish this, we will locally extend the dtd to put a level
attribute on <idno> to mirror the level attribute currently on
<title>. We don't want to use level on <title> because
the values are specified: analytic title, monograph title, journal title,
series title, and title of unpublished materials. These values
are not effective for us to represent the differences, for example,
between a serial and a monographic set. The MSG will develop a
list of possible values. Greg will look into creating/extracting
Independent Headers. Level="m" is to be the default, explicit
in the DTD. If the level attribute is not present, "m" is the
default. Thus, all of the monographs done so far would not have to be
redone to add the "level" attribute. When dealing with serials,
Cataloging will change the value of level= (based on the to-be-devised
list of values).
- When we have a process up and running, we will pull Lewis and Clark
and any other sets already in the Repo and re-do them.
- When the objects get ingested into Fedora, the parent object will
know it is the parent of a set. Workflow will need to be developed
so that DescMeta records will populate the discovery index for only
the parent and not each of the children.
- The Fedora Imps group will need to account for a new content model
for these types of materials. Erin/Thorny/Melinda will take that
back for discussion there.
2) GDMS
- Thorny reported that he is reviewing the GDMS DTD and he will bring the
descriptive fields into line with DescMeta. He's spoken with the other
folks using the GDMS DTD and none of our changes presents problems for them.
Thorny will update the DTD and also pull <source> out of the Admin
DTD.
Next week: Image metadata.
February 19, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples
1) Develop a list of values for <idno level=""> as discussed
the previous week
- Following the decisions of the previous meeting on TEI/Seriality, the
MSG attempted to develop a list of values for the level= attribute. We decided
it was important to distinguish between:
Monographs
Monographic Sets
Serials
Periodicals
Newspapers
- A brief explanation of the distinction between Periodicals and Serials:
- All periodicals are serials but not all serials are periodicals!
- Serials, by definition, are published pretty regularly, have discrete
parts (i.e. can be shelved one next to each other), usually have numbering,
and are intended to be published indefinitely.
- Therefore, we can have print serials and serial videos and serial
CD-ROMs if the same title is published every year.
- Periodicals (aka journals) are a particular type of serial -- it is
probably self-evident what we mean by these. Newspapers are a particular
type of serial. But other types of serials that we might digitize include
something like *The Annual Review of Earth and Planetary Sciences* or
the *The World almanac and book of facts* which LOOK like books (are
bound like books not like periodicals), but are still serial in nature.
- Things really get confusing, when sometimes the first volume comes
in looking and acting like a monograph. There is no evidence that it
will every be published again and doesn't have a screaming-out-loud
date on the cover. But, over time other volumes do appear and then retrospectively
someone notices that we have 7 bibliographic records for the same title.
Can we treat it as a serial?
- The distinction between serial and periodical, from a user standpoint,
is important, because generally users looking for periodicals are looking
for a very specific kind of thing. They want to be able to pull out
the periodicals from all our serials, just like they would for newspapers.
- Monographic sets (i.e. encyclopedias, Lewis & Clark) display
many similar qualities to serials, but they are usually not intended
to be published indefinitely. And lastly, there are monographic series
where often each volume has its own individual title separate from that
of the series.
- After considerable discussion on the matter, it became quite evident that
by using <idno> in the way we discussed the previous week, we were
misusing the tag. We are not trying to say "this particular identification
number is a serial identification number" but rather that this "thing"
we have in front of us is a serial. Using <idno> for this would really
be telling the system to interpret this UVa Title Control Number in a way
different than other UVa Title Control Numbers. We would, again, be burying
secrets in the system, when explicit rules are much better for system modelling.
We also realized that we need a way for the individual headers to identify
themselves as "serial issues" and the Independent header to identify
itself as "the serial". If both headers contain <idno type="UVa
Title Control Number" level="serial">, how do we prevent
the independent headers from forever looking to a higher level for its parent
as well? The MSG realized we had a problem with last week's decisions and
discussion was again tabled.
2) Descriptive Image Metadata
- The MSG resolved that putting full "initial condition" metadata
into the children image objects (representing the object as it was first
ingested) was, in the long run, not meaningful. The MSG agreed that image
objects should only contain their parent pointer: <idno type=parent">.
This descriptive metadata for image objects will not populate the discovery
index and full descriptive metadata will be inherited from the parent on
demand.
Note: this decision was amended the following week because Fedora
requires a label. Rather than use a meaningless system-generated label,
the following was resolved:
- For page images (based on the existence of a pb tag), the label
will be: book title, page [value of n= (page number)].
- For figures (based on the existence of a figure tag), the label
will be: book title, [figure caption]
Some figures have extensive captions. For phase 2, we will grab the
entire caption. If this turns out to be unwieldy, we'll consider limiting
captions to a certain number of characters only for phase 3.
The technical image metadata needs to be worked out ASAP to move phase 2
forward. The MSG will invite Jama Coartney, Michael Tuite, and Leslie Johnston
for that discussion.
February 26, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Thorny Staples, Greg Murray
1) Storing PIDs in the TEI
- Assumption: Every object should have a PID (generated by FEDORA) and
its metadata. As it is now every electronic item does have an <idno>
tag within the Publication Statement. The <idno> element is repeatable.
This is the best place to put a UVaPID number; typed in the idno element:
<idno type="UVApid">. A workflow will need to be set to
generate the PIDs and include them in the objects metadata. This workflow
could be added to the current workflow of Greg's generation of a Master
metadata file. At the end of the current workflow, the batch generated UVAPIDS
could be automatically added to create a "final" master which
would go into the repository.
2) TEI Seriality
- It was decided to use TEI's Independent Header for series, serials, and
multivolume sets (and not use TEI's corpus for multivolume set). That said,
we discussed how to ID the "Thing." Per our discussion in our
last meeting, we wanted a tag to describe the "thing" as a volume
or a serial, etc., rather than telling the system how to interpret a particular
type of idno. The independent header is the parent (newspaper, periodical,
monographic set). The children are newspaper issue, periodical issue, monographic
volume, article. We decided that we needed something in the <sourceDesc>
to describe the thing and what type it is. The idno control number would
be what was used to connect the children to the parent header. To do the
"thing" typing, we decided to use <keywords scheme="uva-form">.
We came up with the following for valid uva-forms: newspaper, newspaper
issue, periodical, periodical issue, monographic set, monographic volume,
monograph, and article. (It was decided after the meeting to add periodical
volume, manuscript, serial and serial volume to this list.) Greg will make
sure our new classification scheme gets declared. Greg will identify "serial"
sets, re-run his program with the new uva-form scheme and create independent
headers for them.
- In practice a display program will gather all the objects with the same
UVA control number and then display them based on the UVA-form, gathering
the children and separating the parent.
3) Fedora Imps Metadata Task List
- Erin handed out a "Phase 2 Task List" for Metadata. The committee
reviewed completed items and made suggestions. Dates were set for "Completing
the metadata development for EAD files" and for "Completing the
metadata development of GDMS files.
4) Descriptive Image Metadata
- It was decided in the Februrary 19th meeting that image objects should
only contain their parent pointer: <idno type="parent">.
The descriptive metadata for image objects will not populate the discovery
index and that the full descriptive metadata will be inherited from the
parent on demand. After this decision, Ross reminded Thorny that all FEDORA
objects require a title (or label). So rather than use a meaningless system-generated
label, the following was resolved:
- For page images (based on the existence of a pb tag), the label will
be: book title, page [value of n= (page number)].
- For figures (based on the existence of a figure tag), the label will
be: book title, [fig caption]
Some figures have extensive captions. For phase 2, we will grab the
entire caption. If this turns out to be unwieldy, we'll consider limiting
captions to a certain number of characters only for phase 3.
Discussion on technical image metadata needs will highlight the next MSG
meeting on March 4. The MSG will invite Jama Coartney , Michael Tuite and
Leslie Johnston for that discussion.
March 4, 2004
Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake,
Ann Whiteside, Jama Coartney, Michael Tuite and Leslie Johnston
- Leslie gave a "heads up" on OAI harvesting. For the American
Studies Grant items, metadata will need to be mapped to Descmeta and then
imported to CenRepo.
- This meeting was a special meeting to hash out image Technical metadata.
Jama, Michael, and Leslie were invited for their expertise. Our task was
to discuss what information is stored in the creation of images and what
information did we need (as a minimum) for UVa's Admin Metadata. Discussion:
- Michael described what was being done at RMC. The image collections
are dumped in "iview." The text information from "iview"
is collected and put into SQL or filemaker. The descriptive information
is added to the SQL or filemaker program. DLPS is using ImageMagic to
create TIFFs. TIFF header information can be extracted in text format.
- Jama distributed a TIFF header example. The information was a text
dump from "ImageMagic," the program used to create the TIFF.
- Discussion on how to get the UVa Fedora PID "connected"
to the image: 1) TIFF header modification to include the UVa Fedora
PID versus 2) TIFF log extraction with "log" modification
to include the UVa Fedora PID. The advantage of updating the TIFF header
in the image is that the complete header would be in every image. Disadvantage
is the time it would take to get a programmer and for them to write
a program to change the TIFF header. It was decided that modifying the
TIFF header was not a reality for the current phase. For future work,
we need further investigation in creating workflows to extract TIFF
headers, modifying them and then burning the "final" TIFF
on CD. Jama suggested that the software "iview" may be used
to add PIDs to TIFF headers. Jama would check to see if "ivew"
or "ImageMagic" could do this.
- Since RMC and DLPS use different processes for their production, we
tried to see what they had in common and what we could use for Admin
Metadata. Iview, that RMC uses, and ImageMagic, that DLPS uses, dump
the same information in a text format, but the labels for the image
data elements are different. TIFF headers can be generated from both
DLPS and RMC.
- The next discussion was on which images would go into CenRepo and
have metadata attached. Until we get Guys $500,000 storage system, CenRepo
will ingest smaller JPEG versions. The TIFFS will be stored on DVDs.
Question here was in regards to TIFFS offline versus JPEGs online. The
TIFF metadata would be online for discovery. If we have Admin metadata
for the master TIFFs, would we need Admin metadata for the JPEGs too?
Ross is to take these questions back to Thorny for discussion.
- Mapping Admin Metadata from TIFF header:
Using the TIFF header example, the group decided the minimal elements
for image metadata (using TIFF label descriptions): Format, Image width,
Image length, Resolution, Photometric Interpretation, Filesize, Compression,
Depth, and Bits/Sample
- The resultant Admin (technical) elements are:
<technical>
<image>
<filesize type="units">
<imageidentifier
type="UVAfilename"> (filename with extension)
<format>
<mimetype>
(include format at end, i.e., image/tiff)
<compression>
<colorspace>
<spatialmetrics>
<imagewidth
type="units">
<imagelength
type="units">
<sourceX
type="units">
<sourceY
type="units">
<bitspersample>
<samplesperpixel>
- Jama will further research the definitions for TIFF header elements,
Depth and Bits/Sample to make sure both are needed.
March 18, 2004
Present: Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside
TEI to DescMeta Mapping Scripts
- Last week during a non-MSG meeting week, Janis, Erin and Sherry met to look over the preliminary results of Adam's mapping scripts. One of the problems was in how the MARC record was being mapped to TEI, subtitles not being mapped under <filedesc>. Erin sent the changes to Adam and Greg. Prior to our meeting, Greg had made his changes. Adam and Erin had been discussing the TEI to DescMeta changes via e-mail.
Technical Metadata for Images
- Erin handed out the elements for Image Technical Metadata that had been decided on in the last meeting (see March 4, 2004 minutes). During the week prior to the MSG meeting, Erin was tasked to create Technical metadata for an image taken by Thorny. The metadata that came with the image was an eXif file from the digital camera. The information in this file did not map to our Technical Metadata. Jama was going to run Thorny?s picture through ImageMagic to get the required Technical Metadata information. In looking at this case of an image born from a digital camera, the MSG realized that we would need mappings for Digital Camera information. We will comeback to this issue.
Unqualified Mapping of Dublin Core from DescMeta
Descriptive Image Metadata
- Redux, on decisions made at February 26th meeting. The decisions made at that meeting (requiring only <idno> and <title> for image objects) were amended to also include access rights or use. The MSG will look further into GDMS mappings to see how access rights could/would be harvested.
Discussions on Mapping from GDMS to DescMeta
- The MSG looked over two GDMS examples. Our first problem was deciding how far to map down into the divdesc and divs. The MSG decided that unless we could ?see? the resultant search interface, it would be hard to decide what to map. We had problems thinking what a user would need (would be looking for) in regards to what is displayed on a search result page.
- Ann told us that what's in the GDMS file (to use for mapping) is decided when the image is cataloged. The decisions made in this initial phase is what determines what is entered into IRIS and thus what is mapped (or could be mapped) to GDMS. At our next meeting we will make an initial mapping, using the top Divdesc and see how it goes.
March 25, 2004
Present: Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside, Beth Picknally Camden, Thorny Staples
Review of Image Header to UVa Metadata Mapping
- Jama and Sherry met earlier in the week to discuss the fields from an ImageMagick dump for TIFFs and JPEGs. Jama decided that the ImageMagick field Type (true color grayscale, bilevel, etc.) was important to capture and to add to the Image Technical Metadata. To accomplish this, we decided to use the attribute Type in the <colorspace> element. Of the image technical metadata elements, JPEG images do not contain the information for the elements <bitspersample>, <samplesperpixel>, or <colorspace>. But since JPEGs have the colorspace Type field, the element <colorspace> would be mandatory. We discussed whether we could (or should) have the <colorspace> element blank, use unknown, or use a fixed value. Erin was tasked to ask Jama if JPEGs have one fixed colorspace. The discussion with Jama revealed that a color JPEG that has a Type of True Color doesn't give enough information as to its color space. So automatically adding information on the color space for JPEGs cannot be done. So rather than the information not be collected, we decided to populated the <colorspace> content with unknown.
- Using the Image mappings decided upon in the March 4th and March 25th MSG meetings, Erin created UVa Metadata admin mappings for TIFFs (color, bitonal, and grayscale) and JPEGs (color and grayscale). The resultant ImageMagick dump fields were then used as mappings to UVa Admin Metadata.
- Workflows will be needed to process the TIFFS, create ImageMagick dumps and then map the fields to the Admin Metadata elements.
TEI Loose Ends
- Adam is getting along great with the TEI mappings. He has sent many e-mails to Erin with questions.
- Greg had a question as to where to put restrictions in the TEI files he was generating. Currently the access and copyright information is in a <p> tag (<p n= "copyright>") inside the <availability> element. All text going into Phase 1 and 2 are publicly accessible (unrestricted), but Erin has asked Greg to think ahead on getting the variations of access coded in the TEI files. MSG decided on 4 types of access restrictions: public, restricted, viva, and uva. These cannot be coded in TEI using the type attribute on the <p> tag. The <availability> tag has the attribute status with defined values of free, unknown, and restricted. It was decided to use this attribute and to extend the DTD to include public, restricted, viva and uva as valid status values.
- Greg is also updating his DLPS TEI generation program to add the element <idno> to <seriesstmt> so that the mapping of <idno> (under <seriesstmt>) can be mapped to DescMeta <set code=??> for collections. Problem here was that <seriesstmt> in TEI must have a <title> element. MSG decided for Modern English collection use University of Virginia Library, Modern English Collection. Other titles will be created as needed when other collections are added.
Review GDMS-UVa Metadata Mappings
- In the March 18th meeting the MSG decided to map the contents from the top <divdesc>. Thorny amended this to include the titles and subjects from the lower <div> and <res> in the descmeta. The idea is to programmatically go down through all <div>s and <res>s, count them and put the results (and total count of each media type) in the <description> field with type=?contents?.
- We discussed workflow issues concerning putting the UVa PIDs back into GDMS. The GDMS DTD needs to be in line with DescMeta regarding PIDS as an element. Another point of discussion was did the GDMS have the minimal metadata requirements. It was deemed that the GDMS should have at least the minimal metadata requirements as well as a copyright statement (which should be in the GDMS header).
- As for GDMS mapping to Admin Technical metadata, the GDMS files need the encoding specified at the top of the xml file. DescMeta from GDMS mappings were reviewed and will be soon available on the DL web site.
- Once phase 2 is over we may rethink the GDMS as a collection and whether it may be best not to grab the top divdesc, but say this is a "collection about _______."
Discussion about EAD with Edward next week.
April 1, 2004
Present: Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside, Beth Picknally Camden, Ross Wayland, Edward Gaynor
Before tackling our scheduled topics, we discussed the level and possibility of updating metadata headers (in TEI, EAD, GDMS, DescMeta, etc.). Currently there is no workflow for updating these XML files once they are generated. Originally, each set of headers are based on the original MARC record. MARC records change and get updated. The ideal would be to have the metadata-generated files reflect these changes. Can this be done, automatically how? Left for a later discussion.
EAD Discussion
- We had a wonderful and lively discussion on the history and development of EAD files at UVa. Edward enlightened us on how EAD files and Special Collections MARC records are created. They are not using all of the MARC record information. He helped us understand how EAD, and Archival Collections, differ from TEI and GDMS (XML and its items). The EAD header information is about the EAD metadata file, not about the collection items (some of which do not have a digital form). The MARC record is about the collection. Also to confuse matters an EAD file could point to a TEI or GDMS image file. We tried to imagine the big picture of EAD, TEI and GDMS ingested metadata would work within the Discovery Index. We are still thinking about this.
- We discussed 3 possibilities:
1) (re)map MARC collection records to EAD then map this new EAD file to DescMeta.
2) map MARC collection records to DescMeta
3) map EAD files to DescMeta
With scenario 1) the resultant EAD files would have the collection information in them and not match the existing EAD files. With scenario 2) the DescMeta file will have rich bibliographic and authority control information, but because the EAD file would have been created describing the Finding Aid, it would not have the same information as the DescMeta file would have. This would pose a problem when trying to search the EAD index (EAD files not having same information as discovered via discovery index). And finally with scenario 3) the DescMeta files would not have the rich bibliographic/authority control information that the MARC record has.
- It was decided that for Phase II the (existing) EAD header will be used to create DescMeta. Edward pointed out that there are over 4000 EAD files (guides) using various versions of EAD. To start, we will map the header fields; <titleproper>, <subtitle>, and <creation><date> and the <scopecontent> information (of course, marked up differently in different versions of EAD). Erin will look at good EAD guides and bring back the stinkers to the MSG group for decisions. On our will look at later list we will look into mapping subject and author terms, from the MARC records, and adding them to the DescMeta files for Phase III.
Diacritic encoding for GDMS
- TEI handles diacritics as character entities. The original Barcelona GDMS files used Unicode. Melinda and Jama wanted guidelines as how to handle diacritics within GDMS. The Unicode in the Barcelona files were put in the GDMS file through a script written by Thorny. They did not come directly from MARC. Janis had noticed a problem with the MARC to TEI conversion program putting the character in and taking out the entity. Greg is reviewing his script.
- Ross elaborated that are problems translating diacritics between different database programs. He said that the new version of Excel can handle Unicode. We decided to hold off on a decision until Jack tests Filemaker to see if it can handle Unicode. Ross recommended declaring an entity reference (in XML) which would map entities to unicode. Ross will also talk with Thorny about his MARC to GDMS script and diacritics mappings.
- After conversation at the MSG and with Fedora Imps, it was recommend that DLPS be consistent with what the TEI folks are doing (using entity references rather than Unicode characters directly).
April 6, 2004
Present: Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside, Thorny Staples
MSG Web Pages
- The UVa Metadata web page has a new URL: http://www.lib.virginia.edu/digital/metadata/
From this link you can access MSG minutes, reports, Metadata mappings, and links to the UVa Metadata Descriptive Elements (DescMeta) and UVa Administrative Elements (AdminMeta). Redirections from the previous links should work.
Namespacing
- From a discussion in the Fedora IMPs meeting, it was decided that a namespace be defined so UVa could use our own DTD definitions within other metadata schemes. It was proposed and approved to define "uvalibmeta" as UVa?s metadata namespace.
GDMS Mapping Redux
- We again discussed the philosophy of "what are we describing" in the DescMeta. At a previous MSG meeting on GDMS, we had decided to use <descmeta> only for GDMS objects and have the DescMeta describe the physical object at hand (i.e. the Rotunda). At an MSG meeting on EAD, we had decided to describe the EAD finding aid as the electronic object (as opposed to the collection the EAD finding aid then described). At this week's meeting we reversed course and decided to use the DescMeta <surrogate> structure for all three collections systematically (EAD, GDMS, TEI).
- For TEI: <descmeta> describes the print source, <surrogate> describes the electronic TEI file.
- For GDMS: <descmeta> describes the object at hand (i.e. the Rotunda), <surrogate> describes the electronic GDMS file.
- For EAD: <descmeta> describes the archival collection, <surrogate> the EAD guide of the finding aid.
All descriptions would, therefore (hopefully!), be consistent.
- After the meeting, Sherry thought through the EAD representation for DescMeta and Erin and Ann thought through the GDMS. These re-mappings will be discussed in the next meeting (April 13, 2004).
- Erin asked to Thorny to clarify how to grab all of the "objects" (<res>) described in <resgrp> and include them in DescMeta's<description> element (see March 25 minutes). This will become a new Best Use Practice.
- Thorny will make corrections to the GDMS DTD <source> element. Mediatype is required in DescMeta. It should be included (and required) in GDMS.
EAD Sort Title Discussion
- In the TEI mappings, the MSG had decided to take the main title and automatically create a sort title (from a stoplist of articles) to facilitate sorting hit-lists. So:
<title type="main">A boy's world</title>
would create:
<title type="sort">boy's world</title>
to sort under "b" in a hit-list.
- The EAD already has a sort title element, so that there exist both:
<titleproper>A Guide to the Angelica Schuyler Church Papers</titleproper>
and
<subtitle id="sort">Church, Angelica Schuyler.</subtitle>
The MSG voted to use the EAD sort title. All "titleproper" titles in EAD start with "A Guide to", so if the "titleproper" was sorted, following the TEI model, all the EADs would sort under "g".
April 13, 2004
Present: Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside, Beth Picknally Camden, Judy Thomas
Digital Video
- Judy Thomas was our guest. We had great discussions on digital video and metadata and RMC's current/future workflow on creating metadata. Judy showed us a diagram of how she perceives the current workflow from selecting a video to cataloging it (creating a MARC record and other metadata) and making it available to the Digital Library.
- In RMC, they create many physical "manifestations" of a digital video (MPEG3, MPEG4, DVD, CD, offline master, etc.) Circulation records (and MARC Cataloging records) will need to be created for the DVD and CDs for inclusion in Virgo. We discussed workflow for creating different types of metadata (MARC records and video metadata, GDMS, DescMeta)
- As to metadata, Judy talked about RMC developing a tool (iris-like) to help create a database for describing the physical object. This tool would normalize the metadata workflow and the database content could be used to create metadata in other formats. We talked about how GDMS metadata could describe videos in all their versions.
EAD Mappings
- Erin passed out the latest DescMeta from EAD mappings. The <language> element does not have an attribute for the ISO code. Adam will be given the codes (according to ISO 639-2) for the 5 most popular languages. The mapping includes mapping <descmeta:subject> to <ead:controlaccess> even though the documents going into Phase I will not have <controlaccess>. There will be documents with <controlaccess> in them in Phase II.
- We discussed the ambiguous copyright notation. We will include the copyright information exactly as it appears in the <date type="publication"> element, unless it says: "There are no restrictions." or "Edit me". If so, <rights> will be populated with a standard note that says: "These materials may be under copyright. Please contact Special Collections of the University of Virginia Library for further information."
Sherry will review the EAD and GDMS mappings before the next meeting.
April 29, 2004
Present: Janis Kessler, Sherry Lake, Ann Whiteside, Beth Picknally Camden, Thorny Staples
UVa-Form Redux and remapping
- In the Feb. 26 meeting, (in the minutes under TEI Seriality), the MSG group decided to map the "thing," the item, using <keywords scheme="uva-form">, where uva-form was one of: newspaper, newspaper issue, periodical, periodical issue, monographic set, monographic volume, monograph, periodical volume, manuscript, serial, serial volume, and article. The MSG has decided that the best way to map the "form" of a "thing" is to not use <keywords> but use <mediatype> with <form>:
<mediatype type="text">
<form scheme="uva-form">monograph</form>
</mediatype>
Latest on TEI Independent Header
- It was decided that TEI's Independent Header would be used for series, serials, and multivolume sets (see Feb. 26 MSG Minutes). Upon further review, the Independent Header within TEI guidelines cannot have extensions. Thus we will need to create uva-header-only section within the main TEIP4 declarations that include our UVa extenstions.
Fedora Object's getPreview behavior
- Ross wanted some advice on mappings (from Descmeta elements) for generating a "bibliographic citation." In Fedora, each type of object will have default behaviors. Text objects will have the getPreview behavior which will display a hit list for the search results. The MSG decided that all text objects should display the following in a hit list:
- Title
- Date (followed by the "type" of date, creation, publication, etc.)
- Agent = if one exits (only mapping type="creator", and only displaying the 1st 3, if multiple creators exist)
To Look into Later
- The MSG will add to its "down-the-road" list: to look at MARC to TEI mappings, are there other MARC fields that should be mapped.
May 13, 2004
Present: Sherry Lake, Ann Whiteside, Beth Picknally Camden, Erin Stalberg
DTD Ownership
- Thorny will not longer be updating the Digital Library's DTDs (GDMS, DescMeta and AdminMeta). DLPS will be taking on this responsibility. It is still undecided who in DLPS will do this. There had been times in the past where there were multiple versions of DTDs on different web servers. Charging DLPS with the keeping and modification of the DTDs will help this situation.
Review getPreview Results
- We looked over draft output of getPreview. The MSG decided that only original dates would be displayed. Previous MSG minutes stated that all dates would be displayed.
Identify elements for getDescription
- getDescription is the behavior to pull full bibliographic citations for objects from DescMeta. getPreview creates a hit list and when a link is selected, the getDescription behavior displays the bibliographic citation for that object. All objects will have the following default behaviours: getPreview, getLabel, getDescription, getFullView, and getDefaultContent.
- The MSG discussed what fields from DescMeta to include in the getDesciption output and how that data is to be displayed (i.e., "VIRGO-y" labels, with both surrogate and original, etc.).
- These are the descmeta fields to display in getDescription:
[Note: this list reflects the MSG minutes of that date. The final list is available at:
http://www.lib.virginia.edu/digital/metadata/dissemmap.html]
<descmeta><agent role=""> (not form or type) -- all instances of any role
<descmeta><covplace><geogname> -- all instances
<descmeta><covtime> -- all instances
<descmeta><culture> -- all instances
<descmeta><description type=""> -- all instances of any type
<descmeta><identifier type=""> -- all instances but only of type="ISBN" or type="ISSN"
<descmeta><language> -- all instances
<descmeta><mediatype><form> -- there should be only one
<descmeta><physdesc type=""> -- all instances of any type
<descmeta><pid> -- there will be only one
<descmeta><place><geogname> -- all instances
<descmeta><relationships><set code=""> -- all instances of any code
<descmeta><rights type=""> -- all instances of any type
<descmeta><style> -- all instances
<descmeta><subject scheme=""> -- all instances of any scheme
<descmeta><time type=""> -- all instances of any type
<descmeta><title type=""> -- all instances of any type (these would be best ordered now: type= main, sub, parallel, alternate, series)
<descmeta><surrogate><agent role=""> (not form or type) -- all instances of any
role
<descmeta><surrogate><description type=""> -- all instances of any type
<descmeta><surrogate><identifier type=""> -- we're still thinking about this one!
<descmeta><surrogate><language> -- all instances
<descmeta><surrogate><physdesc type=""> -- all instances of any type
<descmeta><surrogate><place><geogname> -- all instances
<descmeta><surrogate><rights type=""> -- all instances of any type
<descmeta><surrogate><time type=""> -- all instances of any type
<descmeta><surrogate><title type=""> -- is it possible to ONLY grab surrogate titles that are different than the <descmeta><title>?
May 20, 2004
Present: Janis Kessler, Ann Whiteside, Beth Picknally Camden, Erin Stalberg
How much metadata is too much metadata?
- The group discussed vendor metadata and whether we should map all fields to DescMeta just because they exist and are mappable or whether we really need all that might come from a vendor. This will probably be somewhat case-by-case, but for the Fowler data immediately under consideration we decided not to map some of the gift information to DescMeta. The data will be clearly identified as Fowler data and include the Fowler accession numbers. A user needing more specifics regarding provenance for an object should probably trace the item back to the Fowler Collection and do further research from there.
Order of fields
- In working on the Catlin and Fowler mappings, Erin had mistakenly not paid attention to assuring the field order matched the DTD and Thorny's records wouldn't parse. This brought up the question of field order and whether the MSG should decree an order (akin to AACR2 field order) and adjust the DTDs to conform. Among other things, it would be handy for humans looking through DescMeta and GDMS records to know where to expect certain data. This discussion morphed into a larger discussion about how often we'll be able to change DTDs in the future and the running of lengthy metadata scripts and the scalability of a system that requires that existing data always conform. The group tabled field order until after Phase 2 when there will undoubtedly be other DTD changes required and proposes a joint MSG/Fedora Imps meeting at that time to discuss a plan for updating DTDs (quarterly? semiannually? annually?)
GDMS res titles
- We had asked Adam to map DescMeta <description type="contents"> as follows:
Count the number of GDMS <res>'s for a particular <resgrp> and map as follows:
[value of resgroup label=]: [res count]; each <title> (if <title is not present, use <description>) concatenated together, separated by
"--" and ended with a period.
for the following result: Digital Images: 4; Cast iron capital from annex -- Cast iron annex capital -- Cast iron capital from Robt Mills annex -- 1856 lithograph.
The GDMS files that Thorny ran for Catlin before he left for vacation did not have <res> titles or descriptions which was presenting a problem for Adam. The MSG decided to confirm that Thorny's script wasn't intentional and planned to ask that we add res <titles>. [Erin talked to Thorny after he returned and he believes in a situation like Catlin where the GDMS object has a one-to-one relationship between the divdesc and the res, having the res title repeat (if it's not going to be explicitly different) will cause duplication (and confusion) in display. After agreement on that point, Erin sent the following message to Adam: if there is only one <res> in a particular <resgrp> and it has NO <title> or <description> just do the first part of <descmeta><description type="contents"> where you count the res'. So, you'll stop at: <description type="contents">Digital Images: 1</description>. Otherwise follow the directions on the original mapping. So, if the data has a res title that is different than the divdesc title, i.e. "full view" that will display. Otherwise only the image count will display].
May 27, 2004
Present: Sherry Lake, Ann Whiteside, Beth Picknally Camden, Erin Stalberg, Thorny Staples, Janice Kessler
Naming Conventions for TEI Independent Header Files
- Greg had a question as to what to name the Independent Header Files for multi volume, multi-set materials. The Independent Header must be named in such a way that the children would know about it so the TEI files and their Independent Headers would "connect." We discussed how the TEI files are named and the directory structure of files on Cen Repo. This seemed like a pre-FEDORA workflow (how to get the Independent Header files named and connected to their children).
- NOTE: After the meeting Erin met with Leslie and it was decided that since each independent header and each individual header already has:
<fileDesc>
<sourcDesc>
<idno type="UVa Title Control Number">o00422606</idno>
which links them together, it was their thought to name the independent files with this title control number: o00422606.xml
- Leslie will create a "sets" folder in /ReadyRepo/text for all the independent header files to live. Then she and Ross will talk about an ingestion script which will read through that directory first, do the pid substitution, and then find all the other files that have the same title control number for pid substitution on the individual files.
Cavalier Daily
- Greg had sent Erin a few questions about Cav Daily markup (an an example TEI file) for the MSG group to discuss. The microfilm version of the Cav Daily is what was digitized so Greg used the MARC microfilm version to create TEI headers. But it is the MARC record of the print version that has the more complete descriptive information. Thus it was decided that Greg pull information from the print record not the microfilm record so the intellectual integrity of the original will be captured.
- Looking over the example TEI that Greg had sent, the MSG did not like the volume and issue # as a "type" value on the <title> element. Sherry will look over the TEI documentation to see if there is a better place to put volume and serial information in the TEI header. Erin will find out what values are legal for the "type" attribute on title. The results will be discussed at the next MSG meeting.
- Other modifications for the Cav Daily TEIhdr were: no capitalization on the data elements (YEAR, NUMBER, WEDNESDAY, SEPTEMBER 11, 1968), for the volume numbering (79th, 80th year, v. 81, etc.) use what it says on the piece even though different years may have used different types of volume numbering.
- From all the discussions on independent headers, we had decided that BOTH the fileDesc and the sourceDesc should describe the issue in-hand. Originally we had discussed the idea of having fileDesc describe the issue and sourceDesc the serial (the series). But, when we came around to independent headers, this original idea was discarded. The individual header would describe the issue in-hand for fileDesc AND sourceDesc and the independent header should describe the serial.
<title> "type" Attribute
- MSG discussed further the types of titles that appear in MARC. In MARC alternate titles are spine, cover, parallel, portion (and a few others) all under one MARC tag. But the titles uniform, former, main, translation are different titles each with their own MARC tag. The MSG talked about adding a new attribute "alttype". Again we were not sure if the types of titles were controlled since other non-title types were currently used in the <title> element (type=date, type=gmd, type=issue, type=volume). Erin will investigate and bring back to the next MSG meeting.
June 3, 2004
Present: Sherry Lake, Ann Whiteside, Beth Picknally Camden, Erin Stalberg, Thorny Staples, Janice Kessler
Cavalier Daily
- Per the decisions made in the last meeting, Greg will re-run all the DLPS TEI headers. DLPS has a demo for Reunions Weekend so they do not want to make changes, requiring time and research, with the headers, for now. The elements <title type="volume"> and <title type="issue"> will remain and the MSG will review alternatives later.
Alternate Titles
- Even though the Cavalier Daily files are "done" for now, the MSG discussed how to best handle Alternative Titles and the "type" definitions for the <title> element. The "type" definition list is one created by Greg, not a TEI pre-defined list. Currently the DLPS TEI DTD allows these values for the "type" attribute on the element <title>:
abbreviated | alternate | date | desc | former | gmd | issue | main |
parallel | part | related | resp | sub | uniform | volume
- Our first discussion was whether or not the types of "alternate titles" (cover, spine, caption, parallel, portion, etc.) should be added to this list or another type such as "altType" be added. But there was more to the title "type" problem than just how to add these alternate titles. Some of the values for title type are not types of titles (date, gmd, issue, volume), but subfields for the MARC title fields (210, 245, 246, 247, 740, etc.).
- The MSG posed the question as who (MSG or TEI folks) was responsible for making decision on proper use for TEI. Thorny will discuss this with TEI folks and come back in a couple of weeks.
June 17, 2004
Present: Sherry Lake, Ann Whiteside, Beth Picknally Camden, Erin Stalberg, Thorny Staples, Janice Kessler, Leslie Johnston
Meeting of Experts on Metadata for Digital Library Collections
- Beth was invited to participate in this discussion (presented by the GPO) as a representative from a PCC institution. The GPO is working on an initiative with the federal depository library community to digitize the entire legacy collection of U.S. government documents, estimated to be about 2.2 million items (excluding microfiche). This was the second in a series of meetings with experts in digital preservation to assist GPO in defining specifications to be used for this digitization initiative. Beth was an observer at the meeting. She reported that standards for scanning had been decided upon, but not metadata standards. You can read more about this project at: http://www.gpoaccess.gov/about/reports/preservation.html
Labeling Bibliographic Displays
- The MSG tackled the display for the disseminator for getFullView. This disseminator will use the objects native metadata (GDMS, TEI, and EAD). EAD already has its tags mapped and displayed in its current collection display. The MSG decided upon display labels for the GDMS elements. As we went through the labels, there was quite a bit of exceptions and element specific rules. It was decided that Leslie and Erin get together and work on a document describing the rules for generating labels for the DL interface.
NOTE: This document was drafted and worked on throughout the next week. Once the final kinks are worked out it will appear as a link in the Disseminators section on this page: http://www.lib.virginia.edu/digital/metadata/mappings.html.
June 23, 2004
Present: Ann Whiteside, Erin Stalberg, Thorny Staples, Janice Kessler, Leslie Johnston
Labeling Bibliographic Displays
- The discussion of labels for GDMS getFull continued from the previous week. Erin had mocked up examples (from metadata for Catlin, Fowler, and the IRIS-GDMS dump) based on the rules defined the week before and the group worked through the issues and problems. A few of these were ultimately changes to the IRIS-GDMS script which were requested [and completed] by Jack Kelley:
then the user display would be this:
place (country): China
place (site): Hong Kong
rather than:
place (country): China
place (site): Hong Kong,CHN
July 8, 2004
IRIS to GDMS
- During the week prior to this meeting, the IRIS to GDMS conversion was completed. The parsing ran smoothly and included all changes from previous MSG decisions. In looking at the results, Erin discovered an extra comma "," for some files. Erin was going to ask Jama if this could be fixed prior to the files' ingestion in CenRepo. Later that day, Erin reported that Jama had already fixed the problem.
DL Update
- Thorny gave an update on the Digital Library ingestion of materials: Ross has TEI disseminators to do, Jefferson Country and Fowler collections are almost ready to go in CenRepo, the Catlin collection is already in CenRepo. In a few weeks a test group (which will include the Fedora Imps, DL production, DL R&D, and the MSG) will be testing the user interface. In this initial test, there will not be any cross collection searching.
VTLS
- Thorny also briefed us on VTLS and their use of FEDORA as their database management system. The Arrow Project (Australian Research Repositories Online to the World) , a project in part sponsored by the National Library of Australia, has announced they have entered into an agreement to develop VITAL (a VTLS product) and FEDORA as the solution to address ARROW's research repository needs.
July 22, 2004
Random post-ALA philosophizing
- The MSG philosophized on "description" versus "access." In the traditional print world, cataloging objects with detailed descriptions (summaries, abstracts, subject terms) is necessary in order for the patron to made a good decision that this is the item they want, since the item is often in a different location than the patron. In an electronic world, is a "fuller" description really needed when the object is right in front of them? If description is not necessary, then are we creating metadata records more for access than for description? This was a true discussion, nothing was decided.
Follow-up issues
- Erin handed out a list of Follow-up Issues that we have been putting on hold until we had time. We went over this list to prioritize the items and to see which ones no longer were needed.
July 29, 2004
NINES project
- This was a special meeting for the MSG members to meet Duane Gran of the NINES project. NINES is a group of distinguished scholars and humanities computing experts engaged in building a "networked interface for nineteenth-century electronic scholarship." This interface is to be an online research and publishing environment for integrated, peer-reviewed editorial and critical work in nineteenth-century studies, both British and American. It is planned that UVa Library publish this electronic project. Duane introduced the project and described the types of metadata they plan to use in the project. Their plan is to use METS for the aggregation of their documents (which vary in form from HTML to TEI). They are developing their own NINES wrapper, an XML schema which extends TEI and METS to account for scholarly collaboration and the critical distinction between documents and works, while aggregating mixed content. They will be trying to use the METS "StructMap" concept to make a tree for the document history. They are just now thinking about descriptive metadata and how to either create a unique one, use MODS, or something else.
- We suggested to Duane that the 9s look at DescMeta and/or GDMS. Thorny briefly went over what DescMeta is. Duane did have specific questions on content format, i.e., author names, dates and places. After the meeting Erin sent Duane detailed information on Metadata Content Standards with examples and web links to authority name and subject lists.
August 5, 2004
Present: Sherry Lake, Erin Stalberg, Thorny Staples, Janice Kessler
Review of Faculty Guidelines for Creating Metadata
- Leslie Johnson asked the Metadata Steering Group to help review two
documents. The first was a document on internal production standards and
the MSG decided the metadata section would just point to the MSG pages.
The second was the metadata piece of a larger document on Faculty
Digitization Guidelines to help faculty digitize and create metadata for
their projects in conformance to Library standards.
- The MSG looked over the current Faculty Metadata Guidelines. It was decided
that this document needed a broader scope to be used by all Faculty, not
just the Art & Architecture community. The group went over the guidelines
and discussed the need for wordier descriptions. It was decided that we
would add descriptive notes (on creating the content of the fields) and
examples. There was further philosophical discussion on how to explain to
users the difference between Content Type and Resource Type.
August 12, 2004
Present: Erin Stalberg, Thorny Staples, Janice Kessler, Beth Picknally Camden, Ann Whiteside
Repo status update
- Thorny gave an update on the Repo development status including revised dates for possible testing.
Continuing Review of Faculty Guidelines for Creating Metadata
- The MSG continued to work through the the Faculty document for Leslie, making changes to definitions and wording and adding examples.
- Erin agreed to take a few questions to Mike Furlough, including discussion of who would be the contact for all the "For further assistance, contact ..." references.
- The latest version of the guidelines can be found here: http://www.lib.virginia.edu/digital/metadata/communityquickchart.html
August 18, 2004
Present: Sherry Lake, Erin Stalberg, Thorny Staples, Janice Kessler, Beth Picknally Camden, Ann Whiteside
Continuing Review of Faculty Guidelines for Creating Metadata
- Preceding this meeting, the MSG reviewed drafts of the Quick Reference
chart and the detailed descriptions.
- In the original guidelines, the introductory paragraphs directed the
Faculty to see their Subject Librarian for advice and help in creating
metadata files. The MSG thought that the Metadata Specialist (Sherry Lake
at the moment) should be the one to contact first. She could start to help
them and refer them to the appropriate subject librarian. After Erin and
Mike Furlough talked, it was decided to refer Faculty to an e-mail list "lib-metadata-help@virginia.edu" which is populated with the MSG members.
Later if needed, Subject Librarians may be added to this e-mail list. Mike
also suggested that the word "Faculty" be taken out of the title for a more
general title "UVa Community."
- The MSG went over the quick chart and the newly created notes section
making corrections and adding clarifications.
The latest version of the guidelines can be found here: http://www.lib.virginia.edu/digital/metadata/communityquickchart.html
September 2, 2004
Present: Sherry Lake, Erin Stalberg, Thorny Staples, Janice Kessler, Beth Picknally Camden, Ann Whiteside
Continuing Review of Community Guidelines for Creating Metadata
- The title for the metadata creation guidelines, formerly Faculty Digitization Guidelines, has been changed to UVa Community Digitization Guidelines. The MSG reviewed this document one last time. There was a discrepancy between the guideline notes for Physical Description "type" (saying it was not required) and the DTD specification (which had "type" required). The notes were changed to make the faculty tell us what type their data represents. "Type" is required, but the DTD definition was changed to allow any value for Physical Description "type."
- We discussed the 4 types of access rights described in the notes section for "Access Rights." It was decided to clarify "restricted" meant Restricted to Library staff for management only and to add a line that if there are additional restrictions (other than those on this chart) to contact lib-metadata-help.
- Erin will create PDF files of the community quick chart and guidelines. Thorny will show the chart and guidelines to the Tibetan and Himalayan Digital Library and to the Nines communities.
- The latest version of the guidelines can be found here:
http://www.lib.virginia.edu/digital/metadata/communityquickchart.html
Review of Labels for Art and Architect Displays
- Each member of the MSG tested the art and architecture collections in the digital library. The week before this meeting bug reports and user comments on the first pass of the GDMS repo interface were sent to Leslie. Thorny's group continues to tweek things to get it working. For the MSG meeting we discussed the labeling of the displays. Many differences we saw were attributed to the differences in GDMS creation. Subjects from AAT are not generally capitalized, while those from IRIS are. Also in the Catlin collection, a thesaurus is listed as a "type" under subject, while the display rules look for "scheme".
- It was noted that the image/title display boxes sometimes displayed an alternate title. It was pointed out that Doug's display rules takes the first title, which may not be the "main" title. The display program should use the "main" title, where ever it appears in the DescMeta record. This has subsequently been fixed.
- The "Get Information" display was too long so the MSG decided that if the fields: Type, Style, Culture, Medium, Techniques had more than one value to combine the values on one line separated by a semi-colon ";". This too has subsequently been fixed.
- We ended the meeting discussing the overall look and feel of the Web Interface.
September 23, 2004
Present: Sherry Lake, Erin Stalberg, Thorny Staples, Janis Kessler, Beth
Picknally Camden, Ann Whiteside, Perry Roland, Greg Murray
TEI <titleStmt> and "type"
- Currently the automatic scripts that transform MARC records into TEI
records puts the volume, date, issue as a "type" value in the <title>
element. At the May 27th meeting, the MSG wanted to see if there was a
better place to put this type of information within the TEIheader. Sherry
found the <biblScope> element in the TEI, but its use is limited in TEI. At
this meeting, we discussed modifying the TEI's DTD to be able to use<biblScope> within <fileDesc> and <bibFull> or use an <idno> element.
There were concerns with modifying the DTD and still be able to share our
TEI files. After some discussion, the MSG decided to have Greg modify the
TEI DTD. Here is an example of the "old" way and the new way:
OLD:
<fileDesc>
<titleStmt>
<title n="245|a" type="main">The Cavalier Daily</title>
<title type="volume"><num value="79">79th Year</num></title>
<title type="issue"><num value="1">Number 1</num></title>
<title type="date"><date value="1968-09-11">Wednesday, September 11,
1968</date></title>
<title type="gmd">[electronic resource]</title>
...
NEW:
<fileDesc>
<titleStmt>
<title n="245|a" type="main">The Cavalier Daily</title>
...
</titleStmt>
<biblScope type="volume"><num value="79">79th Year</num></biblScope>
<biblScope type="issue"><num value="1">Number 1</num></biblScope>
<biblScope type="date"><date value="1968-09-11">Wednesday, September 11,
1968</date></biblScope>
<extent>123 kilobytes</extent>
...
<sourceDesc>
<biblFull>
<titleStmt>
<title n="245|a" type="main">The Cavalier daily.</title>
<title n="247" type="former">College topics, (1890-May 1, 1948.)</title>
</titleStmt>
<biblScope type="volume"><num value="79">79th Year</num></biblScope>
<biblScope type="issue"><num value="1">Number 1</num></biblScope>
<biblScope type="date"><date value="1968-09-11">Wednesday, September 11,
1968</date></biblScope>
<extent n="300"> v. :; ill. ;; 58 cm.</extent>
...
</biblFull>
</sourceDesc>
</fileDesc>
How to display Volume, Issue Information
- With the above decision, the next discussion was how to map and display
volume and issue information. Thorny reminded us that the content of the
metadata file is not necessarily what gets displayed. A Q/A process would
be needed to make sure that the content (and attribute information) in the
elements are correct. For example, if the stylesheet is expecting numbers
in the <num value=""> element for properly sorting items, then a human will
need to check (or maybe to add) the proper value.
<biblScope type="volume"><num value="79">79th Year</num></biblScope>
<biblScope type="issue"><num value="1">Number 1</num></biblScope>
<biblScope type="date"><date value="1968-09-11">Wednesday, September 11,
1968</date></biblScope></bibl>
- Thus, in addition to the change in DTD (from the previous discussion), Greg
will add a Q/A check in his program to double check that for the Cavalier
Daily, volume, issue, and date are included. Greg will also re-do other
multi-volume sets. Janis will be the Q/A person, to make sure the <num> element information is correct. Also for these multi-volume sets,
independent headers will need to be created. Janis will Q/A these too. Greg
asked where to put the Independent Headers; Erin will check with Leslie on
where.
SEL-Mountain Lake Biology Specimens
- Coming up (no date set), the MSG has been asked to look at the cataloging
of biological specimen images. Currently the image data is being entered
(cataloged) into a database called Biota. The MSG had many questions about
the cataloging database (Biota), about who and what is being cataloged and
what do the Mountain Lake scientists want out of the cataloging and digital
imaging of the specimens. A big question was, are these images bound for
the Digital Library?
- A bigger discussion about cataloging image collections started. So far
there are at least 3 image database collection systems; IRIS (Art &
Architecture); Filemaker (Holsinger); and the Tibet, cultural database.
What needs to be done is to have a holistic look at images on a general
scale. Sherry will be the MSG's contact person for the SEL-Mountain Lake
image project and she will attend their meetings (the first of which has
yet to be scheduled).
October 7, 2004
Present: Sherry Lake, Erin Stalberg, Thorny Staples, Janis Kessler, Beth Picknally Camden, Ann Whiteside
Art Department Artemis Database and "Unknown"
- The Art Department uses an Access Database for its Artemis collection. In this database, there are many creators that are "anonymous." The Art Department has asked that they be able to search for "unknown" and they wanted the word explicitly in the record so that they could search explicitly for it. To do this, the mapping into IRIS they would convert their "anonymous" creators to "unknown". This started an intense discussion on "unknown" versus "anonymous" and what these two words mean in different communities. The Fine Arts Library had been using anonymous when truly anonymous and leaving the field blank when unknown. It was decided to map (into IRIS) the Artemis database "anonymous" to "unknown" as asked. But for those records, where the creator is "anonymous", in Art & Architecture collections in IRIS, "anonymous" will stay.
- The MSG then went back to the original TEI to DescMeta mapping. The mapping documentation says that if there is no "creator" (author or editor) then populate the
<agent type="creator"> with "Unknown". The "no-creator" mapping to "Unknown" was not done in the IRIS to GDMS mapping (and the GDMS to DescMeta mapping took as-is the top GDMS divDesc). It was decided that changing the IRIS database to make the creator-agent "Unknown" was better than fixing the IRIS to GDMS mapping program do this. Ann will go back to Jack to see how easy it is to globally update the IRIS database to replace "blank" creators with "Unknown".
- As for the TEI to DescMeta, it was decided to remove the mapping of "no-creator" to "Unknown" and make sure that "Unknown" is encoded in the TEI Headers.
Followup Issues
- Relationships- Ed Lay asked, that for the Jefferson Country images, that the book where these images appear be credited. This brought up many questions, where should credits show, on the search results, on the large image display; how to display image copyrights vs. book copyrights and how to credit the photographer; and how to show explicit credit lines vs. the "general" copyright statement. In the current phase of the DL, copyright and viewing is limited to UVa only. An interim decision was to make sure book-credits were displayed for Jefferson Country, but to leave the broader discussion of credits/copyright display issues to a later date.
TEI mapping of <title type="sort">
- Trying to create sort titles for the DescMeta could present problems for the mapping style sheet (actually the program that figures out what article to strip from a title). Greg asked Erin if it would be better to edit his MARC-to-TEI Header program so <title type="sort"> lines were generated from the MARC title non-filing indicators. The mapping/style sheet program would then simply map the TEI Header <title type="sort"> to DescMeta’s <title type="sort">. As part of the workflow, Janis would double check Greg’s TEI program and edit the <title type="sort"> if needed.
Place to put TGN codes
- For thesauri that have identifier codes, such as TGN, we need a place to put them. It was decided to add the element <identifier> to <covplace>, and <place>. The TGN identifier codes would be mapped as:
<place>
<geogname scheme="TGN">New York City</geogname>
<geogname scheme="UVa">Big Apple</geogname>
<identifier scheme="TGN">7007567</identifier>
</place>
- The element <identifier> (with the "scheme" attribute added to it) will be added to <place> and <covplace> as "*", optional and repeatable. Later the MSG will look at creating entity files with description and registration of the short codes (TGN, etc.) used in the scheme attribute.
- Thorny will look at implications to having "scheme" a global attribute versus each element having the sub-element <authority>.
October 14, 2004
Present: Sherry Lake, Erin Stalberg, Thorny Staples, Janis Kessler, Beth Picknally Camden
Multi-volume sets, Newspapers
- We discussed how to display discovery results of multi-volume sets and newspapers. Would it be useful to display a series name or the individual titles (volumes) of a series? For a newspaper having one link that says "Cavalier Daily" isn’t as helpful as a listing of each volume, issue. Thorny will assemble a group to look at the content model for Newspaper objects in Fedora. This group should also consider similarities and uniqueness among serials and series.
Series Titles
- The MSG agreed to have the TEI DTD modified to allow <biblScope> (such that volume, or version number within a series can be captured) within <seriesStmt>. To make things consistent, it was decided that the decision from the 9/23 meeting of allowing <biblScope> within <fileDesc> and <biblFull> be changed and instead allow <biblScope> within <titleStmt>. It would look something like this:
<fileDesc>
<titleStmt>
<title n="245|a" type="main">The Cavalier Daily</title>
<biblScope type="volume"><num value="79">79th Year</num></biblScope>
<biblScope type="issue"><num value="1">Number 1</num></biblScope>
<biblScope type="date"><date value="1968-09-11">Wednesday, September 11,
1968</date></biblScope>
...
</titleStmt>
<extent>123 kilobytes</extent>
...
<sourceDesc>
<biblFull>
<titleStmt>
<title n="245|a" type="main">The Cavalier daily.</title>
<title n="247" type="former">College topics, (1890-May 1, 1948.)</title>
<biblScope type="volume"><num value="79">79th Year</num></biblScope>
<biblScope type="issue"><num value="1">Number 1</num></biblScope>
<biblScope type="date"><date value="1968-09-11">Wednesday, September 11,
1968</date></biblScope>
</titleStmt>
<extent n="300"> v. :; ill. ;; 58 cm.</extent>
...
</biblFull>
</sourceDesc>
</fileDesc>
AND for Series
<sourceDesc>
<biblFull>
...
<seriesStmt n="440">
<title n="440|a">Methods in enzymology</title>
<idno n="440|x" type="ISSN">0076-6879</idno>
<biblScope n="440|v” type="volume">
<num value="385-386">v.385-386</num>
</biblScope>
</seriesStmt>
...
</biblFull>
</sourceDesc>
- Once it was decided where to put the series information, the MSG discussed which MARC series field should be captured: 440, or 490 and 830. Erin briefed the MSG members on the differences between these 3 MARC series fields. For TEI, tags 440 and 830 are captured within a<seriesStmt n="tag"> element. If there is a 490 field with the first indicator as "0", this will be mapped to the <notesStmt><note> element.
- Last week’s MSG (10/7), decided to generate <title type="sort"> lines based on MARC title field indicators (then the <title type="sort"> would map directly to DescMeta <title type="sort">). The MSG forgot to also include generating TEI Series <title type="sort"> lines based on the indicators for fields 440 or 830.
For Future Thought
- There has been some discussion as to what is going into the Discovery Index, what is SIRSI’s role in the Digital Library, will there be 1 federated search index for both (DL Repository and SIRSI)? The MSG will reflect and discuss these points at a later time.
November 4, 2004
Present: Sherry Lake, Erin Stalberg, Thorny Staples, Janis Kessler, Beth Picknally Camden, Ann Whiteside, Judy Thomas
Image Cataloging
- Martha's Managers have asked the MSG to look at all types of image collections at UVa in order to define requirements for image cataloging tools.
- Ann and Judy are co-chairs of the Image Group and briefed the rest of MSG on the discussions of their group. The Image Group looked at tools that all Library units were using for digital images, audio and videos. Iris development has been slow and UVa may need to assess new image cataloging software. Ann took a wish list to the IRIS consortium on features for IRIS that would make it more appealing and usable for other Library units.
- As a first task, the MSG will identify and review all current UVa image collections (Art and Architecture, Tibet, African, RMDS ? Holsinger, Jackson Davis, UVa Prints) and potential collections (science specimens, scanned maps, satellite images, astronomy, religion). The MSG will invite experts (of specific image collections) to future meetings. After the individual collection meetings conclude, the MSG will invite all image collection experts for a larger discussion.
- The MSG will be looking for short term (3 year) solutions that will not need much development. We will be looking at metadata needs that are usable in the short term that will carry over to the long term.
- At the next MSG meeting, Sherry and Thorny will take the MSG through the Holsinger collection online and Judy will review the Tibetan collection.
December 2, 2004
Present: Sherry Lake, Erin Stalberg, Thorny Staples, Janis Kessler, Beth
Picknally Camden, Ann Whiteside, Judy Thomas, Bob Thomas
Reviewing the Holsinger Image Collection
- Thorny and Sherry demonstrated the Holsinger Image Collection. The descriptive information for each image is being kept in a Filemaker database. Each image is an individual work. Some descriptive information is put into EAD. EAD is being used because of the images archival nature. The MSG discussed different approaches such as treating this one database with one large collection-level finding aid (1 complex EAD file) or many smaller EAD files, possibly, based on one of the subjects (called genre in the database). The description data could be enriched to represent relationships between single images.
- One concern with this database is the lack of quality control. The collection could also benefit from data enrichment (global subjects, etc.) It was agreed that the contents of the database could be imported to IRIS, but not without cleaning up and normalization.
- We then discussed how archival images could be added to the Art & Architecture Database. Would it be best to have multiple instances of IRIS for each image collection, or have one big IRIS database with different indexes. Then we discussed how federated searching would work with one big database, versus multiple databases. If there were to be one large database, a new "Image Series" field would be needed to distinguish the different collections.
December 9, 2004
Present: Sherry Lake, Erin Stalberg, Beth Picknally Camden, Ann Whiteside
Set Codes for Lewis and Clark
- The Lewis and Clark TEI records already have one series defined: UVA-LIB-ModEngl which maps to DescMeta’s <set code>. New series "Lewis and Clark" and "The Westward Exploration" need to be added, but in TEI the <seriesStmt> is not repeatable. The MSG agreed to have Greg modify UVa’s version of TEI to allow <seriesStmt> to be repeatable.
getChildDescMeta Display
- The labels for the "getChildDescMeta" display had not been reviewed by the MSG. The information for this disseminator is to come from GDMS metadata (even though its name contains "DescMeta"). The MSG had already identified the elements and their screen display for the disseminator getDescription (which gets its info from DescMeta). This was not being used. So the MSG decided to review the labels for getDescription and add any other elements that would be necessary for getChildDescMeta.
- We then discussed the relationship between getChildDescMeta and the collector tool. If the user "collects" images, they should get more information than what shows up about the child. After the meeting Erin clarified things with Ross and Leslie. The collector tool feeds the website creation. Any top <divdesc> information that needs to be exported to the websites needs to be in the getChildDescMeta dissemination. People can use the collector tool without using the DL, so once users create a collection they'll need all the metadata, not just the information about the child. Since it does seem that the collector tool and getChildDescMeta need full metadata, we adapted the getFull disseminator labels with the following changes: deleted <language> and <identifier>, added <res><agent>, <res><subject>, <res><time>, <res><rights>, <res><adminrights><policy>.
December 17, 2004
Present: Sherry Lake, Erin Stalberg, Beth Picknally Camden, Ann Whiteside, Thorny Staples, Bob Thomas, Janis Kessler
Conceptual Relationships
- The MSG discussed how to record "conceptual" relationships for the Tibetan images, such as Ethnicity, etc. A religious sect may have a relationship with a site, building, etc., but it's not really a culture. Thorny has proposed to David Germano to use <corpgroup> (in GDMS) until the MSG comes up with something better. The MSG consensus was that a term was needed for these relationships, but that <corpgroup> was a little too narrow, and there are other kinds of conceptual relationships, affiliations or associated relationships. For relationships we discussed "sect" persons in a photo belong to, "institution" that own or occupy a building (if the building is known by that name), "affiliation" of persons by a particular organization. What ever term is used, it will require <time> inside it, for instances when something/someone is associated with the work for only a certain period of time.
- Bob suggested that Barbara Tillet, from, the Library of Congress, has come up with come categories of relationships. Bob Thomas will do research and follow up.
- In addition to <time> being needed with this "new" term, it was discussed whether <subject> also needed it.
EAF Metadata
- Erin announced that for the EAF TEI texts, a new subject-scheme was being used "uva-etc". This new scheme will be used for the genre and gender keyword-terms used in EAF files.
|