Digital Initiatives Home About the Digital Initiatives Services Research and Development Metadata Reports Ask Questions Virgo Catalog
University of Virginia
University of Virginia Library
Digital Initiatives: Metadata

Minutes of the Metadata Steering Group (MSG)

Metadata Home > Metadata Steering Group > Past Minutes

 

2003
  September 25
  October 2 | October 9 (Open Meeting) | October 16 | October 21 | October 30
  November 3 | November 13 | November 20
  December 4 | December 11 | December 18
 
2004
  January 6 | January 15 | January 22 | January 29
  February 5  |  February 12  |  February 19  |  February 26
  March 4  |  March 18  |  March 25
  April 1  |  April 6  |  April 13  |  April 29
  May 13  |  May 20  |  May 27
  June 3  |  June 17  |  June 23
  July 8  |  July 22  |  July 29
  August 5  |  August 12  |  August 18
  September 2  |  September 23
  October 7  |  October 14
  November 4
  December 2  |  December 9  |  December 17
 
Minutes home

 

September 25, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples, Ann Whiteside

This was the first meeting of the newly established MSG!

Review of Charge:

  • The charge was clarified: where the text of the charge reads "... will be responsible for keeping up with changes and adjusting mappings and the DescMeta schema to stay aligned with external standards ..." the words "DescMeta schema" will be replaced by "metadata schema for the UVa Digital Library" so as to include administrative/technical metadata as well.
  • The MSG defined its role as being the group that will officially weigh-in on all standards and mappings for all items being ingested into the Central Repository; will address migration issues as standards evolve; will be the provisional authority that will amend and enforce all DescMeta/AdminMeta standards and best practices for ingestation into the DL; will develop recommendations for where that authority will ultimately lie in the production environment; will balance the tensions between the deeper needs of individual communities and the merging together of these communities for the DL; and will be responsible for Library-wide metadata education encouraging those communities to think broadly and flexibly about metadata issues.

Role of the "experts"

  • The charge identifies experts in the various international standards (TEI, MARC, VRA….) and various content domains (science, music, etc.). The group will bring these people in as needed for consultation and/or work on best practices, particular mappings, or migration issues. The experts include: Beth Blanton-Kent (Science)
    • Bradley Daigle (EAD)
    • Edward Gaynor (EAD)
    • Matt Gibson (TEI)
    • Greg Murray (TEI/DLPS)
    • Mary Prendergast (Music)
    • Andrew Rouner (TEI)
    • Christine Ruotolo (TEI)
    • Judith Thomas (VRA/GDMS/Audio/Video)
    • Jama Coartney (DLPS)
    • Leslie Johnston (CenRepo)
    • Ross Wayland (CenRepo)

Please email Erin with any appropriate names missing from this list. The experts will all be invited to a meeting of the MSG in the next few weeks to put all their issues/concerns on the table and help to define priorities for the MSG. The people on this list are also encouraged to bring their metadata issues to the MSG for discussion/consultation at any time as well as to subscribe to lib-metadata.

Meeting times

  • Meeting times were arranged.

Setting priorities

  • There is an immediate need for both practical and philosophical work for the MSG. The group will officially evaluate and approve the TEI mapping proposal (available at http://www.lib.virginia.edu/digital/reports/teimap.html) . Edward, Bradley, and Erin have begun discussions on the EAD mapping; work is beginning now and the proposal will be taken to the MSG for evaluation/approval ASAP as well. On the more philosophical front, there is an immediate need to make decisions regarding content and carrier. Are we describing the intellectual content of the work in our descriptive metadata or are we describing the electronic version that we "hold" in hand?

Next steps

  • Thorny will present to the group a picture of how the metadata fits in with the CenRepo architecture so that the MSG is all operating with a common base understanding.
  • The MSG will invite all of the experts to a meeting in the next few weeks to put all their issues/concerns on the table, to help lay out and frame the questions, and help to define priorities.

 

October 2, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples, Ann Whiteside

Picture of how the metadata fits in with the CenRepo architecture

  • Thorny distributed a diagram representing the current vision of how the metadata will fit into the CenRepo/Fedora architecture and discussion ensued. The extraction of DescMeta from the native metadata schema (TEI, EAD, GDMS, etc.) will be entirely automated. Nobody will touch the actual DescMeta records.
  • Much of the discussion, therefore, surrounded roles and responsibilities and how we would resolve issues and problems in a distributed environment.
  • Also, when a user searches the Digital Discovery Index, s/he is searching that metadata only. How much metadata can we afford?

Agenda planning/discussion/goals for the Expert Meeting

  • The MSG invited stakeholders in metadata decision-making for the Digital Library to an open meeting on October 9th. We will encourage participants to put their issues/concerns on the table, to help lay out and frame the questions, and to help the MSG define its priorities.
    • Invitees include those people identified previously as Library standards/content experts; subscribers of lib-metadata; Martha.
    • Conversation will likely be all over the place, but we will try to sort thoughts into four main categories: 1) Infrastructure and Tools; 2) Standards for data tagging; 3) Standards for data content; 4) Workflow issues

TEI mapping

  • The MSG began considering the TEI mapping done previously by Dan McShane and currently living posted on the Digital Initiatives website. Questions included:
    The TEI elements part of the series statement are all mapped to <relation>?
    • Should the notes statement be mapped to <description>?
    • Text Classification says it maps to Subject. The second sentence reads "Terms denoting specific literary forms (prose, poetry, etc.) should be mapped to the Form element contained in MediaType." Prose, poetry, and other similar terms are forms, but in this instance aren't they subject terms?
    • Revision Description -item - maps to Agent type="contributor" form="persname". Shouldn't this map to Item?
    • Thorny suggested we flip the chart and map DescMeta-from-TEI rather than TEI-to-DescMeta.

 

October 9, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake , Greg Murray, Bradley Daigle, Edward Gaynor, Melinda Baumann, Leslie Johnston, Ross Wayland, Beth Blanton-Kent, Andrew Rouner, Nadine Ellero, Ronda Grizzle, Chris Ruotolo, Matt Gibson, Jama Coartney

Welcome & Introduction

  • Erin welcomed the group and thanked them for coming. She distributed the MSG charge and encouraged them to join lib-metadata. The MSG invited them to this meeting because they are all stakeholders in metadata decision-making for the Digital Library. The MSG encouraged the participants to put their issues/concerns on the table, to help lay out and frame the questions, and to help the MSG define its priorities.

Sketch of Search Services for the Digital Library

  • Because Thorny was out sick, Erin distributed his diagram, representing the current vision of how the metadata will fit into the CenRepo/Fedora architecture. Highlights:
    • The extraction of DescMeta from the native metadata schema (TEI, EAD, GDMS, etc.) will be entirely automated. Nobody will touch the actual DescMeta records. Creation and maintenance of metadata records will occur entirely in the native schema. Catalogers/Metadata creators will need expertise in each of those schema and will need authority to update metadata in its home location. With new materials ingested into CenRepo, the DescMeta record will be created as part of the ingest and distributed to the Digital Discovery Index. Metadata will also simultaneously populate indexes particular to the type of resource (i.e. Modern English, Art & Architecture, Finding Aids, etc.). When metadata records need updating, the work will be done in the native schema and re-disseminated to the Digital Discovery Index and the native index.
    • The DescMeta record will represent selective metadata mapped from the native schema. When a user searches the Digital Discovery Index, s/he is searching that metadata only. When the user searches an index for the native schema (i.e. Modern English Index/TEI), s/he will be searching the full metadata and/or the full text of the resource.

Roundtable discussion of metadata issues/concerns/priorities

  • Beth introduced broad categories of issues to help focus the discussion. Discussion jumped all over the place and Janis organized them into their appropriate category. In these minutes, the discussion is summarized first and then the points are categorized. Names are only mentioned when relevant to a particular context or workflow.
  • SUMMARY
    • Melinda, Greg, & Jama: what about plates/pictures/figures in books? & Beth Blanton-Kent: diagrams/graphs/tables? DLPS is not currently taking advantage of the “FigDesc” tags. There are workflow issues: how to do the tagging? how to describe the items (form: photographs, drawings, etc.; subjects: dogs, fishhooks, etc.), who has that content expertise?, who will do the description/the tagging? The group believes that these resources should be discoverable in the Digital Discovery Index alongside regular images.
    • Bradley needs a policy decision for ingesting Special Collections orphans. They have little metadata from the TIFF header and probably an associated MARC record for the parent.
    • Worthiness of items – are certain items more “worthy” of having their plates/pictures/figures/diagrams/graphs/tables described? Who decides? Does this need to be a policy decision or do we do this by hand? In the traditional print world, Catalogers look at each book and pull out relevant subject terms/name headings, etc., to be indexed for the user. Is this do-able/scalable for the DL?
    • Who is going to be doing the metadata? – student labor? training?
    • Staffing -- humans will have to touch data at some point in the process.
    • If Catalogers/Metadata creators are creating and updating records in their native schema, they need:
      • tools (i.e. Rob's GDMS tool) in each of the native areas to work with.
      • the authority to create/update/edit in each native schema.
    • Who is going to make the tools to do this? Who has authority to edit somebody else's TEI (etc., etc.) metadata? Who has the subject expertise?
    • Volume/scalability problem – what do we have enough staff to do?
    • First time creation v. long-term maintenance? Enrichment?
    • Versioning – Fedora will keep track of all versions of the metadata. We will be able to clean up all the versions, but who has the authority to do this? Only the repository manager? Who is the repository manager? Can we do clean up or make decisions about when to clean up various versions item-by-item? Or collection-by-collection?
    • Enforcement of rules – we can enforce tagging rules programmatically, but how do we enforce content rules? Bradley: do we need a Metadata Honor Code? Enforcing content for a meaningful discovery index will require a human element!
    • Bottlenecks -- human intervention will necessarily cause bottlenecks at some points in the process. What is acceptable?
    • How much metadata can we afford?
    • Metadata assessment should be part of the selection process. If the collection we want to purchase does not contain adequate metadata, local enhancement needs to be considered as part of the purchase cost.
    • Collections/aggregations of objects – how do we represent this in the metadata? There is a need to establish inter-relationships between resources in the metadata (i.e. by LC call #, Subject, name authority, etc.)
    • Broad vs. specific subjects -- the needs for the humanities is obviously different from the needs for the scientific communities. If one community creates subject headings, another must have the authority to update the metadata by adding subject terms to suit their needs. Where does Health Sciences fit in here (MESH v. LCSH)? Will Health Sciences have authority to add subject terms for DL records (in the native TEI, GDMS, EAD, etc.)?
    • What is the minimal set of elements required for meaningful discovery?
    • Images are different than text in that the user is not generally looking for a known item.
    • Science is different in that the searching is often more granular.
    • We should compare the art & architecture slides with the scientific slides. How is the metadata different?
    • Admin & technical metadata need to be fleshed out. Technical metadata can be acquired programmatically, where is it maintained?
    • Rights management must be a priority. (Right now, Fedora can do IP restriction, more coming Jan. 2005). We need both to restrict materials UVa has purchased and to have a place to note usage restrictions on UVa Special Collections materials. Fedora rights management will be “rules based”, we should follow that same path now.
  • BY CATEGORY
    • Infrastructure and Tools
      • If Catalogers/Metadata creators are going to create and update records in their native schema, they need:
        • tools (i.e. Rob's GDMS tool) in each of the native areas to work with.
        • the authority to create/update/edit in each native schema
      • Staffing -- humans will have to touch data at some point in the process.
      • Bottlenecks -- human intervention will necessarily cause bottlenecks at some points in the process. What is acceptable?
      • What about plates/pictures/figures/diagrams/graphs/tables in books? If we make them discoverable like other images in the Digital Discovery Index, who will do the identification/metadata creation?
      • How much metadata can we afford?
      • Metadata assessment should be part of the selection process. If the collection we want to purchase does not contain adequate metadata, local enhancement needs to be considered as part of the purchase cost.
    • Standards for data tagging
      • What about plates/pictures/figures/diagrams/graphs/tables in books? How do we make them discoverable like other images in the Digital Discovery Index?
      • What is the minimal set of elements required for meaningful discovery?
      • Admin & technical metadata need to be fleshed out. Technical metadata can be acquired programmatically, where is it maintained?
      • Rights management must be a priority. (Right now, Fedora can do IP restriction, more coming Jan. 2005). There are needs both to restrict materials UVa has purchased and to have a place to note use restrictions on UVa Special Collections materials. Fedora rights management will be “rules based”, we should follow that same path now.
    • Standards for data content
      • Collections -- there is a need to establish inter-relationships between resources in the metadata (i.e. by LC call #, Subject, name authority, etc.) B
      • road vs. specific subjects -- the needs for the humanities is obviously different than the needs for the scientific communities. If one community creates subject headings, another must have the authority to update the metadata by adding subject terms to suit their needs. Where does Health Sciences fit in here (MESH v. LCSH)? Will Health Sciences have authority to add subject terms for DL records (in the native TEI, GDMS, EAD, etc.)?
      • Who has the subject expertise?
      • Authority control
      • Enforcement of rules – we can enforce tagging rules programmatically, but how do we enforce content rules? Bradley: do we need a Metadata Honor Code? Enforcing content for a meaningful discovery index will require a human element!
    • Workflow
      • Policy decision on ingesting Special Collections orphans & how much metadata
      • Worthiness of items – are certain items more “worthy” of having their plates/pictures/figures/diagrams/graphs/tables described? Who decides? Does this need to be a policy decision or can we do this by hand? In the traditional print world, Catalogers look at each book and pull out relevant subject terms/name headings, etc., to be indexed for the user. Is this do-able/scalable for the DL?
      • Who will be doing the metadata?
      • Creation v. long term maintenance?
      • Value added/enhancement?
      • What about Fedora versioning? – how easy will it be in Fedora to change/update metadata? Batch tools also not currently available in Fedora.
      • How does Health Sciences fit in?

 

October 16, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples, Ann Whiteside

TEI mapping.

  • The MSG began considering the TEI mapping. Erin and Thorny began the morning by "flipping" the TEI mapping that was done a while back and is currently posted on the Digital Initiatives website. Instead of:

    identifying TEI elements and where they map TO in DescMeta

we would:

identify DescMeta elements and where they map FROM in TEI.

With the map re-focused, we believe we'll be better able to articulate a vision for what we want in DescMeta for the purpose of a meaningful discovery index. Flipping the chart would also force better articulation of the minimal requirements outlined last spring. What is minimal for meaningful discovery? With the chart flipped, the MSG starting reviewing DescMeta.

Highlights:

  • The list of DescMeta elements on the Digital Initiatives website is NOT correct (<covspace> should be <covplace>; <date> should be <time>, <place> element should be added); the DTD however and accompanying documentation are correct. Erin will talk to Leslie about permissions for updating the website and will get this corrected.
  • Discussion of the model of fileDesc and sourceDesc. In TEI, the idea is that information about the original source should live in sourceDesc and information about the electronic version (electronic publisher, etc.) should live in fileDesc. Thorny proposed, and the MSG agreed, that DescMeta specify elements as belonging to the original item or belonging to the surrogate. Original publisher/surrogate publisher; original extent/surrogate extent, etc.; how to represent this will be further discussed (i.e. type="surrogate"; type="original"; class="surrogate"; class="original"?)
  • <agent> -- the minimal requirements outlined last spring by the Digital Library Metadata Review and Planning Group required agent. Agent is used all over the place (type="creator"; type="contributor"; type="compiler"). With TEI objects, what we really want to require is at least one <agent type="creator">.
  • <covplace> and <covtime> -- discussion about the feasibility of mapping LCSH geographic fields to <covplace> and LCSH chronology fields to <covtime>. Also, clarification: <covplace> is place of coverage; <place> is place of publication.

 

October 21, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples, Ann Whiteside

Special Collections

  • Erin, Beth, and Janis reported on a meeting with Edward and Melinda regarding a Special Collections issue. Special Collections is asking DLPS to digitize "random" documents out of a collection which currently contains only collection level description in VIRGO. DLPS/Cataloging workflow, up until this point, has had DLPS extracting VIRGO records for their base TEI header which Janis then enhances. Only collection level records exist in VIRGO for the SC materials, however, and SC is selecting particular items for digitization. There is no item level description in either VIRGO or in the EAD finding aid. The MSG spent quite a bit of time on this discussion and resolved that more input from Melinda and Edward will be needed. We do feel, however, the need to frame this as a cost-benefit analysis. Metadata should be created where it is best suited for the native resource -- in this case, the EAD finding aid. Data can be extracted and manipulated with a fair amount of ease, but will require a human to create it in the FIRST instance. The process needs to start somewhere; we need to find the resources.

TEI mapping

  • Thorny proposed that we add a new <surrogate> element to the DescMeta DTD, which would be recursive and contain all other elements except for itself. The top level element <desc> will change to <descmeta>. All elements immediately below <descmeta> will describe the original print resource. All elements under <surrogate> will be applicable to the electronic surrogate only. Born-digital items are electronically "original" and therefore will not have surrogate elements. The MSG approved this proposal. Erin reported that Leslie is getting the DTD & related documentation moved to the new Digital Initiatives site and then Erin & Thorny will have permissions to update it. The DTD will then be adjusted to reflect this change. Unless explicitly designated as falling under <surrogate>, elements should be mapped to the highest level (<descmeta>)
  • <agent>--in TEI, sometimes names are structured (last names & first names in separate fields), sometimes unstructured (last name, first name, date). If the data is unstructured, we will grab the entire contents and tolerate the presence of date data.
  • <authority>--refers to where they got the content of the data. We decided to ignore this for the TEI, as it is understood that the data comes from the native schema.
  • <covplace> and <covtime> -- We decided to test out the feasibility of mapping LCSH geographic fields to <covplace> and LCSH chronology fields to <covtime> by parsing the MARC coding. We'll have to consider the results after programming.
  • <culture>--not relevant for TEI
  • <description>--we'll need content rules for notes. What qualifies as an important note and what just causes clutter in a search? For mapping, we'll grab what notes are available in the TEI, but we need to revisit this as part of a future best practices' discussion.
  • <form>--we decided to remove <form> as a top level element, because it is also available as an element of <mediatype>. It was unclear why this had been assigned as a top level element and the MSG agreed to discard it. <mediatype><form>prose</form></mediatype> v. <form>prose</form> does not affect searchability. In a broader discussion of the purpose of <form>, it was acknowledged that this could be a highly useful element, but we're not sure how it has been applied. How do you decide between assigning a work prose and assigning it non-fiction? The values would need to be drawn from an authority list and have very clear definitions. We will leave this out of the current mapping but need to revisit.
  • <identifier>--the top level identifier will be the Fedora PID. The <surrogate> identifiers will include type=ISBN (although we don't think the current texts have ISBNs) and type=UVa Title Control Number, which we believe to be more stable than UVa Virgo ID, (the item barcode). UVa Title Controls may still disappear: if all items on a particular record are withdrawn; if a bibliographic record is replaced; or if the bibliographic record was created only for the purpose of DLPS extraction (where we don't own the book being digitized); but we still believe them to be more stable than barcodes. If we want to use this number down the road to link back to SIRSI, we would just need to make sure we provide users a good, clear error message if the Title Control number no longer exists.
  • <mediatype> will be populated with type=text for all TEI objects.
  • <mimetype> will be populated with "text/xml" for all TEI objects.
  • The MSG believes (hopes!) we need only one more meeting to finish the TEI mapping and then it will be given to Perry to begin programming. After we get a first-pass at programming, the MSG will invite the TEI experts back to review the results.


October 30, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples

TEI mapping:

  • The MSG continued reviewing the TEI mapping. Highlights:
    • <covtime>. The DTD currently requires a type= for <date>. The MSG voted that this is unnecessary & will change the DTD.
    • <mediatype>. The DTD currently allows <mediatype> to be repeatable. The MSG voted that this element should not be repeatable & will change the DTD.
    • <place><geogname>. Places have different content rules according to various standards, which will affect meaningful searchability. We will consider adding an attribute to the DTD to all for TGN codes to normalize <place> data.
    • <relation>. Leave off for now. Thorny presented the ideas he and Ross are discussing for kinship metadata. Kinship metadata will handle primary parent/child relationships. More complex relationships will need to be hashed out later.
    • <rights>. TEI <availability>copyright 2000 ...</availability> will map to <rights type="copyright"> The MSG clarified that the access info. will be coded in administrative metadata and will drive the DescMeta <rights type="access">. The DescMeta text will then render to the user. Use info., however, is not coded in TEI separately from <availability> which is problematic because it cannot, therefore, be mapped to AdminMeta. Sherry & Erin will work on a proposal for where to encode this in the TEI. The MSG will then present this (through Beth & PT Services Council) to the TEI Experts.
    • <subject>. Change: <subject><authority scheme ="LCSH"> to <subject scheme="LCSH">. The MSG voted that a URL link to the source of the scheme is unnecessary, voted to delete <authority>, & will change the DTD. The MSG also decided to scheme everything; if nonexistent in TEI, the default would be: <subject scheme="unknown">. We will need a practice rule for naming schemes. There was discussion about providing very high level subject access following a local UVa scheme. Thorny said he had tried to no avail in previous projects to find an established list of very high level subject categorization. Beth suggested the list that Cataloging uses to categorize ejournals. These were based on the academic departments and were developed working with subject selectors to tie these to LC classification numbers. If the TEI contains LC class numbers, this could be mapped. The list is available at: http://www.lib.virginia.edu/cataloging/policies/drafts/ejcodes.htm

DTD

  • Thorny will give the MSG a DTD-reading tutorial at a follow-up meeting.


November 3, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples, Ann Whiteside

TEI mapping

  • The MSG continued reviewing the TEI mapping. Highlights:
    • Sherry asked about TEI v. TEI Lite. Thorny explained that DLPS is using TEI Lite because it allows us tighter control than TEI. Beth offered to buy a print copy of the DTD if that would be helpful to us. [A clarification from Greg, at DLPS, after the minutes were distributed on lib-metadata, on the TEI Lite issue: "DLPS does NOT use TEI Lite (which is a very loose DTD -- not more tightly controlled than TEI -- simply omits the more obscure tags). Instead, DLPS uses a customization of TEI, specifically designed to provide tighter control over TEI documents. Perhaps the confusion arises from the fact that our customization has sometimes been called TEI Tight."]
    • <subject>. Erin asked to review <subject>. Erin & Beth went to a meeting on Friday where MODS was discussed. The DescMeta coding is: <subject scheme="LCSH" type="topic">Religion</subject> <subject scheme="LCSH" type="geographic">United States</subject> MODS coding is: <subject><topic>Religion</topic><geographic>United States</geographic></subject> Erin asked for a clarification on the difference and Thorny explained that in the MODS example, "Religion" and "United States" are related to each other by being present both inside <subject>. In the DescMeta example, they are two different subject headings. Traditional MARC cataloging has drawn semantic distinctions between that different type of coding and Erin, Janis, and Ann felt that it is important for us to consider that (pre-coordinated v. post-coordinated headings). A general discovery keyword search will result in both records, but once you have found a particular record and are looking for other like records, users need the advanced complexity. If you are specifically looking for "Religion IN THE United States", the current DescMeta coding makes it difficult. The MSG voted to keep the structure as is for the current mapping, but to consider a more complex structure down the road (DescMeta 2.0!) Thorny also mentioned that MODS had been considered in early DescMeta discussions but was rejected become it is very textual in nature and wouldn't manage well the image collections.
    • TEI <edition> will map to DescMeta <description type="edition">
    • <time><date>. Valid types include type="creation" ; type="publication" ; type="revision" The MSG believes (for the moment) that revision can encompass textual reprints, etc., as well as architectural additions.
    • <surrogate><time><date>. The date the resource was put on the first server. The date of ingest into Fedora will be covered by the object's audit trail.
    • <title>. type="primary" -- the MSG voted to always explicitly tag type="primary". Also available will be type="series" ; type="alternate" ; type="parallel" ; type="sort" -- Initial articles will cause trouble for browse lists. The mapping script should take type="primary", strip out articles (from an attached stopword list), and put the result into type="sort". We'll do this for the legacy data which, for the current target collection, is all English and should be ok to just run through a stopword list. Human judgement is really needed here however (i.e. Los Angeles, El Cid, Le Corbusier, etc.), so the MSG will recommend a sort title proposal for future TEI markup.

We will review the minimal requirements by email.

 

November 13, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples, Ann Whiteside

1) TEI mapping

  • The group FINISHED reviewing the TEI mapping! We made a few small last changes & clarifications regarding notes and examples on the spreadsheet. The only significant change was a vote to add <name> as an element under <agent> to make the string more clean. Thorny will update the DTD, Erin will update the file and send it to Perry to begin programming. Adam Soroka, from Robertson Media, will actually be working for Perry on this project. Erin will update the metadata website with the new mapping and send out an announcement to lib-metadata. When the programming is complete, we will invite the TEI folks back for a review of the results.

2) Minimal requirements

  • The group reviewed & clarified the list of minimal requirements proposed by the spring Digital Library Metadata Review and Planning Group and approved by (then) IT Council.
  • Highlights & Changes
    • ID. The Fedora PID -- <surrogate><identifier>
    • Title. <title type="primary"> -- one and only one (type="primary" not repeatable)
    • Agent. <agent type="creator"> -- at least one (for the EAD, it will likely be one <agent type="repository"> -- we will require one <agent> but the type can vary based on the collection). Populate with "Unknown" if anonymous or otherwise unavailable.
    • Mediatype. <mediatype> -- one and only one (not repeatable)
    • Rights. <rights type="access"> -- the MSG will put together a proposal for the TEI experts to code in access rights more explicitly (in <availability>), but we know that all the texts going into phase 1 & 2 are publicly accessible. Right now, therefore, will default in <rights type="access">Unrestricted</rights>
    • Desc. The spring group intended this to be Description, but the MSG does not believe <description> to be required for minimum. We will drop this from the list.
    • Date & Creation date -- These were 2 separate items on the spring list. The Fedora ingest date will be automatic. The MSG believes we should require <surrogate><time><date type="creation"> as the date the resource was put on the first server. This info. will be useful for evaluating our digital collections over time. After a lengthy discussion of which "original" date to require, we chose to require at least one <time><date> of any date type (which could be creation, publication, revision, whatever). For a collection to be minimally acceptable for ingest, some original date info. must be encoded. In extreme situations, folks can appeal to the MSG for populating the field with "Unknown"; the MSG will decide if this is acceptable on a case-by-case basis.
    • Erin will write up a revised document that Beth will take back to PT Services Council for approval.


3) DTD Tutorial

  • Thorny provided the group a tutorial on reading a DTD. The group voted to add <name> under agent (see above under TEI mapping).


4) Planning/Prioritization

  • The MSG will look next at the administrative elements and then move on to EAD.

 

November 20, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples, Ann Whiteside

1) Discussion of Perry's comments/questions

  • The TEI mapping went to Perry on Tuesday. Adam Soroka will be working with Perry and doing the actually programming. Perry had emailed a number of questions that needed follow-up from the MSG:

    a) Question: Shouldn't mimetype be recorded at the <surrogate> level, not at the <descmeta> level? It seems to me that mimetype only applies to the electronic surrogate. Action: MSG agreed that for the TEI mapping, mimetype applies to <surrogate>

    b) Question: Similarly, doesn't the rights info contained in fileDesc/publicationStmt/availability (p.12) and the blanket access statement (p.13) belong to the surrogate? These don't apply to the original which has its own set of copyright and access data. Action: MSG agreed.

    c) Question: The DTD now requires mediatype at both the descmeta and surrogate levels. Wouldn't a surrogate normally inherit its mediatype from the higher level? If so, then mediatype should be optional at the surrogate level, but remain required at the descmeta level. Action: the MSG indeed wants the mediatype to be required at both levels. There are instances in which the mediatypes may actually be different (the original is a building, the surrogate is an image of the building, etc.) It is not always inherited. In the current TEI instances, they will be the same, however, and Adam should populate "text" at both the descmeta and the surrogate levels.

    d) Question: Element and attribute pairs, i.e., <identifier type="UVaPID">, <titletype="primary">, <rights type="access">, <time type="creation">, can't be required in a DTD. Either the DTD must be changed, that is, requiring elements such as <uvapid>, <primarytitle>, etc., or a secondary level of checking, i.e., a "quality assurance" step, must be implemented. I'd recommend the first option. The advantage of a more constraining DTD is that it catches errors earlier in the metadata creation. We could discard the current, general DTD, make the new DTD a customization of the old one, or make the new DTD mappable to the old one. Action: the MSG decided not to change the DTD because DescMeta will always be extracted from the native schema. No one will be entering DescMeta records by hand and therefore the suggested enforcement really needs to happen in the native schema. This will need to be taken up separately. [Later clarification from Thorny: "The real point here is that we do not have a complete list of values for those attributes and we won't for some time, if ever. I was saying today in the meeting that one approach would be to develop a specific DTD for each of the crosswalks that could have specific lists of values, i.e. one for TEI, one for EAD, etc. Then I said that, because all of this metadata extraction will be done programmatically, the extraction programs are a point of control on the attribute values anyway."  Note: the MSG will invite Perry to the next meeting for further discussion of this point and finalization of the DTD]

    e) Question: Is the term "thing" in the list of values allowed in the type attribute for mediatype intended to be a catch-all for anything that doesn't fit the other categories or is it intended to mean physical object? If the former, I'd suggest changing the term to "other". If the latter, then I'd suggest "physicalobject". "Thing" is too ambiguous -- all the other values are things too. Action: "Thing" refers to a physical object; it was the best term Dan McShane could come up with as the time for such objects. "Entity" couldn't be used because it is a reserved word. Physicalobject is now available in Dublin Core and the MSG will adopt this term.

    f) Question: With regard to creator, I think "anonymous" and "unknown" are different things. To me "anonymous" means that there is an author, we just don't know his name, while "unknown" is a substitute for "no value was supplied here". They represent different levels of uncertainty. Action: MSG agreed. If, at the point of creating/editing the TEI header, someone wants to code something as <agent type="creator">Anonymous</agent>, they should do so. But, if we are automatically populating a field because the element is minimally required and the TEI doesn't have an <agent type="creator"> value, then it should be populated with "Unknown" because no value was supplied there.

    Erin will update the mapping files. Erin also met with Perry today to learn to use the program for generating the documentation from the DTD. Erin will update those files as well.

2) Administrative metadata.

  • Highlights:
    • Digiprov. Will be taken care of by the Fedora audit trail
    • Rights. Access is who can use the resource, Use is what you can do with it. Use and access notes are grouped under <policy> so that they can be applied in pairs, i.e. Access available to UVa users for certain uses and access available to non-UVa users for other uses. For now, the MSG recommends four different access terms: unres (unrestricted, i.e. publicly available); uva (UVa only); viva (VIVA only); res (restricted, i.e. to only authorized library staff). Display rending to the public will come from the <rights> element in DescMeta. AdminMeta will enforce policies, except that currently the only restrictions we can put on Fedora is to put the whole CenRepo into an IP box and restrict the entire database to UVa users. The MSG has on its agenda a plan to work up a proposal to the TEI folks to code access and use. For the phase 2 TEI collection, adminMeta can be populated with unres for all materials. When we get to GDMS, we will need to deal with other variations. The MSG agreed that DescMeta for restricted materials will populate the discovery index. Users will be able to find out materials exist, even if they cannot access them. For use restrictions, we need to think more about use classes. We'll need credit lines, especially for GDMS. Lending policies could be another type of use class. We can consider this further for next meeting, but for the current implementation, <access> and <use type="credit"> may be enough.
    • Technical. We talked a bit about which of the already defined elements for texts were really necessary. Is word_processor important for adminMeta? Do we care what software was used to ocr the file or do we just care that it was ocr'd? For the initial implementation, we will focus on encoding and markup, which the MSG agreed were most important. We need information on character encoding (Unicode, etc.), markup (XML, etc.), and DTD (TEI, etc.). Thorny will talk with Perry and bring the MSG back a proposal for encoding. Looking at the DLPS TEI's <encodingDesc> references various entities. Erin will follow up with Greg to find out exactly what they reference. These values would need to be considered part of the content when the mapping is programmed, the program will need to go out and find those referenced values.

Next steps

  • More on administrative metadata.

 

December 4, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Perry Roland, Thorny Staples, Ann Whiteside

1) Discussion with Perry and finalization of the DescMeta DTD

  • Perry joined the MSG to talk through issues and finalize the DTD.  Highlights:
    • Clarification of mediatype at both the descmeta and surrogate levels. The MSG had said we wanted mediatype to be required at both levels.  Perry asked about inheritance, unless otherwise specified can we assume that mediatype at the <surrogate> level is inherited from mediatype at <descmeta>?  Do we explicitly state it at both levels when it is the same?  MSG agreed that <surrogate> can be said to inherit mediatype from <descmeta> unless the value is specifically different.  Mediatype will therefore not be encoded explicitly at both levels.  It is required at <descmeta> but optional at <surrogate>
    • We can't enforce type attributes in a DTD.  There is no way to enforce <title type="primary"> or <identifier type="UVaPID> without explicitly creating elements for them.  We, therefore, went through the list of minimally required elements and revised it yet again, adding new elements for some required concepts:
      • <surrogate><identifier type="UVaPID> will become new element <pid>.  <identifier> will apply to coded identifiers other than the UVaPID.  <pid> will be available at both <descmeta> and <surrogate> but the DTD can't enforce requirement at <descmeta> only (for born digital materials -- but since we have no born digital materials at the moment, we will live with this now).
      • <title type="primary"> will no longer be typed and will become solely element <title>.  All other titles will be element <alttitle> and will be typed accordingly.
      • <agent> ok as is: At least one agent of any type; May be populated with "Unknown" if unavailable
      • <mediatype> (see above)
      • <rights type="access"> will become new element <accessrights>.  All other rights will be element <rights> and will be typed accordingly.
      • <surrogate><time><date type="creation"> will become <surrogate><creationdate>.  If unknown, populate with date of ingest.
      • <time><date> will change as a minimal requirement simply to <time> for now.  Is it better to require a specific date/date range or would <time><timeinterval> be ok to fulfill minimal requirement?  This needs further discussion, right now we will require only <time>
    • Perry will update the DTD
    • Erin will update the TEI mapping table and the website.
    • Thorny discussed other initiatives, besides DTD, which would allow us to enforce, not only coding, but practice.  Schema introduce a secondary layer of parsing which is objectionable to some.  The Fedora Project is using Schematron.  This needs further consideration.

2) MSG/Metadata website

  • Erin will add email addresses for MSG members and will follow up with Leslie regarding questions on the digital initiatives template design.

3) MSG ownership

  • A few questions came up from DLPS this week regarding mappings to GDMS from other standards.  The MSG will take ownership of GDMS and will respond to those needs as they come up.  Prioritization of MSG work is going to become an increasingly difficult issue.

 

December 11, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside

1) Yet again, DescMeta-TEI follow-up

  • Sherry asked for clarification of the decision to have <surrogate> inherit <mediatype> from <descmeta>, rather than have it be explicit at both levels.  The MSG explained Perry's rationale of the week before and stated that it won't affect discoverability.  It is required at <descmeta> but optional at <surrogate>.  When Erin was writing last week's minutes, she found a conflict. We had said that we would require <mediatype> at <descmeta> but not at <surrogate> and then we went on to say that the DTD can't enforce requirement of <pid> only at the descmeta level. Are these two situations not analogous?  We still need some clarification of what we are able to require in the DTD, as opposed to best practices.  Perry and Adam are currently working on the mapping and, at the moment, we can enforce best practices, so we are tabling this issue right now.  Needs to be revisited.

2) TEI headers and MARC serial records

  • Janis brought an example of an AACR2/MARC serial that DLPS is digitizing and needs individual TEI headers.  The Virgo title is "A catalogue of the officers and matriculates of the University of Virginia" but the publication has changed titles many times.  The Virgo record has a slew of alternate titles for the variations.  The are 80-some print volumes, one per academic year.  In MARC, there is one bibliographic record and 80-some holdings records, designated by date.  i.e.

    A catalogue of the officers and matriculates of the University of Virginia
    SPEC-COLL--  
      Location:  SC-ARCHV -- LD5667 COPY 1  
      Library has:  1829/1830,  
      Library has:  1831/1832 (1880 reprint)  
      Library has:  1832/1833-1839/1840  
      Library has:  1840/1841 (1880 reprint)  
      Library has:  1841/1842-1860/1861  
      Library has:  1865/1866-1906/1907

    Each of those years represents a separate holdings record attached to that one bib. record.

DLPS has digitized 5 "random" volumes.  Each is a separate file, will be a separate Fedora Object, and needs a separate TEI header.  We can create alternate titles in the TEI to ease searchability, but we don't want to populate the discovery index with each volume's metadata because Public Services will object.  They often ask Cataloging to combine separate monograph records into one serial record:

They won't want this for a hit list:

  1. A catalogue of the officers and matriculates of the University of Virginia, 1829
  2. Catalogue of the officers and students of the University of Virginia, 1840/1841
  3. Catalogue of the University of Virginia, 1831/1832
  4. University of Virginia 1906/1907 session. Annual announcements, with a catalogue of the officers and students.
  5. University of Virginia catalogue of session 1865/1866 announcements for session

when they are all members of the same serial publication.  Janis brought this to the MSG to get an opinion from Thorny about the system architecture before proceeding.  In Thorny's absence, she will hold off on creating TEI files until we know how they will render to the user.  The seven volumes of Lewis & Clark is another example of this problem.  We need to have sufficient metadata to bring together all volumes of a set, but the rendering of a hit list, with each volume having its own metadata, is likely to be problematic.  Also, how to we indicate to that user that, although only 5 volumes have been digitized, there are another 75 sitting on the shelf in Special Collections?

3) Continued discussion on Administrative metadata & Admin-TEI mapping

  • Highlights:
    • <digiprov>.  We had said we could ignore this.  It will be taken care of by the Fedora audit trail.  The DTD currently requires <digiprov>, does this mean something needs to be POPULATED FROM the Fedora audit trail or can we change the DTD to not require the element?  Question tabled for Thorny.
    • <adminrights><policy><access>.  For now, populate with "unres" for all TEI objects.  All phase 2 texts are publicly accessible.  Can we enforce values in the DTD? (res, unres, viva, uva)?  Or is that best practice only?  Question tabled for Thorny.
    • <adminrights><policy><use>.  Ignore for the current mapping, because there are no use restrictions on the phase 2 texts.  The DTD currently requires <use>, is this necessary?.  Action:  change the DTD.
    • <technical><text><encoding> and <technical><text><markup>.  This is the current formulation in the DTD.  We need to get to character encoding (unicode, ASCII), mimetype (xml), and markup schema (TEI, EAD).  How to best code this?

      <technical>     
              <text>        
                      <encoding>             
                                  <character>             
                                  <mimetype>             
                                  <markup>

      or

      <technical>     
              <encoding>        
                      <text>             
                                  <character>             
                                  <markup>

    • The DLPS context also points to the actual DTD:  <!DOCTYPE TEI.2 SYSTEM " http://text.lib.virginia.edu/bin/dtd/tei/uvalib_kb/tei2.dtd " [ Do we point to the DTD or just name it?  How do we encode this?  Questions tabled for Thorny.
    • We'll need a complete element set and a minimal list for AdminMeta as well.

4)  Website

  • The website had said "... there are two distinct categories of information ... for descriptive and administrative (including technical purposes ... these locally-defined element sets are collectively referred to as UVa DescMeta ..."  This is confusing -- we were assuming the "Desc" is "DescMeta" was meant to imply descriptive only.  The MSG voted to change the text and all associated references to "collectively referred to as UVa Metadata."
    • The UVa DescMeta Descriptive Elements become UVa Metadata Descriptive Elements (UVa DescMeta)
    • The UVa DescMeta Administrative Elements become UVa Metadata Administrative Elements (UVa AdminMeta) etc., etc.
  • Per previous conversation, Erin had added a note on each of the element sets that said:  "UVa DescMeta is currently under development by the Metadata Steering Group at the University of Virginia Library. The element set, the minimal requirements, and the DTDs are still considered in-progress. The DTD will be released as UVa DescMeta version 1.0 when development and prototyping are complete."  Beth pointed out that current DTD calls itself version 1.01 (09/17/2001) and it was decided we could release the revised DTD as version 2.0.  Also, authorship on the current DTDs goes to Dan McShane, Perry, and Thorny and the MSG should now be reflected there as well.  How much history is it necessary to keep?
  • Erin will update the website
  • Thorny will update the DTDs.

 

December 18, 2003

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake

1) AdminMeta-TEI follow-up.

  • Erin and Thorny met earlier in the week to go through the MSG questions from the previous meeting and Erin reported back to the group (Thorny was not able to attend this meeting). Highlights:
    • <digiprov>. We had said we could ignore this. It will be taken care of by the Fedora audit trail. The DTD currently requires <digiprov>, does this mean something needs to be POPULATED FROM the Fedora audit trail or can we change the DTD to not require the element? Thorny: the Fedora audit trail should account for all info. assigned to <digiprov>. We should change the AdminMeta DTD to not require <digiprov>
    • <adminrights><policy><access>. For now, populate with "unres" for all TEI objects. All phase 2 texts are publicly accessible. Can we enforce values in the DTD? (res, unres, viva, uva)? Or is that best practice only? Thorny: we cannot enforce values in the DTD. Best practice only.
    • <technical><text><encoding> and <technical><text><markup>. This is the current formulation in the DTD. We need to get to character encoding (unicode, ASCII), mimetype (xml), and markup schema (TEI, EAD). How to best code this? Erin and Thorny had discussed the pros and cons of various formulations and Thorny will go back to Perry to formulate a proposal for the MSG. The DLPS context also points to the actual DTD:

      <!DOCTYPE TEI.2 SYSTEM "http://text.lib.virginia.edu/bin/dtd/tei/uvalib_kb/tei2.dtd " [

      Do we point to the DTD or just name it? Thorny: yes, we should point to the DTD. Thorny and Perry will talk.

2) TEI headers/MARC Serial records follow-up.

  • Erin & Thorny also talked earlier in the week about this and Erin reported back to the MSG. Thorny suggests that we create GDMS records to link together the multiple TEIs. The top divDesc of the GDMS would map to DescMeta and populate the discovery index rather than the individual TEI headers. The discussion of which level of the GDMS populates the discovery index is a discussion that needs to be had more seriously and with the image experts. Another option is to try and link the files together within the TEI headers themselves. Thorny suggested that Erin contact Chris Routolo about this. Email to Chris is outstanding.

3) Best practices for Collection naming/Identification

  • How do we identify something as being part of a particular collection? We need an answer ASAP for Jack and the IRIS-GDMS mapping. MSG proposal to Ann:
    • Change GDMS element to <alttitle type="collection"> as in

      <series><title>UVA-ARCH</title></series>

      to

      <alttitle type="collection">UVA-ARCH</alttitle>

    • Alttitle is not currently an element in GDMS, but this would bring it back in line with DescMeta. We propose this change to the GDMS. We thought about recommending a <collection> element, but then we would need <collection> in DescMeta ... using <alttitle> seemed better. Question for Thorny: Do we need to reference the pid of the collection object?
  • There are 2 naming convention issues:
    • The proper name of the collection to be displayed on all public pages associated with the collection (i.e. the Collection Object)
    • The abbreviated name for that collection to be used in the various headers (GDMS, TEI, EAD, etc.) and pointing to the Collection Object and mapped to DescMeta.
  • Principles:
    • The full "proper" name of the collection must be unique. The abbreviation used must also be unique and we need a formula for guaranteeing unique abbreviations (Erin will look into standards for abbreviating titles, like the ISSN Center uses for abbreviating journal titles -- something short enough to not be just wasting character space and long enough to be humanly intelligible). [Update: ISSN center: http://www.issn.org:8080/English/pub/products/lstwa/ based on ISO 4 standard: http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=3569]
    • There should be a central authority (perhaps Cataloging) for assigning/approving collection names in order to guarantee uniqueness. The central authority would be responsible for pre-searching the database for uniqueness, adding qualifiers if necessary ... (i.e. Barcelona Collection (1994) and Barcelona Collection (2004)), collection title changes, creating the abbreviations, etc.
    • No semantic information about the nature of the collection needs to be inherent in the abbreviation. The fact that it is a UVa created collection v. a purchased collection would live in the collection object. Relationships between collections would be represented in the collection objects. I.e. the GDMS header does not need to represent that Barcelona is a subset of Art and Architecture.
    • Supercollections (Art & Architecture, Modern English, Finding Aids) are those that will have their own search index. Supercollection proper names should be prefaced by University of Virginia Library (i.e. University of Virginia Library Art and Architecture Collection)
    • There must be a way to limit a search within any particular collection.
  • There was much discussion about what makes a collection; different kinds of collections (those inherent to the resources themselves (i.e. parts of a series), UVa created collections (i.e. Barcelona ), UVa compiled collections (anything having to do with the Rotunda), purchased collections, etc); how many Supercollections we will ultimately have; etc. None of which was at all resolved.
  • More discussion on this the first meeting after the holidays.

 

January 6, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake , Ann Whiteside, Thorny Staples

1) GDMS

  • Thorny gave the group an introduction to GDMS. He went through the structure, the fields, and the MSG viewed various examples out there on the web. Highlights:
    • The top Div describes the "thing" as a whole that is being represented by the GDMS file. It can contain other divs recursively and the structure describes the relationships between the various objects (i.e. The Rotunda: the exterior view: the interior view: a particular interior room: a particular painting in that particular interior room).
    • Divs contain type and label attributes. The label is a shorthand name for the described resource and the type represents what type of div the resource is describing (an architectural site, a structure, a space, a feature, an object). Different div types can use different ontologies for description.
    • Divs contain divDesc's and res'. The divDesc provides the descriptive metadata about the item being represented in the Div. The res points to the image file, although it can contain its own descriptive data as well. The res will include the PID of the image object and can also contain a rescon to allow for accompanying narrative content and html (i.e. a critical essay on the resource at hand).
    • Divs also can contain divincs and resincs to allow you to reuse images and their accompanying descriptive metadata in other parts of the tree. For example, if Jefferson had moved a chair in Monticello from the dining room to the front room, you can show the image in both places as you trace through history. The divinc contains the referenced divDesc's identifier and the resinc contains the referenced image's PID.
    • The GDMS header contains information about the GDMS file itself (not about the resource described): who created the file? when?, etc.
    • GDMS was developed to be a widely used standard to describe materials that have complex hierarchies and beg for a relational descriptive structure. Because it was intended to be used more broadly than UVa, we have a responsibility to keep it semantically neutral.
  • After learning about GDMS more in-depth, the MSG decided we still needed to look closely at the GDMS header in order to answer Jack's IRIS-GDMS mapping questions. Thorny stated that the header elements had never really been evaluated and needed some work by the group. Next meeting.

 

January 15, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake , Ann Whiteside, Thorny Staples

1) GDMS Header

  • The MSG resolved to avoid semantically loaded terms, and agreed to use "set" rather than "collection" to describe the various types of aggregated materials.
    • Sets can consist of the following:
      • Materials purchased from a vendor as concrete units
      • Materials created by the Library (or faculty projects) as concrete units
      • Materials brought together by the Library as being usefully related but not inherently related to each other
    • Materials inherently related to each other by their bibliographic nature are considered series (i.e. electronic texts bearing a series statement)
    • <setstmt> and <set> elements will be added to the GDMS. Thorny will update the DTD to accommodate the following formulation:

      <setstmt>
      <set code="UVA-LIB-ArtArchit"/>
      </setstmt>

    • <setstmt> is optional: <setstmt>?
    • If there is a <setstmet> there can be one or more <set> elements: <set>+
    • The set code only serves to link the GDMS image object back to the GDMS collection object. See more below on set codes.
    • <pubstmt> -- the elements in <pubstmt> should be pulled out. Right now <title>,<agent><series> are nested within <pubstmt>. Thorny will update the DTD. The GDMS Head only really needs <agent type="creator" form="corpname"> University of Virginia Library </agent> (as the creator of the GDMS file) and <time><date> (for the date the GDMS file was created). The GDMS file doesn't need a title of its own.

2) Set code conventions

  • Set codes will begin with UVA-LIB for any sets created or collected by the Library. This includes faculty projects that have been selected and collected by the Library. UVA-LIB will then also be used in the XML namespace, which nicely represents a hierarchy for the university namespace.
  • For vendor collections, set codes will begin with standard abbreviations in all caps (i.e. SI-SAAM for the Smithsonian Institute, Smithsonian American Art Museum )
  • The MSG considered the ISO standard "Rules for the abbreviation of title words and titles of publications" for the remainder of the set code. The MSG agreed to pull abbreviations from the standardized list and follow their abbreviation conventions without following their punctuation or capitalization rules. As per previous discussion, there should be a central authority for determining the "official" name of a set (regardless of which of the set categories it falls into). Once the official name has been determined, the code should prefixed as above, followed by a hyphen, followed by the standardized abbreviation with all words strung together as a compound word. The first word of each abbreviated title should be capitalized. All set codes must be unique. Given the above formulation:
    • The Art and Architecture collection is: UVA-LIB-ArtArchit;
    • The Barcelona collection is UVA-LIB-Barcelona;
    • The Architecture of Jefferson Country is: UVA-LIB-ArchitJeffCtry;
    • The Catlin collection is SI-SAAM-CatlinIndianPaint (from The Smithsonian American Art Museum Catlin Indian Paintings Collection).

3) Dates

  • Jack and Ann are having issues with era conventions for the IRIS-GDMS map. GDMS and DescMeta DTD's currently use ad,bc,cc,cd which is based on FGDC (http://www.fgdc.gov/metadata/csdgm/organization.html).
    • ad: A.D. Era to December 31, 9999 A.D.
    • bc: B.C. Era to 9999 B.C.
    • cc: B.C. Era before 9999 B.C.
    • cd: A.D. Era after 9999 A.D.
  • Most government information uses these conventions. The archeological world and IRIS use CE and BCE. AACR uses only ad and bc. It seems that either are perfectly good standards. Which do we use? Do we need to account for the extremes of the date ranges? Do we want to be politically correct? Jack needs an answer for his IRIS-GDMS work. Sherry will look more into FGDC's use of this and Ann and Erin will continue research.

 

January 22, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake , Ann Whiteside, Thorny Staples, Chris Ruotolo

1) TEI and Seriality (with Chris Routolo)

  • The MSG invited Chris to the meeting to discuss how to handle serials in TEI (and then by extension, DescMeta).  This has come up now for "A Catalogue of the officers and matriculates of the University of Virginia" -- 83 volumes in Special Collections.  DLPS has digitized 4.  Chris and Erin had met earlier to hash out some ideas. Out of that meeting:
    • There needs to be an element/attribute in the TEI saying this is a set or serial, etc.  Etext had used <title level=" "> but the only valid options for the level attribute are: analytic title, monographic title, journal title, series title, title of unpublished material.  This may not do it for us.
    • There needs to be a unique identifier to pull all children of the set together.  UVA Title Control Number would do the trick.
    • Chris and Erin had discussed the idea of using <sourcedesc> to describe the serial in its "purest" entirety (matching the MARC record) and <filedesc> to describe the particular volume in hand.  The stylesheet would then identify the existence of the set and collate all the children but present only one sourcedesc to the user at the time of searching.
    Thorny referred to this proposal as an implicit rule, a secret the system needs to know -- which creates a lot of system overhead.  Thorny explained explicit rules make for better systems modelling and it would be better if there were a way to represent the serial as its own object.  GDMS seems appealing for this but, we wouldn't be able to dump all of the records into the same index.  Erin was also concerned about staffing implications if we first must create TEI headers and then on top of that create GDMS objects.  The group talked about the option of having a separate TEI file entirely that would describe the serial, but have no other file content besides the header.  Issues there include not wanting to have to update that file each time a new volume is added and concern over having certain TEI files the database that are unlike the "normal" TEI files. The cataloging representation pointed out that serials are always different, always require a different than "normal" workflow, and always cause more overhead.  The question is where do we invest the overhead?
  • Discussion of Lewis & Clark and monographic sets.  Monographic sets and serials contain many like qualities, but are also quite different.
  • Discussion of using the series statements in sourcedesc. Action:  Janis will mock up some examples; Erin will do some more research about serials at other TEI institutions [Update:  Erin had no luck; Beth emailed Jackie Shieh at Michigan.  They have delivery solutions but not TEI solutions].  Discussion tabled.

2) TEI/AdminMeta issues

  • Discussion continued on how to code elements such as xml/DTD/TEI.2/etc. in the AdminMeta.  The MSG ran out of time and this discussion was also tabled for next week.

 

January 29, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake , Ann Whiteside, Thorny Staples

1) LofT review

  • Beth is part of the Planning Team reviewing the 2001 LofT goals. The MSG reviewed the three 3 metadata goals (LofT 19, 20, & 21) to document the status and make recommendations for follow-up.

2) Issues on Collection naming (for GDMS/IRIS mapping)

  • Jack had a question about our set naming decisions. Clarification was made that there is no hierarchical relationship implied in the <set> elements. All hierarchy will be assumed by the collection objects. <set code="UVA-LIB-ArtArchit"/> and <set code="UVA-LIB-Barcelona"/> can co-exist in a GDMS header and there is no need for a "plain" <set code="UVA-LIB"/>.
  • Erin noted that we had not considered sets when we reviewed DescMeta and we need a place there to note sets as well. Thorny will update the DescMeta DTD to achieve:
    <relation><set code=" "/></relation> [Update: in the end, Thorny and Erin opted for a new <relationships> tag which will contain <relation> and <set>].
  • In TEI, we will endorse <filedesc><series> to describe sets, as Etext has been doing. Erin will check with Melinda about populating the DescMeta for the current scripting with <set code="ModEngl"/> [Update: ok'd by Melinda].

3) Issues on dates (for GDMS/IRIS mapping)

  • IRIS uses ce/bce (Common Era v. Before Common Era)
    The GDMS DTD uses ad/bc/cc/cd.
    Should we change the GDMS DTD or map IRIS' ce/bce to ad/bc?
    ad/bc/cc/cd comes from FGDC (http://www.fgdc.gov/metadata/csdgm/organization.html) and is defined as:

    ad -- Era to December 31, 9999 A.D.
    bc -- Era to 9999 B.C.
    cc -- B.C. Era before 9999 B.C.
    cd -- Era after 9999 A.D.

    ce/bce are politically correct; cc and cd are important for archaeological description. Sherry emailed FGDC to see how they made their choice but had no response. Both are perfectly reasonable standards, the question is which to choose. Janis noted that because AACR2 uses ad/bc, all bibliographic data following AACR2 will have those conventions. The MSG decided that consistency between data was crucial for discovery and usability and so voted for mapping IRIS' ce/bce to ad/bc.

4) TEI/AdminMeta redux

  • Again, short discussion of coding markup in the AdminMeta. Do we note TEI version? Erin will check with Greg about what it means to be TEI.2 -- is "2" a version? what about "P3" v. "P4"? [Update from Greg: "The .2 is an unfortunate carry-over from the second major version of TEI, P2. In TEI P3 and P4, they did not continue this tradition of changing the top-level element name to reflect the version. We are using the current version, P4. In TEI P5, coming sometime in the future, they will finally do away with the .2 and the top-level element will become simply <TEI>" and the "P"'s stand for Public Proposal.]. Discussion tabled.

5) Where do we go next?

  • TEI & Seriality tabled for next week.
  • Then bringing GDMS descriptive elements in line with DescMeta descriptive elements. This should be short & easy!

 

February 5, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake , Ann Whiteside, Thorny Staples

1)  Workflow

  • Erin brought a number of questions back from the Access-Content Planning Team discussions on workflow and from Ann and the Image Group on whether all images need to go into IRIS before they go into the Repo.  Can we go from vendor data to GDMS directly or do we go through IRIS?
    • IRIS and other systems provide data management functionality and enrichment functionality that the Repo will not offer.
    • What about tracking information?  How do we know what has been done, needs to be done?  What are the triggers for knowing that something has moved from one step to another?  How do we know something is ready for cataloging?  Or done cataloging and sent off to DLPS?  How do we know when something has stopped and is stuck at a particular step?
    • Erin believes that tracking is a kind of administrative metadata.  Whether or not the MSG "owns" that is another question.  Thorny argued that tracking metadata goes hand-in-hand with industrial production.
    • Beth is concerned that we have a standard and finite number of metadata tools.  Are we at a point to say that all texts are TEI and all will be extracted/enriched/enhanced using the current DLPS/Cataloging tools [VIRGO reports, notetab, proofreader?]; all images will be created/enriched in IRIS and extracted to GDMS?  What about Jack's image collection tool and the GDMS tool?
    • What does it mean to be "integrated"? -- integrated delivery v. integrated production?

2)  TEI/AdminMeta

TEI/Seriality and GDMS/DescMeta tabled for next week.

 

February 12, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside, Thorny Staples, Chris Ruotolo, Greg Murray, Melinda Baumann, Allison Sleeman

1) TEI and Seriality (with Chris Routolo, Greg Murray, Melinda Baumann, and Allison Sleeman)

  • The MSG invited Chris, Greg, Melinda, and Allison to the meeting to discuss how to handle serials in TEI (and then by extension, DescMeta). This had come up previously for Lewis and Clark and "A Catalogue of the officers and matriculates of the University of Virginia."  Now, DLPS has the Cavalier Daily. Highlights & decisions:
    • Data modelling.  Thorny explained that, from a data modelling perspective, it would be good and useful to represent a hierarchical tree of relationships between a title and the issues of that title.  Each title would be an object and each of the individual issues child objects.  If the title of a publication changes, the existing objects are still children of the old title object but now the old title can be related to a new title object.  If you discover one object, you should be led to discover the other related objects.
    • There was discussion about creating new objects to represent each title change or whether to simply change the existing object.  In traditional cataloging, this is called successive v. latest entry cataloging.  Do you have a separate record for each title when College Topics becomes the Cavalier Daily?  Key advantages of creating separate objects include:  the ability to discover *only* College Topics apart from the whole run and the ability to describe the nature of each publication as opposed to characteristics of a particular issue or the entire run.  Each issue will still have its own TEI header, but having separate objects for title changes will allow us to describe each title as a whole, such as frequency changes or editorial changes.
    • To represent the title object, Chris says there is a concept of an "Independent Header" in TEI:  http://www.tei-c.org/P4X/SH.html .  An independent header can exist as its own document independent from the TEI text.  The normal individual headers, therefore, can describe the volume in-hand and the independent header can describe the serial as a whole.  The system we build will link the two together by the document self-identifying itself as being part of a set and by detecting the unique identifier for the parent and children resources (the UVa Title Control Number).
    • To accomplish this, we will locally extend the dtd to put a level attribute on <idno> to mirror the level attribute currently on <title>.  We don't want to use level on <title> because the values are specified: analytic title, monograph title, journal title, series title, and title of unpublished materials.  These values are not effective for us to represent the differences, for example, between a serial and a monographic set.  The MSG will develop a list of possible values.  Greg will look into creating/extracting Independent Headers.  Level="m" is to be the default, explicit in the DTD.  If the level attribute is not present, "m" is the default. Thus, all of the monographs done so far would not have to be redone to add the "level" attribute.  When dealing with serials, Cataloging will change the value of level= (based on the to-be-devised list of values).
    • When we have a process up and running, we will pull Lewis and Clark and any other sets already in the Repo and re-do them.
    • When the objects get ingested into Fedora, the parent object will know it is the parent of a set.  Workflow will need to be developed so that DescMeta records will populate the discovery index for only the parent and not each of the children.
    • The Fedora Imps group will need to account for a new content model for these types of materials.  Erin/Thorny/Melinda will take that back for discussion there.

2) GDMS

  • Thorny reported that he is reviewing the GDMS DTD and he will bring the descriptive fields into line with DescMeta.  He's spoken with the other folks using the GDMS DTD and none of our changes presents problems for them.  Thorny will update the DTD and also pull <source> out of the Admin DTD.

Next week:  Image metadata.

 

February 19, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples

1) Develop a list of values for <idno level=""> as discussed the previous week

  • Following the decisions of the previous meeting on TEI/Seriality, the MSG attempted to develop a list of values for the level= attribute. We decided it was important to distinguish between:

    Monographs
    Monographic Sets
    Serials
    Periodicals
    Newspapers

  • A brief explanation of the distinction between Periodicals and Serials:
    • All periodicals are serials but not all serials are periodicals!
    • Serials, by definition, are published pretty regularly, have discrete parts (i.e. can be shelved one next to each other), usually have numbering, and are intended to be published indefinitely.
    • Therefore, we can have print serials and serial videos and serial CD-ROMs if the same title is published every year.
    • Periodicals (aka journals) are a particular type of serial -- it is probably self-evident what we mean by these. Newspapers are a particular type of serial. But other types of serials that we might digitize include something like *The Annual Review of Earth and Planetary Sciences* or the *The World almanac and book of facts* which LOOK like books (are bound like books not like periodicals), but are still serial in nature.
    • Things really get confusing, when sometimes the first volume comes in looking and acting like a monograph. There is no evidence that it will every be published again and doesn't have a screaming-out-loud date on the cover. But, over time other volumes do appear and then retrospectively someone notices that we have 7 bibliographic records for the same title. Can we treat it as a serial?
    • The distinction between serial and periodical, from a user standpoint, is important, because generally users looking for periodicals are looking for a very specific kind of thing. They want to be able to pull out the periodicals from all our serials, just like they would for newspapers.
    • Monographic sets (i.e. encyclopedias, Lewis & Clark) display many similar qualities to serials, but they are usually not intended to be published indefinitely. And lastly, there are monographic series where often each volume has its own individual title separate from that of the series.
  • After considerable discussion on the matter, it became quite evident that by using <idno> in the way we discussed the previous week, we were misusing the tag. We are not trying to say "this particular identification number is a serial identification number" but rather that this "thing" we have in front of us is a serial. Using <idno> for this would really be telling the system to interpret this UVa Title Control Number in a way different than other UVa Title Control Numbers. We would, again, be burying secrets in the system, when explicit rules are much better for system modelling. We also realized that we need a way for the individual headers to identify themselves as "serial issues" and the Independent header to identify itself as "the serial". If both headers contain <idno type="UVa Title Control Number" level="serial">, how do we prevent the independent headers from forever looking to a higher level for its parent as well? The MSG realized we had a problem with last week's decisions and discussion was again tabled.

2) Descriptive Image Metadata

  • The MSG resolved that putting full "initial condition" metadata into the children image objects (representing the object as it was first ingested) was, in the long run, not meaningful. The MSG agreed that image objects should only contain their parent pointer: <idno type=parent">. This descriptive metadata for image objects will not populate the discovery index and full descriptive metadata will be inherited from the parent on demand.

    Note: this decision was amended the following week because Fedora requires a label. Rather than use a meaningless system-generated label, the following was resolved:

    • For page images (based on the existence of a pb tag), the label will be: book title, page [value of n= (page number)].
    • For figures (based on the existence of a figure tag), the label will be: book title, [figure caption]
      Some figures have extensive captions. For phase 2, we will grab the entire caption. If this turns out to be unwieldy, we'll consider limiting captions to a certain number of characters only for phase 3.

The technical image metadata needs to be worked out ASAP to move phase 2 forward. The MSG will invite Jama Coartney, Michael Tuite, and Leslie Johnston for that discussion.

 

February 26, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Thorny Staples, Greg Murray

1) Storing PIDs in the TEI

  • Assumption: Every object should have a PID (generated by FEDORA) and its metadata. As it is now every electronic item does have an <idno> tag within the Publication Statement. The <idno> element is repeatable. This is the best place to put a UVaPID number; typed in the idno element: <idno type="UVApid">. A workflow will need to be set to generate the PIDs and include them in the objects metadata. This workflow could be added to the current workflow of Greg's generation of a Master metadata file. At the end of the current workflow, the batch generated UVAPIDS could be automatically added to create a "final" master which would go into the repository.

2) TEI Seriality

  • It was decided to use TEI's Independent Header for series, serials, and multivolume sets (and not use TEI's corpus for multivolume set). That said, we discussed how to ID the "Thing." Per our discussion in our last meeting, we wanted a tag to describe the "thing" as a volume or a serial, etc., rather than telling the system how to interpret a particular type of idno. The independent header is the parent (newspaper, periodical, monographic set). The children are newspaper issue, periodical issue, monographic volume, article. We decided that we needed something in the <sourceDesc> to describe the thing and what type it is. The idno control number would be what was used to connect the children to the parent header. To do the "thing" typing, we decided to use <keywords scheme="uva-form">. We came up with the following for valid uva-forms: newspaper, newspaper issue, periodical, periodical issue, monographic set, monographic volume, monograph, and article. (It was decided after the meeting to add periodical volume, manuscript, serial and serial volume to this list.) Greg will make sure our new classification scheme gets declared. Greg will identify "serial" sets, re-run his program with the new uva-form scheme and create independent headers for them.
  • In practice a display program will gather all the objects with the same UVA control number and then display them based on the UVA-form, gathering the children and separating the parent.

3) Fedora Imps Metadata Task List

  • Erin handed out a "Phase 2 Task List" for Metadata. The committee reviewed completed items and made suggestions. Dates were set for "Completing the metadata development for EAD files" and for "Completing the metadata development of GDMS files.

4) Descriptive Image Metadata

  • It was decided in the Februrary 19th meeting that image objects should only contain their parent pointer: <idno type="parent">. The descriptive metadata for image objects will not populate the discovery index and that the full descriptive metadata will be inherited from the parent on demand. After this decision, Ross reminded Thorny that all FEDORA objects require a title (or label). So rather than use a meaningless system-generated label, the following was resolved:
    • For page images (based on the existence of a pb tag), the label will be: book title, page [value of n= (page number)].
    • For figures (based on the existence of a figure tag), the label will be: book title, [fig caption]
      Some figures have extensive captions. For phase 2, we will grab the entire caption. If this turns out to be unwieldy, we'll consider limiting captions to a certain number of characters only for phase 3.

Discussion on technical image metadata needs will highlight the next MSG meeting on March 4. The MSG will invite Jama Coartney , Michael Tuite and Leslie Johnston for that discussion.

 

March 4, 2004

Present: Beth Picknally Camden, Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside, Jama Coartney, Michael Tuite and Leslie Johnston

  • Leslie gave a "heads up" on OAI harvesting. For the American Studies Grant items, metadata will need to be mapped to Descmeta and then imported to CenRepo.
  • This meeting was a special meeting to hash out image Technical metadata. Jama, Michael, and Leslie were invited for their expertise. Our task was to discuss what information is stored in the creation of images and what information did we need (as a minimum) for UVa's Admin Metadata. Discussion:
    • Michael described what was being done at RMC. The image collections are dumped in "iview." The text information from "iview" is collected and put into SQL or filemaker. The descriptive information is added to the SQL or filemaker program. DLPS is using ImageMagic to create TIFFs. TIFF header information can be extracted in text format.
    • Jama distributed a TIFF header example. The information was a text dump from "ImageMagic," the program used to create the TIFF.
    • Discussion on how to get the UVa Fedora PID "connected" to the image: 1) TIFF header modification to include the UVa Fedora PID versus 2) TIFF log extraction with "log" modification to include the UVa Fedora PID. The advantage of updating the TIFF header in the image is that the complete header would be in every image. Disadvantage is the time it would take to get a programmer and for them to write a program to change the TIFF header. It was decided that modifying the TIFF header was not a reality for the current phase. For future work, we need further investigation in creating workflows to extract TIFF headers, modifying them and then burning the "final" TIFF on CD. Jama suggested that the software "iview" may be used to add PIDs to TIFF headers. Jama would check to see if "ivew" or "ImageMagic" could do this.
    • Since RMC and DLPS use different processes for their production, we tried to see what they had in common and what we could use for Admin Metadata. Iview, that RMC uses, and ImageMagic, that DLPS uses, dump the same information in a text format, but the labels for the image data elements are different. TIFF headers can be generated from both DLPS and RMC.
    • The next discussion was on which images would go into CenRepo and have metadata attached. Until we get Guys $500,000 storage system, CenRepo will ingest smaller JPEG versions. The TIFFS will be stored on DVDs. Question here was in regards to TIFFS offline versus JPEGs online. The TIFF metadata would be online for discovery. If we have Admin metadata for the master TIFFs, would we need Admin metadata for the JPEGs too? Ross is to take these questions back to Thorny for discussion.
    • Mapping Admin Metadata from TIFF header:
      Using the TIFF header example, the group decided the minimal elements for image metadata (using TIFF label descriptions): Format, Image width, Image length, Resolution, Photometric Interpretation, Filesize, Compression, Depth, and Bits/Sample
    • The resultant Admin (technical) elements are:

      <technical>
          <image>
              <filesize type="units">
              <imageidentifier type="UVAfilename"> (filename with extension)
              <format>
                      <mimetype> (include format at end, i.e., image/tiff)
                      <compression>
                      <colorspace>
              <spatialmetrics>
                      <imagewidth type="units">
                      <imagelength type="units">
                      <sourceX type="units">
                      <sourceY type="units">
              <bitspersample>
              <samplesperpixel>

    • Jama will further research the definitions for TIFF header elements, Depth and Bits/Sample to make sure both are needed.

     

March 18, 2004

Present: Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside

TEI to DescMeta Mapping Scripts

  • Last week during a non-MSG meeting week, Janis, Erin and Sherry met to look over the preliminary results of Adam's mapping scripts. One of the problems was in how the MARC record was being mapped to TEI, subtitles not being mapped under <filedesc>. Erin sent the changes to Adam and Greg. Prior to our meeting, Greg had made his changes. Adam and Erin had been discussing the TEI to DescMeta changes via e-mail.

Technical Metadata for Images

  • Erin handed out the elements for Image Technical Metadata that had been decided on in the last meeting (see March 4, 2004 minutes). During the week prior to the MSG meeting, Erin was tasked to create Technical metadata for an image taken by Thorny. The metadata that came with the image was an eXif file from the digital camera. The information in this file did not map to our Technical Metadata. Jama was going to run Thorny?s picture through ImageMagic to get the required Technical Metadata information. In looking at this case of an image born from a digital camera, the MSG realized that we would need mappings for Digital Camera information. We will comeback to this issue.

Unqualified Mapping of Dublin Core from DescMeta

Descriptive Image Metadata

  • Redux, on decisions made at February 26th meeting. The decisions made at that meeting (requiring only <idno> and <title> for image objects) were amended to also include access rights or use. The MSG will look further into GDMS mappings to see how access rights could/would be harvested.

Discussions on Mapping from GDMS to DescMeta

  • The MSG looked over two GDMS examples. Our first problem was deciding how far to map down into the divdesc and divs. The MSG decided that unless we could ?see? the resultant search interface, it would be hard to decide what to map. We had problems thinking what a user would need (would be looking for) in regards to what is displayed on a search result page.
  • Ann told us that what's in the GDMS file (to use for mapping) is decided when the image is cataloged. The decisions made in this initial phase is what determines what is entered into IRIS and thus what is mapped (or could be mapped) to GDMS. At our next meeting we will make an initial mapping, using the top Divdesc and see how it goes.

 

March 25, 2004

Present: Erin Stalberg, Janis Kessler, Sherry Lake, Ann Whiteside, Beth Picknally Camden, Thorny Staples

Review of Image Header to UVa Metadata Mapping

  • Jama and Sherry met earlier in the week to discuss the fields from an ImageMagick dump for TIFFs and JPEGs. Jama decided that the ImageMagick field Type (true color grayscale, bilevel, etc.) was important to capture and to add to the Image Technical Metadata. To accomplish this, we decided to use the attribute Type in the <colorspace> element. Of the image technical metadata elements, JPEG images do not contain the information for the elements <bitspersample>, <samplesperpixel>, or <colorspace>. But since JPEGs have the colorspace Type field, the element <colorspace> would be mandatory. We discussed whether we could (or should) have the <colorspace> element blank, use unknown, or use a fixed value. Erin was tasked to ask Jama if JPEGs have one fixed colorspace. The discussion with Jama revealed that a color JPEG that has a Type of True Color doesn't give enough information as to its color space. So automatically adding information on the color space for JPEGs cannot be done. So rather than the information not be collected, we decided to populated the <colorspace> content with unknown.
  • Using the Image mappings decided upon in the March 4th and March 25th MSG meetings, Erin created UVa Metadata admin mappings for TIFFs (color, bitonal, and grayscale) and JPEGs (color and grayscale). The resultant ImageMagick dump fields were then used as mappings to UVa Admin Metadata.
  • Workflows will be needed to process the TIFFS, create ImageMagick dumps and then map the fields to the Admin Metadata elements.

TEI Loose Ends

  • Adam is getting along great with the TEI mappings. He has sent many e-mails to Erin with questions.
  • Greg had a question as to where to put restrictions in the TEI files he was generating. Currently the access and copyright information is in a <p> tag (<p n= "copyright>") inside the <availability> element. All text going into Phase 1 and 2 are publicly accessible (unrestricted), but Erin has asked Greg to think ahead on getting the variations of access coded in the TEI files. MSG decided on 4 types of access restrictions: public, restricted, viva, and uva. These cannot be coded in TEI using the type attribute on the <p> tag. The <availability> tag has the attribute status with defined values of free, unknown, and restricted. It was decided to use this attribute and to extend the DTD to include public, restricted, viva and uva as valid status values.
  • Greg is also updating his DLPS TEI generation program to add the element <idno> to <seriesstmt> so that the mapping of <idno> (under <seriesstmt>) can be mapped to DescMeta <set code=??> for collections. Problem here was that <seriesstmt> in TEI must have a <title> element. MSG decided for Modern English collection use University of Virginia Library, Modern English Collection. Other titles will be created as needed when other collections are added.

Review GDMS-UVa Metadata Mappings

  • In the March 18th meeting the MSG decided to map the contents from the top <divdesc>. Thorny amended this to include the titles and subjects from the lower <div> and <res> in the descmeta. The idea is to programmatically go down through all <div>s and <res>s, count them and put the results (and total count of each media type) in the <description> field with type=?contents?.
  • We discussed workflow issues concerning putting the UVa PIDs back into GDMS. The GDMS DTD needs to be in line with DescMeta regarding PIDS as an element. Another point of discussion was did the GDMS have the minimal metadata requirements. It was deemed that the GDMS should have at least the minimal metadata requirements as well as a copyright statement (which should be i