This document describes the local markup practices for TEI-encoded electronic texts
followed by Digital Library Production Services (DLPS), University of Virginia
Library. It is intended to be helpful in two main ways:
| Including all content |
General Requirements |
|
| Description |
With very few exceptions, all printed content from the print
source must be included in the electronic text. All textual data must
be included in the transcription, and all non-textual (graphical) data must be
included in the markup as <figure> elements.
|
| Enforcement |
|
| Exceptions |
The only exceptions to this rule are:
- Running page headers — Exclude the running headers that often appear at the top of each
page in a printed book. (These headers are typically very repetitive
and only contain content already available elsewhere in the electronic
transcription, such as the title of the book or the title of the
current chapter.)
NOTEIn rare cases, running page headers will contain
unique content (such as a summary of the content of the current
page). In such cases DLPS will require that the running headers be
included in the electronic text. See Running page headers.
- Handwriting — When transcribing printed materials, handwriting
(such as readers’ notes or markings) should be excluded from the
transcription.
- Gaps — Gaps in the transcription are necessary in some cases, typically
either because a passage is missing from the print source (due to a
missing or torn page, for example), or because the print source
contains non-Western characters. Any and all omissions in the
electronic transcription must be indicated by the <gap/> element. See
Use of <gap/>.
|
| Composite texts |
Major Structure: Essential Structure |
|
| Description |
In rare cases, DLPS will request that a particular text be
marked as a composite text, in which the usual <body> element
is replaced with the <group> element, which then contains
multiple <text> elements, each with its own <front>,
<body>, and <back>.
|
| Remarks |
DLPS will only request a composite text in the case of
anthologies or collected works, where each work has its own front
and/or back matter.
|
| Example |
<TEI.2>
<teiHeader>
. . . [metadata section supplied by DLPS to the keyboarding vendor]
</teiHeader>
<text>
<front> . . . [front matter for the collection] </front>
<group>
<text>
<front> . . . [front matter of first text] </front>
<body> . . . [main body of first text] </body>
<back> . . . [back matter of first text] </back>
</text>
<text>
<front> . . . [front matter of second text] </front>
<body> . . . [main body of second text] </body>
<back> . . . [back matter of second text] </back>
</text>
</group>
<back> . . . [back matter for the collection] </back>
</text>
</TEI.2>
|
| Enforcement |
| Machine-enforceable: |
semi |
| Method: |
program |
| Name: |
qa_lib_structure |
| Message type: |
warning |
| Comments: |
qa_lib_structure issues a
warning if the <group> element is used, so that the QA tech can check
the markup manually. No DTD or program can ensure proper use of <group>.
|
|
| All content within a div |
Major Structure: Structural Divisions |
|
| Description |
The <front>, <body>, and <back> elements
must contain only <div1> elements. No content is allowed
directly within <front>, <body>, or
<back>.
|
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
DTD |
|
| Typed divs |
Major Structure: Structural Divisions |
|
| Description |
The type attribute is required on <divN>
elements, and it has an enumerated vocabulary of
allowed values. If a division has no obvious type, the generic value
"section" may be used; if "section" has already been
used for a higher-level division, use "subsection".
|
| Remarks |
Most of the available type values are self-explanatory,
and no definition is provided for them here. Those that are not
necessarily self-explanatory are:
- bio — for brief biographic sketches of authors or other
contributors
- castlist — for a list of characters preceding a dramatic work
- chronology — for biographical or historical timelines
- colophon — a section at or near the end of a book, containing printing
information such as name of printer (as distinct from publisher),
typefaces used, etc.
- contents — for tables of contents and for lists of
illustrations, etc.
- editorial — for opinion pieces in newspapers
- entry — for journal entries or encyclopedia/dictionary entries
- errata — for lists of printing errors; also called corrigenda
- fly-title — like a half-title page, but occurs between the front matter and
the body; treat as last page of front matter (not first page of
body); see Half-titles, fly-titles, and divisional titles
- frontispiece — technically, an illustration facing the title page; may also be
used for any full-page illustration in the front matter, or for an
illustration facing the first page of a major division within the
body
- half-title — a page preceding the title page bearing the title of the work,
perhaps with a series title or volume number; see Half-titles, fly-titles, and divisional titles
- masthead — a block of matter in a newspaper or other periodical indicating
title of publication, address, list of editors or other contributors,
etc.
- plates — one or more full-page illustrations, often unnumbered or
numbered independently of main pagination
- speech — for a transcript of an oration, not for a piece of dialog in a
dramatic work (for which use <sp>)
|
| Enforcement |
| Machine-enforceable: |
semi |
| Method: |
DTD |
| Comments: |
Although the DTD requires type on
divs and enumerates the allowable values, obviously the DTD cannot
enforce appropriate use of the available type
values.
|
|
| n on divs |
Major Structure: Structural Divisions |
|
| Description |
If a division is numbered or otherwise labeled in the print
source (this should be obvious from the division’s heading), record the
number or label in the n attribute (in addition to
transcribing it as part of the <head>). If the division does
not have a number or other label associated with it, do not include
the n attribute.
|
| Remarks |
The value of n does not have to be strictly
numeric; often it will be a roman numeral or letter.
|
| Example |
page image
<div1 type="introduction">
<pb/>
<head>INTRODUCTION</head>
<div2 type="chapter" n="I">
<head>I</head>
<div3 type="section" n="1">
<head>1—IMPORTANCE OF THE PO HU T'UNG.</head>
<p>The <hi rend="italic">Po hu t'ung</hi> pretends to be the official report of the discussions <lb/>
on the Classics which were held under Imperial auspices in 79 A.D., <lb/>
<!-- continues -->
|
| Enforcement |
|
| Reason |
Could be useful for delivery. As is our usual
practice, n is simply a label
for display; n is not being used as a transcriptional space
here, since the number/label of the div is transcribed in the
division’s <head>.
|
| Half-titles, fly-titles, and divisional titles |
Major Structure: Structural Divisions |
|
| Description |
Encode half-titles as <div1 type="half-title"> within
<front>. Encode fly-titles as <div1 type="fly-title">
within <front>. Encode divisional titles as <head type="divisional">
within the <divN> that it precedes.
|
| Remarks |
A common feature in many books is a heading (on a separate page, or at the
top of the first page of the first chapter) containing the
title of the work (or the title of a section of the work). There are
three main types of such features:
- A page preceding the title page and bearing the title of the
work, perhaps with a series title or volume number, is a half-title page
and should be marked as <div1 type="half-title"> within
<front>.
- A page similar to a half-title page but occurring between the
front matter and the body is a fly-title and should be marked as
<div1 type="fly-title"> as the last division within
<front> (not as the first division of the <body>).
- A page, or just an initial heading preceding other headings,
similar to a half-title but occurring within the body of
the work, to announce the beginning of a major section, is a
divisional title. In contrast to half-title and fly-title pages
within the front matter, a divisional title should not be
marked with its own <divN>. Instead, the divisional
title should be incorporated into the <divN> that it precedes, as a <head>
element.
|
| Example |
page images
<front>
<!-- ... -->
<div1 type="fly-title">
<pb/>
<head type="main">The Original Journals of Captains Meriwether <lb/>
Lewis and William Clark</head>
<ornament type="line"/>
<head type="sub"><hi rend="italic">THE JOURNALS PROPER</hi></head>
<ab type="empty" rend="none"/>
<pb/>
</div1>
</front>
<body>
<div1 type="chapter" n="I">
<pb n="3"/>
<head type="divisional"><hi rend="italic">The</hi> ORIGINAL JOURNALS OF <lb/>
LEWIS AND CLARK</head>
<head type="main"><hi rend="small-caps">Chapter</hi> I</head>
<ornament type="line"/>
<head type="sub"><hi rend="italic">FROM RIVER DUBOIS TO THE PLATTE</hi></head>
<head type="desc">Clark's Journal and Orders, January 30–July 22, 1804 <lb/>
Entries and Orders by Lewis, February 20, March 3, May 15, 20, 26, and July 8, 12</head>
<ornament type="line"/>
<div2 type="section">
<head>[PRELIMINARY MEMORANDA]</head>
<div3 type="subsection">
<head rend="left">[Clark]</head>
<p>CAPT<hi rend="super">S</hi>. LEWIS & CLARK wintered at the enterance <lb/>
of a Small river opposit the Mouth of Missouri <lb/>
<!-- page continues -->
|
| Enforcement |
|
| Multiple headings |
Major Structure: Division Headings |
|
| Description |
If a division (or other feature) has more than one heading, use
multiple <head> elements (rather than a single <head>
with line breaks), and include the type attribute with one
of these values: "main", "sub", "desc"
(descriptive), "alt" (alternative), or "divisional"
(for divisional titles; see Half-titles, fly-titles, and divisional titles).
|
| Example |
page image
In this example, the main heading identifies the division as a
chapter and gives its number, the sub-heading indicates the content of
the chapter, and the descriptive heading indicates the manuscript
materials represented in the chapter.
<div1 type="chapter" n="I">
<pb n="3"/>
<head type="divisional"><hi rend="italic">The</hi> ORIGINAL JOURNALS OF <lb/>
LEWIS AND CLARK</head>
<head type="main"><hi rend="small-caps">Chapter</hi> I</head>
<ornament type="line"/>
<head type="sub"><hi rend="italic">FROM RIVER DUBOIS TO THE PLATTE</hi></head>
<head type="desc">Clark's Journal and Orders, January 30–July 22, 1804 <lb/>
Entries and Orders by Lewis, February 20, March 3, May 15, 20, 26, and July 8, 12</head>
<ornament type="line"/>
<div2 type="section">
<head>[PRELIMINARY MEMORANDA]</head>
<div3 type="subsection">
<head rend="left">[Clark]</head>
<p>CAPT<hi rend="super">S</hi>. LEWIS & CLARK wintered at the enterance <lb/>
of a Small river opposit the Mouth of Missouri <lb/>
<!-- page continues -->
|
| Enforcement |
| Machine-enforceable: |
semi |
| Method: |
program |
| Name: |
qa_lib_structure |
| Message type: |
error |
| Comments: |
The DTD
enumerates the possible values for type on <head>,
but it doesn’t require type because it’s unnecessary if
there’s only one <head>. The QA program can only verify that,
if multiple heads are present, they each have the type
attribute. It cannot determine that multiple heads should have been
used (rather than a single <head> with line breaks, since line
breaks are often needed/legitimate within a heading), or that the appropriate type
values have been used.
|
|
| See also |
Half-titles, fly-titles, and divisional titles |
| Exactly one <titlePage> |
Major Structure: Title Pages |
|
| Description |
Normally a text should have exactly one <titlePage>
element.
|
| Remarks |
Although a text with more or less than one
<titlePage> is theoretically possible and technically allowed
by the DTD, such an occurrence is extremely rare and should be
regarded as an encoding error unless proven otherwise.
|
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
program |
| Name: |
qa_lib_structure |
| Message type: |
warning |
|
| Title types |
Major Structure: Title Pages |
|
| Description |
When using <titlePart> to mark the parts of the title,
include the type attribute, assigning one of these values:
"main", "sub", "desc" (descriptive),
"alt" (alternative), or "volume" (for volume
information).
|
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
DTD |
|
| Volume information on title page |
Major Structure: Title Pages |
|
| Description |
Volume information on the title page should be encoded as
<titlePart type="volume">.
|
| Remarks |
This rule holds true even if the volume information is
separated from the title by the byline or other elements
(<titlePart> is allowed outside <docTitle>).
. . . </docTitle>
<byline>By <docAuthor>BOOKER T. WASHINGTON</docAuthor></byline>
<titlePart type="volume">VOLUME I</titlePart>
|
| Enforcement |
|
| Verso of title page |
Major Structure: Title Pages |
|
| Description |
The content on the verso (reverse side) of the title page should
be included within the <titlePage> element, typically inside
<docImprint>.
|
| Example |
page images
<titlePage>
<pb/>
<docTitle>
<titlePart type="main">THE <lb/>
UNDERGROUND RAILROAD <lb/>
FROM <lb/>
SLAVERY TO FREEDOM</titlePart>
</docTitle>
<byline>BY <lb/>
<docAuthor>WILBUR H. SIEBERT</docAuthor>
ASSOCIATE PROFESSOR OF EUROPEAN HISTORY <lb/>
IN OHIO STATE UNIVERSITY</byline>
<titlePart type="desc"><hi rend="italic">WITH AN INTRODUCTION</hi>
BY <lb/>
ALBERT BUSHNELL HART
PROFESSOR OF HISTORY IN HARVARD UNIVERSITY</titlePart>
<docImprint>
<pubPlace><hi rend="gothic">New York</hi></pubPlace>
<publisher>THE MACMILLAN COMPANY <lb/>
LONDON: MACMILLAN & CO., <hi rend="small-caps">Ltd.</hi></publisher>
<docDate>1899</docDate>
<hi rend="italic">All rights reserved</hi>
<pb/>
<hi rend="small-caps">Copyright</hi>, 1898, <lb/>
<hi rend="small-caps">By THE MACMILLAN COMPANY</hi>. <lb/>
<ornament type="line"/>
Set up and electrotyped December, 1898.   Reprinted September, <lb/>
1899. <lb/>
<lb/>
<lb/>
<lb/>
<hi rend="gothic">Norwood Press</hi> <lb/>
J. S. Cushing & Co. — Berwick & Smith <lb/>
Norwood Mass. U.S.A.
</docImprint>
</titlePage>
|
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
program |
| Name: |
qa_lib_structure |
| Message type: |
warning |
| Comments: |
This practice is machine-enforceable in the sense that the QA program checks for
exactly two <pb> elements within <titlePage>.
|
|
| Openers and closers |
Genres: Letters |
|
| Description |
When encoding letters, prefaces, and other such personal
writings, use <opener> and/or <closer> as needed to
encode the opening and closing sections of the division.
|
| Remarks |
<opener> and <closer> typically contain one or more of these
elements:
- <dateline> — groups together the place, date, etc., the letter was written;
normally contains at least <name type="place">
and <date value="...">
- <date> — contains a date in any format; use the value
attribute to provide the date in standardized format; see
Standardized date formats
- <name> — contains a proper name; use type to indicate
"person" or "place".
- <salute> — salutation at the beginning (e.g. “Dear Sir”) or end
(e.g. “Yours sincerely”) of a letter
- <signed> — signature at the end of a letter, preface, etc.
|
| Example |
page images
<div2 type="letter">
<head>TO MRS. H. LINCOLN.<ref target="n1"><hi rend="super">1</hi></ref>
<!-- <note id="n1" place="foot"> here -->
</head>
<opener>
<dateline>
<name type="place">Weymouth</name>, <date value="1761-10-05">5 October, 1761.</date>
</dateline>
<salute>MY DEAR FRIEND,</salute>
</opener>
<p><hi rend="small-caps">Does</hi> not my friend think me a stupid girl, when <lb/>
she has kindly offered to correspond with me, that <lb/>
I should be so senseless as not to accept the offer? <lb/>
<!-- continues -->
<p>I can say, in the length of this epistle, I've made <lb/>
the golden rule mine. Pray, my friend, do not let it <lb/>
be long before you write to your ever affectionate</p>
<closer>
<signed>A. S.</signed>
<seg type="postscript" rend="block">P.S. My regards to your good man. I've no <lb/>
acquaintance with him, but if you love him, I do, <lb/>
and should be glad to see him.</seg>
</closer>
</div2>
|
| Enforcement |
|
| See also |
Block quotations with opener/closer
Default alignment
|
| Postscripts |
Genres: Letters |
|
| Description |
Postscripts in letters should be encoded using <seg
type="postscript" rend="block"> within <closer>.
|
| Example |
See preceding example |
| Enforcement |
|
| Line breaks in verse |
Genres: Verse |
|
| Description |
When encoding verse it is important to distinguish between
logical lines of verse and the physical presentation of those lines on
the printed page. In cases where a line of verse is too long to fit on
the printed page, and for that reason is continued on a second line,
use <l> to mark the logical line of verse and <lb/> to
mark the physical line break.
|
| Enforcement |
|
| See also |
Line breaks |
| Alternate spellings and usage examples |
Genres: Dictionaries |
|
| Description |
Alternate spellings should be marked with
<orth type="alt">. Usage examples should be marked with
<eg>.
|
| Remarks |
The <eg> element does not allow character data; instead,
<eg> must contain <q> (for examples with no attributed
source) or <cit> (for examples that include an attribution of
the author or source text).
NOTEBecause DLPS normally uses <q> for block
quotations, when using <q> in a dictionary entry please
indicate <q rend="inline">, as shown in the following
example.
|
| Example |
Again, conj. Agen; agin:
By the time that, untill: “I’ll have
it ready agin
you come.”
<entry rend="hang">
<form><orth><hi rend="bold">Again,</hi></orth></form>
<gramGrp><pos><hi rend="italic">conj.</hi></pos></gramGrp>
<form><orth type="alt"><hi rend="italic">Agen; agin:</hi></orth></form>
<def>By the time that, untill:</def>
<eg><q rend="inline">"I'll have <lb/>
it ready <hi rend="italic">agin</hi> you come."</q></eg>
</entry>
page image
|
| Enforcement |
|
| Multiple homographs and multiple meanings |
Genres: Dictionaries |
|
| Description |
More complex dictionary entries may include more than one form
of the same word — that is, multiple homographs (words identical
in spelling but different in meaning or pronunciation), each marked
with <hom>. Entries may also include more than one meaning for
the same word, in which case the information (definitions, examples,
etc.) for each meaning should be grouped as a <sense>. If the
senses are labeled with numbers or letters in the print source,
include the label in the n attribute.
|
| Example |
Against, prep. In resistance to; or defense from “They
marched against the Spaniards.” (2.) Opposite. “Over
against a point called
Sandy Point.” Against, conj. “Keep
’em against I come.”
<entry rend="hang">
<hom>
<form><orth><hi rend="bold">Against,</hi></orth></form>
<gramGrp><pos><hi rend="italic">prep.</hi></pos></gramGrp>
<sense>
<def>In resistance to; or defense from</def>
<eg><q rend="inline">"They <lb/>
marched <hi rend="italic">against</hii> the Spaniards."</q></eg>
</sense>
<sense n="2">
(2.) <def>Opposite.</def>
<eg><q rend="inline">"Over <lb/>
<hi rend="italic">against</hi> a point called Sandy Point."</q></eg>
</sense>
</hom>
<hom>
<form><orth>Against,</orth></form>
<gramGrp><pos><hi rend="italic">conj.</hi></pos></gramGrp>
<eg><q rend="inline">"Keep <lb/>
'em <hi rend="italic">against</hi> I come."</q></eg>
</hom>
</entry>
page image
|
| Enforcement |
|
| Super entries |
Genres: Dictionaries |
|
| Description |
In cases where words with identical spellings (homographs)
receive separate entries in the dictionary (rather than being included
within a single entry), each entry should be marked as an
<entry> as usual, but then the group of entries should be
wrapped in a <superEntry> element.
|
| Example |
page image
<superEntry>
<entry rend="hang">
<form><orth><hi rend="bold">Pitch,</hi></orth></form>
<gramGrp><pos><hi rend="italic">n.</hi></pos></gramGrp>
<def>The height of anything.</def>
<eg><q rend="inline">"The roof was ten feet <lb/>
<hi rend="italic">pitch</hi>."</q></eg>
<eg><q rend="inline">"Tester bedstead 7½ feet <hi rend="italic">pitch</hi>."</q></eg>
</entry>
<entry rend="hang">
<form><orth><hi rend="bold">Pitch,</hi></orth></form>
<gramGrp><pos><hi rend="italic">v.</hi></pos></gramGrp>
<sense>
<def>To pitch in, to begin; set to work with promptness <lb/>
or energy.</def>
</sense>
<sense n="2">
(2.) <def><hi rend="italic">To pitch into</hi>, to attack; assault.</def>
</sense>
</entry>
<entry rend="hang">
<form><orth><hi rend="bold">Pitch,</hi></orth></form>
<gramGrp><pos><hi rend="italic">v.</hi></pos></gramGrp>
<def>To sit down; to light.</def>
<eg><q rend="inline">"I saw wild geese <hi rend="italic">pitch</hi> <lb/>
in the wheatfields."</q></eg>
</entry>
<entry rend="hang">
<form><orth><hi rend="bold">Pitch,</hi></orth></form>
<gramGrp><pos><hi rend="italic">v.</hi></pos></gramGrp>
<def>To plant.</def>
<eg><q rend="inline">"I have already <hi rend="italic">pitched</hi> my crop."</q></eg>
</entry>
</superEntry>
<superEntry>
<entry rend="hang">
<form><orth><hi rend="bold">Pitcher,</hi></orth></form>
<gramGrp><pos><hi rend="italic">n.</hi></pos></gramGrp>
<def>A vessel of various sizes with one handle and a <lb/>
lip-spout for holding water or other liquids. A basin and <lb/>
<hi rend="italic">pitcher</hi>. Never called <hi rend="italic">jug</hi>.</def>
</entry>
<entry rend="hang">
<form><orth><hi rend="bold">Pitcher,</hi></orth></form>
<gramGrp><pos><hi rend="italic">n.</hi></pos></gramGrp>
<def>The man who pitches the sheaves of wheat up on <lb/>
the cart or stack, by means of a pitch-fork.</def>
</entry>
</superEntry>
|
| Enforcement |
|
| Block quotations |
Block-level Features: Block Quotations |
|
| Description |
Block quotations should be encoded using the <q>
element.
|
| Remarks |
By block quotation we simply mean a quotation set off
from the surrounding text by one or more of these typographic
changes:
- set off by line breaks
- indented
- in a smaller typeface
<q> is always used for block quotations, irrespective of
whether or not the narrator/author attributes the quotation to an
external source; that is, the <quote> element should never be
used.
|
| Enforcement |
|
| See also |
Quoted material |
| Block quotations with opener/closer |
Block-level Features: Block Quotations |
|
| Description |
For block quotations requiring <opener>
and/or <closer>, use
<q><text><body><div1>.
If the quoted text is a letter (the most common case), use
<q><text><body><div1 type="letter">.
If the quoted text is not a letter or other type for which an appropriate
type value exists, use
<q><text><body><div1 type="quotation">.
|
| Example |
page image
passage of the Kansas-Nebraska act. The results secured by <lb/>
the two circulars will be seen in the following letter from <lb/>
Francis Jackson, of Boston, to his fellow-townsmen and co- <lb/>
worker, the Rev. Theodore Parker.</p>
<q><text><body><div1 type="letter">
<opener>
<dateline>
<name type="place"><hi rend="small-caps">Boston</hi></name>,
<date value="1854-08-27">Aug. 27,1854</date>.
</dateline>
<salute><hi rend="small-caps">Theodore Parker</hi>:</salute>
</opener>
<p><hi rend="italic">Dear Friend</hi>,— The contributions of the churches in behalf of <lb/>
the fugitive slaves I think have about all come in. I herewith <lb/>
<!-- letter continues -->
have been. Those societies who have contributed, I judge were <lb/>
least able to do so.</p>
<closer>
<signed><hi rend="small-caps">Francis Jackson</hi>.<ref target="n4.1"><hi rend="super">1</hi></ref>
<note id="n4.1" place="foot"><seg type="note-symbol"><hi rend="super">1</hi></seg><p>Theodore Parker's
<hi rend="italic">Scrap-book</hi>, Boston Public Library.</p></note>
</signed>
</closer>
</div1></body></text></q>
<p>The political affiliations of underground helpers before <lb/>
1840 were, necessarily, with one or the other of the old <lb/>
|
| Enforcement |
|
| See also |
Openers and closers |
| A figure at the start of a division: <frontispiece> |
Block-level Features: Figures and Ornaments |
|
| Description |
A <frontispiece> is a figure that occurs at or near the
beginning of a structural division.
|
| Remarks |
The <frontispiece> element is identical to the <figure>
element except that, unlike a <figure>, a <frontispiece> can
occur at the top of a <divN> element.
There are cases where an illustration is the first component in a chapter
or other structural division. This occurs most often when the figure is a
full-page illustration on the page immediately preceding the first page of
content for that division, and the illustration is clearly related to the
content that follows it, not the content that precedes it. In such cases, the
illustration should be marked as a <frontispiece> at the beginning
of the new <divN>, not as a <figure>
at the end of the preceding <divN>.
|
| Example |
page images
<div1 type="section">
<pb entity="b000234935_0130"/>
<frontispiece entity="b000234935_0130_0" rend="page">
<head><hi rend="small-caps">Patrick Henry.</hi></head>
</frontispiece>
<pb entity="b000234935_0131"><fw type="sig" place="bottom-left">8</fw></pb>
<head type="main">THE FAMOUS <lb/>
Revolution Speech of Patrick Henry,</head>
<head type="sub">DELIVERED BEFORE THE VIRGINIA CONVENTION IN ST. JOHN'S <lb/>
CHURCH, 1775.</head>
<ornament type="line"/>
<p>"Mr. President," said he, "it is natural to man to indulge <lb/>
in the illusions of hope. We are apt to shut our eyes against <lb/>
a painful truth and listen to the song of that siren, till she <lb/>
|
| Enforcement |
|
| Horizontal lines |
Block-level Features: Figures and Ornaments |
|
| Description |
Horizontal lines should be encoded using the <ornament>
element. For true horizontal lines, set type to
"line". For a string of asterisks, periods, etc. that functions
as a horizontal line, set type to "characters" and
include the characters as the content of the <ornament>
element.
|
| Example |
<ornament type="line"/>
<ornament type="characters">* * * * * * * *</ornament>
|
| Enforcement |
| Machine-enforceable: |
semi |
| Method: |
program |
| Name: |
qa_lib_misc |
| Message type: |
error |
| Comments: |
The qa_lib_misc QA program verifies that if type="characters",
the <ornament> element must contain text (must not be empty), and that if
type="line" (or type="ornament"), the <ornament>
element must be empty. But there is no machine-enforceable way to ensure that horizontal lines
are marked properly, or marked at all.
|
|
| Note symbols |
Block-level Features: Notes |
|
| Description |
When the note body includes the referencing symbol
(a number, *, †, etc.), record this symbol using
<seg type="note-symbol"> as the first element within
<note>.
|
| Example |
page image and example markup
|
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
program |
| Name: |
qa_lib_notes |
| Message type: |
warning |
| Comments: |
Because a note symbol is almost always printed for anchored notes
(footnotes and endnotes), the QA program issues a warning if the first child
element of <note> is not <seg type="note-symbol">, unless
anchored="no".
|
|
| Reason |
Isolation of the note symbol is expected to
facilitate delivery, especially if the delivery system chooses to suppress the
printed note symbol and instead use the n value for display.
|
| See also |
Unanchored notes |
| n attribute for notes
|
Block-level Features: Notes |
|
| Description |
The n attribute is required on both the note
reference (<ref> or <ptr/>) and the note body
(<note>). Its value should be a label for display, which may or
may not be equivalent to the note symbol transcribed from the print
source (for note reference, content of <ref>; for note body,
content of <note><seg type="note-symbol">).
|
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
program |
| Name: |
qa_lib_notes |
| Message type: |
error |
| Comments: |
The QA program requires n on <note> (except in
<teiHeader>), and also on <ref> and <ptr/>.
Compliance with this practice is automated: The processing script
notes_n programmatically adds the n
attribute on <note>, <ref>, and <ptr/> elements.
|
|
| Reason |
Facilitates delivery. If n is always present and
always carries a label for display, delivery of notes is greatly
facilitated, without interfering at all with the transcriptional
content.
|
| Other features |
Block-level Features: Other Block-level Features |
|
| Description |
Arguments, bibliographic citations, epigraphs, and trailers
should be marked using the appropriate TEI elements.
|
| Example |
An epigraph containing a quotation, along with attribution of its source:
<epigraph>
<cit>
<q>"I have sworn upon the altar of God <lb/>
eternal hostility against every form of tyranny <lb/>
over the mind of man."</q>
<bibl><author> — <hi rend="italic">Thomas Jefferson.</hi></author></bibl>
</cit>
</epigraph>
A trailer:
<trailer>FINIS.</trailer>
|
| Enforcement |
|
| Changes in typeface |
Phrase-level Features: Changes in Typeface |
|
| Description |
With the exception of foreign
phrases, the vendor has been instructed to mark changes in
typeface as physical changes, not with a
logical element such as <emph>, <title>,
<term>, <mentioned>, etc. At the post-keyboarding stage,
this practice continues. For the sake of consistency, and as a matter
of practical necessity, DLPS does not normally undertake the enhanced,
logical markup of changes in typeface.
|
| Remarks |
The most common values for the rend attribute on
<hi> are:
- italic
- bold
- underline
- super — superscript
- sub — subscript
- small-caps
- gothic
Less common but valid values are:
- line-through
- open
- overline
- red-letter
- roman — assumed and not normally necessary
- script
- slash-through
- other — indicate rendering using the
other attribute, as in: <hi rend="other"
other="..."> (Of course, <hi> should be used in this way
only as a last resort.)
The value "gothic" should be used for the gothic or
black-letter style of typeface. In modern printed books, gothic type
is typically used to highlight a name or brief passage.
page image
<docImprint>
<pubPlace><hi rend="gothic">New York</hi></pubPlace>
<publisher>THE MACMILLAN COMPANY <lb/>
LONDON: MACMILLAN & CO., <hi rend="small-caps">Ltd.</hi></publisher>
<docDate>1899</docDate>
<hi rend="italic">All rights reserved</hi>
|
| Enforcement |
| Machine-enforceable: |
no |
| Comments: |
Enforced by the DTD in that
rend is required on <hi>, and rend has an
enumerated vocabulary. But as always, the appropriate use of
<hi> and its available rend values is not
machine-enforceable.
|
|
| Reason |
While ideally changes in typeface would always be encoded
with the appropriate logical element, in practice this is not feasible
for vendor-produced markup. The encoders may not be native speakers of
English and should not be expected to make the appropriate semantic
distinctions or determine the author’s rhetorical intention. Instead,
such distinctions should be reserved for second-pass markup by a
native speaker of English, perhaps even a subject-matter specialist.
Note, however, that such second-pass markup is not a part of the DLPS
workflow. Normally a text will receive such enhanced, logical markup
only if someone outside DLPS (faculty member, Info Comm, Etext Center,
etc.) happens to take an interest in the text and brings the resources
(funding, staff, etc.) to undertake the additional markup.
|
| Small caps |
Phrase-level Features: Changes in Typeface |
|
| Description |
Text that is printed in small caps should be transcribed using
both upper-case and lower-case letters, not all
upper-case letters.
|
| Example |
page image
<div2 type="chapter">
<pb n="184"/>
<head>CONNECTICUT.</head>
<p>There was no press in this colony until 1709; and, I <lb/>
believe, not more than four printing houses in it before <lb/>
1775.</p>
<div3 type="section">
<head><hi rend="small-caps">New London</hi>.</head>
<p>The first printing done in Connecticut was in that town; <lb/>
<!-- continues -->
|
| Enforcement |
|
| Reason |
This is the simplest way to distinguish the fully capitalized
letters from the small-caps letters. The alternative method of marking
small caps would be to mark only the small-caps letters, leaving the
fully capitalized letters unmarked, for example:
H<hi rend="small-caps">ERE</hi>, A<hi rend="small-caps">BANDON</hi>
A<hi rend="small-caps">LL</hi> H<hi rend="small-caps">OPE</hi>
rather than simply
<hi rend="small-caps">Here, Abandon All Hope</hi>
The former is ugly, tag-heavy, and unnecessary.
|
| Representing alignment and indentation |
Phrase-level Features: Alignment and Indentation |
|
| Description |
When indicating alignment or indentation, use the rend
attribute, either on structural elements
(<p>, <l>, <cell>, <item>, etc.) or on
<hi>, as appropriate to the situation.
|
| Remarks |
For indicating alignment, the available rend values are:
- center
- left — assumed and not normally necessary
- right
For indicating indentation, the available rend values are:
- indent
- indent2, indent3, indent4,
indent5 — for cases where more than one level of indentation needs to be
recorded (Use these values sparingly, and only when "indent"
has already been used. Normally these values are only needed when
encoding lines of verse.)
- hang — for hanging indentation —
that is, when the first line of content is further left
than subsequent lines; common in lists, such as indexes
|
| Enforcement |
| Machine-enforceable: |
no |
| Comments: |
Enforced by the DTD in the sense that rend has an
enumerated vocabulary, but as always, the appropriate use of the available
rend values is not machine-enforceable.
|
|
| Reason |
Originally the DTD Practices Group recommendations limited
global rend to "block", "inline", or
"none", with the intention of encouraging the nesting of
typographic markup within structural markup (for example,
<p><hi rend="bold">...</hi></p> rather than
<p rend="bold">...</p>), partly for inherent
logical/semantic reasons and partly to facilitate writing stylesheets for
delivery. In practice this came to seem counterintuitive for marking alignment and
indentation, especially for marking lines of verse, where indentation
markup is common. We decided to consider alignment and indentation as
properties of the elements themselves, not as a form of highlighting
(for example, <l rend="indent">...</l> is preferable
to <l><hi rend="indent">...</hi></l>, since the line
itself is indented; the text within the line is not being highlighted). That
is, we decided to distinguish between display ("block",
"inline", or "none"), alignment ("center",
"left", or "right"), and indentation
("indent", "indent2" etc., or "hang") on
the one hand, and typographic changes on the other.
|
| Use of <foreign> |
Phrase-level Features: Foreign Phrases |
|
| Description |
Words or phrases that are both (a) typographically distinct
(usually in italics), and (b) not in the main language of the text,
should be marked with the <foreign> element.
|
| Enforcement |
|
| Reason |
The reasons for requiring that foreign phrases must be
typographically distinct to warrant <foreign> are that (a) the
keyboarding vendor can know unambiguously when to use
<foreign>, and (b) if the phrase is not italicized or otherwise
distinct, it is presumably so common as to have lost its foreign-ness,
at least in the author’s (or copy editor’s, or typesetter’s)
estimation.
|
| Declaring languages |
Phrase-level Features: Foreign Phrases |
|
| Description |
Each language identified by a lang attribute (on <foreign>, or
on any other element) must be declared in a <language> element within
the <teiHeader>.
|
| Example |
In the <teiHeader>:
<profileDesc>
<langUsage>
<language id="eng" usage="main">English</language>
<language id="fre">French</language>
</langUsage>
</profileDesc>
In the body of the text:
This the reader is willing to <lb/>
accept as a possible occurrence; but when she <lb/>
goes on to say that having completed his dem- <lb/>
onstration, this person triumphantly replaced <lb/>
his pencil-case, and with his feet upon the chimney- <lb/>
piece whistled Yankee Doodle, <foreign lang="fre"><hi rend="italic">c'est un peu <lb/>
trop fort</hi></foreign>, and we are probably justified in set- <lb/>
ting it down as a bit of literary colour.</p>
|
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
program |
| Name: |
qa_lib_foreign |
| Message type: |
error |
| Comments: |
Although the lang attribute is declared as type
IDREF, a validating parser can only verify that
the value of lang corresponds to an ID
somewhere in the XML document. The QA program checks whether
lang actually corresponds to a <language id="...">
within teiHeader/profileDesc/langUsage.
|
|
| Retaining typographic distinction |
Phrase-level Features: Foreign Phrases |
|
| Description |
Using the <foreign> element does not eliminate
the need to encode the change in typeface using <hi>.
|
| Remarks |
Since foreign phrases are usually italicized, typical
markup for a foreign phrase will be:
<foreign lang="..."><hi rend="italic">...</hi></foreign> |
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
program |
| Name: |
qa_lib_foreign |
| Message type: |
warning |
|
| Reason |
Explicitness. This approach is preferable to assuming that
foreign content is italicized, on the same principle as avoiding the use of default values
for attributes: if there’s a default value, encoders tend to ignore
the attribute altogether and thus forget to use it when it’s actually
applicable, resulting in erroneous (or missing) markup.
|
| Not roman but not Asian |
Phrase-level Features: Foreign Phrases |
|
| Description |
Languages such as Greek, Hebrew, and Russian fall into a special
category. They require non-roman characters, but they are alphabetic,
not ideographic. If the language is within the vendor’s capabilities,
the foreign content should be included in the electronic
transcription. If the language is not within the vendor’s
capabilities, omit the characters from the transcription and use the
<gap/> element to mark the location of the omitted
characters.
|
| Remarks |
When transcribing these kinds of languages, use the
appropriate character entities, when available (namely Greek), or XML
character references with Unicode hexadecimal values:
- Greek — Use the iso-grk1.ent character entities, supplemented as needed
by the accented characters in iso-grk2.ent
- Hebrew — Use the Hebrew block of Unicode (0590 - 05FF)
- Russian — Use the Cyrillic block of Unicode (0400 - 04FF)
|
| Enforcement |
|
| See also |
Use of <gap/>
Special Characters
|
| Page number corrections |
Reference Systems: Page Breaks |
|
| Description |
In cases where it is necessary to correct a page number (or add
any other markup to a page number), insert an <fw> (form work)
element within the <pb> element, and use <corr> as
usual to make the correction. Put the correct number in the n
attribute.
|
| Example |
<pb n="242"/>
<!-- ... -->
<pb n="243"><fw type="pageno"><corr resp="gpm2a" sic="242">243</corr></fw></pb>
<!-- ... -->
<pb n="244"/>
|
| Enforcement |
|
| Reason |
We do not consider the n attribute on
<pb> to be a transcriptional space; n is actually
just a label for display. The n attribute is the most
convenient and traditional place to record the page number, but in
cases where tagging is needed around a page number, it is not possible
in XML to add that tagging within an attribute value. For these reasons,
we have opted to allow <fw> within <pb>. This approach
is comparable to the distinction between printed note reference symbols
and the n attribute on <note>. See
Note symbols.
|
| See also |
Corrections
Running page headers
|
| Always within a div |
Reference Systems: Page Breaks |
|
| Description |
Page breaks must be placed within a <divN>
element, never between divisions. Therefore, when a division starts on
a new page, the <pb> is the first element in the division,
immediately following the opening <divN> tag
(preceding even the division <head>, if there is one).
|
| Example |
</div2>
<div2 type="chapter" n="II">
<pb/>
<head>II—APPELLATIONS.</head>
|
| Enforcement |
| Machine-enforceable: |
no |
| Comments: |
The DTD disallows <pb>
outside a div, but there is no way for the DTD to control exact
placement of those <pb> elements within the div.
|
|
| Exceptions |
The exception to this rule is newspapers, where the use of
<pb/> is reversed: <pb/> is required to occur
outside of any <divN> (<pb/>
is allowed only within <body>). This practice simply
fits better with the nature of printed newspapers, where each page
contains large amounts of text divided into numerous different divs,
and where a div almost never continues to the next page uninterrupted.
|
| Use of <cb/> |
Reference Systems: Column Breaks |
|
| Description |
If the print source has a single-column layout, it is not
necessary to mark the column at all. For materials with multiple
columns, use <cb/> to mark the beginning of each
column on each page.
|
| Enforcement |
|
| Line breaks |
Reference Systems: Line Breaks |
|
| Description |
Line breaks in running prose should be preserved in the
electronic transcription by marking the end of each printed line with
<lb/>.
|
| Enforcement |
|
| See also |
Line breaks in verse |
| Use of <unclear> |
Special Considerations: Gaps and Uncertainties |
|
| Description |
Use <unclear> to mark passages that cannot be transcribed
with certainty, as happens when a letter/word/phrase is physically
present on the page but is unreadable (due to a printing
error, physical damage to the page such as readers’ marks, or a bad
scan).
|
| Remarks |
When working with words or phrases marked by the vendor as
<unclear>, follow these guidelines:
- If a word marked as <unclear> by the vendor is actually
legible (as can happen when a bad page image is rescanned and replaced
only after the page images have shipped to the vendor), simply supply
the characters necessary to complete the word and remove the
<unclear> start-tag and </unclear> end-tag.
- If the word is unclear but a reasonable supposition can be made
as to the intended word, supply the characters necessary to complete
the word, but leave the <unclear> and </unclear>
markup in place.
- If the word is so illegible that no reasonable supposition can
be made, remove the entire illegible word/phrase and replace it with
an empty <unclear/> element.
|
| Enforcement |
|
| Arbitrary sections |
Special Considerations: Arbitrary Sections |
|
| Description |
When none of the standard TEI elements is appropriate for a
particular textual feature, use <ab> if the feature is a block
element or <seg> if the feature is within a containing block
element.
|
| Remarks |
In texts with complex structure or layout, the encoder is likely to
encounter block-level sections or phrase-level passages that are
difficult to fit into any of the standard TEI elements. In such cases,
it may be best to take advantage of TEI’s elements for arbitrary
sections:
- <ab> — (anonymous block) occurs at the block level (at same level as
<p>, <table>, <list>, etc.)
- <seg> — (segment) occurs at the phrase level (within <p>,
<cell>, <item>, etc.)
Both of these elements accept the type attribute with
any value (no predefined vocabulary).
Although these elements should be used sparingly, they are very
useful when genuinely needed.
IMPORTANTIt is better to use <ab> or
<seg>, when appropriate, than to inject inappropriate markup
— such as <divN> elements that do not truly
reflect the major structural divisions of the work, or <p>
elements that are not really paragraphs — for the sake of
“making it parse.”
If a work contains a particularly problematic feature for which the
preferred encoding is not clear, ask DLPS for further guidance.
|
| Enforcement |
|
| Use of <corr> |
Corrections |
|
| Description |
If the print source contains a blatant error, the error can and
should be corrected in the electronic text. When making corrections,
always use the <corr> element to mark the content that has been
changed.
|
| Enforcement |
|
| resp attribute on <corr> |
Corrections |
|
| Description |
When making corrections, include the resp (responsibility)
attribute on <corr>. Its value should correspond to the
id attribute of a
<name> element within a <respStmt>
in the <teiHeader>.
|
| Remarks |
The typical scenario is to add an entry to the revision history for
the file in teiHeader/revisionDesc (see Adding to the file’s revision history),
then use resp to record responsibility for corrections.
We have no <corr sic="trustworty" resp="gpm2a">trustworthy</corr> statistics <lb/>
|
| Enforcement |
| Machine-enforceable: |
yes |
| Method: |
program |
| Name: |
qa_lib_corr |
| Message type: |
warning |
| Comments: |
The DTD does not require resp on <corr>, because in cases
of conversion/migration of existing markup where the corrector is unknown, it is not always possible
to supply a useful value for resp. Instead, the QA program issues a warning
if resp is missing. The program also verifies that the resp value
corresponds to the id attribute of an element within <teiHeader>. (This
test goes beyond the DTD, which specifies that resp is an IDREF
and so must point to an ID somewhere within the XML document. Actually,
this test is performed for any and all elements that have a resp attribute, not
just <corr>.)
|
|
| char2ent: |
| Filename: |
char2ent |
| Type: |
processing |
| Language: |
Perl |
| Description: |
Converts super-ASCII characters to standard mnemonic character entities, when available |
| Disk path: |
/dlps_work/bin/char2ent |
|
| notes_n: |
| Filename: |
notes_n |
| Type: |
processing |
| Language: |
Perl |
| Description: |
Adds or updates the n attribute on <note>, <ref>, and <ptr/> elements.
|
| Disk path: |
/dlps_work/bin/notes_n |
|
| replace_xml_decl: |
| Filename: |
replace_xml_decl |
| Type: |
processing |
| Language: |
Perl |
| Description: |
Replaces XML declaration |
| Disk path: |
/dlps_work/bin/replace_xml_decl |
|
| qa_dates: |
| Filename: |
qa_dates |
| Type: |
QA |
| Language: |
Perl |
| Description: |
QA program for standardized date values |
| Disk path: |
/dlps_work/bin/qa_dates |
|
| qa_figures: |
| Filename: |
qa_figures |
| Type: |
QA |
| Language: |
Perl |
| Description: |
QA program for <figure> elements
|
| Disk path: |
/dlps_work/bin/qa_figures |
|
| qa_lib_corr: |
| Filename: |
qa_lib_corr.xsl |
| Type: |
QA |
| Language: |
XSLT |
| Description: |
QA program for corrections, additions, and deletions |
| Disk path: |
/cenrepo/bin/cgi-dl/dlps/xsl/qa_lib_corr.xsl |
| URL: |
http://text.lib.virginia.edu/bin/cgi-dl/dlps/markupQA/ |
|
| qa_lib_foreign: |
| Filename: |
qa_lib_foreign.xsl |
| Type: |
QA |
| Language: |
XSLT |
| Description: |
QA program for foreign phrases and the global lang attribute
|
| Disk path: |
/cenrepo/bin/cgi-dl/dlps/xsl/qa_lib_foreign.xsl |
| URL: |
http://text.lib.virginia.edu/bin/cgi-dl/dlps/markupQA/ |
|
| qa_lib_misc: |
| Filename: |
qa_lib_misc.xsl |
| Type: |
QA |
| Language: |
XSLT |
| Description: |
QA program for miscellaneous requirements not handled by the other qa_lib_* stylesheets |
| Disk path: |
/cenrepo/bin/cgi-dl/dlps/xsl/qa_lib_misc.xsl |
| URL: |
http://text.lib.virginia.edu/bin/cgi-dl/dlps/markupQA/ |
|
| qa_lib_notes: |
| Filename: |
qa_lib_notes.xsl |
| Type: |
QA |
| Language: |
XSLT |
| Description: |
QA program for TEI notes and note references |
| Disk path: |
/cenrepo/bin/cgi-dl/dlps/xsl/qa_lib_notes.xsl |
| URL: |
http://text.lib.virginia.edu/bin/cgi-dl/dlps/markupQA/ |
|
| qa_lib_structure: |
| Filename: |
qa_lib_structure.xsl |
| Type: |
QA |
| Language: |
XSLT |
| Description: |
QA program for TEI document structure |
| Disk path: |
/cenrepo/bin/cgi-dl/dlps/xsl/qa_lib_structure.xsl |
| URL: |
http://text.lib.virginia.edu/bin/cgi-dl/dlps/markupQA/ |
|
| qa_xml: |
| Filename: |
qa_xml |
| Type: |
QA |
| Language: |
Perl |
| Description: |
QA program for XML features |
| Disk path: |
/dlps_work/bin/qa_xml |
|
| Maintained by: |
Greg Murray (gpm2a at virginia dot edu), DLPS
|
| Overview: |
The XML source for this documentation describes the local markup
practices for TEI-encoded electronic texts followed by Digital Library
Production Services (DLPS), University of Virginia Library. It
contains three types of markup practices:
- those applicable only to keyboarding/encoding vendors
(“vendor” practices)
- those applicable only after a text has been received from a
vendor (“postkb” practices)
- those applicable to both (“global”
practices)
The documentation is intended to be helpful in three main ways:
- as a set of encoding guidelines for keyboarding/encoding vendors
who produce TEI texts for DLPS
- as a reference for DLPS staff when working on QA/correction,
markup enhancement, migration, etc. of TEI texts
- as a reference for authors of delivery systems for the digital
library, when creating delivery mechanisms for TEI texts produced by
DLPS
|
| Last modified: |
Monday, 03-Aug-2009 15:34:13 EDT |
| Revision history: |
| Date: |
August 2005 |
| Role: |
author |
| Name: |
Greg Murray, DLPS |
| Change: |
Produced first published version. Documented about 100 markup
practices in 8 main categories and about 30 subcategories.
|
| Date: |
February 24, 2006 |
| Role: |
corrector |
| Name: |
Greg Murray, DLPS |
| Change: |
Minor corrections and enhancements to existing practices. |
| Date: |
July 1, 2008 |
| Role: |
corrector |
| Name: |
Greg Murray |
| Change: |
Minor changes and additions. |
|