TEI/MARC "Best Practices"

June 16, 2001 Draft: Changes made since 11/2 appear in this color.

INTRODUCTION

At the TEI and XML in Digital Libraries Workshop that was held at the Library of Congress in July 1998, several working groups were formed to consider various aspects of the Text Encoding Initiative. Group 1 was charged to recommend some best practices for TEI header content and to review the relationship between the Text Encoding Initiative header and MARC. To this end, representatives of the University of Virginia Library and the University of Michigan Library gathered in Ann Arbor in early October to develop a recommended practice guide. Our work was assisted by similar efforts that had taken place in the United Kingdom under the auspices of the Oxford Text Archive the previous year. The following document represents a draft of those recommended practices. It has been submitted to various constituencies for comment. Please send comments to teimarc@umich.edu

WORKING ASSUMPTIONS

A TEI header can serve many publics. Headers can be created in a text center and reflect the center's standards, or they can serve as the basis for other types of metadata system records produced by other agencies. Headers can function in detached form as records in a catalog, as a title page inherent to the document, or as a source for index displays.

In addition, a header may describe a collection of documents, a single item, or a portion of an item. Variances in TEI header content can result from making different choices of what is being described.

A TEI header may not have a one to one correspondence with a MARC record. One TEI header may have multiple MARC analytic records, or one MARC record may be used to describe a collection of TEI documents with individual headers.

A TEI header serves several purposes. It may contain an historical background on how the file has been treated. It can extend the information of a classic catalog record. The Text Center and/or cataloging agency can act as the gatekeeper for creators by providing standards for content.

Does the TEI header act as the electronic title page or as a catalog record? Is it integral to the document it describes or independent? Depending on the community being served, the TEI elements will reflect the interest of that community. Nonetheless, it is possible to describe a set of "best practices" that will produce compatible content while accommodating this variety of purposes. Compatibility of content encourages a more understandable set of results when information about assorted items is displayed as a set of search results, a contents list, or an index, and it allows for more reasonable conversion of content information from TEI tags to elements of other metadata sets when this action seems advisable.

It is a traditional practice of librarianship to agree upon where in a document and in what order of preference one should look to identify the title, author, etc., of that document. This permits a certain consistency in terminology and allows for a certain amount of authentication of content. We recommend the following preferences to those who create headers and to those who attempt to use headers to create traditional catalog records that are compliant with AACR2 and ISBD(ER) rules.

As a member of the academic community, the header creator/editor has a responsibility to verify, whenever humanly possible, the intellectual source for an electronic document that presents itself without any information regarding its source or authorship.

Chief Sources of Information for Several Types of Electronic Resources Are:

1. For an electronic document with a digitized title page (without a header), prefer
a) Chief source of information = information coded as title page
b) Use added information from an originating paper document if absolutely certain it is the source

2. Electronic document with header (without a title page)
a) Chief source of information = supplied and verified header*
b) Use information from paper document if absolutely certain it is the source

3. Electronic document with header and title page
a) Chief source of information = supplied header (if verified)*
b) If header is not verified, use title page as chief source.
c) Use information from paper document if absolutely certain it is the source

4. If neither header nor title page is present and there is no evidence of a source document, the header creator
a) May assign a title and author if appropriate
b) Enclose the information in brackets, using the standard English language convention for editorial interjections

5. If neither header nor title page is present but the header creator has satisfactory evidence of an originating source, that document should be used as the chief source of information for the title and author of the header. If the source cannot be fully verified as to edition, authorship, etc., this fact should be clearly indicated in a note in the <fileDesc>.

*Verified means that the cataloger/editor has established for him/herself that the information represented as title information is an accurate representation of content.
  TAG    RECOMMENDATION 
<teiHeader type="____"> Standards which apply to the header, e.g.,
<teiHeader type="ISBD(ER)">, <teiHeader
type="AACR2">
<fileDesc>  
<title type=____> Only uniform title and main title should be
entered here, e.g., <titleType="uniform">
or <titleType="main">. See <sourceDesc>
for other title forms for documents where a
user might seek the documents under titles
other than those assigned. Where a title is
provided by the header creator rather than
the document creator, the title should be
enclosed in square brackets using standard
English language conventions for editorial
insertion.
<author> Author of original source (electronic or
print) should be entered into the <author>
tag before the <respStmt>. Use discrete
tags within <author> tag for "last name",
"first name", "middle name", "date",
"position title" to allow future flexibility in
display, indexing, and in transferring to
MARC. Whenever possible, establish or
use nationally established forms of names.
The name should be inverted and entered in
the established form.
<editor> Editor of original source (electronic or
print) should be entered into the <editor>
tag before the <respStmt>.
<respStmt> The editor (also compiler, illustrator) of an
electronic version should be entered into
the appropriate tag in the <respStmt>. The
name should be inverted and entered in the
established form.
<editionStmt> Caution: Remind users that the edition
statement here refers to the electronic
piece--not the original item. This field
should be used sparingly as there are
currently no standards as to when versions
become editions. Users should refer to the
instructions in the TEI manual.
<extent> Use the standard text "ca.**** kilobytes".
<publicationStmt> Caution: This statement describes the
electronic file.
<publisher> The publisher is whoever has collected the
electronic text and has made decisions
concerning it.
<distributor> The distributor is whoever makes the
electronic text available.
<idno> Any unique identification number
determined by the publisher.
<availability> Use specialized elements when anticipating
sharing of the header or free text if only
local usage is expected. Caution: Know
your audience.
<date> Refers to the date of the publication of the
electronic document. For most purposes, the year date (yyyy) will be adequate. If greater detail is required, enter dates as yyyymmdd.
<seriesStmt> Whenever possible, establish the national
authority file authorized form for the
electronic locally created series.
<notesStmt> Optional, depending on display decisions.
Should be used for indicating questionable
attributions for title, author, etc.
<sourceDesc> In order to effectively represent the
source(s) when many documents are
represented by the TEI header, we see the
need for structured elements that minimally
allow us to identify parent-child and
component relationships. In the absence of
these structures, we suggest that multiple
source descriptions be employed with
relationships described in free text.
Relationships also could be useful in other
portions of the TEI header. Cataloger
may need to do research to establish the
original source.
<bibl> or
<biblStruct> or
<biblFull>
Prefer <biblFull> to allow searching on
parts of the description.
<title> It is possible to have multiple <title> fields
in <biblFull>. Alternative titles (cover,
running, spine titles) should be entered in
separate <title> fields in the <biblFull> field
in the <sourceDesc> where they are
searchable.
<author> If the name of the author(s) in the
originating source differs from the
established form, include here the form
from the source tagged
<author type="alternate">.
<editionStmt> Enter edition statement as found on the
original source.
<extent> Enter physical description for the original
source.
<publicationStmt>
<publisher>
Don't repeat field. Enter multiple
publishers divided by semicolons.
<pubPlace> Don't repeat field. Enter multiple
publishers divided by semicolons.
<date> Imprint date for the original source. For most purposes, the year date (yyyy) will be adequate. If greater detail is required, enter dates as yyyymmdd.
<idno> In this location, <idno> refers to
identification numbers for the source
document. They can be used to indicate the
source's location in an individual
institution's collection. If a formal standard
location system is being used, indicate the
nature of the system, e.g., <idno type="LC
call number">.
<seriesStmt> Establish via national authority file the
series statement of original document.
<notes> Caution: Notes made here should refer to
the original source.
<encodingDesc>  
<projectDesc> Enter a description of the purpose for which
the electronic file was encoded.
<editorialDecl> Enter general and specific statements about
how the electronic file has been treated.
Record here editorial decisions made during
encoding.
<refsDecl> <refsDecl> seems a possibility for adminis-
trative metadata, e.g., pagination and page
sequencing.
<classDecl>
<taxonomy id=____>
If used, identify the appropriate taxonomy
definitions or descriptive sources in the
<taxonomy> tag followed by id, e.g.,
<taxonomy id=LCSH>, <taxonomy id=AAT>.
<profileDesc>  
<creation>
     <date>
Use the date as it comes from the creator. For most purposes, the year date (yyyy) will be adequate. If greater detail is required, enter dates as yyymmdd.
<langUsage> Language usage is specified by document
creators. Use standard language names.
<language id=____> Use the ISO 639-2 standard (which is the
same as the MARC language codes).
<textClass> True classification numbers as opposed to
call numbers can be entered here.
   <keywords>
      <term>
Use for uncontrolled terms.
   <keywords scheme=____>
      <term>
Use for controlled vocabulary as specified
in <encodingDesc> taxonomy id. Example: scheme="LCSH"
<revisionDesc>  
<change>
<resp>
<item>
Use the specific codes to note revisions
rather than free text description. Include
the entire date (e.g., 19991101).

MINIMAL LEVEL HEADER RECOMMENDATIONS

<minimal Header>
   <fileDesc>
         <title>
         <publicationStmt>
         <sourceDesc>
         <biblFull>

Repeat the <biblFull> field, as appropriate, if there is more than one source for the electronic item.

Consult "TEI Header" v.1, p.124-135, Section 5.6 for more information on creating minimal TEI headers.

RECOMMENDED ADDITIONS TO TEI AND TEILITE DTD

ACKNOWLEDGMENTS AND BIBLIOGRAPHY

This guide was prepared by Judy Ahronheim, Thomas Champagne, Lynn Marko, Kelly Webster, and Chris Wilcox of the University of Michigan Library and Jackie Shieh of the University of Virginia Library in October 1998. The source documents were the cataloging guides prepared by those two institutions (Virginia--http://www.lib.virginia.edu/cataloging/manual/chapters/chapxiib.html and Michigan--http://dns.hti.umich.edu/htistaff/cataloging/). In addition, documentation from the Oxford Text Archive, Arts and Humanities Data Service of the United Kingdom also was made available to assist in this effort.

DESCRIPTION OF TEXT ENCODING INITIATIVES (TEI) HEADER ELEMENTS AND CORRESPONDING USMARC FIELDS (APPENDIX A).