================================================================= KEY *** = will be done, essential to functionality ** = should probably be done, helpful to functionality * = should be done if easy, adds marginally to functionality ? = of dubious utility =================================================================
|***||Part-of-speech abbreviations tagged on headwords.|
|**||Part-of-speech abbreviations tagged in in-text cross-references.|
|?||Part-of-speech abbreviations tagged elsewhere.|
|***||Language terms (usually abbreviated) tagged in etymologies.|
|*||Language terms (usually abbreviated) tagged in definitions.|
|*||Language terms (usually abbreviated) tagged everywhere they appear.|
|**||"Cp. notes" et sim. tagged as <NOTE>. (Most are now tagged as part of the form section).|
|**||Internal cross-references tagged and linkable to headwords (?and individual senses)|
|**||Zoological/botanical taxonomic names tagged as such.|
|The rendition should default to parenthesised; but a rend attribute with "nonpar" would allow the tag to be used elsewhere:|
|***||Bolded etyma tagged logically as etyma. Alternatively:|
|**||Labels in form section tagged as <LBL>|
|*||Specific kinds of labels separately tagged within <LBL>, e.g. <DIAL> or <ERA> (Prefer this to TYPE attribute on <LBL>)|
|Addition of a REND attribute would allow the use of the LBL element for non-parenthesized labels, e.g. grammatical labels in form sections.|
|**||Bolded phrases and compounds tagged (?as <PHRASE>) within <DEF>. It is probably not useful or practicable at this point to try to bind cited phrases to their associated definitions, in the manner of the TEI <RE> (related-entry) element; but the addition of a grouping element containing the phrase and the definition would allow for that.|
|*||Head-form given separate tag (<HDORTH>?).|
|?||Other abbreviations tagged invisibly as <ABBR>; an excessive example:|
|**||Citations of ME words (generally corresponding to a MED headword, sometimes to an inflected or variant form) currently tagged merely as bold. This excludes <ORTH> elements (already separatedly tagged) and <ETYMON> elements (as described above), but includes explicit and implicit internal cross-references; perhaps also the components of phrases and compounds.|
|*||Citations of non-ME words currently tagged merely as bold.|
|?||A simple mechanism, short of full logical tagging, for dealing with the bracketed material inserted into MED quotations so as to exclude portions of it from indexing and retrieval as ME. Ideally, it should be forwards-compatible with such more elaborate encoding as may be added later, including identification of MS abbreviations and translation-source languages and texts. Perhaps <ADD> could be as equivalent to the MED's brackets; with <ME> used within <ADD> as a way of restoring portions of the added material to searchability. In this example, "read, Hrl, L intende, God" would not be retrievable as ME.|
|?||References to the entry's head-word tagged as such: especially references in phrases (tagged mostly as the EMPTY element <oREF>) and in quotations (tagged as the #PCDATA element <oVAR>).|
N.B. <POS>, <LANG>, and <NOTE> are already used in current MED production.
*** ID attribute on <ENTRYFREE> to allow other texts to link to it. * RID or TARGET attribute on <ETYMON> to allow it to link to other dictionaries. ** LANG attribute on <ETYMON>. * RID or TARGET attribute on <WORD> to allow it to link to referenced headword/entry. ** LANG attribute on <WORD> for non-ME words. ? TYPE (=head, variant) attribute on <ETYMON> to distinguish between other dictionaries' headwords and variant forms. (Or use the TEI <DISTINCT> element). EXP attribute on most elements likely to contain abbreviated text, e.g.: *** <LANG> <LANG EXP="Old Slavonic">OSl.</LANG> ** <ABBR> <ABBR EXP="superlative">sup.</ABBR> ** <DIAL> <LBL>early <DIAL EXP="Kentish">K</DIAL></LBL> ** <USG> <USG EXP="numismatics">num.</USG> *** <ORTH> <ORTH EXP="arthetik,arthretik">arth(r)etik</ORTH> * <PHRASE> <PHRASE EXP="don askinge,yeven askinge">don (yeven) ~</PHRASE> ** REND attribute on <TAX> element, as above. ** REND attribute on <LBL> element, as above.
3. Enlarged use of current tags*** Non-italicised usage labels tagged as
(this is already done, somewhat irregularly, in current MED production) ? Subsenses or phrasal definitions tagged by nested tags. (this is partly done in current MED prod. with respect to lettered subsenses)
|***||FORMS>||Given the uncertainty of expanding forms in the text itself, consider
the virtues of expansion (without diacritics) as attribute values rather than literal text. Though at the expense of
adding to the bulk of the text, and at the risk of encountering attribute limitations, this has three notable advantages, viz.
|Probably prefer to expand these as literal text. This can probably
be done without serious risk of inaccuracy, and makes the
MED locally (e.g. on a single screen) more intelligible
without making it significantly less succinct.|
Alternatively, convert these to the TEI's <oREF> element with the expanded form preserved as an EXP attribute value. This has the minimal advantage of preserving the "squiggle" but with little else to be said for it.
|**||PHRASES||Consider expanding at least some phrases and compounds, perh. as attribute value, for the same reasons as with forms, above.|
|**||ABBREVIATIONS||Since the expansion is usually unambiguous, most abbreviations (including material tagged as , |
<POS>: is roughly equivalent to the TEI element of the same name, differing chiefly to the extent that we have enlarged the content model of <FORM> to correspond to the MED's "form section." POS can therefore appear within the MED's FORM but not within TEI's. POS can similarly appear within the MED's NOTE, ETYM, and DEF, but not the TEI's.
MED: <FORM><ORTH>blog</ORTH> <POS EXP="noun">n.</POS>Note that the <LBL>(1)</LBL> is better handled by the TEI's "key" attribute (on entries)
1.</FORM> TEI: <FORM><ORTH>blog</ORTH></FORM><GRAMGRP><POS>n.</POS><LBL>(1)</LBL></GRAMGRP> or: <FORM><ORTH>blog</ORTH><GRAM TYPE="pos">n.</GRAM><LBL>(1)</LBL></FORM> (non-preferred)
The TEI "expand" attribute on <POS> corresponds to the MED's "exp".
<LANG>, <oVAR>, <oREF>, <REF> seem to correspond exactly with the TEI elements of the same names.
<NOTE> (aside from the POS problem) falls within the general TEI tag of the same name.
<TAX> corresponds with any number of possible TEI equivalents, such as <FOREIGN LANG="taxon" REND="par">.
<ETYMON> corresponds roughly with a number of possible TEI equivalents, such as <MENTIONED>, though that lacks the TYPE attribute.
<LBL> does not exceed the bounds of the very general TEI element of the same name.
<PHRASE> corresponds partly with the <FORM>, <RE>, and even <PHR> elements.
<ABBR> corresponds with the general TEI element of the same name.
<HDORTH> corresponds with TEI <FORM TYPE="lemma"><ORTH>
<WORD> falls generally within the broader TEI element <MENTIONED>
<ADD> distorts the intended use of the TEI <ADD> element, but is formally equivalent.
<ME>, a sort of 'exception' to <ADD> is without TEI equivalent (?).