e-MED coding: the next plateau and a little beyond

1st draft
File last modified

e-MED coding: the next plateau

  =================================================================
  KEY
  
  ***  =  will be done, essential to functionality
  **   =  should probably be done, helpful to functionality
  *    =  should be done if easy, adds marginally to functionality
  ?    =  of dubious utility
  =================================================================

1. Additional tags

***	Part-of-speech abbreviations tagged on headwords.
`<FORM><ORTH>riban</ORTH> <POS>n.</POS> Also <ORTH>riban(n)e</ORTH>`
**	Part-of-speech abbreviations tagged in in-text cross-references.
`<NOTE>Cp. <HI REND="b">regard(e</HI> <POS>n.</POS></NOTE>`
?	Part-of-speech abbreviations tagged elsewhere.

***	Language terms (usually abbreviated) tagged in etymologies.
`<ETYM><LANG>OF</LANG> <HI REND="b">ruban</HI>, <LANG>AF</LANG> <HI REND="b">rubain</HI> & <LANG>ML</LANG> <HI REND="b">ribanus</HI>.</ETYM>`
*	Language terms (usually abbreviated) tagged in definitions.
`<DEF>In plant names: (a) <HI REND="b">~ crop</HI> [<LANG>OE</LANG> <HI REND="b">sta¯n-cropp</HI>], a plant of the genus Sedum.`
*	Language terms (usually abbreviated) tagged everywhere they appear.
`<Q>With fury face [<LANG>L</LANG> ignea facie].`
**	"Cp. notes" et sim. tagged as <NOTE>. (Most are now tagged as part of the form section).
`<FORM><ORTH>reward</ORTH> n. Also <ORTH>rewuard(e</ORTH>.</FORM> <NOTE>Cp. <HI REND="b">regard(e</HI> <POS>n.</POS></NOTE>`
**	Internal cross-references tagged and linkable to headwords (?and individual senses)
`<NOTE>Cp. <REF TARGET="012345"><HI REND="b">regard(e</HI> <POS>n.</POS></REF></NOTE>`
**	Zoological/botanical taxonomic names tagged as such.
`<DEF>The castor-oil plant <TAX>Ricinus communis</TAX>.</DEF>`
	The rendition should default to parenthesised; but a rend attribute with "nonpar" would allow the tag to be used elsewhere:
	`<DEF>An aromatic plant of the <TAX REND="nonpar">Valerianaceae</TAX> family, esp. <TAX REND="nonpar">Nardostachys jatamansi</TAX>, used as a perfume.` `<DEF>A plant of the genus <TAX REND="nonpar">Sedum</TAX>, either stonecrop <TAX>S. acre</TAX> or rock stonecrop <TAX>S. reflexum</TAX>.`
***	Bolded etyma tagged logically as etyma. Alternatively:
`<ETYM><LANG>OF</LANG> <ETYMON>ruban</ETYMON>` `<ETYM><LANG>OF</LANG> <MENTIONED LANG="OF">ruban</MENTIONED>`
**	Labels in form section tagged as <LBL>
*	Specific kinds of labels separately tagged within <LBL>, e.g. <DIAL> or <ERA> (Prefer this to TYPE attribute on <LBL>)
`Also <ORTH>rid(de</ORTH>, <LBL><DIAL>SEM</DIAL></LBL> <ORTH>redden</ORTH> & <LBL><ERA>early</ERA></LBL> <ORTH>rudde</ORTH>; ppl. <LBL>in names</LBL> <ORTH>red(de-</ORTH>, <LBL>?error</LBL> <ORTH>rodde-</ORTH>.</FORM>`
	Addition of a REND attribute would allow the use of the LBL element for non-parenthesized labels, e.g. grammatical labels in form sections.
	`<ORTH>rudde</ORTH>; <LBL REND="nonpar">ppl.<LBL> <LBL>in names</LBL> <ORTH>red(de-</ORTH>,`
**	Bolded phrases and compounds tagged (?as <PHRASE>) within <DEF>. It is probably not useful or practicable at this point to try to bind cited phrases to their associated definitions, in the manner of the TEI <RE> (related-entry) element; but the addition of a grouping element containing the phrase and the definition would allow for that.
`<PHRASE>~ abouten</PHRASE>, clear a space of enemies; also, clear (an area) of enemies around (oneself).`
*	Head-form given separate tag (<HDORTH>?).
`<FORM><HDORTH>ridden</HDORTH> <POS>v.</POS> Also <ORTH>rid(de</ORTH>`
?	Other abbreviations tagged invisibly as <ABBR>; an excessive example:
`<ETYM>OF, <ABBR EXP="ultimately">ult.</ABBR> <LANG EXP="aramaic">Aram.</LANG>; <ABBR EXP="perhaps">perh.</ABBR> <ABBR EXP="originally">orig.</ABBR> identical with <HI REND="b">fit</HI> n.; <ABBR EXP="compare">cp.</ABBR> <HI REND="b">finesse</HI> <POS EXP="noun">n.</POS></ETYM>`
**	Citations of ME words (generally corresponding to a MED headword, sometimes to an inflected or variant form) currently tagged merely as bold. This excludes <ORTH> elements (already separatedly tagged) and <ETYMON> elements (as described above), but includes explicit and implicit internal cross-references; perhaps also the components of phrases and compounds.
`<NOTE>Cp. <WORD>irnen</WORD> <POS>v.</POS>.` `<DEF>Also with <WORD>to</WORD> phrases.`
*	Citations of non-ME words currently tagged merely as bold.
`<DEF> ... (c) glossing <LANG>L</LANG> <WORD LANG="L">arenosus</WORD>`
?	A simple mechanism, short of full logical tagging, for dealing with the bracketed material inserted into MED quotations so as to exclude portions of it from indexing and retrieval as ME. Ideally, it should be forwards-compatible with such more elaborate encoding as may be added later, including identification of MS abbreviations and translation-source languages and texts. Perhaps <ADD> could be as equivalent to the MED's brackets; with <ME> used within <ADD> as a way of restoring portions of the added material to searchability. In this example, "read, Hrl, L intende, God" would not be retrievable as ME.
`<Q>Gyf entert <ADD>read: <ME>entent</ME>; <WB(2)> <ME>herketh wel</ME>; <LANG>L</LANG> intende</ADD> to hym <ADD>God</ADD>.</Q>`
?	References to the entry's head-word tagged as such: especially references in phrases (tagged mostly as the EMPTY element <oREF>) and in quotations (tagged as the #PCDATA element <oVAR>).
`<DEF><PHRASE EXP="yeven gost">yeven <oREF></PHRASE>, to die. <Q>Geofroi yef hys <oVAR>gast</oVAR>.</Q>`

N.B. <POS>, <LANG>, and <NOTE> are already used in current MED production.

2. Additional attributes


*** ID attribute on <ENTRYFREE> to allow other texts to link to it.

*   RID or TARGET attribute on <ETYMON> to allow it to link to other dictionaries.
**  LANG attribute on <ETYMON>.

*   RID or TARGET attribute on <WORD> to allow it to link to referenced headword/entry.
**  LANG attribute on <WORD> for non-ME words.

?   TYPE (=head, variant) attribute on <ETYMON> to distinguish between 
    other dictionaries' headwords and variant forms. (Or use the TEI <DISTINCT> element).

    EXP attribute on most elements likely to contain abbreviated text, e.g.:

*** <LANG>        <LANG EXP="Old Slavonic">OSl.</LANG>
**  <ABBR>        <ABBR EXP="superlative">sup.</ABBR>
**  <DIAL>        <LBL>early <DIAL EXP="Kentish">K</DIAL></LBL>
**  <USG>         <USG EXP="numismatics">num.</USG>
*** <ORTH>        <ORTH EXP="arthetik,arthretik">arth(r)etik</ORTH>
*   <PHRASE>      <PHRASE EXP="don askinge,yeven askinge">don (yeven) ~</PHRASE>

**  REND attribute on <TAX> element, as above.
**  REND attribute on <LBL> element, as above.

3. Enlarged use of current tags

***  Non-italicised usage labels tagged as 
     (this is already done, somewhat irregularly, in current MED production)

?    Subsenses or phrasal definitions tagged by nested  tags.
     (this is partly done in current MED prod. with respect to lettered
     subsenses) 


4. Expansions

*** FORMS> Given the uncertainty of expanding forms in the text itself, consider
the virtues of expansion (without diacritics) as attribute values rather than literal text. Though at the expense of 
adding to the bulk of the text, and at the risk of encountering attribute limitations, this has three notable advantages, viz.


it preserves the compact display of the print MED.
it allows for progressive (layered) expansion of the forms, from purely automatic to wholly manual.
it allows for erroneous expansion of the forms (the original text is still there; the expansion is invisible and results only in an erroneous "hit" from the search).

<FORM><HDORTH>everiwhere</HDORTH> <POS>adv.</POS> Also <ORTH EXP="everiwhore">-iwhore</ORTH>, <ORTH EXP="everaiquare">-aiquare</ORTH>, <ORTH EXP="everilkquar,everilkquare">everilkquar(e</ORTH>.</FORM>



** SQUIGGLES
(swung dashes) Probably prefer to expand these as literal text. This can probably
                be done without serious risk of inaccuracy, and makes the
                MED locally (e.g. on a single screen) more intelligible
                without making it significantly less succinct.Alternatively, convert these to the TEI's <oREF> element with the expanded form preserved as an EXP attribute value. This has the minimal advantage of preserving the "squiggle" but with little else to be said for it.

 


** PHRASES Consider expanding at least some phrases and compounds, perh. as
                attribute value, for the same reasons as with forms, above.

<PHRASE EXP="gret in expense, large in expense">gret (large) in <oREF EXP="expense"></PHRASE>, generous.



** ABBREVIATIONS Since the expansion is usually unambiguous, most abbreviations (including material tagged as , , , , , and ) can be unambiguously expanded, expansion is probably best handled in an external file rather than in the text or tag. Since some abbreviations may require local expansion, allow all such elements an EXP (expansion) attribute.


[see example under <ABBR> above, section 1.]





5. TEI equivalence/equivalents
<POS>: is roughly equivalent to the TEI element of the same name, differing chiefly to the extent that we have
enlarged the content model of <FORM> to correspond to the MED's "form section." POS can therefore appear within the 
MED's FORM but not within TEI's. POS can similarly appear within the MED's NOTE, ETYM, and DEF, but not the TEI's.
MED:  <FORM><ORTH>blog</ORTH> <POS EXP="noun">n.</POS>1.</FORM>
TEI:  <FORM><ORTH>blog</ORTH></FORM><GRAMGRP><POS>n.</POS><LBL>(1)</LBL></GRAMGRP>
  or: <FORM><ORTH>blog</ORTH><GRAM TYPE="pos">n.</GRAM><LBL>(1)</LBL></FORM> (non-preferred)

Note that the <LBL>(1)</LBL> is better handled by the TEI's "key" attribute (on entries)
The TEI "expand" attribute on <POS> corresponds to the MED's "exp".
<LANG>, <oVAR>, <oREF>, <REF> seem to correspond exactly with the TEI elements of the same names.
<NOTE> (aside from the POS problem) falls within the general TEI tag of the same name.
<TAX> corresponds with any number of possible TEI equivalents, such as <FOREIGN LANG="taxon" REND="par">.
<ETYMON> corresponds roughly with a number of possible TEI equivalents, such as <MENTIONED>, though that lacks the TYPE attribute.
<LBL> does not exceed the bounds of the very general TEI element of the same name.
<PHRASE> corresponds partly with the <FORM>, <RE>, and even <PHR> elements.
<ABBR> corresponds with the general TEI element of the same name.
<HDORTH> corresponds with TEI <FORM TYPE="lemma"><ORTH>
<WORD> falls generally within the broader TEI element <MENTIONED>
<ADD> distorts the intended use of the TEI <ADD> element, but is formally equivalent.
<ME>, a sort of 'exception' to <ADD> is without TEI equivalent (?).

***	FORMS>	Given the uncertainty of expanding forms in the text itself, consider the virtues of expansion (without diacritics) as attribute values rather than literal text. Though at the expense of adding to the bulk of the text, and at the risk of encountering attribute limitations, this has three notable advantages, viz. it preserves the compact display of the print MED. it allows for progressive (layered) expansion of the forms, from purely automatic to wholly manual. it allows for erroneous expansion of the forms (the original text is still there; the expansion is invisible and results only in an erroneous "hit" from the search).
`<FORM><HDORTH>everiwhere</HDORTH> <POS>adv.</POS> Also <ORTH EXP="everiwhore">-iwhore</ORTH>, <ORTH EXP="everaiquare">-aiquare</ORTH>, <ORTH EXP="everilkquar,everilkquare">everilkquar(e</ORTH>.</FORM>`
**	SQUIGGLES (swung dashes)	Probably prefer to expand these as literal text. This can probably be done without serious risk of inaccuracy, and makes the MED locally (e.g. on a single screen) more intelligible without making it significantly less succinct. Alternatively, convert these to the TEI's <oREF> element with the expanded form preserved as an EXP attribute value. This has the minimal advantage of preserving the "squiggle" but with little else to be said for it.

**	PHRASES	Consider expanding at least some phrases and compounds, perh. as attribute value, for the same reasons as with forms, above.
`<PHRASE EXP="gret in expense, large in expense">gret (large) in <oREF EXP="expense"></PHRASE>, generous.`
**	ABBREVIATIONS	Since the expansion is usually unambiguous, most abbreviations (including material tagged as , , , , , and ) can be unambiguously expanded, expansion is probably best handled in an external file rather than in the text or tag. Since some abbreviations may require local expansion, allow all such elements an EXP (expansion) attribute.
`[see example under <ABBR> above, section 1.]`