MEC Wish List

draft 3 / 4 June 1999

  1. Update indexing and search interface to make use of new tags when and if they are added to e-MED. Those with stars below have been tagged in MED; those without have not yet been, and appear in order of decreasing priority:

    1. *Add LANGUAGE index to boolean entry-search. (searches LANG tags in <ETYM>; probably not also within <DEF>).

    2. *Add part-of-speech index to boolean entry-search (perhaps also as limiter to lookups); searches <POS> within <FORM>.

    3. Add FORM-LABEL index to boolean entry-search: searches <LBL> (including <DIAL> and <ERA>) within <FORM>). <LBL> will be used to tag the MED's own labels within form sections, e.g. "(early infl.)" "(chiefly SWM)" etc.

    4. Add ETYMON index to boolean entry-search (searches <MENTIONED>, <DISTINCT> or <ETY> elements in <ETYM>, whichever, if any, end up being used).

    5. Add phrase index to boolean entry-search: searches <PHRASE> within <DEF>.

    6. Exclude non-ME material from <Q> searches. See plateau.dtd comments re: <ADDED> and <ME>. Some quots. have a lot of modern English in them: [A pair of] tongis [and three socks]. And of course many have non-ME formulas [?read: ...] [L: ...] [<TITLE>WB(2):</TITLE> ...] etc. Ideally, these should not be searchable as ME using the quotation search.

  2. Revise MED searches

    1. Quotation searches should be revised:

      1. to drop "bibl.ref. and quot." search;
      2. to add separate combinable (boolean AND) searches of <Q> <DATE> <STNCL> <MS> (or at least of <Q> and <STNCL>).

    2. Make use of data from the HB to generate hidden input to MED searches. This is essential to fulfilling our promise to integrate HB and MED; e.g.:

      1. allow MED search by MS dialect, modern title, or IMEV number, etc.
      2. use HB MSLIB lookup to search MED for MS abbreviations, as with revised HB MS search (see C.1 below).

    3. Improve MED stencil search(es) by means of separate index of tag-stripped stencils with byte-offset pointers back to the MED.

    4. Add an alternative form of entries-search that searches for the co-occurrence of various features within the same sense (not within the same entry). Current tagging supports restriction only to the same numbered sense, but we should think about how to restrict searches even to the same lettered subsense. If we cannot support both sense-search and entry-search, perhaps even prefer sense search.

    5. Increase number of Boolean boxes in entry searches, preferably by breaking down the searchable features into two groups: (1) those that occur within <FORM> <ETYM> or <DEF>; and (2) those that occur within <EG>. Three boolean boxes would be allowed to each group, with the option of combining results of each group with a invariant Boolean "AND".

    6. Note: the following crude mockup is designed only to illustrate the preferred logical structure of the search, not to suggest how it should be implemented; even at the former task, it is flawed, since the 'in senses' radio-button option apparently allows one to search for features 'within senses' that do not in fact exist within senses (e.g., etymologies, forms). It is debatable how useful this would be even if correctly expressed (i.e., search for the coexistence within a sense of features A and B, limited to senses that fall within entries that also contain feature 'C').

      Boolean searching of entries:

      (Proximity searching within definitions is also available).

      Search for entries that include (within the same entry the same sense:)

      (
      Within:

      Within: )

      Within:
        AND  
      (
      Within:

      Within: )

      Within:

      For sets larger than 100 results, view:

    7. Strengthen ability of quotation (and CME) searches to cope with orthographic variety. E.g.:

      1. Build in the commonest orthographic variants to create a 'fuzzy search' option. Example:

        searching for "liverous" automatically searches also for leuerous, lywerous, leuereus(e, lyvereus(e, etc.

      2. Make use of variant spelling listed by MED to generate a different kind of fuzzy search option. Example:

        searching for "tortous skin" automatically searches for all phrases whose elements appear as co-variants with tortous and skin in the MED.

                  tortous              skin
                  turtu                skinne
                  tortu                sckin
                  tortus               scinne
                  tortuse              scin
                  tortuce              schin
                  tortouse             shine
                  tortois       x      chin
                  tortes               skene
                  tortuge              skijn
                  turtuse              kyn
                  tortuces             kin
                  cortucis             skins
                                       skinse
                                       chinne
                                       etc.
        

      3. Use OED-MED headword crossindex list together with MED variants lists to provide modern english searches. Example:

        search "tortoise skin" (selecting tortoise sb. and skin sb.), which = MED "tortouse skin" + variants.

      4. (a) combined with (b); this would perhaps be the best, but (a) would have to be run first, in order to defeat the normalization used in MED form sections. I.e., if you searched for "skyn", you would need the "a" mechanism to generate "skin"; the latter would be found among the MED variant spellings; the former would not, since MED always normalizes vocalic "y" to "i" except when quoting forms that it regards as errors.

    8. Add <DATE> search to entry searches.

    9. Improve date and date-range searches of MED (quotation and perhaps entry search(es), perhaps through addition of simplified date attribute(s) on <DATE> element, either a general search that searches both comp date and ms date, or separate ms and comp-date searches.

    10. Allow limitation of MED entry searches particular ranges of quotations, e.g. search for ENTRIES in which chaucer appears as author in
      • first (first or second) quot. in word
      • first (first of second) quot. in a sense

    11. ?increase range of MED author search by using the <AUTHOR> element of the HB. It may be that this would not add enough authors to be worth worrying about.

  3. Revise HB searches

    1. Retain existing MS/shelfmark search but add access to the HB via a string search of the MSLIB (including the "MS" attribute of the MSFULL element. Insert interim results page showing the <MSFULL>s that matched. E.g. search for "sim" and get:
              SIM         London, British Library, Additional 22283 (Simeon)
              SIMPSON     Privately owned
      
      Or search for "add" and get:
              ADD	London, British Library, Additional
              BODADD	Oxford, Bodleian Library, Add.
              CMBADD	Cambridge, University Library, Additional
              FIL	London, British Library, Additional 37492 (<I>olim</I> Fillingham)
              PCY	London, British Library, Additional 27879 (Percy)
              SIM	London, British Library, Additional 22283 (Simeon)
              WHT	London, British Library, Additional 39574 (<I>olim</I> Wheatley)
      
      Click on one of these (?or select one or more of these with a check box) to retrieve entries with that MS or a MS from that repository. In fact, perhaps you would need another interim results page listing the full ABBRs that begin with the selected term(s); you could then select from those results to go to entries. E.g., selecting "CmbAdd" above would generate this page:
              CmbAdd 43
              CmbAdd 2830
              CmbAdd 3039
              CmbAdd 3042
              CmbAdd 3042:Lind.
              CmbAdd 3137
              CmbAdd 4407
              CmbAdd 5943
              CmbAdd 6681
              CmbAdd 6864
              CmbAdd 7350
      
      Then clicking on CmbAdd 4407 would yield these entries:
              Havelok the Dane
              In þis werd (incipit)
              The Proverbs of Hendyng
      
  4. Revise MED/HB display

    1. Sort and display quotation-search results in either date or stencil order, not headword order (the current order seems to be by headword within each group of (100) hits, which implies that the sort is done subsequent to the grouping; ideally, though perhaps impractically, the sort (by date or stencil) should be done prior to the grouping into hundreds.)

    2. Add "page turner" to MED and HB interfaces
      ( <-previous entry | next entry-> )

    3. Add to "bare-bones" (quoteless) display of MED entries the date of the first and last non-bracketed quots. in each sense; also perhaps the first and last non-bracketed quots. in the entry.

    4. Add (optional) context-sensitive help in frames (generated by links from particular items on search and results pages). At least, add link to list of <USE> labels.

    5. Add mouseovers:
      1. to selected abbreviations, supplying expansion from attribute ?or from abbreviation list.
      2. to stencils, supplying modern title, either from attribute or (better) directly from HB.


  5. Revise MED text (worst of print artifacts removed)

    1. Replace Ibid.'s with immediately preceding stencil *unless* preceding stencil was result of disambiguation process or Ibid. is a complex Ibid (an Ibid. with additional info attached, e.g. different MS, date, or bibl. src.), in which case manual intervention is needed to select the correct replacement.

    2. Re-order quots. according to MED scheme.

    3. Replace "~" in phrases/cpds. with appropriate headword.

    4. Replace short contextual references in etymology:

      1. Replace "From preceding" "From next" with headword referred to.
      2. Replace "From the adj." "From the n." with headword referred to.
      3. ?Replace [OF] or [L] with full reference.

    5. Replace "~" in <FORM> where feasible with referrent.

  6. Expand MED contractions (in most cases probably as searchable attributes with suppressed diacritics (if applicable), not as literal text)

    1. Expand head <ORTH>s, perh. as separate element (<HDORTH>)

    2. Expand other <ORTH>s

    3. ? Expand phrases/cpds.

  7. Revise MED encoding

    1. Add IDs on headwords ?and senses.

    2. Recognize <NOTE> (or <XR>) and <ETYM> elements as containing cross-references; convert internal cross-references in MED into links, at least to relevant headword if not relevant sense.

    3. ?Add RIDs to etyma.


pfs:4June99