Middle English Compendium logo

How to QC Map Files

File modified

For other MEC files, see the  MEC INDEX


1. Check map files:

   a. add doctype and validate.

   b. extract list of IDs. Make sure that none 
      has been duplicated in the file.

   c. check muds superficially:

       search for <mud>.*\.</AUTHOR>{space}
                  <mud>.*[^.]</AUTHOR>[^ ]
                  <mud>.*\.</TITLE>{space}
                  <mud>.*[^.]</TITLE>[^ <]
                  <mud>.*[^.]</TITLE><MS
                  <MS></MS>
                  <mud>.*/DATE>[^ ]
       get rid of some extraneous blank lines
       by replacing </med>\n\n with </med>\n
       
       get rid of others by removing comments <!--.*-->
       and then replacing \n\n\n with \n\n repeatedly


2. Prepare bib files for indexing:

   a. move to a temp directory
   b. validate
   c. normalize
   d. reentrify (using reentry.pl or reenter.postid.sgm,
      depending on whether the file has been merged before.
      Use ch-entry.pl on chaucer (c2) file: this will also
      work in place of reentry.pl on the other non-ided files)
   e. check recent files for proper TYPE="DOC" attributes
   f. extract all the notes from the bib files by running
      ex-notes.pl >> notes.date.sgm
      
        clean it up (remove <SG></SG> and perhaps
           also </SG><SG>)
        attach doctype 
           <!DOCTYPE notes PUBLIC "-//MERGE//DTD mebnotes 1.0//EN">
        add <NOTES> at head and </NOTES> at foot
        validate
        display in Panorama and print out
        
   g. attach current mslib and doctype to all files and validate
   h. collect missing MS refs and add them to current mslib or
      otherwise resolve the problem (check implications for map file/MED)
   i. reattach new mslib and doctype
   j. revalidate
    
3. Compare stencils from map files(s) and bib files.

   a. extract stencils from both sets thus:

      perl ex-stencil.all.pl mapfile.take?.sgm >> sts.mapped.txt
      perl ex-stencil.all.pl ??.sgm >> sts.bibbed.txt
      perl ex-stencil.all.pl ??.id.sgm >> sts.bibbed.txt

      or some such.

    c. open sts.bibbed.txt and sts.mapped.txt in text editor. Cp. numbers
       of lines. If sim., remove IDs from bibbed stencils

       (replace <STENCIL[^>]+> with <STENCIL> in textpad)

    d. upload both files to dns /work/pfs/merge or your equivalent
       directory

    e. create lists of stencils unique to map files and bib files:

A. Sort files

dns:merge % sort ss.bibbed.txt > ss.bibbed.sort
dns:merge % sort ss.mapped.txt > ss.mapped.sort

B. Look for duplicates within each file

dns:merge % uniq -d ss.bibbed.sort
dns:merge % uniq -d ss.mapped.sort

C. Resolve any duplicates; re-upload, or re-sort using uniq switch

dns:merge% sort -u ss.bibbed.sort > ss.bibbed.srt
dns:merge% sort -u ss.mapped.sort > ss.mapped.srt

D. Check for stencils unique to one file or the other

dns:merge % comm -23 ss.bibbed.srt ss.mapped.srt > solely.bibbed
dns:merge % comm -13 ss.bibbed.srt ss.mapped.srt > solely.mapped
dns:merge % cat solely.bibbed
 ...
dns:merge % cat solely.mapped
 ...

4. Resolve differences, if any, between bib files and map files.

5. revalidate map files. revalidate bib files.

6. Hand both over to Nigel.


pfs/April99 rev. Oct.99