A. Validate 1. Run NSGMLS on the received file. 2. May need to add character entities to the entities file (propagate additions to all copies). (see any Readme.doc that may have accompanied the file) AA. Sample Note: this procedure will require some modification if there is a large number of unnumbered pages, or if there are multiple sets of pagination that overlap. (1) Get list of page numbers: 1. Extract list of page-break tags from the .sgm file (Using TextPad, search-in-files, specifying file name, search for ]*> file-type=binary, regular expression, all matching lines; copy the list to another file; remove the line-number info [replace ^[^:]: with nil]) 2. Supply artificial page numbers for tags without "N" values, using decimal values (770.1, etc.) 3. Reduce to bare numbers (replace to nil). 4. Save list of page numbers to a file ("pages.txt") (2) Use random number generator to generate sample 5. Open Excel, import the text file. 6. Note the number of cells occupied by the list, i.e. the total number of pages. Calculate the number of samples needed (total pages x 0.05). Use the Excel "sampling" option (under Tools/Data analysis) to generate a random sample containing the necessary number of pages. Sort it. Look for duplicates. Replace duplicates with arbitrarily selected pages from the original set. (3) Copy sample pages from .sgm file 7. Open the .sgm file in a text editor. 8. Replace xxx.test.html 13. copy \mecode\perl\cmeA\striptag.pl perl striptag.pl xxx.test.sgm > xxx.stripped.txt (this creates a "stripped" version in order to establish a character count against which to evaluate transcriptional accuracy.) 14. Load the html file in a browswer; if there are obvious uglies or inadequacies, revise the perl script and/or make specific changes in the .html file, e.g. adding missing closing tags. (Look especially for characters that do not display satisfactorily.) Otherwise, print the file. (5) Proofread the print against the original, noting errors or omissions of transcription. B.