CME FAQ Sheet #3: Page Breaks and Milestones

To check the presence and accuracy of page breaks, use the "Search/Find In Files" command (hotkey: Ctrl + F5) in TextPad.

Specifying the file and folder names, for example:

In files: *.sgm
In folder: C:\Work Docs\mecorp\text\aa\toproof

search for

<PB[^>]*>

with Text, Regular expression, All matching lines, and Binary files all checked.

The results will appear in a file of Search Results; these may be examined for missing numbers, duplications, etc. and any discrepancies may be checked against the original text and/or the Author/Editor file.

Similar searches may be made in order to examine the presence and accuracy of

<MILESTONE>s (<MILESTONE[^>]*>),
<DIV>s of various sizes (<DIV[^>]*>),
<LG>s (<LG[^>]*>), and
<L>s (<L[^>]*>).

In each case it is a useful tool for seeing anomalies, particularly in numbers and attributes.

NOTE:

As indicated in FAQ sheet #4, Author/Editor sometimes introduces unwanted carriage returns into the .sgm files that it exports, often in the middle of tags (between the element name and the attribute name):

<PB
N="103">

If such is the case, the above searches will fail to list those tags in the Search Results. In order to remove the Author/Editor placed carriage returns from the middle of tags, in TextPad, replace:

<\([^>]+\)\n\([^>]+\)>
with
<\1 \2>

Then run the "Search/Find In Files" again.

One further test for the presence of unwanted carriage returns is to use "Search/Find" (hotkey: F5) to find any lines not beginning with "<". Use ^[^<].

For now, consider page breaks as occurring at the "bottom" of each page. This means that the total number of <PB>s found will always be one less than the number of pages in the text.

Index