CME FAQ Sheet #4: .ae and .sgm

Author/Editor is not always the best tool for the job. You may, if you like, work with the raw .sgm file to accomplish particular ends. This is purely optional. And if you do, be careful:

  1. It is quite possible to do things to the .sgm file that will make it difficult or impossible to re-import it into A/E. It is quite possible to make global changes that inadvertently wipe out whole swatches of text. (I've done both of these things, more than once.)
  2. So make sure that you save, make backups under other names, whatever it takes to ensure that if you do something fatal, you can always recover from it by using an unaffected file.
  3. If you do move back and forth between the .ae and .sgm files, make sure that you don't create a "version" problem, where you make changes in one file that don't get incorporated in the other. The easiest way to do that is to treat the .ae file as the authoritative one, export an .sgm file to do some particular task (after saving the .ae), do what you need to, then reimport it into .ae again immediately.
  4. Bear in mind always that the .ae files are versions of the .sgm file that are "imported" into A/E's native binary format. So one "opens" and "saves" .ae files; but one "imports" and "exports" .sgm files.
  5. Bear in mind that A/E is notorious for introducing carriage returns into the .sgm files that it exports, often in the middle of tags (between the element name and the attribute name):

    <DIV2
    TYPE="chapter">

    This means that some searches and some replacements that you try may fail unless you restore some predictability to the location of the carriage returns.

Some reasons to work with the .sgm:

  1. Better searching, especially of attributes.
  2. Extraction. You can search for patterns and create a list of matches.
  3. Better find-and-replace.
  4. Better validation, using NSGMLS.
  5. Ability to run search-and-replace on a selected portion of the text instead of the whole thing (e.g. change all the <DIV2>s in this section to <DIV3>s).

Some available tools:

  1. TextPad : a good basic text editor with good support of regular expressions (pattern-matching) in both find and search-and-replace modes. Able to extract lists of matches using the "find-in-files" feature (ctrl-F5).
  2. Other basic editors, such as Windows NotePad. Most of these are weak in features, but they will allow you to (for example) cut out pieces of the DocType declaration that A/E won't allow you to touch. Make sure that whatever editor it is, it saves as plain (ASCII) text.
  3. Emacs : a venerable and powerful text editor with a very difficult interface, good regexp support, and available integration with an interactive parser (PSGML) and validator (NSGMLS).
  4. Perl. A scripting language with very powerful abilities to manipulate text.
  5. NSGMLS. A command-line validator that will often give more useful error messages than A/E does.

(3), (4) and (5) are not for the fainthearted, and are not installed on many (or any?) of the machines. If we get to a point where we need them, I'll try to install them. In the mean time, I will try to install at least TextPad on all the HTI machines.

Index