( Title: Backus-Naur Automation File: bnaut.fs Author: David N. Williams Version: 0.5.6 License: LGPL Last revision: August 23, 2008 ) \ Copyright (C) 1996, 2002, 2004, 2006-2008 by David N. Williams ( This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or at your option any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. If you take advantage of the option in the LGPL to put a particular version of this library under the GPL, the author would regard it as polite if you would put any direct modifications under the LGPL as well, and include a copy of this request near the beginning of the modified library source. A "direct modification" is one that enhances or extends the library in line with its original concept, as opposed to developing a distinct application or library which might use it. ) \ *** PREAMBLE ( This text translator implements the Backus-Naur form [bnf] to Forth code scheme discussed in expr.html. The syntax for generating examples of the code templates in the reference above is illustrated in the EXAMPLES section at the end of this file. Uncomment that section and run this file to output the code to the file bnaut.out. The generator includes full nesting of bnf forms, and a fair amount of error reporting. It is written for version 0.7.3 of either the pfe DSTRINGS-EXT environment [Dynamic-Strings word set], or its ANS Forth implementation, dstrings.fs. There is an environmental dependence on case sensitivity, [UNDEFINED], and one-character line ends. While the generator itself is ANS compatible, the generated code is not, because it uses Wil Baden's COND construction with short circuit iterated IF's or NIF's, along with our ELSES, inspired by his THENS. It also uses NUNTIL. See expr.html for pfe and gforth implementations of COND and ELSES. NIF and NUNTIL are trivially implemented in ANS Forth -- see the GENERAL UTILITIES section of this file. Because of the one-shot nature of a translation, this code does not strive to avoid possibly redundant string copying. ) decimal \ true constant PFE [UNDEFINED] PFE [IF] false constant PFE [THEN] PFE [IF] loadm dstrings loadm parsing loadm execution loadm xstacks requires xstacks-pfe.fs cr .( Using pfe dstrings.) [ELSE] \ ANS Forth s" parsing.fs" included s" dstrings.fs" included true constant XSTACK-DEFINING-WORDS s" xstacks.fs" included cr .( Using ANS dstrings.fs.) [THEN] \ multiline comment separator : --- ( -- ) s" ---" |s|-seek-instream drop ; immediate \ *** PUBLIC WORDS --- OPEN-OUTSTREAM CLOSE-OUTSTREAM _\ _" _\N _HEAD _HEAD" _TAIL _TAIL\N SET-NORMAL-SPAT SET-UPTO-SPAT { }ORS }ANDS }ANDS&UPTO }ANY-UPTO }1-UPTO }N-UPTO }MERGE --- \ *** USER PARAMETERS \ These can be predefined before loading this file: [UNDEFINED] MAX-LEVELS [IF] 8 constant MAX-LEVELS \ template and merge nesting [THEN] [UNDEFINED] /LINES-BUF [IF] 1 cells 500 + constant /LINES-BUF \ see INDENTATION section [THEN] \ These can be changed after loading this file: 2 value MAX-UNROLL \ see }N-UPTO 2 value LEAD-INDENT \ first line of template code 2 value HANG-INDENT \ extra for argument lines \ *** TABLE OF CONTENTS --- 1 TERMINOLOGY AND NOTATION 2 BACKTRACKING 3 GENERAL UTILITIES 4 STRING UTILITIES 5 INDENTATION 6 ARGUMENTS 7 OUTPUT 8 BACKUS-NAUR TEMPLATES 8.1 NORMAL GENERATORS 8.2 UPTO GENERATORS 8.3 UPTO MERGES 9 EXAMPLES --- \ *** 1 TERMINOLOGY AND NOTATION --- fstring: An ANS Forth string represented on the data stack as ( addr len). "s", or ".s" suffix: Short for the ( addr len) pair of an fstring on the data stack. measured string, mstring: A string stored in memory as a cell-sized count followed by a sequence of characters. dynamic string, dstring: A string in a dynamic string buffer handled by the Dynamic-Strings word set. It is bound either to the dynamic string stack or to a dynamic string variable, and is stored in memory as a measured string. "+" suffix on words: Does string concatenation into the dynamic string buffer. "$": Indicates a dstring or dstring operation. --- \ *** 2 BACKTRACKING --- The application needs to supply customized versions of the backtracking words SAVE-SEM ("save semantics"), 'TIS, and 'TAINT. In the simplest case, where the bnf's are used only to recognize syntax with no semantic accumulation, they can be defined like this: : save-sem ( s -- s s ) 2dup ; : 'tis ( s after.s -- after.s true ) 2swap 2drop true ; : 'taint ( s after.s -- s false ) 2drop false ; In any case, the stack patterns must be those above. See Sections 5.1 and 5.3 of expr.html for more discussion of backtracking. --- \ *** 3 GENERAL UTILITIES 0 [IF] \ DEBUGGING [UNDEFINED] \\ [IF] : \\ -1 parse 2drop BEGIN refill 0= UNTIL ; [THEN] : .# ( n -- ) cr ." GOT TO #" . ; [THEN] [UNDEFINED] rdrop [IF] : rdrop ( r: x -- ) postpone r> postpone drop ; immediate [THEN] [UNDEFINED] NIF [IF] : NIF ( -- ) postpone 0= postpone IF ; immediate \ : NUNTIL ( -- ) postpone 0= postpone UNTIL ; immediate [THEN] \n$ $s> over c@ constant eol 2constant \n-s \ *** 4 STRING UTILITIES : unum>s ( u -- s ) --- Leave the unsigned number as an fstring. --- 0 <# #s #> ; \ : $+ ( -- ) cat ; \ : s+ ( s -- ) s-cat ; \ : +" ( -- ) postpone cat" ; immediate : unum+ ( u -- ) unum>s s+ ; : space+ ( -- ) s" " s+ ; : spaces+ ( u -- ) ( u) 0 ?DO space+ LOOP ; : \n+ ( -- ) \n$ $+ ; \ *** 5 INDENTATION --- We have found indentation to be tricky and very time-consuming to code and debug. To track nesting visually in the output text, we think it essential that each argument start its own line, with the same indentation at a given level of nesting, and deeper indentation for deeper levels. The first line generated for a form does not contain arguments, but serves as a visual delimiter for the form, with subsequent argument lines indented further. When forms are nested, a deeper form generates a composite argument for the next outward level. The first line of the deeper form has the same indentation as the arguments at the next outward level. For a few special forms, the first argument line would be the first line. In those cases we delimit the form by supplying an empty "\" comment line as the first line, indented like the first lines of more ordinary forms. We don't think it matters if this scheme conflicts with some indentation policy for control structures, because they inevitably get too complicated to follow anyway, a big motivation for automatic source code generation in the first place. In particular, we concatenate control structure cleanup code at the end of a form onto its last argument line. This exacerbates the usual problem of long lines with several levels of nesting and indentation. The burden is put on the user to hand edit the output to break lines that are too long. --- : lead-indent+ ( -- ) LEAD-INDENT spaces+ ; : separate-line ( s -- after.s line.s true | s false ) --- Scan s for the next eol character. If found, leave true and the first part of the string, including the eol as line.s, and the rest after.s. If not found, leave false and the input string. This implementation assumes the Forth eol sequence to be one byte only. --- eol separate ( after.s before.s true | s false ) IF 1+ true ELSE false THEN ; create lines-buf /LINES-BUF allot \ for multiline indentation : indent-lines+ ( u -- ) --- The mstring in LINES-BUF is assumed to consist of one or more text lines, the last of which has no eol character. Concatenate the string line by line, inserting n blanks ahead of each, unless it is empty or just the eol character. --- ( u) >r lines-buf mcount ( s) BEGIN ( s) separate-line ( after.s line.s true | s false) WHILE ( len) dup 1 = NIF r@ ( u) spaces+ THEN ( line.s) s+ REPEAT ( len) dup IF r@ ( u) spaces+ THEN s+ rdrop ; : composite? ( -- flag ) --- If the mstring in LINES-BUF contains an eol character, leave a nonzero flag, else leave zero. --- lines-buf mcount eol scan ( addr len) nip ; : merged? ( -- flag ) --- The mstring in LINES-BUF is the immediate result of argument merging iff it is terminated by an eol character. If it is, leave true and remove the terminating eol; else leave false. --- lines-buf mcount end-c@ eol = dup IF -1 lines-buf +! THEN ; : lth-arg ( l $: [$frame] i*x$ -- [$frame] i*x$ argl$ ) --- Leave the l-th argument in the topmost string frame, where l starts with zero for the deepest argument. --- #$args 1- swap - th-$arg ; : lth-indented+ ( l $: [$frame] i*x$ -- [$frame] i*x$ ) --- Concatenate the lth argument in the topmost string frame with extra indentation. If the argument is composite, it was generated at a next deeper level. Each of its lines already has leading indentation but each gets extra hanging indentation. If it is not composite or is merged, it gets both leading and hanging indentation, including all lines of a merged argument. --- lth-arg $s> lines-buf (m!) merged? 0= composite? and IF 0 ELSE LEAD-INDENT THEN HANG-INDENT + indent-lines+ ; \ *** 6 ARGUMENTS --- Nesting of syntax elements is handled by bracketed arguments with a dedicated string-depth stack. --- MAX-LEVELS make-xstack constant $depth-stack $depth-stack >x: push-$depth $depth-stack x>: pop-$depth $depth-stack xdepth: nest-depth \ nesting count for indentation : { ( -- ) --- Push the current string stack depth onto $DEPTH-STACK, so it can be retrieved by the next invocation of }$#. --- $depth push-$depth ; : }$# ( -- ) --- Pop $DEPTH-STACK and make a dynamic string frame out of the net number of items left on the string stack since the last invocation of {. --- $depth pop-$depth - $frame ; \ *** 7 OUTPUT 0 value outstream : open-outstream ( filename.s -- ) r/w create-file ( fid ior) throw to outstream ; : close-outstream ( -- ) outstream close-file ( ior) throw ; : _s ( s -- ) dup IF outstream write-file throw ELSE 2drop THEN ; : _$ ($: $ -- ) $s> _s ; : _\n ( -- ) \n-s _s ; : _unum ( n -- ) unum>s _s ; : _" ( -- ) --- Write the rest of the current line to the output stream. Note that no trailing quote mark is needed. --- source dup >r >in @ /string _s r> >in ! _\n ; : _\ ( -- ) --- Write the rest of the current line to the output stream as a comment. --- s" \ " _s _" ; : ?_$ ($: $ -- $ | ) --- Write the input string to the output stream iff there is no nesting of string arguments. --- nest-depth NIF _$ THEN ; $variable spat$ : set-normal-spat ( -- ) $" ( s -- after.s true | s false )" spat$ $! ; : set-upto-spat ( -- ) $" ( s -- after.s | s )" spat$ $! ; : spat+ ( -- ) spat$ $@ $+ ; : _head ( name.s -- ) $+" : " ( name.s) s+ spat+ \n+ ENDCAT _$ ; : _head" ( "name" -- ) [char] " parse>s _head ; : _tail ( -- ) $+" ;" \n+ ENDCAT _$ ; : _tail\n ( -- ) _tail _\n ; \ *** 8 BACKUS-NAUR TEMPLATES --- As described in expr.html, the code generators for bnf templates use two kinds of basic or composite syntax elements as arguments, "normal", labeled with b's, and "upto", labeled with u's. They have stack patterns: b <--> ( s -- after.s true | s false ) u <--> ( s -- after.s | s ) (both branches implicitly true) Code generator arguments are dstrings, either the names of Forth syntax-element words, or the text of Forth code that obeys one of the two stack patterns. Templates can be nested because the code they correspond to does obey the patterns. There are two special bnf "and" subsequences which do not require the full logic of a sequence of normal elements. They are built just by stringing together Forth code for syntax elements obeying the above stack patterns: u1 ... un u1 ... un b The pure upto sequence automatically produces an upto stack pattern. The trailing normal sequence may fail to produce a normal stack pattern when the b element fails, because of parsing motion by the u's, and needs extra backtracking logic. We call the first an "upto merge" and the second a "seminormal merge". We do not associate bnf templates with these, but there is a code generator, }MERGE, in the UPTO MERGES section, for using them as template arguments. We use "w" to label arguments that can be either normal or seminormal. We never use "or" sequences with upto elements. All of the argument frame terminating words below, those with "}" prefixes, assume that no dynamic string concatenation is in progress upon entry. --- : ?_CODE ($: -- arg$ | ) --- Used at the end of template code generators to terminate code concatenation for the current frame and either emit the code, with final line termination, if nesting is finished, or leave it as an argument, without line termination, for the next outer frame. --- drop-$frame ENDCAT nest-depth NIF _$ _\n THEN ; \ *** 8.1 NORMAL GENERATORS --- These words use normal, upto, or seminormal text arguments, as indicated, to produce the text of code that obeys the normal syntax element stack pattern: ( s -- after.s true | s false ) Such code text is called "normal text". --- : }ors ($: b1$ ... bn$ -- code$ | ) \ syntax: { b1$ ... bn$ }ors --- Produce normal text for an "or" sequence of normal elements. --- }$# #$args 2 < ABORT" ***Need 2 or more normal $args." lead-indent+ $+" COND" \n+ #$args 1- 0 ?DO i lth-indented+ $+" NIF" \n+ LOOP #$args 1- lth-indented+ $+" ELSES true THEN" ?_CODE ; : }ands ($: w1$ ... wn$ -- code$ | ) \ syntax: { w1$ ... wn$ }ands --- Produce normal text for an "and" sequence of normal elements. --- }$# #$args 1 < ABORT" ***Need 1 or more normal or seminormal $args." lead-indent+ $+" COND save-sem" \n+ #$args 1- 0 ?DO i lth-indented+ $+" IF" \n+ LOOP #$args 1- lth-indented+ $+" IF 'tis ELSES 'taint THEN" ?_CODE ; : }ands&upto ($: w1$ ... wn$ u$ -- code$ | ) \ syntax: { w1$ ... wn$ u$ }ands&upto --- Produce normal text for an "and" sequence of normal or seminormal elements with a trailing upto element. --- }$# #$args 2 < ABORT" ***Need at least 1 normal or seminormal followed by 1 upto $arg." lead-indent+ $+" COND save-sem" \n+ #$args 1- 0 DO i lth-indented+ $+" IF" \n+ LOOP #$args 1- lth-indented+ $+" 'tis ELSES 'taint THEN" ?_CODE ; \ *** 8.2 UPTO GENERATORS --- These words use normal text arguments to produce the text of code that obeys the upto syntax element stack pattern: ( s -- after.s | s ) We call such code text "upto text". --- : }any-upto ($: b$ -- code$ | ) \ syntax: { b$ }any-upto --- Produce upto text for zero or any number of occurrences of a normal syntax element. --- }$# #$args 1 <> ABORT" ***Need exactly 1 normal $arg." lead-indent+ $+" BEGIN " \n+ 0 lth-indented+ $+" NUNTIL" ?_CODE ; : }1-upto ($: b$ -- code$ | ) \ syntax: { b$ }1-upto --- Produce upto text for zero or one occurrence of a normal syntax element. --- }$# #$args 1 <> ABORT" ***Need exactly 1 normal $arg." lead-indent+ $+" \" \n+ 0 lth-indented+ $+" drop" ?_CODE ; : }n-upto ($: b$ -- code$ | ) ( n -- ) \ suggested syntax: n { b$ }n-upto --- Produce upto text for zero or up to n occurrences of a normal syntax element. If n <= MAX-UNROLL, the code loop will be generated as unrolled. --- }$# #$args 1 <> ABORT" ***Need exactly 1 normal $arg." ( n) dup 2 < ABORT" ***Numerical argument must be at least 2." ( n) dup MAX-UNROLL <= IF \ unrolled loop lead-indent+ $+" \" \n+ ( n) 1- 0 DO 0 lth-indented+ $+" drop" \n+ LOOP 0 lth-indented+ $+" drop" ELSE \ DO-loop lead-indent+ ( n) unum+ space+ 0 unum+ $+" DO" \n+ 0 lth-indented+ $+" drop LOOP" THEN ?_CODE ; \ *** 8.3 UPTO MERGES --- Merging is an optimization that simply strings "and" sequences of upto elements, or upto elements trailed by a single normal element, together into a single upto, respectively, seminormal argument string. Merges are not bnf templates in the same sense as normal and upto templates. They are to be used as direct template arguments, and their own direct arguments may be nested templates containing merged arguments. Merges should not be used as direct arguments of merges, nor is there any reason to do so. --- : #leading ( $: a$ -- s: u ) --- Leave the number of leading whitespace characters in a$. --- $s> dup >r bl skip nip r> swap - ; : unlead-lines+ ($: lines$ -- ) --- The input string is assumed to consist of one or more text lines separated by a single eol character, each of which has at least as many leading blanks as the first line, and the last of which has no eol character. Concatenate the string line by line, removing the number of leading blanks in the first line from each. --- $dup #leading ( u) >r $s> lines-buf (m!) lines-buf mcount ( s) BEGIN ( s) separate-line ( after.s line.s true | s false) WHILE ( len) dup 1 = NIF r@ ( u) /string THEN ( line.s) s+ REPEAT ( len) dup IF r@ ( u) /string THEN s+ rdrop ; : }merge ($: u1$ ... un$ | u1$ ... un$ b$ -- code$ ) \ syntax: { u1$ ... un$ }merge (produces upto argument) \ { u1$ ... un$ b$ }merge (produces seminormal argument) --- String the elements together as an "and" sequence separated by eol's without additional logic, for use as an argument in normal or upto templates. Leading indentation of lines in composite elements is removed first. Although the resulting code$ is to be used as an argument, it is terminated by an eol to distinguish it from a multiline template argument. That eol gets removed when code$ is processed as a template argument. The only legal syntaxes are the two above, but there is no check for that. --- }$# #$args 2 < ABORT" *** Need 2 or more $args." #$args 0 DO i lth-arg unlead-lines+ \n+ LOOP drop-$frame ENDCAT ; \ *** 9 EXAMPLES --- Uncomment this section and run this file to output the examples to the file bnaut.out. If the examples were to be processed from a separate file, bnaut.fs would have to be loaded before or at the beginning of the file. See aexpr.fs for a real example that generates code for the address expression bnf's in expr.html. Some sort of brace matching format convention is indispensible when there is any nesting. Below, the basic rule is that it should be totally clear which brace closes an opening brace. Sometimes this means using a vertical format for the arguments at a given level. --- 0 [IF] s" bnaut.out" open-outstream _\ UNNESTED BACKUS-NAUR FORMS _\n \ normal arguments $" b" $constant b$ $" b1" $constant b1$ $" b2" $constant b2$ $" b3" $constant b3$ $" b4" $constant b4$ \ upto arguments $" u" $constant u$ $" u1" $constant u1$ $" u2" $constant u2$ $" u3" $constant u3$ \ normal or seminormal arguments $" w" $constant w$ $" w1" $constant w1$ $" w2" $constant w2$ $" w3" $constant w3$ $" w4" $constant w4$ _\ NORMAL STACK PATTERN _\n set-normal-spat _head" w" { w$ }ands _tail\n _head" w1&w2" { w1$ w2$ }ands _tail\n _head" w1&w2&w3&w4" { w1$ w2$ w3$ w4$ }ands _tail\n _head" b1|b2" { b1$ b2$ }ors _tail\n _head" b1|b2|b3" { b1$ b2$ b3$ }ors _tail\n _head" w&u" { w$ u$ }ands&upto _tail\n _head" w1&w2&u" { w1$ w2$ u$ }ands&upto _tail\n _head" w1&w2&w3&u" { w1$ w2$ w3$ u$ }ands&upto _tail\n _\ UPTO STACK PATTERN _\ These usually appear as unnamed string arguments nested in _\ templates, not as colon definitions like those below. _\n set-upto-spat _head" {b}" { b$ }any-upto _tail\n _head" [b]" { b$ }1-upto _tail\n $" \ The value of MAX-UNROLL is " _$ MAX-UNROLL _unum $" ." _$ _\n _head" {2:b}" { b$ 2 }n-upto _tail\n _head" {3:b}" { b$ 3 }n-upto _tail\n _\ NESTED EXAMPLES _\n set-normal-spat _head" (w1&w2)|b" { { w1$ w2$ }ands b$ }ors _tail\n _head" ((w1&w2)|b)&w3" { { { w1$ w2$ }ands b$ }ors w3$ }ands _tail\n _head" b|(w1&w2)" { b$ { w1$ w2$ }ands }ors _tail\n _\ The last argument line here is long enough to be _\ a candidate for hand-editing. _head" w1&(b|(w2&w3))" { w1$ { b$ { w2$ w3$ }ands }ors }ands _tail\n _head" b1|(w1&w2)|b2" { b1$ { w1$ w2$ }ands b2$ }ors _tail\n _head" (b1|b2)&(b3|b4)" { { b1$ b2$ }ors { b3$ b4$ }ors }ands _tail\n _head" (b1|b2)&{b3}" { { b1$ b2$ }ors { b3$ }any-upto }ands&upto _tail\n _head" w&{b}" { w$ { b$ }any-upto }ands&upto _tail\n _head" w&[b]" { w$ { b$ }1-upto }ands&upto _tail\n _head" w&{2:b}" { w$ 2 { b$ }n-upto }ands&upto _tail\n _head" u1&u2&b1|b2" { { { u1$ u2$ b1$ }merge }ands b2$ }ors _tail\n _head" u&b&w" { { u$ b$ }merge w$ }ands _tail\n _head" u1&b&u2" { { u1$ b$ }merge u2$ }ands&upto _tail\n _head" u&{b1}&b2&w" { { u1$ { b1$ }any-upto b2$ }merge w$ }ands _tail\n _head" w1&w2&{b1}&u&[b2]" { w1$ w2$ { { b1$ }any-upto u$ { b2$ }1-upto }merge }ands&upto _tail\n _head" (u1&{w1&u2&u3}&w2)|b" { { { u1$ { { w1$ { u2$ u3$ }merge }ands&upto }any-upto w2$ }merge }ands b$ }ors _tail\n close-outstream [THEN]