( Title: Dynamic-Strings Words File: dstrings.fs Author: David N. Williams Version: 0.7.6 License: LGPL Test file: dstrings-test.fs Log file: dstrings.log Last revision: November 20, 2008 ) \ Copyright (C) 2001-2004, 2006, 2008 David N. Williams ( This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or at your option any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. If you take advantage of the option in the LGPL to put a particular version of this library under the GPL, the author would regard it as polite if you would put any direct modifications under the LGPL as well, and include a copy of this request near the beginning of the modified library source. A "direct modification" is one that enhances or extends the library in line with its original concept, as opposed to developing a distinct application or library which might use it. This library is an ANS Forth implementation of our dynamic-strings word list, a direct port of our C implementation, version 0.6.29 of February 25, 2003, for pfe. This version is in sync with version 0.7.6 of the pfe external module dstrings-ext.c. ANS Forth with dependences: - case insensitivity - zero is not an address - Core word set - File-Access word set - Memory-Allocation word set - Locals word set - Programming-Tools word set - String word set - Extensions word sets for the above The intent is that this code should be character clean. NOTATION AND TERMINOLOGY The number of address units for n characters is c[n], the result of CHARS. In stack and other comments, "s" or "something.s" stands for an ANS Forth string [c-addr len], where len is the number of characters in the string body at c-addr. An ANS Forth string is called an "fstring". We say that addr is the beginning of s, even when len=0. We say that addr+c[len] is "off the end" of s, or "just after" s, even when len=0. We say that s' is a substring of s if addr <= addr' <= addr+c[len] and addr'+c[len'] <= addr+c[len]. Note that the empty string has len+1 representations as a substring of s. We say that a substring of s is a "word token", or just a "token", if all of the following are true: 1. it is not empty; 2. it contains no whitespace characters; 3. it is preceded by a whitespace character or is at the beginning of s; 4. it is followed by a whitespace character or is at the end of s. This library uses a "measured string" representation for storage of strings in memory. A measured string is the same as a Forth counted string, except that instead of being restricted to one character, the size of the count field is defined by the implementation, and is aligned. In this implementation, the string body is zero filled to trailing alignment. "Mstring" is short for "measured string". The "MSA" is the measured string address, the same as the count field address. In versions prior to 0.7.6, two choices were allowed for the size of a measured string count field, one character or one cell, by making CHAR-COUNT-FIELD in the PARAMETERS section false or true. That remains possible, but is strongly deprecated. Leave it false for the default cell-sized count field. That and other parameters that the user can choose are on lines with a "USER-CONFIG" comment. ) decimal \ *** WORDS ( Please see the file dstrings-words.txt for a more or less functional listing of the words in this library, including separation into user and system categories. The user category has subdivisions for the Dynamic-Strings word set and extra Forth-string words. Fairly complete word specifications are included in this file, but please see dstrings.html for the official Dynamic-Strings word list specification. ) \ *** UTILITY WORDS s" [UNDEFINED]" pad c! pad char+ pad c@ move pad find nip 0= [IF] : [UNDEFINED] ( "name" -- flag ) bl word find nip 0= ; immediate [THEN] 0 [IF] \ debugging \ loadm dstrings \ pfe [UNDEFINED] \\ [IF] : \\ -1 parse BEGIN refill 0= UNTIL [THEN] [UNDEFINED] bp? [IF] false value bp? \ true turns on breakpoints : bp ( n -- ) bp? IF cr ." #" . .s ELSE drop THEN ; [THEN] [THEN] [UNDEFINED] u<= [IF] : u<= ( u1 u2 -- flag ) 1+ u< ; [THEN] [UNDEFINED] u>= [IF] : u>= ( u1 u2 -- flag ) 1- u> ; [THEN] [UNDEFINED] -rot [IF] : -rot ( a b c -- c a b ) rot rot ; [THEN] [UNDEFINED] rdrop [IF] : rdrop ( r: a -- ) s" r> drop" evaluate ; immediate [THEN] [UNDEFINED] 2rdrop [IF] : 2rdrop ( r: a b -- ) s" 2r> 2drop" evaluate ; immediate [THEN] [UNDEFINED] cell [IF] 1 cells constant cell [THEN] [UNDEFINED] -cell [IF] -1 cells constant -cell [THEN] [UNDEFINED] 2cells [IF] 2 cells constant 2cells [THEN] [UNDEFINED] cell- [IF] : cell- ( u -- u-cell ) cell - ; [THEN] [UNDEFINED] cell/ [IF] : cell/ ( cell*[x] -- x ) [ 1 cells ] literal / ; [THEN] [UNDEFINED] char/ [IF] : char/ ( c[x] -- x ) [ 1 chars ] literal / ; [THEN] [UNDEFINED] chars+ [IF] : chars+ ( x n -- x+c[n] ) chars + ; [THEN] [UNDEFINED] num>s [IF] : num>s ( n -- n.s ) <# dup >r abs s>d #s r> sign #> ; [THEN] [UNDEFINED] [n] [IF] : [n] \ compile: ( n -- ) \ run: ( -- n ) ( This word is intended for embedding in an immediate word, to be executed in compilation mode. It does, however, have interpretation semantics: [ n -- n ] Based on PARSE-N in Leo Wong's TOOLS.F. ) num>s evaluate ; [THEN] \ *** TABLE OF CONTENTS ( 0 PARAMETERS 1 VARIABLES 2 STRUCTURES 3 FSTRING AND MSTRING EXTENSIONS 4 COMMENTS 5 STRING SPACE 6 STRING STACK SUPPORT 7 LOAD AND STORE 8 STRING STACK 9 CONCATENATION 10 ARGUMENTS 10.1 FRAMES 10.2 SCASE CONTROL STRUCTURE 10.3 PARSING LIBRARY WORDS 10.4 ARGUMENT INTERPRETER ) \ *** 0 PARAMETERS 4096 cells constant DEFAULT-/$BUF \ USER-CONFIG 4 constant DEFAULT-MAX-#$FRAMES \ USER-CONFIG \ TRUE is DEPRECATED: false constant CHAR-COUNT-FIELD immediate \ USER-CONFIG s" MAX-U" environment? 0= [IF] cr .( S" MAX-U" ENVIRONMENT? returns FALSE.) ABORT [THEN] constant UCELL-MAX CHAR-COUNT-FIELD [IF] \ char size 1 chars ( /mcount) s" /COUNTED-STRING" environment? 0= [IF] cr .( S" /COUNTED-STRING" ENVIRONMENT? returns FALSE.) ABORT [THEN] ( /counted-string) [ELSE] \ cell size cell ( /mcount) UCELL-MAX [THEN] constant MAX-MCOUNT constant /MCOUNT \ *** 1 VARIABLES [UNDEFINED] dstrings [IF] variable dstrings 0 dstrings ! [THEN] \ *** 2 STRUCTURES \ Measured string: (count, body, 0-fill to alignment) : >mbody ( msa -- bfa ) [ /MCOUNT ] literal + ; \ Dynamic string: (binding address, measured string) : >dbody ( dsa -- bfa ) [ /MCOUNT cell+ ] literal + ; \ String frame: (frame address, number of elements) 2 cells constant /$frame-item \ String space structure ( The major structure is the string space structure. Except for structure instance creation, the code always operates on the current instance of that structure, whose address is held in the variable DSTRINGS. The instance data words use DSTRINGS explicitly, rather than a base address passed on the data stack. WARNING! The /$BF and MAX-#$F fields may be assumed elsewhere to be at offsets of zero and one cell, respectively. Also note that the flags $GBG, $GC-LOCK, and $ARGS? fetch as chars, not as cell-extended flags. When uncommented, the debugging lines adjust the structure, including alignment, for direct comparison of dumps with the pfe/gcc version, at least with the alignment in MacOS X. The $ARGS? field is not used in this implementation. ) \ : STRUCT ( -- 0 ) 0 ; \ gforth already defines this another way : /STRUCT: ( <"sizename"> u -- ) aligned constant ; : +DS-FIELD ( offset size <"name"> -- offset+size ) \ ex-200x create over , + DOES> ( -- [dstrings]+offset ) @ dstrings @ + ; : ds-field: ( offset <"name"> -- offset+1cell ) \ ex-200x \ Exec: ( -- [dstrings]+offset ) aligned 1 cells +DS-FIELD ; : ds-cfield: ( offset <"name"> -- offset+1char ) \ ex-200x \ Exec: ( -- [dstrings]+offset ) 1 chars +DS-FIELD ; \ STRUCT \ avoid gforth "redefined" message 0 ds-field: /$bf \ size of string buffer with stack ds-field: max-#$f ds-field: $bufp \ string buffer pointer ds-field: $brkp \ next available for strings ds-field: $sp \ string stack pointer ds-field: $sp0 \ initial string stack pointer ds-field: $fbrkp \ frame stack top limit pointer ds-field: $fsp \ frame stack pointer ds-field: $fsp0 \ initial frame stack pointer ds-field: cat$p \ msa of last string in the \ buffer, if concatenation is \ in progress, else NULL ds-cfield: $gbg \ true when there is garbage \ char+ \ DEBUG ds-cfield: $gc-lock \ true when collection not permitted \ char+ \ DEBUG \ ds-cfield: $args? \ DEBUG \ char+ \ DEBUG /STRUCT: /$space-header \ frame stack continuation /$space-header ds-field: $fbrk \ frame stack break drop : ds-field@: ( xt -- ) ( Define a word that returns the contents of a cell-sized field of the current string space stucture. These words are intended for debugging and testing and should not be used in normal dstring word definitions. Use the structure field words for that instead. ) create , DOES> ( -- x ) @ execute @ ; ' $brkp ds-field@: $breakp@ ' $bufp ds-field@: $bufp@ ' $fbrkp ds-field@: $fbreakp@ ' $fsp0 ds-field@: $fsp0@ ' $fsp ds-field@: $fsp@ ' $sp0 ds-field@: $sp0@ ' $sp ds-field@: $sp@ ' cat$p ds-field@: cat$p@ \ *** 3 FSTRING AND MSTRING EXTENSIONS ( The following words have essentially the same functionality as words of the same name in the mstrings.fs library: ) \ (M!) M,S MCOUNT -MCOUNT ( The difference is that this implementation does a zero fill to trailing alignment when storing an mstring, and it compiles with count fields of size either one character or one cell, depending on whether CHAR-COUNT-FIELD is true. ) : 0fill-aligned ( addr -- addr' ) ( addr) dup aligned ( addr') dup >r ( addr addr') over - char/ ( addr len) 0 fill r> ; CHAR-COUNT-FIELD [IF] \ char size : mcount@ ( msa -- len ) c@ ; : mcount! ( len msa -- ) c! ; [ELSE] \ cell size : mcount@ ( msa -- len ) @ ; : mcount! ( len msa -- ) ! ; [THEN] : (m!) ( s msa -- ) ( This is the same as Wil Baden's PLACE, except it assumes the buffer address msa to be aligned, and stores s as a measured string, zero-filled to trailing alignment. As with PLACE, it is assumed that the mstring copy does not clobber the old string, and there is no check for room starting at msa. ) 2dup 2>r /MCOUNT + swap move 2r@ + /MCOUNT + 0fill-aligned ( offend) drop 2r> mcount! ; : m,s ( addr len -- addr' len ) ( ALLOT room and store the fstring into aligned data space as an mstring, leaving data space zero-filled to alignment; and leave the length and new body address. It is assumed that len is unsigned. NOTE: M,S differs from STRING, in Wil Baden's Tool Belt in that it stores a measured string instead of a counted string, and it leaves the fstring representation of the stored string instead of nothing. ) CHAR-COUNT-FIELD [IF] ( len) dup MAX-MCOUNT u>= ABORT" ***string count too large" [THEN] align here ( addr len addr') 2dup 2>r over /MCOUNT + aligned allot (m!) r> /MCOUNT + r> ; ( The following words convert between the fstring representation of an mstring and its mstring address. ) : mcount ( msa -- addr len ) dup mcount@ swap /MCOUNT + swap ; : -mcount ( addr len -- msa ) drop /MCOUNT - ; : parse>s \ compile: ( "ccc" char -- ) \ run: ( -- ccc.s ) \ interpret: ( "ccc" char -- ccc.s ) ( Parse the parse area up to the first occurrence of char, which is parsed away. If executing in compilation mode, append run-time semantics to the current definition that leaves the fstring representation on the stack. In interpretation mode, leave the Forth string representation for a stored copy, which may be transient in the style of S". In either mode, the format of the stored string is implementation dependent. NOTE: The interpreted copy is not transient in this implementation, and both copies are mstrings. ) ( char) parse ( ccc.s) state @ IF 2>r postpone AHEAD 2r> m,s 2>r postpone THEN 2r> postpone 2literal ELSE m,s THEN ; : s` ( "ccc<`>" -- ) \ see PARSE>S for STATE [char] ` parse>s ; immediate create &mempty 0 , \ *** 4 COMMENTS : ($: ( -- ) ( A synonym for "(" including any immediacy, and allowing multiple lines before the terminating right parenthesis, as specified for "(" in the File-Access word set. ) postpone ( ; immediate \ "paren-string-colon" \ *** 5 STRING SPACE : /$frame-stack ( -- /stack ) max-#$f @ /$frame-item * ; : 0$space ( &space -- ) ( Clear the string buffer, string stack, and string frame stack in the string space starting at &space. Any string variables holding strings in the string buffer are left pointing into limbo. This may be executed with the string space in an invalid state, as long as the /$BF and MAX-#$F fields of its string space structure are intact. NOTE: This word does not zero fill the string buffer. ) dstrings @ >r ( &space) dstrings ! /$space-header /$frame-stack + ( $buf.offset) dstrings @ + ( &$buf) dup $bufp ! dup $brkp ! dup $fsp ! $fsp0 ! $fbrk $fbrkp ! 0 \ for string stack underflow guard $bufp @ /$bf @ + ( offend) dup $sp ! dup $sp0 ! ( 0 offend) ! \ (unused) string stack underflow guard 0 cat$p ! false $gbg c! false $gc-lock c! \ false to-$args? \ not used in this implementation r> dstrings ! ; : make-$space ( /buf #frames -- addr ) ( Allocate and initialize a string space with /buf address units available for the string buffer including the string stack, and with a string frame stack for frame description entries holding up to #frames. The size is rounded up to address alignment, and the buffer begins and ends aligned. Return addr, the address of the string space. The standard word FREE with addr as input can be used to release the space. ) swap ( #frames /buf) aligned LOCALS| /buf #frames | #frames /$frame-item * /$space-header + /buf + cell+ allocate \ extra guard cell ( &$space ior) ABORT" ***MAKE-$SPACE can't ALLOCATE." ( &$space) /buf over ! #frames over cell+ ! dup 0$space ; ( Uncomment the DEBUG lines for use with pfe dstrings loaded. ) \ dstrings @ 0= [IF] \ DEBUG DEFAULT-/$BUF DEFAULT-MAX-#$FRAMES make-$space dstrings ! \ [THEN] \ DEBUG : collect-$garbage ( -- collected? ) ( The garbage flag in the current string space structure is tested, and if there is garbage, it is collected, unless garbage collection is locked. Garbage strings are marked by null backward links. Nongarbage strings are bound by their backward links, pointing either to a string variable data field address or to an entry on the string stack [the deepest if there are several identical references]. Garbage collection fills the gaps occupied by garbage strings by moving any nongarbage strings to lower memory one at a time. The backward link of a string that is moved does not change, but the forward links, in at most one string variable and/or possibly several string stack entries, are updated to point to the new MSA. This algorithm is "fast" because the backward links make it unnecessary to scan a list of string variables, and because no string is moved more than once. It does, however, require a scan of the string stack for each string that moves, unless it is the current concatenation string, which is guaranteed not to be on the string stack. When there is no garbage to collect, this word returns false. If there is garbage, it throws an error when garbage collection is locked. Otherwise it collects the garbage and returns true. We have not thought it worth the effort to factor this long word, but maybe it would be. ) $gbg c@ 0= IF false EXIT THEN $gc-lock c@ ABORT" ***string garbage locked" false $gbg c! $sp @ $sp0 @ $brkp @ $bufp @ 0 LOCALS| target next break $sp0v $spv | \ locate first garbage hole (no need to check off end, because \ we know there is a hole) BEGIN next @ ( backlink) WHILE \ not garbage, skip to next next >dbody next cell+ mcount@ + aligned to next REPEAT next to target \ garbage hole found BEGIN next break u< WHILE \ skip over garbage BEGIN next @ ( backlink) 0= next break u< and WHILE next >dbody next cell+ mcount@ + aligned to next REPEAT \ move and update until next garbage BEGIN next @ ( backlink) next break u< and WHILE \ not garbage and not off end \ We always update the forward link pointed to by the \ backward link. The backward link points to one of: a \ string variable, a deepest string stack entry, or CAT$. target cell+ next @ ( backlink) ! \ Unless the backward link points to CAT$, we scan the \ string stack and update copies of the old forward link. \ Note that if the backward link points into the string \ stack, we've already updated the deepest reference, which \ won't be found in the scan. next @ ( backlink) cat$p @ <> IF $sp0v $spv ?DO i @ next cell+ = IF target cell+ i ! THEN cell +LOOP THEN \ move dynamic string, backlink thru alignment pad, to hole next cell+ mcount@ ( $len) >dbody aligned ( total.len) next target 2 pick cmove next over + to next target + to target REPEAT \ move and update REPEAT \ garbage hole found target $brkp ! true ; : 0strings ( -- ) ( Set all string variables holding bound string values in string space to the empty string, and clear string space, including the string buffer, string stack, and string stack frames. NOTE: This word does not zero fill the string buffer. NOTE: If used for under-the-hood development, this word must be executed only when string space is in a valid state. ) $bufp @ ( next) $brkp @ LOCALS| break next | BEGIN next break u< WHILE next @ ( backlink) IF &mempty next @ ( $variable) ! THEN next >dbody next cell+ mcount@ + aligned to next REPEAT dstrings @ 0$space ; : $garbage? ( -- flag ) ( Leave true if there is garbage in the current string space. Not normally used by applications, since garbage collection is transparent. ) $gbg c@ 0= invert ; : $gc-off ( -- ) ( Disable garbage collection in the current string space. An error will be thrown if garbage collection is attempted. ) true $gc-lock c! ; : $gc-on ( -- ) ( Enable garbage collection in the current string space. This is the default. ) false $gc-lock c! ; : $gc-lock@ ( -- flag ) ( Fetch the dstring garbage collection "off" state. Intended for saving the off state for later restoration after a usage of $GC-ON or $GC-OFF. ) $gc-lock c@ 0= invert ; : $gc-lock! ( flag -- ) ( Set the dstring garbage collection "off" state according to flag. Intended for restoring the off state previously fetched by GC-LOCK@. ) $gc-lock c! ; : $unused ( -- u ) ( Leave the number of address units available for dynamic strings and string stack entries in the string buffer. ) $sp @ $brkp @ - ; : /$buf ( -- u ) ( Leave the size in address units allocated for the current string buffer. ) /$bf @ ; : max-#$frames ( -- u ) ( Leave the number of string frames allowed on the string frame stack for the current string space. ) max-#$f @ ; : $room? ( size -- flag ) ( Leave true if there is enough aligned room in string space for size address units starting at the break. ) $brkp @ + aligned $sp @ u<= ; : ?$room ( size -- ) ( Abort if there is not enough aligned room in string space for size address units starting at the break, even after garbage collection. ) dup $room? 0= IF collect-$garbage ( collected?) drop $room? 0= ABORT" ***string space overflow" ELSE ( size) drop THEN ; : ?cat ( -- ) ( Abort if there is a concatenation in progress in string space. ) cat$p @ ABORT" ***cat lock preventing string copy" ; : in-$buffer? ( msa -- flag ) ( Leave true if the mstring is in the string buffer. ) $bufp @ $brkp @ within ; \ *** 6 STRING STACK SUPPORT : $sp-- ( -- ) -cell $sp +! ; : $sp++ ( -- ) cell $sp +! ; : --$sp@ ( -- [$sp]-[cell] ) $sp-- $sp @ ; : $sp@++ ( -- $sp ) $sp @ $sp++ ; : /$stack ( -- #au ) $sp0 @ $sp @ - ; : ?$s-under ( #au -- ) /$stack swap u< ABORT" ***string stack underflow" ; : ($push) ( msa -- $: msa ) ( Pop an mstring address from the data stack and push it onto the string stack, with no check for stack room. A check that could trigger garbage collection must be done before using this word. If msa is not external, it must be supplied, after the check, from the string stack or a string variable, so it remains valid after a garbage collection. ) $sp @ cell- dup $sp ! ! ; : $push-ext ( msa -- $: msa ) ( Pop an external mstring address from the data stack and push it onto the string stack after checking for room, invoking garbage collection if necessary. Not to be used with a dynamic string because a garbage collection can invalidate its address. ) cell ?$room ($push) ; : $pop ($: msa -- s: msa) ( Abort if the string stack would underflow when popped. Otherwise pop the top of the string stack and push it onto the data stack. If the string is in the current string space and initially bound to the top of the string stack, mark it as garbage by setting its back link to NULL and set the garbage flag. WARNING: If msa is the address of a bound string, it may become invalid at the next garbage collection. This can be avoided by sandwiching sections of code where this could occur between $GC-OFF and $GC-ON. ) cell ?$s-under $sp@++ dup @ dup in-$buffer? ( $sp a$ bound?) IF dup cell- rot ( a$ blfa $sp) over @ = IF ( a$ blfa) 0 swap ! true $gbg c! ELSE ( a$ blfa) drop THEN ELSE nip THEN ; \ *** 7 LOAD AND STORE : $constant ( "name" $: ext$ -- ) create $pop , DOES> ($: -- ext$ ) @ $push-ext ; : parse>$ \ compile: ( "ccc" char -- ) \ run: ($: -- ccc$ ) \ interpret: ( "ccc" char -- $: ccc$ ) ( Parse the parse area up to the first occurrence of char, which is parsed away, and store the string as an external measured string. If executing in compilation mode, append run-time semantics to the current definition that leaves the MSA on the string stack. In interpretation mode, leave the MSA on the string stack, where the stored copy, unlike PARSE>S, is required to be nontransient. ) ( char) parse ( ccc.s) state @ IF 2>r postpone AHEAD 2r> m,s -mcount >r postpone THEN r> postpone literal postpone $push-ext ELSE m,s -mcount $push-ext THEN ; \ see PARSE>$ for STATE: : $" ( "ccc<">" -- $: c$ ) [char] " parse>$ ; immediate : $` ( "ccc<`>" -- $: c$ ) [char] ` parse>$ ; immediate $" " $constant empty$ align here /MCOUNT allot 10 c, 1 over mcount! $push-ext $constant \n$ : $variable ( "name" -- ) \ name does ( -- addr ) create empty$ $pop , ; : $@ ( $var.dfa -- $: a$ ) ( Leave the MSA of the string held by the string variable. ) cell ?$room @ ($push) ; : $! ( $var.dfa $: a$ -- ) ( Store the string whose MSA is on the string stack in the variable whose DFA is on the parameter stack. NOTES: The only situation in which $! copies the string value is when it is a bound string already stored in another variable. In that case, the new copy is the one that is stored in the variable. In particular, external strings are not copied. If the string value held by the string variable on entry is a bound string that is also referenced deeper on the string stack, its back link is reset to point to the deepest string stack reference. If it is a bound string not deeper on the string stack and not identical to the input string, its back link is set to zero, making it garbage. If it is an external string, its MSA in the variable is simply written over by that popped from the string stack. ) cell ?$s-under ( var) dup @ ( old$) dup in-$buffer? 0= ( oldext?) $sp @ @ ( new$) dup in-$buffer? 0= ( newext?) 0 0 LOCALS| /body blfa newext? new$ oldext? old$ var | oldext? newext? and 0= oldext? 0= new$ old$ = and 0= and IF \ new string not bound to variable \ Do old string first; it might make a garbage hole, more \ room in case new string is copied. oldext? 0= IF \ old string is bound 0 ( next) $sp0 @ $sp @ cell+ ?DO i @ old$ = IF ( prev.next) drop i ( next) THEN cell +LOOP ( next) dup old$ cell- ( blfa) ! ( next) 0= IF true $gbg c! THEN THEN newext? 0= IF \ new string is bound new$ cell- dup to blfa ( blfa) @ $sp @ $sp0 @ within 0= IF \ new string bound to different variable, copy it ?cat new$ mcount@ ( len) dup chars dup to /body ( /body) >dbody ?$room \ maybe gc $brkp @ to blfa ( len) blfa cell+ mcount! $sp @ @ ( src$) \ in case there was gc ( src$) >mbody blfa >dbody /body cmove blfa cell+ to new$ \ for storing into variable blfa >dbody /body + ( offend) dup aligned ( newbreak) dup $brkp ! ( offend newbreak) over - char/ 0 fill THEN var blfa ! THEN THEN new$ var ! $sp++ ; \ *** 8 STRING STACK : $drop ($: a$ -- ) ( Abort if the string stack would underflow when popped. Otherwise increment the string stack pointer, thus popping the string stack. If the input string is in the current string space and initially bound to the top of the string stack, mark it as garbage by setting its back link to NULL and set the garbage flag. ) $pop drop ; : $2drop ($: a$ b$ -- ) ( Drop the two topmost string stack entries, marking them as garbage if appropriate. ) $drop $drop ; : $2dup ($: a$ b$ -- a$ b$ a$ b$ ) ( Leave copies of the two topmost string stack entries. The string values are not copied. ) 2cells ?$s-under 2cells ?$room $sp @ dup @ swap cell+ @ --$sp@ ! --$sp@ ! ; : $depth ( -- u ) ( Leave the number of items on the string stack. ) /$stack cell/ ; : $dup ($: a$ -- a$ a$ ) ( Leave a copy of the topmost string stack entry. The string value is not copied. ) cell ?$s-under $sp @ $@ ; : $swap ($: a$ b$ -- b$ a$ ) ( Exchange the two most accessible strings on the string stack. Throw an error if there are less than two strings on the stack. Neither string value is copied. ) 2cells ?$s-under $sp @ ( &top) dup @ ( top$) over cell+ ( &bot) dup @ ( bot$) LOCALS| bot$ &bot top$ &top | \ don't change blinks to $vars, nor to deeper than &deep top$ bot$ <> IF top$ &bot ! bot$ &top ! bot$ in-$buffer? IF bot$ cell- @ ( blink) &bot = IF &top bot$ cell- ( bla) ! THEN THEN top$ in-$buffer? IF top$ cell- @ ( blink) &top = IF &bot top$ cell- ( bla) ! THEN THEN THEN ; : $nip ($: a$ b$ -- b$ ) ( Drop the next to top item from the string stack. ) $swap $drop ; : $over ($: a$ b$ -- a$ b$ a$ ) ( Leave a copy of the next most accessible string stack entry on top of the string stack. The string value is not copied. ) 2cells ?$s-under $sp @ cell+ $@ ; : $tuck ($: a$ b$ -- b$ a$ b$ ) ( Copy the top string stack item just below the second item. The string value is not copied. ) $swap $over ; : $pick ( u $: au$ ... a0$ -- au$ ... a0$ au$ ) ( Copy the u-th string stack entry to the top of the string stack. The string value is not copied. Throw an error if the input string stack does not have at least u+1 items. ) ( u) cells dup cell+ ?$s-under $sp @ + $@ ; : $exchange ( i j -- ) ($: maxth$ ... minth$ ... -- minth$ ... maxth$ ... ) ( Exchange the ith and jth strings on the string stack, where the top is the 0th. Throw an error if there are not at least max[i,j] + 1 strings on the stack. Neither string value is copied. ) cells swap cells 2dup u> IF swap THEN ( min.cells max.cells) dup cell+ ?$s-under 2dup = IF 2drop EXIT THEN ( max.cells) $sp @ + ( &deep) dup @ rot ( &deep deep$ min.cells) $sp @ + ( &shal) dup @ LOCALS| shal$ &shal deep$ &deep | shal$ deep$ = IF EXIT THEN shal$ &deep ! deep$ &shal ! ( Back links to $vars or to deeper than &deep do not change. ) shal$ in-$buffer? IF shal$ cell- @ ( blink) &shal &deep within IF &deep shal$ cell- ( blfa) ! THEN THEN deep$ in-$buffer? IF deep$ cell- @ ( blink) &deep = \ was deepest copy IF ( It was the deepest copy. Scan for the deepest shallower copy, including that already stored at &shal. Note that neg +LOOP includes limits. ) &shal &deep cell- DO i @ deep$ = IF i LEAVE THEN -cell +LOOP ( &tween) deep$ cell- ( blfa) ! THEN THEN ; : $s> ($: a$ -- s: a.s ) ( Drop a$ from the string stack and leave it as an fstring a.s, without copying. WARNING: If a$ is a bound string, it may move or disappear at the next garbage collection, making a.str invalid. This can be avoided by sandwiching sections of code where this could occur between $GC-OFF and $GC-ON. ) $pop mcount ; : $,s ($: a$ -- s: a.s ) ( Drop a$ from the string stack, copy it into data space as a measured string, and leave it as an fstring. ) $pop mcount m,s ; : $s@ ($: a$ -- a$ s: a.s ) ( Leave the string stack unchanged, and leave the string body address and length on the data stack. NOTE: In earlier versions this was call $S@S. The trailing "S" is superfluous if it is understood that the only string format that usually appears on the data stack is the fstring format. WARNING: If a$ is a bound string, it may move at the next garbage collection, making a.s invalid. This can be avoided by sandwiching sections of code where this could occur between $GC-OFF and $GC-ON. ) cell ?$s-under $sp @ @ mcount ; : >$s ( a.s -- $: a$ ) ( Push the external fstring a.s onto the string stack, without copying the string value into the string buffer. It is an unchecked error if the Forth string a.s is not stored as an external measured string. WARNING: If the string value of a.s is actually in the string buffer and not external, the push operation may generate a garbage collection that invalidates its MSA. ) -mcount $push-ext ; : >$s-copy ( a.s -- $: a$ ) ( Copy the external string value whose body address and count are on the parameter stack into the string buffer and push it onto the string stack. Errors are thrown if there is an mcount overflow, if there is not enough room in string space, even after garbage collection, or if there is an unterminated string concatenation. The input external string need not exist as a measured string. WARNING: This word should not be used when the input string is a bound string because the copy operation may generate a garbage collection which invalidates its MSA. ) ?cat CHAR-COUNT-FIELD [IF] ( len) dup MAX-MCOUNT u> ABORT" ***string count too large" [THEN] 0 ( blf) \ fill in back link field addr when gc-safe LOCALS| blf len addr | len >dbody cell+ ?$room $brkp @ to blf blf cell+ ($push) \ forward link $sp @ blf ! \ back link len blf cell+ mcount! blf >dbody addr over len cmove ( &body) len chars+ 0fill-aligned ( new.break) $brkp ! ; : $. ($: a$ -- ) ( Display the string on the terminal. If the system implementation of TYPE has its output vectored, $. uses the same vector. ) $s> type ; : $type ($: a$ -- ) $. ; \ synonym, deprecated in favor of $. \ *** 9 CONCATENATION ( When there is a string concatenation in progress, concatenation is the only basic string operation that can copy a string into the string buffer. This implementation leaves string space in a valid state after each piece is concatenated, with the break aligned after the concatenation so far, and a 0 fill between the end of the concatenation and the break. The concatenating string, called the cat$, is the last dynamic string in the string buffer. It is bound to a hidden string variable in the string space structure, and remains valid if a garbage collection occurs before concatenation ends. ) : cat$-$! ($: a$ -- ) cat$p $! ; : onto-cat$ ( a.s -- ) ( Assume a concatenation is in progress, and that there is room in string space to append the body of a.s to the cat$. Here a.s may be external and not assumed to exist as an mstring, or it may be the fstring representation of a bound mstring. Append the body and update the cat$ count and the string break. An error is thrown if there is an mcount overflow. ) cat$p @ LOCALS| cat$ delta source | cat$ mcount ( addr len) dup delta + CHAR-COUNT-FIELD [IF] ( len') dup MAX-MCOUNT u> ABORT" ***string count too large" [THEN] ( len') cat$ mcount! ( addr len) chars+ ( target) source over delta cmove ( target) delta chars+ 0fill-aligned $brkp ! ; : $+ ($: a$ -- ) ( If a$ is the empty string, drop it and do nothing else. In particular, do not start a new concatenation, which would lock string space against new nonconcatenating copies. Otherwise append the string body to the end of the current cat$. If there is none, start one. An error is thrown if there is not enough room in string space even after a garbage collection, or if there is an mcount overflow. If garbage collection occurs, a$ remains valid even when it is in the string buffer. ) $s@ nip 0= IF $drop EXIT THEN cat$p @ IF \ continue concatenation $s@ ( addr len) chars ?$room \ any overestimate due to break ( addr) drop \ alignment stays < 2 cells $s> ( source delta) onto-cat$ ELSE \ start concatenation $s@ ( addr len) chars ?$room drop $s> >$s-copy cat$-$! THEN ; : s+ ( a.s -- ) ( If a.s is the empty string, drop it and do nothing else. Otherwise append the fstring body to the end of the current cat$. If there is none, start one. An error is thrown if there is not enough room in string space even after a garbage collection, or if there is an mcount overflow. S+ is most commonly used on external strings, not assumed to exist as mstrings. In contrast to $+, garbage collection could invalidate a.s if it is a dynamic string in the string buffer. S+ can be used in that situation if garbage collection is turned off with $GC-OFF. ) dup 0= IF 2drop EXIT THEN cat$p @ IF \ continue concatenation ( delta) dup chars ?$room \ any overestimate due to break ( s) onto-cat$ \ alignment stays < 2 cells ELSE \ start concatenation ( a.s) >$s-copy cat$-$! THEN ; : ENDCAT ($: -- cat$ | empty$ ) ( If there is no cat$, do nothing but leave the empty string. If there is, leave it as a string bound to the top of the string stack, and terminate concatenation, permitting normal copies into the string buffer. Concatenation can be interrupted and resumed like this: ENDCAT ... $+ where the intervening action must ensure that the string pushed by ENDCAT is on top of the string stack for $+. There may be overhead from a recopy of the interrupted concatenation, if any new strings have been produced in string space, even if marked as garbage. If the intervening action leaves a new concatenation in progress, the old is concatenated to the new [unless there was no old concatenation in progress]. ) cat$p @ IF cell ?$room cat$p @ dup ($push) ( cat$) cell- ( lfa) $sp @ swap ! 0 cat$p ! ELSE empty$ THEN ; : parse-s+ \ compile: ( "ccc" char -- ) \ run: ( -- ) \ interpret: ( "ccc" char -- ) ( Parse the parse area up to the first occurrence of char, which is parsed away. If executing in compilation mode, append run-time semantics to the current definition that concatenates the characters parsed from the string. Otherwise concatenate the characters. ) ( char) parse state @ IF postpone sliteral postpone s+ ELSE s+ THEN ; \ See PARSE-S+ for STATE: : $+" ( "ccc<">" -- ) [char] " parse-s+ ; immediate : $+` ( "ccc<`>" -- ) [char] ` parse-s+ ; immediate \ *** 10 ARGUMENTS ( This entire section can be deleted if there is no need for string stack frames or macro arguments. ) \ *** 10.1 FRAMES : $frame-depth ( -- u ) ( Leave the number of string frames currently on the string frame stack. ) $fsp0 @ $fsp @ - /$frame-item / ; : $frame ( u -- ) ( Push the description of a string stack frame starting at the top of the string stack and containing u entries onto the string frame stack. Errors are thrown if the frame stack would overflow or if the depth of the string stack above the top frame, if there is one, is less than u. The value u = 0 is allowed. NOTE: The current implementation pushes u and the string stack pointer onto the frame stack, even if u = 0. NOTE: It is illegal to make changes to the string stack in or below the string frame before it is dropped, except for those induced by garbage collection. There is no check for that. ) $fsp @ $fbrkp @ = ABORT" ***too many string frames" $frame-depth IF $fsp @ @ ELSE $sp0 @ THEN $sp @ - ( u avail.size) over cells u< ABORT" ***not enough strings for frame" $fsp @ cell- dup >r ! \ push u r> cell- $sp @ over ! \ push $sp $fsp ! ; : ?$frame-under ( -- ) $frame-depth 0= ABORT" ***string frame stack underflow" ; : drop-$frame ($: frame*$ i*$ -- i*s ) ( Drop the topmost string frame from the string frame stack, and the corresponding strings, frame*$, from the string stack. An error is thrown if either stack would underflow. The cases where the frame has zero entries on the string stack and/or there are zero or more items on the string stack above the top frame item are handled properly. ) ?$frame-under $fsp @ ( &fr.top) dup [ 2 cells ] literal + $fsp ! ( &fr.top) 2@ ( #fr fr.top) over 0= IF 2drop EXIT THEN LOCALS| fr.top #fr | fr.top $sp @ ( top) - cell/ ( #above) ?dup IF ( #above) 1- dup #fr + swap ( fr.bot.index above.bot.index) 0 swap DO dup i $exchange 1- -1 +LOOP drop THEN #fr 0 DO $drop LOOP ; : #$args ( -- u ) ( Leave the number of entries in the topmost string frame. Throw an error if the frame stack is empty. ) ?$frame-under $fsp @ cell+ @ ; : find-$arg ( s -- u true | false ) ( Assume there is at least one string frame on the string frame stack. The frame may contain zero items. If the fstring matches an element in the top string frame, leave its index u and true, else leave only false. The index of the top element is zero. ) $fsp @ dup @ ( first) swap cell+ @ ( first #items) 2dup cells + ( first #items limit) LOCALS| limit #items next len addr | false BEGIN next limit <> WHILE addr len next @ mcount compare WHILE \ no match next cell+ to next REPEAT ( false) drop #items limit next - cell/ - ( i) true THEN ; : th-$arg ( u -- $: arg_u$ ) ( Leave the u-th string in the topmost string frame, where the index u of the top element is zero. Throw an error if the frame stack is empty or if the top frame contains less than u+1 strings. ) ?$frame-under ( u) dup #$args u>= ABORT" ***not enough strings in top frame" ( u) cells $fsp @ @ + $@ ; \ *** 10.2 SCASE CONTROL STRUCTURE ( These words are replacements for CASE, OF, ENDOF, and ENDCASE where the input and the cases to be matched are fstrings of the form [addr len]. The SCASE structure is supposed to behave in strict analogy to the normal CASE structure. The imsplementation here is a direct imitation of the example in ANSI X3.215-1994, section A.3.2.3.2. As in that example, the stack notation acts as if the control-flow stack is the data stack, with which the code has to be compatible. ) 0 constant SCASE immediate \ zero OF count : SOF ( #of -- orig #of+1 / s s' -- | s ) 1+ ( #of+1) >r \ get out of the way of possible control-flow postpone 2over postpone compare postpone 0= postpone IF ( C: orig) postpone 2drop \ discard case string r> ; immediate : ENDSOF ( orig1 #of -- orig2 #of ) >r postpone ELSE r> ; immediate : ENDSCASE ( orig1 ... orign #of -- / s -- ) postpone 2drop ( #of) 0 ?DO postpone THEN LOOP ; immediate \ *** 10.3 PARSING LIBRARY WORDS [UNDEFINED] PARSING-LIB-S [IF] [UNDEFINED] PARSE-NAME [IF] \ http://www.forth200x.org/reference-implementations/parse-name.fs : isspace? ( c -- f ) bl 1+ u< ; : isnotspace? ( c -- f ) isspace? 0= ; : xt-skip ( addr1 n1 xt -- addr2 n2 ) \ gforth \ skip all characters satisfying xt ( c -- f ) >r BEGIN dup WHILE over c@ r@ execute WHILE 1 /string REPEAT THEN r> drop ; : parse-name ( "name" -- c-addr u ) source >in @ /string ['] isspace? xt-skip over >r ['] isnotspace? xt-skip ( end-word restlen r: start-word ) 2dup 1 min + source drop - >in ! drop r> tuck - ; [THEN] \ PARSE-NAME \ These words are copied and pasted from parsing.fs 0.8.3. : preparse-name ( "" -- word.s ) ( Get the next word in the parse area without advancing the input stream. Leo Wong's PREPARSE. ) >in @ >r parse-name r> >in ! ; : parse-name-away ( "" -- ) ( When the parse area is not all white space, parse away the next word. Otherwise empty the parse area. A logical companion to PREPARSE-NAME. ) parse-name 2drop ; : ?emit-cr ( -- ) source-id 0= IF cr THEN ; : next-instream-name ( "" -- word.s ) \ Wil Baden's NEXT-WORD ( Parse the next word from the input stream across lines. If word.s is empty, at most whitespace was found. A parsing implementation of Wil Baden's NEXT-WORD, plus conditional echoing of CR's. ) BEGIN parse-name dup IF EXIT THEN ?emit-cr refill WHILE 2drop REPEAT ; [THEN] \ PARSING-LIB-S \ *** 10.4 ARGUMENT INTERPRETER : cat-$arg ( u -- ) th-$arg $+ ; : [cat-$arg] ( u -- ) [n] postpone cat-$arg ; \ just to save typing : skip&postpone postpone parse-name-away postpone postpone ; immediate : compile-with-$args ( -- ) ( A factor in $ARGS{, to be executed only in compilation mode, with an argument string frame already formed. It works around the absence of a standard, patchable INTERPRET. We got the basic idea from Leo Wong's implementation of MANY: in TOOLS.F. The method is to examine each word in the input stream, across lines up to a terminating ";", to intercept named string arguments and compile code that handles the run-time argument. Nonargument words are to be compiled as usual. For nonparsing words, this can be done by EVALUATEing [in compilation mode] the string found for the word. That doesn't work for parsing words because the parse area terminates at the end of the word; so the word doesn't see what it's supposed to parse. By POSTPONEing it instead, the interpreter executes it without switching the parse area. This has the disadvantage that every parsing word allowed to occur must be handled explicitly, most easily with case sensitivity. This word leaves the terminating semicolon as the next word token in the parse area, ready to be handled by the host system interpreter. It compiles code to drop the run-time argument frame, but does not drop the compile-time argument frame. NOTE: The C-primitive version of $ARGS{ in the pfe DSTRINGS-EXT environment uses pfe's built-in methods for patching the interpreter, and does not suffer the disadvantages mentioned above. ) BEGIN preparse-name 2dup 2>r s" ;" compare WHILE r@ ( len) 0= IF 2r> ( empty.s) 2drop refill 0= ABORT" *** missing ;" ELSE SCASE 2r@ ( word.s) s" (" SOF skip&postpone ( ENDSOF s" ($:" SOF skip&postpone ($: ENDSOF s" \" SOF skip&postpone \ ENDSOF s` $+"` SOF skip&postpone $+" ENDSOF s" $+`" SOF skip&postpone $+` ENDSOF s` s"` SOF skip&postpone s" ENDSOF s" s`" SOF skip&postpone s` ENDSOF s` $"` SOF skip&postpone $" ENDSOF s" $`" SOF skip&postpone $` ENDSOF s` ."` SOF skip&postpone ." ENDSOF s` ABORT"` SOF skip&postpone ABORT" ENDSOF parse-name-away ( word.s) find-$arg IF ( index) [cat-$arg] ELSE 2r@ ( word.s) evaluate THEN 0 0 ( dummy.word.s) ENDSCASE 2rdrop THEN REPEAT 2r> ( ";".s) 2drop \ ";" remains in the input stream postpone drop-$frame ; : args>$frame ( " ... }" delim.s -- $: arg1$ ... argn$ ) ( Copy word tokens from the input stream into string space across lines, up to but not including the first occurrence of the body of delim.s as a word token. Leave the input stream positioned with delim.s parsed away. ) $depth >r ( delim.s) 2>r BEGIN next-instream-name dup 0= ABORT" ***$ARGS{ not terminated." 2dup 2r@ compare WHILE >$s-copy REPEAT 2drop 2rdrop $depth r> - $frame ; : $ARGS{ \ compile: ( "arg1 arg2 ... argn <}>" -- ) \ run: ($: arg1'$ arg2'$ ... argn'$ -- ) ( Immediate and compilation-only. Copy the argument strings across lines to the string buffer, push them onto the string stack with argn the most accessible, and make them into the top compile-time string stack frame. Compile the run-time code to make an argument frame out of the n most accessible run-time string stack entries. Start intercepting the input stream across lines. For any argument word token, compile run-time code that concatenates the corresponding string in the run-time frame. For any nonargument word token, compile its normal run-time semantics. At the semicolon terminating the definition, drop the compile-time argument frame, compile code to drop the run-time argument frame, perform the normal system semicolon semantics, and stop intercepting the input stream. Syntax for defining a string macro GEORGE: ) \ : george ($: a$ b$ c$ -- cat$ ) \ $ARGS{ arg1 arg2 arg3 } \ $+" This is arg1: " arg1 $+" ." ENDCAT ; ( The blank following the last argument is required. For a macro with no arguments, $ARGS{...} does nothing but add useless overhead and should be omitted. Two of the arguments in this example are ignored and could have been left out. It is also normal to use a $ARGS{...} word without ENDCAT before the colon as a step in a concatenation that is terminated elsewhere. Sample syntax using the string macro GEORGE: $" bill" $" sue" $" marie" george $. The resulting display is: This is arg1: bill. NOTE: The code between $ARGS{...} and the terminating semicolon is allowed to produce a net increase in the runtime string stack depth, but is not allowed to change the part of the string stack present just after $ARGS{...} [which includes the frame], beyond what is induced by garbage collection. NOTE: Macro argument labels must be distinct from each other and from any local labels that appear in the same definition, and there is no check for that. NOTE: At the moment the semantics of $ARGS{ is undefined before DOES>. ) state @ 0= ABORT" ***Use only in definitions." s" }" args>$frame #$args [n] postpone $frame \ true to-args? \ NOT USED in this implementation compile-with-$args \ false to-args? \ NOT USED in this implementation drop-$frame ; immediate