( Title: Dynamic-Strings Words File: dstrings.fs Author: David N. Williams Version: 0.7.3 License: LGPL Starting date: October 6, 2006 Last revision: October 9, 2006 Version 0.7.2 Starting date: April 16, 2004 Last revision: December 22, 2004 Version 0.7.1 Starting date: June 12, 2003 Last revision: June 14, 2003 Version 0.7.0 Starting date: March 19, 2003 Last revision: June 12, 2003 ) \ Copyright (C) 2001-2004, 2006 David N. Williams ( This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or at your option any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. If you take advantage of the option in the LGPL to put a particular version of this library part under the GPL, the author would regard it as polite if you would put any direct modifications under the LGPL as well, and include a copy of this request near the beginning of the modified library source. A "direct modification" is one that enhances or extends the library in line with its original concept, as opposed to developing a distinct application or library which might use it. This code is an ANS Forth implementation of our dynamic-strings word list, a direct port of our C implementation, version 0.6.29 of February 25, 2003, for pfe. ANS Forth with dependences: - case insensitivity - zero is not an address - Core word set - File-Access word set - Memory-Allocation word set - Locals word set - Programming-Tools word set - String word set - Extensions word sets for the above NOTATION AND TERMINOLOGY In stack and other comments, "s" or "something.s" stands for an ANS Forth string [addr len]. We say that addr is the beginning of s, even when len=0. We say that addr+len is "off the end" of s, or "just after" s, even when len=0. We say that s' is a substring of s if addr <= addr' <= addr+len and addr'+len' <= addr+len. Note that the empty string has len+1 representations as a substring of s. We say that a substring of s is a "word token", or just a "token", if all of the following are true: 1. it is not empty; 2. it contains no whitespace characters; 3. it is preceded by a whitespace character or is at the beginning of s; 4. it is followed by a whitespace character or is at the end of s. This package generally uses a "measured string" representation for strings in memory. A measured string is the same as a Forth counted string, except that instead of being restricted to one character, the size of the count field is defined by the implementation. We default to one cell as the count field size. "Mstring" is short for "measured string". The "MSA" is the measured string address, the same as the count field address. We allow only two choices for the size of a measured string count field, one character or one cell. Set CHAR-COUNT-FIELD? to true to make measured strings the same as normal Forth counted strings. Set it to false for a cell-sized count field. This and other parameters that the user can choose are on lines with a "USER-CONFIG" comment. WORDS Please see the file dstrings-words.txt for a more or less functional listing of the words in this package, including separation into user and system categories. The user category has subdivisions for the Dynamic-Strings word set and extra Forth-string words. Fairly complete word specifications are included in this file, but please see dstrings.html for the official Dynamic-Strings word list specification. ) decimal false constant CHAR-COUNT-FIELD? immediate \ USER-CONFIG s" [UNDEFINED]" pad c! pad char+ pad c@ move pad find nip 0= [IF] : [UNDEFINED] ( "name" -- flag ) bl word find nip 0= ; immediate [THEN] 0 [IF] \ debugging \ loadm dstrings \ pfe [UNDEFINED] \\ [IF] : \\ BEGIN -1 parse 2drop refill 0= UNTIL ; [THEN] [UNDEFINED] bp? [IF] false value bp? \ true turns on breakpoints : bp ( n -- ) bp? IF cr ." #" . .s ELSE drop THEN ; [THEN] [THEN] [UNDEFINED] u<= [IF] : u<= ( u1 u2 -- flag ) 1+ u< ; [THEN] [UNDEFINED] u>= [IF] : u>= ( u1 u2 -- flag ) 1- u> ; [THEN] [UNDEFINED] -rot [IF] : -rot ( a b c -- c a b ) rot rot ; [THEN] \ *** PARAMETERS AND DATA 16384 constant DEFAULT-/$BUF \ USER-CONFIG 4 constant DEFAULT-MAX-#$FRAMES \ USER-CONFIG 1 cells constant cell -1 cells constant -cell 2 cells constant 2cells : cell- ( u -- u-cell ) cell - ; s" MAX-U" environment? 0= [IF] cr .( S" MAX-U" ENVIRONMENT? returns FALSE.) ABORT [THEN] constant UCELL-MAX CHAR-COUNT-FIELD? [IF] \ char size : mcount@ ( msa -- len ) c@ ; : mcount! ( len msa -- ) c! ; 1 chars ( /mcount) s" /COUNTED-STRING" environment? 0= [IF] cr .( S" /COUNTED-STRING" ENVIRONMENT? returns FALSE.) ABORT [THEN] ( /counted-string) [ELSE] \ cell size : mcount@ ( msa -- len ) @ ; : mcount! ( len msa -- ) ! ; cell ( /mcount) UCELL-MAX [THEN] constant MAX-MCOUNT constant /MCOUNT \ DEBUG \ : dump-mstring ( msa -- ) \ dup mcount@ /MCOUNT + align dump ; [UNDEFINED] dstrings [IF] variable dstrings 0 dstrings ! [THEN] create &mempty 0 , 0 , \ sufficient for empty mstring \ *** ANS FORTH STRING EXTENSIONS: I : 0fill-aligned ( addr -- addr' ) ( addr) dup aligned ( addr') dup >r ( addr addr') over - ( addr len) 0 fill r> ; : mplace ( s msa -- ) ( MPLACE is the same as Wil Baden's PLACE, except it assumes the buffer address msa to be aligned, and stores s as a measured string, zero-filled to trailing alignment. As with PLACE, it is assumed that the mstring copy does not clobber the old string, and there is no check for room starting at msa. ) 2dup 2>r /MCOUNT + swap move 2r@ + /MCOUNT + 0fill-aligned ( offend) drop 2r> mcount! ; : sm, ( addr len -- addr' len ) ( ALLOT room and store the ANS Forth string into aligned data space as an mstring, leaving data space zero-filled to alignment; and leave the length and new body address. It is assumed that len is unsigned. NOTE: SM, differs from STRING, in Wil Baden's Tool Belt in that it stores an aligned, zero-filled, measured string instead of a counted string, and it leaves the ANS Forth representation of the stored string. ) CHAR-COUNT-FIELD? [IF] ( len) dup MAX-MCOUNT u>= ABORT" ***string count too large" [THEN] align here ( addr len addr') 2dup 2>r over /MCOUNT + aligned allot mplace r> /MCOUNT + r> ; ( The following words convert between the ANS Forth string representation of an mstring and its mstring address. They are intended mainly for system use, as mstring addresses don't normally appear on the data stack. ) : mcount ( msa -- addr len ) dup mcount@ swap /MCOUNT + swap ; : -mcount ( addr len -- msa ) drop /MCOUNT - ; : embed-m$ ( addr len buf -- buf len' ) ( Store the input ANS Forth string [addr len] as a measured string at buf, zero-filled to trailing alignment, and return it as a Forth string with len' = len + /MCOUNT + alignment. NOTE: Used, e.g., by PARSE>$, to trick SLITERAL into compiling a measured string. ) dup >r mplace r> dup mcount@ /MCOUNT + aligned ; : parse>s \ compile: ( "ccc" char -- ) \ run: ( -- ccc.s ) \ interpret: ( "ccc" char -- ccc.s ) ( Parse the input stream up to the first occurrence of char, which is parsed away. If executing in compilation mode, append run-time semantics to the current definition that leaves the ANS Forth string representation on the stack. In interpretation mode, leave the Forth string representation for a stored copy, which may be transient in the style of S". In either mode, the format of the stored string is implementation dependent. NOTE: The interpreted copy is not transient in this implementation, and both copies are mstrings. ) ( char) parse state @ IF pad embed-m$ postpone sliteral postpone drop postpone mcount ELSE sm, THEN ; : s` ( "ccc<`>" -- ) \ see PARSE>S for STATE [char] ` parse>s ; immediate \ *** STRUCTURES \ Measured string: (count, body, 0-fill to alignment) : >mbody ( msa -- bfa ) [ /MCOUNT ] literal + ; \ Dynamic string: (binding address, measured string) : >dbody ( dsa -- bfa ) [ /MCOUNT cell+ ] literal + ; \ String frame: (frame address, number of elements) 2 cells constant /$frame-item ( The major structure is the string space structure. Except for structure instance creation, the code always operates on the current instance of that structure, whose address is held in the variable DSTRINGS. The instance data words use DSTRINGS explicitly, rather than a base address passed on the data stack. The following structure words are a variation on qdstruct.fs. ) 0 value /members \ running total offset 1 cells value /member \ current member size : char-members ( -- ) 1 chars to /member ; : cell-members ( -- ) 1 cells to /member ; : create-member ( "name" -- ) create /members dup , /member + to /members ; : same-member ( -- ) /members /member - to /members ; : align-members ( -- ) /members aligned to /members ; : &member: ( "name" -- ) create-member DOES> ( -- addr ) @ ( offset) dstrings @ + ; : @member: ( "name" -- ) create-member DOES> ( -- n ) @ ( offset) dstrings @ + @ ; : !member: ( "name" -- ) create-member DOES> ( n -- ) @ ( offset) dstrings @ + ! ; : c@member: ( "name" -- ) create-member DOES> ( -- n ) @ ( offset) dstrings @ + c@ ; : c!member: ( "name" -- ) create-member DOES> ( n -- ) @ ( offset) dstrings @ + c! ; : /struct: ( "name" -- ) /members constant ; \ String space: ( WARNING! The /$BUF and MAX-#$FRAMES fields are assumed elsewhere to be at offsets of zero and one cell, respectively. Also note that the flags $GARBAGE?, $GC-LOCK?, and $ARGS? fetch as chars, not as cell-extended flags. When uncommented, the debugging lines adjust the structure, including alignment, for direct comparison of dumps with the pfe/gcc version, at least with the alignment in MacOS X. The $ARGS? field is not used in this implementation. ) 0 to /members cell-members @member: /$buf \ size of string buffer with stack @member: max-#$frames @member: $bufp@ \ string buffer pointer same-member !member: $bufp! @member: $breakp@ \ next available for strings same-member !member: $breakp! @member: $sp@ \ string stack pointer same-member !member: $sp! same-member &member: $sp @member: $sp0@ \ initial string stack pointer same-member !member: $sp0! @member: $fbreakp@ \ frame stack top limit pointer same-member !member: $fbreakp! @member: $fsp@ \ frame stack pointer same-member !member: $fsp! @member: $fsp0@ \ initial frame stack pointer same-member !member: $fsp0! &member: cat$p \ msa of last string in the same-member @member: cat$@ \ buffer, if concatenation is same-member !member: cat$! \ in progress, else NULL char-members c@member: $garbage? \ true when there is garbage same-member c!member: to-$garbage? \ /members 1+ to /members \ DEBUG c@member: $gc-lock? \ true when collection not permitted same-member c!member: to-$gc-lock? \ /members 1+ to /members \ DEBUG \ c@member: $args? \ DEBUG \ same-member c!member: to-$args? \ DEBUG \ /members 1+ to /members \ DEBUG align-members /struct: /$space-header &member: $fbreak \ (continuation) frame stack break \ *** COMMENTS : ($: ( -- ) ( A synonym for "(" including any immediacy, and allowing multiple lines before the terminating right parenthesis, as specified for "(" in the File-Access word set. ) postpone ( ; immediate \ "paren-string-colon" \ *** STRING SPACE : /$frame-stack ( -- /stack ) max-#$frames /$frame-item * ; : 0$space ( &space -- ) ( Clear the string buffer, string stack, and string frame stack in the string space starting at &space. Any string variables holding strings in the string buffer are left pointing into limbo. This may be executed with the string space in an invalid state, as long as the /$BUF and MAX-#$FRAMES fields of its string space structure are intact. NOTE: This word does not zero fill the string buffer. ) dstrings @ >r ( &space) dstrings ! /$space-header /$frame-stack + ( $buf.offset) dstrings @ + ( &$buf) dup $bufp! dup $breakp! dup $fsp! $fsp0! $fbreak $fbreakp! 0 \ for string stack underflow guard $bufp@ /$buf + ( offend) dup $sp! dup $sp0! ( 0 offend) ! \ (unused) string stack underflow guard 0 cat$! false to-$garbage? false to-$gc-lock? \ false to-$args? \ not used in this implementation r> dstrings ! ; : make-$space ( buf.size #frames -- addr ) ( Allocate and initialize a string space with buf.size bytes available for the string buffer including the string stack, and with a string frame stack for frame description entries holding up to #frames. The size is rounded up to address alignment, and the buffer begins and ends with cell alignment. Return addr, the address of the string space. The standard word FREE with addr as input can be used to release the space. ) swap ( #frames buf.size) aligned locals| buf.size #frames | #frames /$frame-item * /$space-header + buf.size + cell+ allocate \ extra guard cell ( &$space ior) ABORT" ***MAKE-$SPACE can't ALLOCATE." ( &$space) buf.size over ! #frames over cell+ ! dup 0$space ; ( Uncomment the DEBUG lines for use with pfe dstrings loaded. ) \ dstrings @ 0= [IF] \ DEBUG DEFAULT-/$BUF DEFAULT-MAX-#$FRAMES make-$space dstrings ! \ [THEN] \ DEBUG : collect-$garbage ( -- collected? ) ( The garbage flag in the current string space structure is tested, and if there is garbage, it is collected, unless garbage collection is locked. Garbage strings are marked by null backward links. Nongarbage strings are bound by their backward links, pointing either to a string variable data field address or to an entry on the string stack [the deepest if there are several identical references]. Garbage collection fills the gaps occupied by garbage strings by moving any nongarbage strings to lower memory one at a time. The backward link of a string that is moved does not change, but the forward links, in at most one string variable and/or possibly several string stack entries, are updated to point to the new MSA. This algorithm is "fast" because the backward links make it unnecessary to scan a list of string variables, and because no string is moved more than once. It does, however, require a scan of the string stack for each string that moves, unless it is the current concatenation string, which is guaranteed not to be on the string stack. When there is no garbage to collect, this word returns false. If there is garbage, it throws an error when garbage collection is locked. Otherwise it collects the garbage and returns true. We have not thought it worth the effort to factor this long word, but maybe it would be. ) $garbage? 0= IF false EXIT THEN $gc-lock? ABORT" ***string garbage locked" false to-$garbage? $sp@ $sp0@ $breakp@ $bufp@ 0 locals| target next break $sp0 $sp | \ locate first garbage hole (no need to check off end, because \ we know there is a hole) BEGIN next @ ( backlink) WHILE \ not garbage, skip to next next >dbody next cell+ mcount@ + aligned to next REPEAT next to target \ garbage hole found BEGIN next break u< WHILE \ skip over garbage BEGIN next @ ( backlink) 0= next break u< and WHILE next >dbody next cell+ mcount@ + aligned to next REPEAT \ move and update until next garbage BEGIN next @ ( backlink) next break u< and WHILE \ not garbage and not off end \ We always update the forward link pointed to by the \ backward link. The backward link points to one of: a \ string variable, a deepest string stack entry, or CAT$. target cell+ next @ ( backlink) ! \ Unless the backward link points to CAT$, we scan the \ string stack and update copies of the old forward link. \ Note that if the backward link points into the string \ stack, we've already updated the deepest reference, which \ won't be found in the scan. next @ ( backlink) cat$@ <> IF $sp0 $sp ?DO i @ next cell+ = IF target cell+ i ! THEN cell +LOOP THEN \ move dynamic string, backlink thru alignment pad, to hole next cell+ mcount@ ( $len) >dbody aligned ( total.len) next target 2 pick cmove next over + to next target + to target REPEAT \ move and update REPEAT \ garbage hole found target $breakp! true ; : 0strings ( -- ) ( Set all string variables holding bound string values in string space to the empty string, and clear string space, including the string buffer, string stack, and string stack frames. NOTE: This word does not zero fill the string buffer. NOTE: If used for under-the-hood development, this word must be executed only when string space is in a valid state. ) $bufp@ ( next) $breakp@ locals| break next | BEGIN next break u< WHILE next @ ( backlink) IF &mempty next @ ( $variable) ! THEN next >dbody next cell+ mcount@ + aligned to next REPEAT dstrings @ 0$space ; \ : $garbage? ( -- cflag ) ( Leave true if there is garbage in the current string space. Not normally used by applications, since garbage collection is transparent. ) \ This word is defined in the STRUCTURES section. : $gc-off ( -- ) ( Disable garbage collection in the current string space. An error will be thrown if garbage collection is attempted. ) true to-$gc-lock? ; : $gc-on ( -- ) ( Enable garbage collection in the current string space. This is the default. ) false to-$gc-lock? ; : $gc-lock@ ( -- flag ) ( Fetch the dstring garbage collection "off" state. Intended for saving the off state for later restoration after a usage of $GC-ON or $GC-OFF. ) $gc-lock? 0= invert ; : $gc-lock! ( flag -- ) ( Set the dstring garbage collection "off" state according to flag. Intended for restoring the off state previously fetched by GC-LOCK@. ) to-$gc-lock? ; : $unused ( -- u ) ( Leave the number of bytes available for dynamic strings and string stack entries in the string buffer. ) $sp@ $breakp@ - ; : $room? ( size -- flag ) ( Leave true if there is enough aligned room in string space for size bytes starting at the break. ) $breakp@ + aligned $sp@ u<= ; : ?$room ( size -- ) ( Abort if there is not enough aligned room in string space for size bytes starting at the break, even after garbage collection. ) dup $room? 0= IF collect-$garbage ( collected?) drop $room? 0= ABORT" ***string space overflow" ELSE ( size) drop THEN ; : ?cat ( -- ) ( Abort if there is a concatenation in progress in string space. ) cat$@ ABORT" ***cat lock preventing string copy" ; : in-$buffer? ( msa -- flag ) ( Leave true if the mstring is in the string buffer. ) $bufp@ $breakp@ within ; \ *** STRING STACK SUPPORT : $sp-- ( -- ) -cell $sp +! ; : $sp++ ( -- ) cell $sp +! ; : --$sp@ ( -- [$sp]-[cell] ) $sp-- $sp@ ; : $sp@++ ( -- $sp ) $sp@ $sp++ ; : /$stack ( -- #bytes ) $sp0@ $sp@ - ; : ?$s-under ( #bytes -- ) /$stack swap u< ABORT" ***string stack underflow" ; : ($push) ( msa -- $: msa ) ( Pop an mstring address from the data stack and push it onto the string stack, with no check for stack room. A check that could trigger garbage collection must be done before using this word. If msa is not external, it must be supplied, after the check, from the string stack or a string variable, so it remains valid after a garbage collection. ) $sp@ cell- dup $sp! ! ; : $push-ext ( msa -- $: msa ) ( Pop an external mstring address from the data stack and push it onto the string stack after checking for room, invoking garbage collection if necessary. Not to be used with a dynamic string because a garbage collection can invalidate its address. NOTE: This word violates the normal rule that only ANS Forth strings should appear on the data stack. ) cell ?$room ($push) ; : $pop ($: msa -- s: msa) ( Abort if the string stack would underflow when popped. Otherwise pop the top of the string stack and push it onto the data stack. If the string is in the current string space and initially bound to the top of the string stack, mark it as garbage by setting its back link to NULL and set the garbage flag. NOTE: This word violates the normal rule that only ANS Forth strings should appear on the data stack. WARNING: If msa is the address of a bound string, it may become invalid at the next garbage collection. This can be avoided by sandwiching sections of code where this could occur between $GC-OFF and $GC-ON. ) cell ?$s-under $sp@++ dup @ dup in-$buffer? ( $sp a$ bound?) IF dup cell- rot ( a$ blfa $sp) over @ = IF ( a$ blfa) 0 swap ! true to-$garbage? ELSE ( a$ blfa) drop THEN ELSE nip THEN ; \ *** LOAD AND STORE : $constant ( "name" $: ext$ -- ) create $pop , DOES> ($: -- ext$ ) @ $push-ext ; : parse>$ \ compile: ( "ccc" char -- ) \ run: ($: -- ccc$ ) \ interpret: ( "ccc" char -- $: ccc$ ) ( Parse the input stream up to the first occurrence of char, which is parsed away, and store the string as an external measured string. If executing in compilation mode, append run-time semantics to the current definition that leaves the MSA on the string stack. In interpretation mode, leave the MSA on the string stack, where the stored copy, unlike PARSE>S, is required to be nontransient. ) ( char) parse state @ IF pad embed-m$ postpone sliteral postpone drop postpone $push-ext ELSE sm, -mcount $push-ext THEN ; \ see PARSE>$ for STATE: : $" ( "ccc<">" -- $: c$ ) [char] " parse>$ ; immediate : $` ( "ccc<`>" -- $: c$ ) [char] ` parse>$ ; immediate $" " $constant empty$ align here /MCOUNT allot 10 c, 1 over mcount! $push-ext $constant \n$ : $variable ( "name" -- ) \ name does ( -- addr ) create empty$ $pop , ; : $@ ( $var.dfa -- $: a$ ) ( Leave the MSA of the string held by the string variable. ) cell ?$room @ ($push) ; : $! ( $var.dfa $: a$ -- ) ( Store the string whose MSA is on the string stack in the variable whose DFA is on the parameter stack. NOTES: The only situation in which $! copies the string value is when it is a bound string already stored in another variable. In that case, the new copy is the one that is stored in the variable. In particular, external strings are not copied. If the string value held by the string variable on entry is a bound string that is also referenced deeper on the string stack, its back link is reset to point to the deepest string stack reference. If it is a bound string not deeper on the string stack and not identical to the input string, its back link is set to zero, making it garbage. If it is an external string, its MSA in the variable is simply written over by that popped from the string stack. ) cell ?$s-under ( var) dup @ ( old$) dup in-$buffer? 0= ( oldext?) $sp@ @ ( new$) dup in-$buffer? 0= ( newext?) 0 ( blfa) locals| blfa newext? new$ oldext? old$ var | oldext? newext? and 0= oldext? 0= new$ old$ = and 0= and IF \ new string not bound to variable \ Do old string first; it might make a garbage hole, more \ room in case new string is copied. oldext? 0= IF \ old string is bound 0 ( next) $sp0@ $sp@ cell+ ?DO i @ old$ = IF ( prev.next) drop i ( next) THEN cell +LOOP ( next) dup old$ cell- ( blfa) ! ( next) 0= IF true to-$garbage? THEN THEN newext? 0= IF \ new string is bound new$ cell- dup to blfa ( blfa) @ $sp@ $sp0@ within 0= IF \ new string bound to different variable, copy it ?cat new$ mcount@ ( len) dup >dbody ?$room \ maybe gc $breakp@ to blfa ( len) blfa cell+ mcount! $sp@ @ dup to new$ \ in case there was gc ( new$) >mbody blfa >dbody new$ mcount@ dup >r cmove blfa cell+ to new$ \ for storing into variable blfa >dbody r> ( len) + ( offend) dup aligned ( newbreak) dup >r ( offend newbreak) over - 0 fill r> ( newbreak) $breakp! THEN var blfa ! THEN THEN new$ var ! $sp++ ; \ *** STRING STACK : $drop ($: a$ -- ) ( Abort if the string stack would underflow when popped. Otherwise increment the string stack pointer, thus popping the string stack. If the input string is in the current string space and initially bound to the top of the string stack, mark it as garbage by setting its back link to NULL and set the garbage flag. ) $pop drop ; : $2drop ($: a$ b$ -- ) ( Drop the two topmost string stack entries, marking them as garbage if appropriate. ) $drop $drop ; : $2dup ($: a$ b$ -- a$ b$ a$ b$ ) ( Leave copies of the two topmost string stack entries. The string values are not copied. ) 2cells ?$s-under 2cells ?$room $sp@ dup @ swap cell+ @ --$sp@ ! --$sp@ ! ; : $depth ( -- u ) ( Leave the number of items on the string stack. ) /$stack cell / ; : $dup ($: a$ -- a$ a$ ) ( Leave a copy of the topmost string stack entry. The string value is not copied. ) cell ?$s-under $sp@ $@ ; : $swap ($: a$ b$ -- b$ a$ ) ( Exchange the two most accessible strings on the string stack. Throw an error if there are less than two strings on the stack. Neither string value is copied. ) 2cells ?$s-under $sp@ ( &s0) dup @ ( $0) over cell+ ( &s1) dup @ ( $1) locals| $1 &s1 $0 &s0 | $0 $1 <> IF $0 &s1 ! $1 &s0 ! $1 in-$buffer? IF $1 cell- ( backlink) @ &s1 = IF &s0 $1 cell- ( backlink) ! THEN THEN $2 in-$buffer? IF $0 cell- @ ( backlink) &s0 = IF &s1 $0 cell- ( backlink) ! THEN THEN THEN ; : $nip ($: a$ b$ -- b$ ) ( Drop the next to top item from the string stack. ) $swap $drop ; : $over ($: a$ b$ -- a$ b$ a$ ) ( Leave a copy of the next most accessible string stack entry on top of the string stack. The string value is not copied. ) 2cells ?$s-under $sp@ cell+ $@ ; : $tuck ($: a$ b$ -- b$ a$ b$ ) ( Copy the top string stack item just below the second item. The string value is not copied. ) $swap $over ; : $pick ( u $: au$ ... a0$ -- au$ ... a0$ au$ ) ( Copy the u-th string stack entry to the top of the string stack. The string value is not copied. Throw an error if the input string stack does not have at least u+1 items. ) ( u) cells dup cell+ ?$s-under $sp@ + $@ ; : $s> ($: a$ -- s: a.s ) ( Drop a$ from the string stack and leave it as an ANS Forth string a.s, without copying. WARNING: If a$ is a bound string, it may move or disappear at the next garbage collection, making a.str invalid. This can be avoided by sandwiching sections of code where this could occur between $GC-OFF and $GC-ON. ) $pop mcount ; : $s>-copy ($: a$ -- s: a.s ) ( Drop a$ from the string stack, copy it into data space as a measured string, and leave it as an ANS Forth string a.s. ) $pop mcount sm, ; : $s@ ($: a$ -- a$ s: a.s ) ( Leave the string stack unchanged, and leave the string body address and length on the data stack. NOTE: In earlier versions this was call $S@S. The trailing "S" is superfluous if it is understood that the only string format that usually appears on the data stack is the ANS Forth string format. WARNING: If a$ is a bound string, it may move at the next garbage collection, making a.s invalid. This can be avoided by sandwiching sections of code where this could occur between $GC-OFF and $GC-ON. ) cell ?$s-under $sp@ @ mcount ; : >$s ( a.s -- $: a$ ) ( Push the external ANS Forth string a.s onto the string stack, without copying the string value into the string buffer. It is an unchecked error if the Forth string a.s is not stored as an external measured string. WARNING: If the string value of a.s is actually in the string buffer and not external, the push operation may generate a garbage collection that invalidates its MSA. ) -mcount $push-ext ; : >$s-copy ( a.s -- $: a$ ) ( Copy the external string value whose body address and count are on the parameter stack into the string buffer and push it onto the string stack. Errors are thrown if there is an mcount overflow, if there is not enough room in string space, even after garbage collection, or if there is an unterminated string concatenation. The input external string need not exist as a measured string. WARNING: This word should not be used when the input string is a bound string because the copy operation may generate a garbage collection which invalidates its MSA. ) ?cat CHAR-COUNT-FIELD? [IF] ( len) dup MAX-MCOUNT u> ABORT" ***string count too large" [THEN] 0 ( blf) \ fill in back link field addr when gc-safe locals| blf len addr | len >dbody cell+ ?$room $breakp@ to blf blf cell+ ($push) \ forward link $sp@ blf ! \ back link len blf cell+ mcount! blf >dbody addr over len cmove ( &body) len + 0fill-aligned ( new.break) $breakp! ; : $. ($: a$ -- ) ( Display the string on the terminal. If the system implementation of TYPE has its output vectored, $. uses the same vector. ) $s> type ; : $type ($: a$ -- ) $. ; \ synonym, deprecated in favor of $. \ *** CONCATENATION ( When there is a string concatenation in progress, concatenation is the only basic string operation that can copy a string into the string buffer. This implementation leaves string space in a valid state after each piece is concatenated, with the break aligned after the concatenation so far, and a 0 fill between the end of the concatenation and the break. The concatenating string, called the cat$, is the last dynamic string in the string buffer. It is bound to a hidden string variable in the string space structure, and remains valid if a garbage collection occurs before concatenation ends. ) : cat$-$! ($: a$ -- ) cat$p $! ; : onto-cat$ ( a.s -- ) ( Assume a concatenation is in progress, and that there is room in string space to append the body of a.s to the cat$. Here a.s may be external and not assumed to exist as an mstring, or it may be the ANS Forth string representation of a bound mstring. Append the body and update the cat$ count and the string break. An error is thrown if there is an mcount overflow. ) cat$@ locals| cat$ delta source | cat$ mcount ( addr len) dup delta + CHAR-COUNT-FIELD? [IF] ( len') dup MAX-MCOUNT u> ABORT" ***string count too large" [THEN] ( len') cat$ mcount! ( addr len) + ( target) source over delta cmove ( target) delta + 0fill-aligned $breakp! ; : cat ($: a$ -- ) ( Append the string body to the end of the current cat$. If there is none, start one. An error is thrown if there is not enough room in string space even after a garbage collection, or if there is an mcount overflow. If garbage collection occurs, a$ remains valid even when it is in the string buffer. ) cat$@ IF \ continue concatenation $s@ ( addr len) ?$room \ any overestimate due to break ( addr) drop \ alignment stays < 2 cells $s> ( source delta) onto-cat$ ELSE \ start concatenation $s@ -mcount in-$buffer? 0= IF $s> >$s-copy THEN cat$-$! THEN ; : s-cat ( a.s -- ) ( Append the ANS Forth string body to the end of the current cat$. If there is none, start one. An error is thrown if there is not enough room in string space even after a garbage collection, or if there is an mcount overflow. S-CAT is most commonly used on external strings, not assumed to exist as mstrings. In contrast to CAT, garbage collection could invalidate a.s if it is a dynamic string in the string buffer. S-CAT can be used in that situation if garbage collection is turned off with $GC-OFF. ) cat$@ IF \ continue concatenation ( delta) dup ?$room \ any overestimate due to break ( s) onto-cat$ \ alignment stays < 2 cells ELSE \ start concatenation ( a.s) >$s-copy cat$-$! THEN ; : ENDCAT ($: -- cat$ | empty$ ) ( If there is no cat$, do nothing but leave the empty string. If there is, leave it as a string bound to the top of the string stack, and terminate concatenation, permitting normal copies into the string buffer. ) cat$@ IF cell ?$room cat$@ dup ($push) ( cat$) cell- ( lfa) $sp@ swap ! 0 cat$! ELSE empty$ THEN ; : parse-cat \ compile: ( "ccc" char -- ) \ run: ( -- ) \ interpret: ( "ccc" char -- ) ( Parse the input stream up to the first occurrence of char, which is parsed away. If executing in compilation mode, append run-time semantics to the current definition that concatenates the characters parsed from the string. Otherwise concatenate the characters. ) ( char) parse state @ IF postpone sliteral postpone s-cat ELSE s-cat THEN ; \ See PARSE-CAT for STATE: : cat" ( "ccc<">" -- ) [char] " parse-cat ; immediate : cat` ( "ccc<`>" -- ) [char] ` parse-cat ; immediate \ *** ARGUMENTS: I : $frame-depth ( -- u ) ( Leave the number of string frames currently on the string frame stack. ) $fsp0@ $fsp@ - /$frame-item / ; : $frame ( u -- ) ( Push the description of a string stack frame starting at the top of the string stack and containing u entries onto the string frame stack. Errors are thrown if the frame stack would overflow or if the depth of the string stack above the top frame, if there is one, is less than u. The value u = 0 is allowed. NOTE: The current implementation pushes u and the string stack pointer onto the frame stack, even if u = 0. ) $fsp@ $fbreakp@ = ABORT" ***too many string frames" $frame-depth IF $fsp@ @ ELSE $sp0@ THEN $sp@ - ( u avail.size) over cells u< ABORT" ***not enough strings for frame" $fsp@ cell- dup >r ! \ push u r> cell- $sp@ over ! \ push $sp $fsp! ; : ?$frame-under ( -- ) $frame-depth 0= ABORT" ***string frame stack underflow" ; : drop-$frame ( -- ) ( Drop the topmost string frame from the string frame stack and string stack. Errors are thrown if either stack would underflow or if the string frame does not begin at the top of the string stack. The case where the frame has zero entries on the string stack is handled properly. ) ?$frame-under $fsp@ dup @ ( frame-top) $sp@ <> ABORT" ***string frame not at top of string stack" ( $fsp) cell+ ( '#items) dup cell+ $fsp! @ ( #items) 0 ?DO $drop LOOP ; : #$args ( -- u ) ( Leave the number of entries in the topmost string frame. Throw an error if the frame stack is empty. ) \ $frame-depth 0= ABORT" ***string frame stack underflow" ?$frame-under $fsp@ cell+ @ ; : find-$arg ( s -- u true | false ) ( Assume there is at least one string frame on the string frame stack. The frame may contain zero items. If the ANS Forth string matches an element in the top string frame, leave its index u and true, else leave only false. The index of the top element is zero. ) $fsp@ dup @ ( first) swap cell+ @ ( first #items) 2dup cells + ( first #items limit) locals| limit #items next len addr | false BEGIN next limit <> WHILE addr len next @ mcount compare WHILE \ no match next cell+ to next REPEAT ( false) drop #items limit next - cell / - ( i) true THEN ; : th-$arg ( u -- $: arg$ ) ( Leave the u-th string in the topmost string frame, where the index u of the top element is zero. Throw an error if the frame stack is empty or if the top frame contains less than u+1 strings. ) ?$frame-under ( u) dup #$args u>= ABORT" ***not enough strings in top frame" ( u) cells $fsp@ @ + $@ ; \ *** ANS FORTH STRING EXTENSIONS: II : parse-area@ ( -- unparsed.s ) source ( &inbuf #inbuf) >in @ ( #parsed) /string ; : parse-area! ( unparsed.s -- ) ( addr len) source nip ( #tib) swap - >in ! ( addr) drop ; : bl-skip ( addr len -- addr+i len-i ) \ Wil Baden BEGIN dup WHILE over c@ bl <= WHILE 1 /string REPEAT THEN ; : bl-scan ( addr len -- addr+i len-i ) \ Wil Baden BEGIN dup WHILE over c@ bl > WHILE 1 /string REPEAT THEN ; : sfirst-word ( s -- word.s ) ( Leave the first word in the string s = [addr len]. If there is none, word.s = [addr+len,0]. ) bl-skip 2dup bl-scan nip - ; : safter ( s s' -- after.s) ( Assume s' to be a substring of s. Leave after.s, the substring of s that follows s'. The calculation is after.s = [addr'+len',len-len'-addr'+addr]. The commented definitions below have also been tested. ) \ + ( addr'+len') >r + ( addr+len) r@ - r> swap ; \ ( len') >r -rot + over - r> ( addr' addr+len-addr' len') /string ; \ + 2 pick - ( addr len addr'+len'-addr) /string ; + -rot + ( addr'+len' addr+len) over - ; : sparse-word ( s -- after.s word.s ) ( Leave word.s as a substring of s, the first word token in s. Leave after.s as the substring of s following word.s, skipping the first of any leading whitespace characters, in line with the ANS Forth notion of parsing. If after.s is empty, its address is just after the end of s. If word.s is empty, its address is also just after the end of s. ) 2dup sfirst-word 2dup 2>r safter ( after.len) dup IF 1 /string THEN 2r> ; : parse-word ( -- word.s ) parse-area@ sparse-word 2swap parse-area! ; : preparse ( -- word.s ) \ Leo Wong ( Get the next word without advancing the input stream. ) >in @ >r parse-word r> >in ! ; : skip-word ( -- ) ( When the parse area is not all white space, parse away the next word. Otherwise empty the parse area. ) parse-word 2drop ; : next-word ( -- word.s ) \ Wil Baden ( Parse the next word from the input stream across lines. If word.s is empty, at most whitespace was found. A parsing implementation of Wil Baden's word. NOTE: The implementation does not echo CR's when the lines in question are being entered interactively at the terminal. ) BEGIN parse-word dup IF EXIT THEN refill WHILE 2drop REPEAT ; : num>s ( n -- n.s ) ( Convert the number according to the current BASE to a transient ANS Forth string. The following code is pretty traditional. ) <# dup >r abs s>d #s r> sign #> ; : [n] \ compile: ( n -- ) \ run: ( -- n ) ( This word is intended for embedding in an immediate word, to be executed in compilation mode. It does, however, have interpretation semantics: [ n -- n ] Based on PARSE-N in Leo Wong's TOOLS.F. ) num>s evaluate ; \ *** SCASE, SOF, ENDSOF, ENDSCASE ( These words are replacements for CASE, OF, ENDOF, and ENDCASE where the input and the cases to be matched are ANS Forth strings of the form [addr len]. The SCASE structure is supposed to behave in strict analogy to the normal CASE structure. The implementation here is a direct imitation of the example in ANSI X3.215-1994, section A.3.2.3.2. As in that example, the stack notation acts as if the control-flow stack is the data stack, with which the code has to be compatible. ) 0 constant SCASE immediate \ zero OF count : SOF ( #of -- orig #of+1 / s s' -- | s ) 1+ ( #of+1) >r \ get out of the way of possible control-flow postpone 2over postpone compare postpone 0= postpone IF ( C: orig) postpone 2drop \ discard case string r> ; immediate : ENDSOF ( orig1 #of -- orig2 #of ) >r postpone ELSE r> ; immediate : ENDSCASE ( orig1 ... orign #of -- / s -- ) postpone 2drop ( #of) 0 ?DO postpone THEN LOOP ; immediate \ *** ARGUMENT INTERPRETER : cat-$arg ( u -- ) th-$arg cat ; : [cat-$arg] ( u -- ) [n] postpone cat-$arg ; \ just to save typing : skip&postpone postpone skip-word postpone postpone ; immediate : compile-with-$args ( -- ) ( A factor in $ARGS{, to be executed only in compilation mode, with an argument string frame already formed. It works around the absence of a standard, patchable INTERPRET. We got the basic idea from Leo Wong's implementation of MANY: in TOOLS.F. The method is to examine each word in the input stream, across lines up to a terminating ";", to intercept named string arguments and compile code that handles the run-time argument. Nonargument words are to be compiled as usual. For nonparsing words, this can be done by EVALUATEing [in compilation mode] the string found for the word. That doesn't work for parsing words because the parse area terminates at the end of the word; so the word doesn't see what it's supposed to parse. By POSTPONEing it instead, the interpreter executes it without switching the parse area. This has the disadvantage that every parsing word allowed to occur must be handled explicitly, most easily with case sensitivity. This word leaves the terminating semicolon as the next word token in the parse area, ready to be handled by the host system interpreter. It compiles code to drop the run-time argument frame, but does not drop the compile-time argument frame. NOTE: The C-primitive version of $ARGS{ in the pfe DSTRINGS-EXT environment uses pfe's built-in methods for patching the interpreter, and does not suffer the disadvantages mentioned above. ) BEGIN preparse 2dup 2>r s" ;" compare WHILE r@ ( len) 0= IF 2r> ( empty.s) 2drop refill 0= ABORT" *** missing ;" ELSE SCASE 2r@ ( word.s) s" (" SOF skip&postpone ( ENDSOF s" ($:" SOF skip&postpone ($: ENDSOF s" \" SOF skip&postpone \ ENDSOF s` cat"` SOF skip&postpone cat" ENDSOF s" cat`" SOF skip&postpone cat` ENDSOF s` s"` SOF skip&postpone s" ENDSOF s" s`" SOF skip&postpone s` ENDSOF s` $"` SOF skip&postpone $" ENDSOF s" $`" SOF skip&postpone $` ENDSOF s` ."` SOF skip&postpone ." ENDSOF s` ABORT"` SOF skip&postpone ABORT" ENDSOF skip-word ( word.s) find-$arg IF ( index) [cat-$arg] ELSE 2r@ ( word.s) evaluate THEN 0. ( dummy.word.s) ENDSCASE 2r> 2drop THEN REPEAT 2r> ( ";".s) 2drop \ ";" remains in the input stream postpone drop-$frame ; : args>$frame ( " ... }" delim.s -- $: arg1$ ... argn$ ) ( Copy word tokens from the input stream into string space across lines, up to but not including the first occurrence of the body of delim.s as a word token. Leave the input stream positioned with delim.s parsed away. ) $depth >r ( delim.s) 2>r BEGIN next-word dup 0= ABORT" ***$ARGS{ not terminated." 2dup 2r@ compare WHILE >$s-copy REPEAT 2drop 2r> 2drop $depth r> - $frame ; \ *** ARGUMENTS: II : $ARGS{ \ compile: ( "arg1 arg2 ... argn <}>" -- ) \ run: ($: arg1'$ arg2'$ ... argn'$ -- ) ( Immediate and compilation-only. Copy the argument strings across lines to the string buffer, push them onto the string stack with argn the most accessible, and make them into the top compile-time string stack frame. Compile the run-time code to make an argument frame out of the n most accessible run-time string stack entries. Start intercepting the input stream across lines. For any argument word token, compile run-time code that concatenates the corresponding string in the run-time frame. For any nonargument word token, compile its normal run-time semantics. At the semicolon terminating the definition, drop the compile-time argument frame, compile code to drop the run-time argument frame, perform the normal system semicolon semantics, and stop intercepting the input stream. The code between $ARGS{ ... } and the terminating semicolon is not allowed to make a net change in the string stack depth, because that would interfere with the automatic dropping of the string argument frame at the semicolon. Syntax for defining a string macro GEORGE: ) \ : george ($: a$ b$ c$ -- ) \ $ARGS{ arg1 arg2 arg3 } \ cat" This is arg1: " arg1 cat" ." ENDCAT $. ; ( The blank following the last argument is required. For a macro with no arguments, $ARGS{ } does nothing but add useless overhead and should be omitted. Two of the arguments in this example are ignored and could have been left out. Note that ENDCAT would not be legal in this word without something like $. to remove the concatenated string from the string stack before the terminating semicolon. It is normal to use a $ARGS{ } word as a step in a concatenation that is terminated elsewhere. Sample syntax using the string macro GEORGE: $" bill" $" sue" $" marie" george The resulting display is: This is arg1: bill. NOTE: Macro argument labels must be distinct from each other and from any local labels that appear in the same definition, and there is no check for that. NOTE: At the moment the semantics of $ARGS{ is undefined before DOES>. ) state @ 0= ABORT" ***Use only in definitions." s" }" args>$frame #$args [n] postpone $frame \ true to-args? \ NOT USED in this implementation compile-with-$args \ false to-args? \ NOT USED in this implementation drop-$frame ; immediate