( Title: Measured strings File: mstrings.fs Author: David N. Williams License: LGPL Version: 0.8.5 Test file: mstrings-test.fs Change log: mstrings.log Revised: February 27, 2011 Some of this file is derived from Wil Baden's ToolBelt, which we believe to be in the public domain. For the sake of the LGPL, any part of this file not under another copyright is Copyright [C] 2002, 2007, 2008, 2010, 2011 David N. Williams This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or at your option any later version. This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. If you take advantage of the option in the LGPL to put a particular version of this library under the GPL, the author[s] would regard it as polite if you would put any direct modifications under the LGPL as well, and include a copy of this request near the beginning of the modified library source. A "direct modification" is one that enhances or extends the library in line with its original concept, as opposed to developing a distinct application or library which might use it. This library is consolidated from the mstrings collection. It is mostly ANS Forth compatible up to case dependence. The code is intended to be character clean. Please see the file mstrings.html for terminology and more word specifications. ) decimal s" parsing.fs" INCLUDED \ *** TABLE OF CONTENTS (* 0 UTILITY WORDS 1 STACK REPRESENTATION mcount -mcount /m 2 M+ WORDS m+room? m+ 3 M! WORDS mroom? m! 4 COMMA WORDS m, m,s ?mcontig-data m+, 5 NULL TERMINATION null-m+ null-m+, 6 MULTILINE INPUT eol-s upto-m! upto-m+ m!" m!` m+" m+` upto-m,s m,s" m,s` m"" m" m` s-upto-m+, s-upto-m,s |s|-upto-m,s NOT IMPLEMENTED {}upto-m! {}upto-m+ s-upto-m! s-upto-m+ upto-m+, m+," m+,` {}upto-m,s {}upto-m+, s-upto-m+, |s|-upto-m+, *) \ *** 0 UTILITY WORDS [UNDEFINED] cell- [IF] : cell- ( addr -- addr-cell ) [ 1 cells ] literal - ; [THEN] [UNDEFINED] u>= [IF] : u>= ( u1 u2 -- flag ) u< 0= ; [THEN] [UNDEFINED] 3drop [IF] : 3drop ( x y z -- ) 2drop drop ; [THEN] [UNDEFINED] chars+ [IF] : chars+ ( x n -- x+c[n] ) chars + ; [THEN] [UNDEFINED] c-allot [IF] : c-allot ( n -- ) chars allot ; [THEN] \ *** 1 STACK REPRESENTATION (* Note that -MCOUNT, the inverse of MCOUNT, assumes that s is stored as an mstring. *) \ These will be redefined if dstrings is loaded later. [UNDEFINED] /mcount [IF] \ dstrings not loaded : mcount ( m -- s ) dup @ swap cell+ swap ; : -mcount ( s -- m ) drop cell- ; [THEN] : /m ( len -- c[len]+cell ) chars cell+ ; \ *** 2 M+ WORDS (* Note that a sequence of mstring concatenations can be initialized by invoking 0 !. *) : ( s m -- ) (* Append the fstring to the mstring, no checks. *) 2dup 2>r mcount chars+ swap cmove 2r> +! ; : m+room? ( len mbuf /mbuf -- len mbuf flag ) (* The flag is the value of the predicate: c[len + count] + cell <= /mbuf. *) over @ ( /mbuf count) 3 pick + /m u>= ; : m+ ( s mbuf /mbuf -- flag ) (* Append the fstring body to the mstring in the mbuffer, with overflow checking. Note that there is an ANS Forth Double-Number word with the same name. *) m+room? IF true ELSE 3drop false THEN ; \ *** 3 M! WORDS [UNDEFINED] [IF] \ dstrings not loaded : ( s a-addr -- ) (* Store the fstring as an mstring at a-addr, without checking for room. *) 0 over ! ; [THEN] : mroom? ( len mbuf /mbuf -- len mbuf flag ) (* The flag is the value of the predicate: c[len] + cell <= /mbuf *) 2 pick /m u>= ; : m! ( s mbuf /mbuf -- flag ) (* Store the fstring as an mstring with overflow checking. *) mroom? IF true ELSE 3drop false THEN ; \ *** 4 COMMA WORDS [UNDEFINED] m,s [IF] \ dstrings not loaded : m,s ( s -- s' ) (* Store the fstring as an mstring in data space. *) align here >r ( len) dup /m allot r@ r> mcount ; [THEN] : m, ( s -- ) m,s 2drop ; : ( s m -- ) (* Same as M+, below, no contiguity check. *) over c-allot ; : ?mcontig-data ( m -- ) mcount chars+ here <> ABORT" ***data space not contiguous with mstring" ; : m+, ( s m -- ) (* Append the fstring to the mstring in data space after checking for contiguity. *) dup ?mcontig-data ; \ *** 5 NULL TERMINATION (* These words append a null character to an mstring, without changing its count. Assuming no other embedded nulls, the resulting string body is a C string when the C size of a char is the same as 1 CHARS. The action of NULL-M+, agrees with its name when the next available slot in data space is contiguous with the end of an mstring, and that is not checked. *) : ( m -- ) mcount chars+ 0 swap c! ; : null-m+ ( mbuf /mbuf -- flag ) over @ /m u> IF mcount chars+ 0 swap c! true ELSE drop false THEN ; : null-m+, ( -- ) 0 c, ; \ *** 6 MULTILINE INPUT (* The constant EOL-S leaves the fstring consisting of the host eol. It is stored in memory as an mstring. Run the file mstrings-test.fs to test whether the value below is correct. *) 1 , here 10 c, 1 \ unix \ 1 , here 13 c, 1 \ mac \ 2 , here 13 c, 10 c, 2 \ dos 2constant eol-s \ fstring for host eol : upto-m+ ( "" mbuf /mbuf char -- room? found? ) (* Append the input stream across lines, up to the first occurrence of char, to the mstring in the mbuffer, including the host eol for line ends. The "room" flag is true when the mbuffer is sufficient, and the "found" flag is true when the delimiting character is found in the input stream before a REFILL failure. When both flags are true, the input stream is left with the delimiting character parsed away. When one or both of the flags is false, the input stream is positioned just after the last character stored, if that was within a line, or at the beginning of the next line, if that was the last character of the host eol sequence. Only full fragments within a line are stored, up to the delimiting character or the end of the line, whichever comes first. *) LOCALS| delim /mbuf mbuf | BEGIN parse-area@ delim separate 0= WHILE \ delim not found mbuf /mbuf m+ 0= IF false false EXIT THEN eol-s mbuf /mbuf m+ 0= IF empty-parse-area false false EXIT THEN ?emit-cr refill 0= IF true false EXIT THEN REPEAT mbuf /mbuf m+ IF parse-area! true true ELSE 2drop false true THEN ; : upto-m! ( "" mbuf /mbuf char -- room? found? ) 0 3 pick ! upto-m+ ; ( "" mbuf /mbuf -- flag ) : m!" [char] " upto-m! and ; : m!` [char] ` upto-m! and ; : m+" [char] " upto-m+ and ; : m+` [char] ` upto-m+ and ; : upto-m,s ( "" char -- s ) (* Store the string across lines, up to the first occurrence of char, as an mstring in data space, including the host eol for line ends. Leave its fstring representation. Leave the input stream positioned just after the trailing char. Throw an error if char is not found. *) align here ( m) >r 0 ( len) , BEGIN ( char) dup parse ( len) dup c-allot r@ parse-area-empty? WHILE \ char not found eol-s ( len) dup c-allot ( eol.s) r@ ?emit-cr refill 0= ABORT" ***terminating character not found" REPEAT ( char) drop r> mcount ; : m,s" ( "<">" -- s ) [char] " upto-m,s ; : m,s` ( "<`>" -- s ) [char] ` upto-m,s ; : m"" \ compile: ( "" char -- ) \ run: ( -- s ) (* Throw an error if not compiling. The interpretive version of this word is not implemented. When compiling, store the input stream across lines into data space as an mstring according to the specification for UPTO-M,S, and append code to the definition which leaves its fstring representation on the data stack. Throw an error if the delimiting character is not found in the input stream. The code below is based on the idea in the gforth code for SLITERAL. *) state @ 0= IF -14 throw THEN >r postpone AHEAD r> upto-m,s 2>r align postpone THEN 2r> postpone 2literal ; : m" \ compile: ( "<">" -- ) \ run: ( -- s ) [char] " m"" ; immediate : m` \ compile: ( "<`>" -- ) \ run: ( -- s ) [char] ` m"" ; immediate : s-upto-m+, ( "" pat.s m -- ) (* Assume that pat.s is not empty. Concatenate the input stream across lines, up to the first occurrence of the characters in pat.body, onto the data space mstring m, including line ends consisting of the host eol. Leave the input stream positioned just after the trailing character of the copy of pat.body found in the input stream. Abort if the next available data space is not contiguous with m, or if the pattern is not found *) LOCALS| m #pat &pat | m ?mcontig-data BEGIN parse-area@ &pat #pat s-separate ( pa.s false | after.s before.s true) 0= WHILE \ s not found ( pa.s) m eol-s m ?emit-cr refill 0= ABORT" ***terminating string not found" REPEAT ( before.s) m ( after.s) parse-area! ; : s-upto-m,s ( "" pat.s -- lines.s ) (* Assume that pat.s is not empty. Store the input stream across lines, up to the first occurrence of the pattern body, as an mstring in data space, including line ends consisting of the host eol. The fstring representation of the mstring is lines.s. Leave the input stream positioned just after the trailing character of the copy of pat.body found in the input stream. Abort if the pattern is not found. *) align here 0 ( mlen) , dup >r s-upto-m+, r> mcount ; : |s|-upto-m,s ( "" pat.s -- lines.s ) (* Assume that pat.s is nonempty. Store the input stream across lines as an mstring in data space, up to the first occurrence of the pattern body, delimited by whitespace or the beginning or end of a line, minus the last of any whitespace characters immediately preceding it. Line ends consisting of the host eol are included in the stored mstring. The fstring representation of the mstring is lines.s. Leave the input stream positioned just after the pattern and the first of any immediately trailing whitespace characters. Abort if the whitespace delimited pattern is not found. *) align here 0 ( mlen) , LOCALS| m #pat &pat | BEGIN &pat #pat m s-upto-m+, parse-area@ drop #pat chars - #pat ( pat.s') source white-delimited? 0= WHILE &pat 1 m \ 1st pat char >in @ #pat - 1+ >in ! \ restore pat minus 1st char REPEAT >in @ #pat <> m @ 0<> and IF -1 c-allot -1 m +! THEN >in++ m mcount ;