Path: utzoo!utgpu!water!watmath!clyde!att-cb!att-ih!pacbell!ptsfa!ames!nrl-cmf!mailrus!tut.cis.ohio-state.edu!ut-sally!utah-cs!thomson From: thomson@utah-cs.UUCP (Richard A Thomson) Newsgroups: comp.lang.forth Subject: Amiga FORTH Newsletter V1, #2 Message-ID: <5327@utah-cs.UUCP> Date: 8 Mar 88 07:02:34 GMT Reply-To: thomson@cs.utah.edu.UUCP (Richard A Thomson) Organization: University of Utah CS Dept Lines: 1648 Amiga FORTH Newsletter Volume 1 Number 2 March 8th, 1988 [ I'm posting just the contents of AFN V1, #2 along with the first document since the total length is 157K. If you wish to obtain the whole thing, you can contact me (Rich Thomson), or Marcus Gabriel. Comments to AFN V1 numbers 1 and 2 will appear in AFN V1, #3 -- RT ] Contents: Introduction Forth-83.Txt (Offered but not included, 200114 bytes) FStrings Package Files: FStrings.Txt (56796 bytes) FStrings.Scr (26432 bytes) FSTest.Scr (28543 bytes) MFStrings.Scr (27381 bytes) GStrings Proposal File: GStrings.Pro (10981 bytes) WARNING CONCERNING THE MULTI-FORTH ASSEMBLER AND MFSTRINGS.SCR THANK YOU Richard Thomson Sources: marcus@newton.physics.purdue.edu (Marcus D. Gabriel) East Coast Forth Board (ECFB) (703-442-8695)(SYSOP: Jerry Shifrin, nice PCBoard) ___________________________________________________________________________ Subject: FStrings Package by George T. Hawkins From: marcus@newton.physics.purdue.edu (Marcus D. Gabriel) To One and All: Allow me to begin by explaining the one offered file which is not included here. The file Forth-83.Txt is the 89 page "FORTH-83 STANDARD," a publication of the FORTH STANDARDS TEAM, and it is reproducible in whole or in part. Their address is FORTH STANDARDS TEAM P.O. BOX 4545 MOUNTAIN VIEW, CA 94040 USA If you would like this document, please e-mail a note to this effect to marcus@newton.physics.purdue.edu (Marcus D. Gabriel), and I will respond either by sending it to you e-mail, or if there is sufficient response and when Richard Thomson has time to create a completed mailing list, I will propagate it to the group. Last January I downloaded from the ECFB George T. Hawkins' FSTRINGS package, adapted it to Multi-Forth in January, and put some more work into it in late February. The file MFStrings.Scr is the result, more on this file later. The file FStrings.Txt is George Hawkins' original narrative on his FSTRINGS package, and this file is highly recommended reading. The File FStrings.Scr is George Hawkins' original source file (FORTH-83 Standard) for the implementation of the abstract string operators of FSTRINGS, although I have converted it from a block file to a stream file. This file is included for three reasons: i) George Hawkins' code is enjoyable and, I believe, instructive to read. ii) For those with JFORTH or MVP Forth, this will give additional aid in adapting FSTRINGS. iii) You may wish to adapt this package differently than myself, and thus you have the original source. The file FSTest.Scr is George Hawkins' Test/Validation file for FStrings.Scr or, in this case, for MFStrings.Scr also. It has been minimally modified in order to compile under Multi-Forth, and you should have no difficulty in seeing where I have made my changes, there documented or obvious. This file will help if you are adapting or re-adapting FSTRINGS. The file MFStrings.Scr is my adaptation of FSTRINGS to Multi-Forth. I have coded all of the words that George Hawkins recommened and many others besides these. I have defined two words for "dynamically" allocating a temporary string storage buffer, eliminated some internal words through the use of local variables, factored some words of generic interest beyond this package, etc. If you do not like the way I did something, please feel free to change it or bring it up for discussion. I must admit that upon review, in some instances, but not all, I may have gotten carried away with the use of local variables. In these cases, I believe I was trying to make myself understand each and every word without skimming over them by simply reworking them, even to excess. C'est la vie :-). If you have questions or problems, you can e-mail me a note if you do not think it is of generic interest to the Amiga FORTH Newsletter, otherwise please feel free to participate. The file GStrings.Pro (I gather the pun was intended) was posted to the ECFB by Rj Brown, and it is a proposed incremental improvement over FSTRINGS. The abstract string operators would stay the same, it is the implementation of these that would change. In essence, at the ECFB, they are discussing lists and list processing as applied to strings. See THE ART OF COMPUTER PROGRAMMING, FUNDAMENTAL ALGORIHMS, by Donald Knuth. I am sure that Richard Thomson or others of the Amiga FORTH Newsletter could supply additional or better references. I include this file as an example of where we might take this discussion of strings, as a group, if we wished, although it is not necessarily specific to the Amiga. On the other hand, one does use strings with the Amiga. WARNING AND SOLUTION -------------------- The Multi-Forth Assembler has an interesting "feature." On page 43 of Chapter 21, see CMPM Aa Ab size CMPM, test:<(Ab)-(Aa)> whereas in fact the test <(Aa)-(Ab)> is performed, opposite of CMP, , CMPA, , or the Motorola convention. However note that on page 35 of Chapter 21, -TEXT is coded correctly, that is, given that CMPM, functions oppositely from the documentation. Note that I am refering to an old manuel, but after calling CSI, I gather that the new manuel has the same discrepency. At least they, and you, have been notified. Find CMPM, in the the Multi-Forth Assembler and change the original CMPM, : CMPM, ( Aa\Ab\sz -- MEMORY MEMORY COMPARE) B108 ,OP 3 AND !SIZE 6 SCALE SWAP ?AREG OR SWAP ?AREG SWAP 9 SCALE OR R 2SWAP R> B108 ,OP 3 AND !SIZE 6 SCALE SWAP ?AREG OR SWAP ?AREG SWAP 9 SCALE OR character (i.e., ASCII decimal value 13) is encountered. The two primitives are: : $KEY ( string -- ) \ "s-key" \ Reads keyboard characters into "string" until . and : $$KEY ( string -- ) \ "s-s-key" \ Same as $KEY except input echoed back to console. The string data type may be compiled directly into the current dictionary via the use of the "$," word once it is placed upon the stack via the $LIT word. Alternatively, the ",$" word may be used to read a string from the Forth input stream and place it into the current dictionary. Thus the two segments of code: $LIT "This is a string" $, and ,$ *This is a string* are equivalent. The source code description of the above discussed words follows: : $, ( string -- ) \ "s-comma" \ Compiles a string into the dictionary. : ,$ ( -- ) \ "comma-s" \ Compiles the following word string into the dictionary. A point to keep in mind with $LIT is that only one string literal may be defined concurrently. That is, $LIT returns a "standard" string reference address, so if, for example, "$LIT 'hello' $LIT 'there'" is typed in, then the two string references left on TOS both "point to" the string "there". In general, this type of situation should not occur. 6.2. STRING DEFINITION Once a string literal capability is defined, then string data types may be defined. The following subsections discusses the two possible cases of string constants and string variables. 6.3. STRING CONSTANTS No provision is made for implementing string constants. The primary reason is that since the string is implemented as an abstract data type, then a string constant is no different in reference from a string variable. Thus the only meaning of a string constant is that assignments to string constants would be caught at compile (or run) time and the corresponding compilation (or run) aborted. This would have the negative effect of introducing additional complexity in the package with a possible run time speed penalty as well. Since the author felt this direction to be counter to his perceived "philosophy" for Forth, the capability was not implemented. In a like fashion, no error checking has been provided either. The user can, of course, modify this package as he/she desires to introduce a constant and/or error checking capability if so desired. In some Forth systems I've seen, it appears that the only difference between a string constant and a string variable is that the former allows initialization (i.e., preassigning a value) and the latter does not. In such systems a reference to a string constant or a string variable *both* leave identical stack data types. Thus in these systems there is nothing (save the programmer's own good judgment) to prevent the run time alteration of string constants. (Note that this is entirely different from the use of standard Forth scalar constants and variables where the former leave a value and the latter leave an address on the stack.) Such usage of the terms "constant" and "variable" seem, to the author at least, to be artificial, contrived, and misleading. That is, a string constant, in such systems, is not a constant but, rather, an initialized variable - and this is merely a minor syntactical distinction. The implementation given here allows allocating a string variable in either an initialized or a NULL state with equal facility and avoids the use of string constants altogether, since (as previously mentioned) this would invariably extract a performance penalty. For example, if a string constant were provided, then the appropriate definition might be something like: $LIT "This is a string constant" $CON STR-CON However this same capability is provided with an initialized variable via: CREATE STR-CON ,$ "This is a string constant" I fail to see much difference between the two nor any advantage to the use of a "constant". 6.4. STRING VARIABLES String variables are "CREATE"d just as are regular Forth variables. This is due to the implementation of the abstract string data type as a pointer/address. For example, to create the string variable SV1 with the initial value: " Hello world!" - one could code: $LIT "Hello world!" CREATE SV1 $, or CREATE SV1 ,$ "Hello world!" or CREATE SV1 12 $ALLOT $LIT "Hello world!" SV1 $$! As shown, if a string variable is needed, but its initial contents are not known, then the variable (plus the dictionary space required) can be defined via the CREATE and $ALLOT words. For example, to reserve a string variable of 97 characters, one could code: CREATE SV1 97 $ALLOT or 97 CREATE SV1 $ALLOT The use of $ALLOT not only reserves the necessary dictionary space but initializes the string to NULL (i.e., zero length) as well. This is important since a null string is different from an unintialized string (and the string package provided here incorporates null strings). All strings are initialized with this package. The source code definition of $ALLOT is: : $ALLOT ( number -- ) \ "s-allot" \ Reserves "number" characters in the dictionary for a string \ and sets the string to NULL. Although initialization is provided, error checking is not. This means that it is the responsibility of the Forth programmer to ensure that the dynamic (i.e., run time) size of strings never exceed their initially (implicitly or explicitly) allocated sizes. (Of course, error checking can be added if so desired, at a performance cost.) 6.5. STRING REFERENCE Once strings are defined as constants or immediately available as literals, then they may be referenced. "Referencing", in this sense, means an immediate operation with a string without altering its value. Four basic reference operators are provided: 1) determining the length of a string, 2) printing a string, 3) "importing" a string from another package (e.g., I/O), and 4) "exporting" a string to another package. String length is determined via: : $LEN ( string -- length ) \ "s-length" \ Returns the length of string. Printing a string is accomplished via: : $. ( string -- ) \ "s-dot" \ Prints a string. Their actions should be self-explanatory. Since almost all Forth packages of which I am aware treat the data portion of a string via contiguous memory locations and since it may be necessary to "send/export" a string to some other vendor provided Forth package (e.g., I/O) and/or to "receive/import" a string from some other vendor provided package - an import/export ability is provided. Note that exporting and importing strings are necessary (but hopefully temporary) evils. That is, once the "string data type" has been established within the framework of the Forth standard and once all packages recognize (and act appropriately upon) this data type - then the export/import artifact will no longer be needed. The definitions of import and export are: : $IMPORT ( addr length string -- ) \ "s-import" \ Imports a string from addr, length. and : $EXPORT ( string addr -- ) \ "s-export" \ Exports a string to addr. Note that is is the user's responsibility when using $EXPORT to ensure that enough contiguous memory is available/reserved at "addr" to hold the data portion of the string. 6.6. BASIC STRING MANIPULATION The use of the definition "basic string manipulation" (as was the previous case of string reference) is somewhat arbitrary. What I am trying to capture here are those essential actions necessary for *any* string package. Two basic string manipulation operators are provided; they are: : $NULL ( string -- ) \ "s-null" \ Forces a string to NULL. and : $$+ ( string1 string2 -- ) \ "s-s-plus" \ Adds (concatenates) string1 onto string2. Again, the actions should be self-explanatory. 6.7. STRING/CHARACTER FETCHES/STORES Following the Forth fetch/store convention, a number of operators are provided which perform fetches/stores across strings/characters. They are: : $C! ( char string index -- ) \ "s-c-store" \ Stores "char" in "string" at position "index". : $C@ ( string index -- char ) \ "s-c-fetch" \ Fetches "char" from position "index" in "string". : $$! ( string1 string2 -- ) \ "s-s-store" \ Stores string1 in string2. Original contents of string2 are \ lost. and : $$@ ( string1 string2 index length -- ) \ "s-s-fetch" \ "string2" is built/fetched using the "string1" substring \ starting at position "index" for "length" characters. The $C@ and $LEN primitives provide, essentially, the capability to "examine" any string in terms of size/content, and the $C! primitive provides the primary method of altering a string. The $$@ operator is, in effect, the "substring" operator used in other languages. When using $$@, the original contents of string2 are lost. 6.8. STRING INSERTIONS Often it is necessary to dynamically "add" information into a string; the string insertion operators provide this capability, and they are: : $CINS ( char string index -- ) \ "s-c-ins" \ Inserts "char" in "string" with "char" at position "index". \ Remaining characters, if any, are moved right. and : $$INS ( string1 string2 index -- ) \ "s-s-ins" \ Inserts "string1" into "string2" starting at position "index" \ of "string2". Remaining characters, if any, are moved right. When a character, or string, is inserted into another string, the original character(s) - if any - beginning at the point of insertion are right shifted past the inserted character(s). Note that: "string1 string2 DUP $LEN $$INS" is equivalent to: "string1 string2 $$+" The latter usage is to be preferred since it is simpler to understand and faster in execution. 6.9. STRING DELETIONS The converse of string insertion (just presented) is string deletion. One general purpose and several special purpose string deletion primitives are provided. The general purpose string deletion primitive is: : $DEL ( string index number -- ) \ "s-del" \ Deletes "number" characters from "string" starting at \ position "index". Although any of the special purpose string deletion primitives presented next can be built using $DEL, $C@ and $LEN, they are provided based on both the perceived frequency of use and in order to standardize naming conventions. They are: : $|TRIM ( string number -- ) \ "s-left-trim" \ Deletes "number" characters from the left/start of "string". : $TRIM| ( string number -- ) \ "s-trim-right" \ Deletes "number" characters from the right/end of "string". : $|SPACES ( string -- ) \ "s-left-spaces" \ Trims leading spaces from string. and : $SPACES| ( string -- ) \ "s-spaces-right" \ Trims trailing spaces from string. 6.10. STRING REPLACEMENTS Having now considered string insertions and deletions, string replacements are now addressed. A single, general purpose string replacement primitive is provided; it is: : $$REP ( string1 string2 index -- ) \ "s-s-rep" \ Replaces current substring in "string2" with "string1" \ starting at position "index" of "string2". 6.11. STRING ROTATIONS The ability to circularly rotate strings left/right is also provided via four primitives. The single shift string primitives are: : $ROT ( string -- ) \ "s-right-rote" \ Rotates a string right one character. The multiple shift string primitives are: : $<>ROT ( string number -- ) \ "s-many-right-rote" \ Rotates "string" right "number" characters. Note that neither $<>ROT are necessary since $<>ROT can be similarly defined. The rationale behind providing $<>ROT was simply that of execution time efficiency. 6.12. STRING COMPARISONS The determination of string ordinality (i.e., the relative lexicographic order of two strings) can be satisfied with a single, all purpose, string comparator (i.e., "$$COMPARE") which compares two strings and returns -1, 0, or +1 depending on whether the first string is less than, equal to, or greater than the second string, respectively. This implementation treats the null string as having the lowest possible ordinality (and there is valid mathematical reason for doing so). Further, the ordinality of a string is based upon the ASCII collating sequence. Although the "$$COMPARE" primitive is adequate to resolve all questions concerning string ordinality, from a coding/programming perspective, it is often more expedient to ask a specific question (such as: Is string1 less than string2?). For this reason, and also from the standpoint of standardizing string comparator terminology, a number of additional, special purpose string comparators are introduced (e.g., $$=, $$<, $$<=, etc.). As might be expected, these comparators are all simple words invoking the $$COMPARE comparator. The single general purpose string comparator is: : $$COMPARE ( string1 string2 -- status ) \ "s-s-compare" \ Returns -1, 0, or +1 depending on whether string1 is \ lexicographically less than, equal to, or greater than \ string2, respectively. And the six special purpose string comparators are: : $$= ( string1 string2 -- t | f ) \ "s-s-equal" \ Returns -1 if string1 = string2, else returns 0. : $$< ( string1 string2 -- t | f ) \ "s-s-less-than" \ Returns -1 if string1 < string2, else returns 0. : $$<= ( string1 string2 -- t | f ) \ "s-s-less-than-or-equal" \ Returns -1 if string1 <= string2, else returns 0. : $$> ( string1 string2 -- t | f ) \ "s-s-greater-than" \ Returns -1 if string1 > string2, else returns 0. : $$>= ( string1 string2 -- t | f) \ "s-s-greater-than-or-equal" \ Returns -1 if string1 >= string2, else returns 0. and : $$<> ( string1 string2 -- t | f ) \ "s-s-not-equal" \ Returns -1 if string1 <> string2, else returns 0. 6.13. STRING PATTERN MATCHING The ability to locate arbitrary characters and substrings within a string is provided. The two primitives for locating an arbitrary character within a string are: : $CFIND ( char string -- index | -1 ) \ "s-c-find" \ Searches for leftmost occurrence of "char" in "string". \ Returns "index" if found, else returns -1. and : $CFIND< ( char string -- index | -1 ) \ "s-c-find-back" \ Searches for rightmost occurrence of "char" in "string". \ Returns "index" if found, else returns -1. The primitive which searches for a substring (to include the searched string itself) within a string is: : $$FIND ( string1 string2 index length -- index | -1 ) \ "s-s-find" \ Searches for the first occurrence of the string1 substring \ starting at "index" for "length" characters in string2. \ Returns index if found, else returns -1. Note that "index" and "length" identify the substring within "string2" to be used for matching purposes. 6.14. STRING SET OPERATIONS Two set theoretic string primitives are provided. In both cases a character is considered a set element and a string is considered a set. The two set theoretic string primitives are: For single element set membership: : $CMEM ( char string -- t | f ) \ "s-c-mem" \ Returns -1 if "char" is in "string", else returns 0. Note that the $CMEM primitive is sufficient (along with other primitives introduced here such as $NULL, $C@, $$+, etc.) to build any necessary higher order set theoretic string operators. For multiple element set membership: : $$VER ( string1 string2 -- index | -1 ) \ "s-s-ver" \ Verifies that string2 contains only those characters in \ string1 by returning a -1. Otherwise the index of the first \ character in string2 not contained in string1 is returned. The $$VER (verify) operator is borrowed from PL/I and it is more powerful, and practical, that might at first be expected. As an example if we define: CREATE NUMBERS ,$ "0123456789" Then any string can be tested to see if it contains only numbers with the word: \ Returns -1 if "string" contains only the ASCII characters 0..9, \ else returns 0. : NUMBERS? ( string -- flag ) NUMBERS SWAP $$VER 0< ; 6.15. STRING TRANSLATION Two straightforward string translation primitives are provided based, primarily, on frequency of use. They translate lower-case characters in a string to upper-case, and vice-versa. They are: : $>UPPER ( string -- ) \ "s-to-upper" \ Converts any lower-case characters in "string" to upper-case. and : $>LOWER ( string -- ) \ "s-to-lower" \ Converts any upper-case characters in "string" to lower-case. 6.16. STRING ENCODING/DECODING String encoding (i.e., translating a number to its ASCII string representation) and string decoding (i.e., translating an ASCII string number representation to a number) are handled by three primitives which are modeled after the currently defined FORTH-83 word set which performs double number conversion (i.e., "<#", "#", "#>", etc.). String encoding, or decoding, is initiated via the use of the "$CONVERT" word. Its definition is: : $CONVERT ( string index -- ) \ "s-convert" \ Defines a string and index within the string to be used for \ subsequent string-to-number or number-to-string conversions. An initial call to $CONVERT, in effect, defines a specific string and a specific place/index within that string to be used for subsequent encoding or decoding. String encoding is accomplished via successive calls to the word "$N". Its definition is: \ : $>N ( -- number | -1 ) \ "s-to-n" \ Converts the character at the string/index position defined \ by the last call to $CONVERT to a number if possible. An \ error (i.e., -1) is returned if the character is non-numeric \ (i.e., does not lie between 0..BASE-1). The index position \ established in the call to $CONVERT is always incremented \ after a call to $>N. The fact that $>N always increments the index position is important since this allows $>N to be used to "search" a string for the first valid numeric character (and then proceed with the decode operations). The primitives given here are quite simple (more so than the "<#", "#>", and associated word set) and it is assumed that the user will tailor the "higher levels" of string encoding/decoding to his/her taste. 7. AREAS NOT COVERED Since this paper tries to concentrate exclusively on Forth strings, a number of related topics were purposely excluded. In all cases the primary reason for exclusion was that the topic area would necessarily introduce other major considerations which would lay well beyond the area of strings and thereby cause a considerable loss of focus. 7.1. INPUT/OUTPUT Input/output, per se, has not been addressed in this file. The reasoning is straightforward. Forth, itself, provides minimal input/output facilities. The Forth machine (upon which the Forth language itself is based) is a virtual machine with an on-line console (for interactive input/output) and a block file system for everything else. This obviously falls far short of an input/output system in the traditional sense. Correspondingly, the string functions given will work with the console and the block file (i.e., the Forth input stream). No attempt has been made to extend this concept. The reasoning is simply that input/output considerations for Forth are a separate issue, of which, string structures are a subcomponent (e.g., any binary read capability could easily input Forth strings - along with any other internal Forth data type for a given system). In truth, as long as string primitives are provided to read string literals from a standard Forth block file (as they are in this implementation) - then this, in itself, handles the issue of portability. So, in essence, there is no need to augment string primitives beyond this point (at least from the perspective of portability). Note: this is not to say that many Forth systems do not have excellent input/output facilities (well beyond the FORTH-83 standard) - it is rather to say that this is an independent subject and more properly treated as input/output instead of strings. 7.2. STRING STACK The idea and use of a string stack will also be considered outside the bounds of this discussion. This, in no way, indicates that I feel the idea of a string stack is bad (I think it is an excellent idea). It is rather because it introduces additional issues and the line has to be drawn somewhere. For example, if a string stack were introduced then string operations could proceed independently of data stack operations with no confusion between the two. This is excellent except that it completely changes the semantics/effect of the same code when string and data stack operations are intermixed. For example, if "SR1" is a string reference, then the code: "3 SR1 7 + $." Could be expected to produce entirely different results depending on whether or not a string stack were implemented. Of course this same logic applies if a floating stack (etc.) is introduced. The point, however, is that this is really more of a stack (and semantics) issue than one of a string manipulation issue. I have long argued for multiple stacks in Forth (and even *God forbid* for a type stack), but the arguments have fallen upon deaf ears. I'm getting very tired of arguing at this point. 7.3. PARSING STRINGS Most C language implementations (for example) provide a healthy serving of string parsing functions. This is nice except that often underlying assumptions must be made (e.g., what characters constitute a "whitespace") which may not apply in all/most situations - and, of course, if you don't need it - why have it? I can think of no parsing functions which cannot be built upon the primitives provided here if needed - so I've left them out. Additionally, since the string package provided herein is primarily targeted toward writing application code (rather than, for example, parsing Forth code) - no provisions have been made for parsing functions as they relate to the Forth language itself. 8. THE FORTH SOURCE FILE The file "FSTRINGS.SCR" contains all of the Forth string primitives discussed in this document. 9. THE VALIDATION FILE The file "FSTEST.SCR" contains a set of validation tests to ensure that the string primitives provided in FSTRINGS.SCR function properly. Although not exhaustive, the validation tests should give reasonable confidence that this package will work on your particular Forth system. (The validation package will also attempt to isolate malfunctions down to a specific primitive if possible.) 10. FORTH-83 DEVIATIONS As far as I am able to determine FSTRINGS.SCR is all kosher FORTH-83 Standard. I use words from the double number extension word set (i.e., 2DUP, 2DROP, 2SWAP, etc.). Also I use the BRANCH word from the system extension word set in [$LIT]. I would suspect any reasonable FORTH-83 implementation would provide these words (along with the "\" word which is also used). I have tested the system (although not extensively) under both MasterFORTH and Laxen & Perry's F83 - no problems. I would be glad to assist anyone who encounters difficulty in getting this system to run - but, remember, the whole idea is just to provide working FORTH-83 code to demonstrate the functions themselves so that they can then be rewritten consistent with the underlying Forth system and extended to your taste. 11. INTERNALS Some of the key points which may be of assistance in understanding the code/logic in file FSTRINGS.SCR follows: 11.1. INTERNAL STRING STRUCTURE Although nothing in the use of the primitives provided here makes any assumptions about the underlying internal string structure used, this information is required for the actual implementation itself. The internal representation selected for strings in this implementation was that of counted strings. The initial string *word* contains the count (which may be 0 or null). Thus the limit on the working size of a string word is 0..32,767 bytes due to the use of signed number arithmetic. The reserve scratch area "_BUFFER" has been allocated at 1024+2 bytes. The only words which reference _BUFFER are $LIT, $<>ROT. Other than this everything is done "core-to-core" via CMOVEs. _BUFFER's initial allocation (and use) should be thought out carefully when implementing this system since it represents a (potentially) large permanent allocation of space. For example, it may be more expedient to temporarily "grab" unused dictionary space for this purpose and avoid any permanent loss of space. How one does this is dependent upon the particular Forth system being used. Another point to keep in mind is that only one _BUFFER is provided. Thus one cannot have two string literals simultaneously defined. (Also, for example, a call to $<>ROT will translate the "number" to the modulus of the length of the string given, but that's hardly editing!) 11.4. MEMORY OPERATORS The string primitives provided herein do most of their real "work" through the underlying "memory" primitives (i.e, CMOVE, CMOVE>, COMPARE, CFIND, CFIND<, and -optionally- SAME?). For this reason alone it would be advisable to code these memory operators in assembler. (Still another reason is that these memory operators should also be useful independent of this string package.) The "interface" operators (i.e., _CF+, _LEN, and _$>AL) provide the connection/interface between the high-level string operators and the low-level memory operators. They also make it easier to modify this package for stack widths of other than 16 bits and/or allow using other internal representations than the one selected here. Since the use of the interface operators is pervasive as well, it is recommended that they be coded in assembler also (in addition they are very short words). So, in general, the basic modus operandi of these string operators is to call upon the interface operators; to perform the appropriate "stack juggling"; and to call upon the appropriate memory operators. In the file FSTRINGS.SCR, block 1 is the load block for a full FORTH-83 implementation. Block 2 is the load block for a MasterFORTH system (i.e., the words: COMPARE, CFIND, CFIND<, SAME?, _CF+, _LEN, and _$>AL) are written in assembler for a MasterFORTH system (80XX chip). Even if you don't have a MasterFORTH system (in fact does *anyone* besides me!?), I would strongly recommend you examine the assembler implementation of these words and rewrite them (in assembler) for your own particular Forth system (they really make a rather dramatic difference in performance). The words are documented and the underlying algorithmic approach should be valid regardless of the system you're running. As an aid the understanding the MasterFORTH implementation for the 80XX series chip (so that you may convert these words): - The MasterFORTH assembler is based upon the Laxen/Perry F83 public domain model with the following major differences: - The mode #) is simply ); - The mode S#) has disappeared; - Local labels are available for unconditional jumps; - REPZ and REPNZ are REPE and REPNE instead; - JZ, JNZ, JC, and JNC are added; - The string instructions use BYTE and WORD rather than AL and AX. - The MasterFORTH implementation uses the following machine resources: - The four segment register must be maintained at the same value (i.e., CS = DS = SS = ES); - The following three registers are used internally by MasterFORTH and must be restored prior to returning to Forth: - SI -- the Forth instruction pointer (IP) - SP -- the Forth parameter stack pointer - BP -- the Forth return stack pointer. The definitions of the memory operators are: : COMPARE ( a1 a2 n -- status ) \ Compares a1-a2, (a1+1)-(a2+1), ... (a1+n-1)-(a2+n-1) \ as required returning -1, 0, or +1 depending on lexicographic \ order. If n=0, then 0 (i.e., =) is returned. : SAME? ( a1 a2 n -- t | f ) \ Compares a1-a2, (a1+1)-(a2+1), ... (a1+n-1)-(a2+n-1) \ as required returning -1 if all "n" bytes match, else returns \ 0. Note that this operator is not strictly required since one \ may define SAME? as \ : SAME? COMPARE 0= ; \ It is included primarily for reasons of speed. \ Memory character/byte FIND GTH 08/22/87 : CFIND ( c a1 n -- a2 | 0 ) \ "c-find" \ Searches a1, a1+1, ..., a1+n-1 for c returning first \ address of match or 0 if no match. NOTE: n=0 returns 0. : CFIND< ( c a1 n -- a2 | 0 ) \ "c-find-back" \ Same as CFIND except search is high-to-low memory. \ NOTE: Memory search begins at address: a1+n-1. The words "CFIND" and "CFIND<" were selected to match the current FORTH-83 words "CMOVE" and "CMOVE>". Also note that no attempt was made to code a general-purpose assembler word which searches memory attempting to locate a substring. The primary reason for this is that this operation, although frequently used, is far more complex (when done efficiently) than is generally realized. The reader is referred to the excellent article "Searching for Strings with Boyer-Moore" by Richard Wiggins and Paul Walberg in the November, 1986 (Volume 3, Number 11) edition of Computer Language (pp. 28-42) just to see how harry things can get! 11.5. THE ISSUE OF STANDARDS The string operators provided herein will hopefully be of benefit to some of you in string processing applications. The true test, however, of the utility of any programming language extension is its proven utility over time. There is thus no good measure at present of what string operators, as provided in FSTRINGS, are needed, are defined well, or are missing. It is more reasonable to expect that FSTRINGS will serve as a catalyst (or perhaps seed) to encourage members of the Forth community to investigate the fundamental nature of and need for string operations in Forth. In addition to the issue of functionality addressed above, there are as well the matters of stack/data conventions and naming conventions. All of these issues need to be throughly investigated and an informal standard developed and used for an extended period before any sort of string standards proposal is warranted (at least in this author's opinion). This has been a rather long-winded way of stating that I am in no way proposing FSTRINGS as a standards extension. The development of FSTRINGS did, however, provide reasonably good assurance that some more fundamental, underlying operators (upon which any contiguous memory string package could easily be built) could be good candidates as standards extensions. These are just the "memory operators" earlier mentioned. Thus I would like to propose that the words: COMPARE, CFIND, CFIND<, and -optionally- SAME? be considered as standards extensions to augment the CMOVE and CMOVE> words. 11.6. NULL STRINGS This package allows the use of null strings. You may wish to ignore them altogether, but I would caution against this. The null string processing introduces very little overhead for the capability gained. Null string handling is greatly simplified by the fact that all of the underlying memory operators are defined to accept null (or zero) counts. For example, in the null case: CMOVE and CMOVE> do not move anything; COMPARE returns 0 (i.e., "equal"); CFIND and CFIND< return 0 (i.e., "not found"); and -optionally- SAME? returns -1 (i.e., "equal"). Thus the majority of the string primitives herein can be written without concern for the special case of null (or zero length) strings. For those primitives which must explicitly test for the null string case, this action is always performed initially. If the null string case applies, then the return stack is set appropriately and the primitive is immediately EXITed. In all cases, this amounts to a single line of (commented) code at the beginning of the word definition. 11.7. CONVERSION PROBLEMS As mentioned, this system has been tested with both MasterFORTH and F83. Jerry Shifrin was kind enough to allow me/us to test this package under LMI's Forth as well. The primary difficulty encountered was the underlying implementation of the "BRANCH" word. Both MasterFORTH and F83 use absolute addresses for BRANCH. LMI seems to use a relative offset instead. BRANCH is only used in the definition of [$LIT] as follows: : [$LIT] ( -- string ) \ "bracket-s-lit" \ Compiled string literal COMPILE BRANCH HERE 0 , \ forward branch around string HERE ,$ HERE ROT ! [COMPILE] LITERAL ; IMMEDIATE It may be necessary to modify [$LIT] prior to the "!" in order to adjust the "address/offset" which follows BRANCH in the compiled word as necessary. It may also be necessary to perform "aligns" as well. Alternatively, if your Forth system supports the ">MARK" and ">RESOLVE" words from the system extension word set, then it may be more expedient to simply recode [$LIT] as: : [$LIT] ( -- string ) \ "bracket-s-lit" \ Compiled string literal COMPILE BRANCH >MARK HERE ,$ SWAP >RESOLVE [COMPILE] LITERAL ; IMMEDIATE Finally, it may be necessary to modify the "_CF+" and "_$>AL" words to use some value other than "2+" if your Forth system's word size is other than 16 bits. 12. GLOSSARY A short glossary of all string primitives follows: : COMPARE ( a1 a2 n -- status ) : SAME? ( a1 a2 n -- t | f ) : CFIND ( c a1 n -- a2 | 0 ) \ "c-find" : CFIND< ( c a1 n -- a2 | 0 ) \ "c-find-back" : $LIT ( -- string ) \ "s-lit" : $, ( string -- ) \ "s-comma" : ,$ ( -- ) \ "comma-s" : [$LIT] ( -- string ) \ "bracket-s-lit" : $KEY ( string -- ) \ "s-key" : $$KEY ( string -- ) \ "s-s-key" : $ALLOT ( number -- ) \ "s-allot" : $IMPORT ( addr length string -- ) \ "s-import" : $EXPORT ( string addr -- ) \ "s-export" : $LEN ( string -- length ) \ "s-length" : $. ( string -- ) \ "s-dot" : $NULL ( string -- ) \ "s-null" : $$+ ( string1 string2 -- ) \ "s-s-plus" : $C! ( char string index -- ) \ "s-c-store" : $C@ ( string index -- char ) \ "s-c-fetch" : $$! ( string1 string2 -- ) \ "s-s-store" : $$@ ( string1 string2 index length --) \ "s-s-fetch" : $CINS ( char string index -- ) \ "s-c-ins" : $$INS ( string1 string2 index -- ) \ "s-s-ins" : $|TRIM ( string number -- ) \ "s-left-trim" : $TRIM| ( string number -- ) \ "s-trim-right" : $|SPACES ( string -- ) \ "s-left-spaces" : $SPACES| ( string -- ) \ "s-spaces-right" : $DEL ( string index number -- ) \ "s-del" : $$REP ( string1 string2 index -- ) \ "s-s-rep" : $ROT ( string -- ) \ "s-right-rote" : $<>ROT ( string number -- ) \ "s-many-right-rote" : $$COMPARE ( string1 string2 -- status ) \ "s-s-compare" : $$= ( string1 string2 -- t | f ) \ "s-s-equal" : $$< ( string1 string2 -- t | f ) \ "s-s-less-than" : $$<= ( string1 string2 -- t | f ) \ "s-s-less-than-or-equal" : $$> ( string1 string2 -- t | f ) \ "s-s-greater-than" : $$>= ( string1 string2 -- t | f) \ "s-s-greater-than-or-equal" : $$<> ( string1 string2 -- t | f ) \ "s-s-not-equal" : $CFIND ( char string -- index | -1 ) \ "s-c-find" : $CFIND< ( char string -- index | -1 ) \ "s-c-find-back" : $$FIND ( string1 string2 index length -- index | -1 ) \ "s-s-find" : $CMEM ( char string -- t | f ) \ "s-c-mem" : $$VER ( string1 string2 -- index | -1 ) \ "s-s-ver" : $>UPPER ( string -- ) \ "s-to-upper" : $>LOWER ( string -- ) \ "s-to-lower" : $CONVERT ( string index -- ) \ "s-convert" : $N ( -- number | -1 ) \ "s-to-n"