Path: utzoo!attcan!uunet!ginosko!gem.mps.ohio-state.edu!tut.cis.ohio-state.edu!ucbvax!SUN.COM!wmb From: wmb@SUN.COM (Mitch Bradley) Newsgroups: comp.lang.forth Subject: Re: Forth Implementation Question Message-ID: <8909221311.AA06054@jade.berkeley.edu> Date: 22 Sep 89 07:43:42 GMT Sender: daemon@ucbvax.BERKELEY.EDU Reply-To: Forth Interest Group International List Organization: The Internet Lines: 82 One issue that has not been explicitly raised in all this discussion about garbage collection is the problem of determining which locations have to be relocated and which don't. The general problem arises in a surprising number of contexts, and none of the Forth standards have dealt with it. Use of position-independent encoding does not even solve the problem; you still have to know exactly which data structures to store in the position-indepdent format. The problem occurs in any one of the following scenarios: a) An entire Forth dictionary image is saved and then re-used at a different address (as in a multi-process OS on a machine with no MMU) b) Multiple precompiled "overlays" or "modules" are saved, and then re-loaded, more than one at a time. c) A "garbage-collecting" or "hole-eliminating" FORGET scheme is used. d) A "turnkey compiler", which is capable of eliminating unused code from the application, is used. e) Supply your own examples. Here are some Forth operator usages which are particularly troublesome: 1) The use of "," to compile an address into a colon definition or word list. 2) The use of "!" to store an address into a variable. 3) The use of "," or "!" to store a link, as in a linked list of dictionary words, or a voc-link chain. 4) The use of "!" to remember the address of a word in an execution variable. 5) The use of "!" to remember the address of a variable or other data structure. The basic problem is that Forth "sweeps under the rug" the distinction between numbers and addresses. Not only do addresses have relocation implications, they also have different arithmetic properties (they are unsigned), granularity of memory access, and alignment restrictions. In the case of the PC, addresses are so bizarre that only IBM could have created a standard around a particular chip which will remain unnamed. As a start toward addressing the relocation issue, I use a set of 3 "address-class" data types, which I call "tokens", "links", and "addresses". token: The execution address of a Forth word. Returned by "'", can be used by "EXECUTE". link: Represents the successor of a node in a linked list. address: Represents the address of an arbitrary data structure. Each such type requires a minimum of 3 operators: xxx@ , xxx! , and /xxx (store, fetch, and sizeof). I use the convention that, when one of these address items appears on the data stack, it is an absolute address. Only when the item is transferred to and from memory does encoding (such as conversion to an offset or a token number, or the setting of a bit in a relocation bitmap) take place. The "absolute address when it's on the stack" convention is not strictly necessary, but it is convenient. In many implementations, tokens, links, and addresses are encoded in the same way. However, in some implementation schemes, the distinction is necessary. For instance, in a true token-threaded scheme, the token table may represent only executable words, while the addresses of other data structures may be encoded differently. In a memory-limited system, it may be worthwhile to encode links using small variable-length relative offsets. Observation about links: If links are relocated, what do you use as the list terminator value? 0 is not a good choice, because it can relocate to a non-zero value. I like to use the beginning address of the Forth dictionary as the sentinel value; I call that address "origin". It relocates properly. All tests for "end of list" must then resolve to "origin =" rather than "0=". When I converted my system to be relocatable, it was quite a chore. Now that it has been done, I frequently run across beneficial side effects, such as the ability to run on machines with oddball memory maps, the ability to change kernel implementation techniques (to tune for different environments) without affecting user code, and increased ease of porting to different CPU architectures. Mitch