Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!willett!ForthNet
From: ForthNet@willett.UUCP (ForthNet articles from GEnie)
Newsgroups: comp.lang.forth
Subject: Forth Implementation
Message-ID: <503.UUL1.3#5129@willett.UUCP>
Date: 18 Feb 90 23:42:11 GMT
Organization: Latest link in the ForthNet chain.  (Pgh, PA)
Lines: 97

Category 3,  Topic 24
Message 51        Sat Feb 17, 1990
R.BERKEY [Robert]            at 18:57 PST
 
 
   To: David Albert
   Re:  : (colon) Data Structures
        Threading Tradeoffs

 >    ...I have seen that several implementations of Forth use a small
 > "inner interpreter loop" using DS:SI for example as the
 > instructioni pointer.  I chose just to use CALL and RET as the
 > entry and exit to my words.  Therfore, CS:IP is my instruction
 > pointer and word pointer.  Here's the question:  Why do people use
 > the separate inner interpreter loop?  It seems that the call and
 > return are much more flexible and that I can more easily manipulate
 > return addresses since they are just on the stack.  I use BP for my
 > parameter stack pointer.


This gets into the whole issue of varieties of implementations of colon.  To
review, the basic varieties of Forth threading techniques have been called, in
increasing order of abstractness:

 native code compilation
 jsr threaded
 direct threaded
 indirect threaded
 token threaded

Native code compilation is just the usual mix of code that an assembler and an
ordinary compiler produce.  This may get called other things like direct
machine compilation.  A Forth native code compilation may have lots of calls
intermixed with short runs of low level code.  Depending on viewpoint, this
may or may not be considered a threading technique.

What you've implemented sounds like it might be related to the class of JSR
(jump subroutine) threading, where the body of a colon definition contains a
sequence of calls.  JSR threading is related to native code compilation in
that the processor looks at them in the same way.  The structural difference
is such that a JSR threaded system can be compliant to the Forth-83 Standard,
while a native code compilation is not.  A Forth-83 implementation could also
have a native code compiler, but this would be there in addition to the :
(colon) compiler.

The names "direct threaded" (DTC) and "indirect threaded" (ITC) were
criticized on technical grounds in an early Forth Dimensions but the names
have stuck.

Direct threading gets a code field added to the body of the colon definition. 
The code field is directly executable, although often one register must be set
before executing the code field.  One key answer to your query is that
compiling a compilation token on an 80188 jsr threaded system takes three
bytes, whereas compiling a compilation token with DTC, ITC, TTC, etc., takes
two bytes--a potential for substantial reduction in code size.

Indirect threading means that the code field, instead of being executable,
contains the address of executable code.  The Forth-79 Standard restricted
implementations of : to indirect threading.

Token threading (TTC) has several variants.  It may add one more level of
indirection through a table of pointers, to a table of pointers to code.  With
token threading, addresses can be completely isolated from the main body of
code, making relocatability easy.

Specific machine architectures lead to more variations on the above, including
segment threading (SgTC) on the 8086, and a 68000 "token" threading in which
the table is accessed by the architecture and the thread is directly
executable.

It might seem at first glance that these systems would get slower the more
abstract they get.  But then consider that in a JSR system NEXT is RET CALL . 
That's two bytes of opcode, which reads from memory four bytes of addresses,
and writes two bytes, for a total of eight bytes of memory access.  Meanwhile
an 80188 direct threaded system with an inline NEXT of   LODSW   AX JMP   has
three bytes of opcode, which reads two bytes of address, for a total of five
bytes of memory access. Processors including the PDP-11 and HP2100(?) have
single-opcode instructions that can perform an indirect-threaded NEXT .  Its
easy to see that the speed tradeoffs can get interesting.

Like you suggest, there are many other tradeoffs.

I've sometimes wondered about the efficiency tradeoffs of having the return
stack the default 80x8x SP stack.  Related to your comment about ease of
manipulating return addresses, one technique that's used is  SP, BP XCHG  to
get at the return stack.  One thing I find interesting about JSR is that it
clarifies that a Forth IP register, (DS:SI or whatever), is really a part of
the return stack.

Now as for how all this compares with what one discovers when reading up on
TIL's, I wouldn't know, but would be interested.

Robert

-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'