Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbvax!ENG.SUN.COM!Mitch.Bradley
From: Mitch.Bradley@ENG.SUN.COM
Newsgroups: comp.lang.forth
Subject: Threading
Message-ID: <9104221358.AA16566@ucbvax.Berkeley.EDU>
Date: 19 Apr 91 19:09:19 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Reply-To: Mitch.Bradley%ENG.SUN.COM@SCFVM.GSFC.NASA.GOV
Organization: The Internet
Lines: 29

> Forth code is usually compiled as a threaded but you can quite
> easily convert it to subroutine threaded and even pure machine code.

On most processors, subroutine threaded code without in-line machine
code expansion is SLOWER than direct threaded code.  This is because
typical program thread from code word to code word 8 times more frequently
than they nest and unnest colon definitions.  The "jsr/rts" pair usually
has to push a return address on a stack, whereas typical direct-threaded
in-line compiled "NEXT" routines keep "IP" in a register.

However, subroutine threading opens the door to in-line machine code
expansion.  The tradeoffs in a nutshell:

        * If you don't plan to use in-line expansion of code words,
          don't use subroutine threading.

        * If you really must have the ultimate speed, then use subroutine
          threading with in-line code expansion and peephole optimization.
          (Be honest about this; most applications bottleneck on I/O, and
          most compute-bound applications spend nearly all their time in
          a very few inner loops.  It is often cost-effective to use
          threaded code for most of the application and hand-code a few
          critical words).

        * Threaded code is easier to debug.  It is possible to decompile
          in-line expanded code, but not easy, especially if peephole
          optimization has been performed.

Mitch Bradley, wmb@Eng.Sun.COM