Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbvax!ENG.SUN.COM!Mitch.Bradley From: Mitch.Bradley@ENG.SUN.COM Newsgroups: comp.lang.forth Subject: Threading Message-ID: <9104221358.AA16566@ucbvax.Berkeley.EDU> Date: 19 Apr 91 19:09:19 GMT Sender: daemon@ucbvax.BERKELEY.EDU Reply-To: Mitch.Bradley%ENG.SUN.COM@SCFVM.GSFC.NASA.GOV Organization: The Internet Lines: 29 > Forth code is usually compiled as a threaded but you can quite > easily convert it to subroutine threaded and even pure machine code. On most processors, subroutine threaded code without in-line machine code expansion is SLOWER than direct threaded code. This is because typical program thread from code word to code word 8 times more frequently than they nest and unnest colon definitions. The "jsr/rts" pair usually has to push a return address on a stack, whereas typical direct-threaded in-line compiled "NEXT" routines keep "IP" in a register. However, subroutine threading opens the door to in-line machine code expansion. The tradeoffs in a nutshell: * If you don't plan to use in-line expansion of code words, don't use subroutine threading. * If you really must have the ultimate speed, then use subroutine threading with in-line code expansion and peephole optimization. (Be honest about this; most applications bottleneck on I/O, and most compute-bound applications spend nearly all their time in a very few inner loops. It is often cost-effective to use threaded code for most of the application and hand-code a few critical words). * Threaded code is easier to debug. It is possible to decompile in-line expanded code, but not easy, especially if peephole optimization has been performed. Mitch Bradley, wmb@Eng.Sun.COM