Path: utzoo!utgpu!attcan!uunet!husc6!rutgers!ucsd!ucbvax!SCFVM.BITNET!ZMLEB
From: ZMLEB@SCFVM.BITNET (Lee Brotzman)
Newsgroups: comp.lang.forth
Subject: Re: Forth "Pre-Compiler"
Message-ID: <8808032106.AA01441@jade.berkeley.edu>
Date: 3 Aug 88 20:06:49 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 95

In article <718@amc.UUCP> pilchuck!amc!jon@uunet.uu.net  (Jon Mandrell) writes:

>In article <5703@batcomputer.tn.cornell.edu> olson@tcgould.tn.cornell.edu
>        (olson) writes:
>>How would you compute if the process of calling proceedures had zero
>>overhead?
>
>  I have seen claims like this for Forth and the Novix chip for quite
>a while.
>1)  What is the definition of zero overhead?  The instruction must still
>    be fetched, and a return address must be stored somewhere.

  The overhead of a call on the NOVIX chip is one (1) clock cycle; a return
from subroutine entails zero (0) clock cycles.  Since the code memory, data
stack and return stack are all separately addressable memory spaces running
in parallel, the chip isn't wasting CPU cycles storing things away on the
return stack before it can perform the call; it just does the whole thing
in one fell swoop.

   Compare the cycle times with even the simple 6502 (4 cycles call; 4
cycles return) and you begin to see the difference.  I attended a lecture by
Charles Moore, co-inventer of the NOVIX, where he stated that a 68000
subroutine call required 20 clock cycles as did the return (I take this number
with a grain of salt, unless someone else is willing to confirm it).

   The NOVIX parallel memory architecture allows it to perform some
combinations of Forth operations in a single op-code.  I believe the longest
any op-code takes to complete is 3 cycles.  The large majority of op-codes
execute in one cycle.  Add this all up and you can see why a NOVIX running at
8MHz can easily outperform a 68000 even if the latter is running at a faster
clock speed.

>2)  Many people have said that C function calls are slower than forth.
>    No matter what processor you are on, a function call runs the same
>    speed, no matter what language compiled the code.  Direct calls cannot
>    be less efficient than threaded code, at worst the things will come
>    out even.

   I agree.  Perhaps the people to which you refer were thinking about the
time necessary to set up the correct stack frame for a C call.  It is easy
for Forth programmers to forget that even though the parameters are already
on the data stack when the word is called, they had to be put there somehow.
The stack is a blessing and a curse.  A blessing because it makes the
internal structure of the language so simple; a curse because stack
manipulations eat away a significant proportion of the processing time
a Forth word takes to execute.  (One of my favorite stack stories is of
a programmer that redesigned a word and had to change the stack manipulation.
After he debugged the word and got it going he took one last look at it.
He discovered that in one part of the code he had the sequence: "DUP SWAP DROP"
Now isn't that I nice little no-op feature?)  See my earlier message which
mentions the use of "local variables" for reducing the coding headaches and
processing time involved in stack thrashing.

>3)  The claim that Forth generates more efficient code than a compiler.
>    I have problems with this, since inefficient code can be written in
>    any language.  I have heard said that the programmer is pass 1 of
>    a Forth compiler, which does allow for more optimization at a very
>    low level.  But, I would say the same thing can be done for C (and
>    a few other languages), if you know the code the compiler will generate.
>    Judicious use of language constructs will produce more efficient code.

   There are two kinds of efficiency: space and time.  Traditional threaded
Forth certainly wins in the space category.  But I see no easy way that
threaded code can be faster on any given processor (save the Forth chips
cited above) than regular compiled code.  That is, unless the compiler
in question was written by complete bozos.  There are compilers for Forth,
though.  Tom Almy wrote one for PC Forth which he mentioned here just
recently.  The code from these compilers rivals that of other languages.

   I also caught someone (sorry don't remember who right now) pointing out
the limited addressable memory of the NOVIX chip, and how this limits the
usefulness of the chip.  One has to remember that this is Forth we are
talking about, just because a Forth program compiles into 3K and an
equivalent Pascal program compiles into 30K, doesn't mean the Forth program
is any less functional.

   I believe the NOVIX chip supports 64 K words for code (128 Kbytes).  One
word holds either a Forth op-code (primitive) or subroutine call (invocation
of a colon word).  Since the op-codes are wired directly into the silicon, no
memory space is wasted on a "kernel".  Given the compactness of Forth,
64,000 words of memory will hold a LOT of Forth code.  I regularly pack
my Apple // with a kernel, development code (memory management, disk
management, advanced data structure defining words, etc.), software floating
point, a decompiler, source-level debugger, complete operating system
interface, and source file editor.  This totals about 35 K bytes of memory.
Much of the above is implemented in 6502 assembler for speed.  6502 assembler
takes up much more room than the equivalent NOVIX code would.  Forth is
probably the main reason I haven't felt the need to upgrade from an Apple //
to a larger machine.  I haven't outgrown the one I'm on yet!

-- Lee Brotzman  (FIGI-L Moderator)
-- My opinions are my own and not those of my employer, ST Systems Corp.,
-- or their employer, NASA Goddard Space Flight Center, or their employer,
-- Ronald Reagan, or his employer, Nancy Reagan, or her employer, the
-- planets and stars.