Path: utzoo!utgpu!attcan!uunet!husc6!rutgers!ucsd!ucbvax!SCFVM.BITNET!ZMLEB From: ZMLEB@SCFVM.BITNET (Lee Brotzman) Newsgroups: comp.lang.forth Subject: Re: Forth "Pre-Compiler" Message-ID: <8808032106.AA01441@jade.berkeley.edu> Date: 3 Aug 88 20:06:49 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 95 In article <718@amc.UUCP> pilchuck!amc!jon@uunet.uu.net (Jon Mandrell) writes: >In article <5703@batcomputer.tn.cornell.edu> olson@tcgould.tn.cornell.edu > (olson) writes: >>How would you compute if the process of calling proceedures had zero >>overhead? > > I have seen claims like this for Forth and the Novix chip for quite >a while. >1) What is the definition of zero overhead? The instruction must still > be fetched, and a return address must be stored somewhere. The overhead of a call on the NOVIX chip is one (1) clock cycle; a return from subroutine entails zero (0) clock cycles. Since the code memory, data stack and return stack are all separately addressable memory spaces running in parallel, the chip isn't wasting CPU cycles storing things away on the return stack before it can perform the call; it just does the whole thing in one fell swoop. Compare the cycle times with even the simple 6502 (4 cycles call; 4 cycles return) and you begin to see the difference. I attended a lecture by Charles Moore, co-inventer of the NOVIX, where he stated that a 68000 subroutine call required 20 clock cycles as did the return (I take this number with a grain of salt, unless someone else is willing to confirm it). The NOVIX parallel memory architecture allows it to perform some combinations of Forth operations in a single op-code. I believe the longest any op-code takes to complete is 3 cycles. The large majority of op-codes execute in one cycle. Add this all up and you can see why a NOVIX running at 8MHz can easily outperform a 68000 even if the latter is running at a faster clock speed. >2) Many people have said that C function calls are slower than forth. > No matter what processor you are on, a function call runs the same > speed, no matter what language compiled the code. Direct calls cannot > be less efficient than threaded code, at worst the things will come > out even. I agree. Perhaps the people to which you refer were thinking about the time necessary to set up the correct stack frame for a C call. It is easy for Forth programmers to forget that even though the parameters are already on the data stack when the word is called, they had to be put there somehow. The stack is a blessing and a curse. A blessing because it makes the internal structure of the language so simple; a curse because stack manipulations eat away a significant proportion of the processing time a Forth word takes to execute. (One of my favorite stack stories is of a programmer that redesigned a word and had to change the stack manipulation. After he debugged the word and got it going he took one last look at it. He discovered that in one part of the code he had the sequence: "DUP SWAP DROP" Now isn't that I nice little no-op feature?) See my earlier message which mentions the use of "local variables" for reducing the coding headaches and processing time involved in stack thrashing. >3) The claim that Forth generates more efficient code than a compiler. > I have problems with this, since inefficient code can be written in > any language. I have heard said that the programmer is pass 1 of > a Forth compiler, which does allow for more optimization at a very > low level. But, I would say the same thing can be done for C (and > a few other languages), if you know the code the compiler will generate. > Judicious use of language constructs will produce more efficient code. There are two kinds of efficiency: space and time. Traditional threaded Forth certainly wins in the space category. But I see no easy way that threaded code can be faster on any given processor (save the Forth chips cited above) than regular compiled code. That is, unless the compiler in question was written by complete bozos. There are compilers for Forth, though. Tom Almy wrote one for PC Forth which he mentioned here just recently. The code from these compilers rivals that of other languages. I also caught someone (sorry don't remember who right now) pointing out the limited addressable memory of the NOVIX chip, and how this limits the usefulness of the chip. One has to remember that this is Forth we are talking about, just because a Forth program compiles into 3K and an equivalent Pascal program compiles into 30K, doesn't mean the Forth program is any less functional. I believe the NOVIX chip supports 64 K words for code (128 Kbytes). One word holds either a Forth op-code (primitive) or subroutine call (invocation of a colon word). Since the op-codes are wired directly into the silicon, no memory space is wasted on a "kernel". Given the compactness of Forth, 64,000 words of memory will hold a LOT of Forth code. I regularly pack my Apple // with a kernel, development code (memory management, disk management, advanced data structure defining words, etc.), software floating point, a decompiler, source-level debugger, complete operating system interface, and source file editor. This totals about 35 K bytes of memory. Much of the above is implemented in 6502 assembler for speed. 6502 assembler takes up much more room than the equivalent NOVIX code would. Forth is probably the main reason I haven't felt the need to upgrade from an Apple // to a larger machine. I haven't outgrown the one I'm on yet! -- Lee Brotzman (FIGI-L Moderator) -- My opinions are my own and not those of my employer, ST Systems Corp., -- or their employer, NASA Goddard Space Flight Center, or their employer, -- Ronald Reagan, or his employer, Nancy Reagan, or her employer, the -- planets and stars.