Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!ucbcad!ucbvax!decvax!decwrl!pyramid!oliveb!intelca!mipos3!cpocd2!howard
From: howard@cpocd2.UUCP
Newsgroups: comp.arch
Subject: Re: subroutine frequency
Message-ID: <443@cpocd2.UUCP>
Date: Mon, 23-Feb-87 16:21:46 EST
Article-I.D.: cpocd2.443
Posted: Mon Feb 23 16:21:46 1987
Date-Received: Thu, 26-Feb-87 23:28:38 EST
References: <1881@homxc.UUCP> <898@moscom.UUCP> <476@mntgfx.MENTOR.COM> <430@cpocd2.UUCP> <1093@uwmacc.UUCP>
Reply-To: howard@cpocd2.UUCP (Howard A. Landman)
Organization: Intel Corp. ASIC Services Organization, Chandler AZ
Lines: 71

In article <1093@uwmacc.UUCP> ad@uwmacc.UUCP (Alan Lloyd Dickman) writes:
>In article <430@cpocd2.UUCP>, howard@cpocd2.UUCP (Howard A. Landman) writes:
>>Procedure calls are grotesquely expensive on the most common machines.
>>There are several ways to try to avoid this
>>overhead.  One is hardware support for call/return, as in some RISC machines.
>>Another way is to eliminate procedures from your language; this is what Forth
>>did with its "threaded code", and is the main reason Forth is so fast.  On a
>>machine with multiple register windows the speed advantage of Forth would
>>evaporate; of course you could also implement a hardware stack, which might
>>increase its advantage!
>
>Please explain what you mean by multiple register windows and how they
>impact Forth's speed.  Thanks.

Some RISC machines have multiple sets of registers, organized as overlapping
windows.  This approach to speeding up calls/returns was suggested to the
RISC I team at Berkeley by Forest Baskett; simulations convinced us it would
work for typical UNIX code, so we implemented it.  It worked.  The average
time for a procedure call was only a few instruction cycles (occasionally, you
overflow/underflow the set of windows you have and must actually save/restore,
but not every call & return).  Also, since the windows overlap (some registers
appear in two adjacent windows), we could pass parameters and return results
in registers with essentially no overhead.

The reason it's fast isn't hard to see.  If you can get a clean set of
registers by just incrementing a window-number register, then you don't
need to save all your old registers away in main memory.  If that data
can be reactivated by just decrementing a window-number register, then
you don't need to restore it.  This saves all the overhead of save/restore,
as long as everything fits into your registers.  The costs include chip
area and slowing down of the register access time as more registers are
added.

SO:	1) Procedure calls/returns are expensive (slow) on most machines.
	2) Forth doesn't do procedure calls/returns, so on those machines
	   it has a speed advantage (compared to C, Pascal, Fortran, Lisp, ...).
	3) On a machine where calls/returns are fast, the advantage disappears.
	   Avoiding something which is inexpensive buys you nothing.
	4) I believe that most of Forth's memory traffic is due to stack
	   manipulations.  If this is true, hardware stack support could
	   speed Forth up quite a bit; but it would have to be better than
	   what Burroughs did in their early "stack machines" where only
	   the top two elements were in registers.

REFERENCES:

Fitzpatrick et al, "A RISCy Approach to VLSI", VLSI Design, 4th quarter 1981
				- OR -
Fitzpatrick et al, "VLSI Implementations of a Reduced Instruction Set Computer",
CMU Conference on VLSI Systems and Computations, 1981
	These are essentially the same article; the VLSI Design version has
	nicer graphics.  They both describe the RISC I.

Tamir & Sequin, "Strategies for Managing the Register File in RISC",
IEEE Trans. Comput., v.C-32, November 1983

Halbert & Kessler, "Windows of Overlapping Registers", CS292R Final Report,
U.C. Berkeley, June 1980
	This will be hard to get, but it's the most detailed reference.

Sherburne et al, "A 32-Bit NMOS Microprocessor with a Large Register File",
IEEE JSSC, v.SC-19 #5, October 1984
	Describes the RISC II.

Sherburne, "Processor Design Tradeoffs in VLSI", PhD dissertation, Comp.
Sci. Div., U.C. Berkeley, April 1984
	Describes the RISC II.
-- 

	Howard A. Landman
	...!intelca!mipos3!cpocd2!howard