Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!ucbcad!ucbvax!decvax!decwrl!pyramid!oliveb!intelca!mipos3!cpocd2!howard From: howard@cpocd2.UUCP Newsgroups: comp.arch Subject: Re: subroutine frequency Message-ID: <443@cpocd2.UUCP> Date: Mon, 23-Feb-87 16:21:46 EST Article-I.D.: cpocd2.443 Posted: Mon Feb 23 16:21:46 1987 Date-Received: Thu, 26-Feb-87 23:28:38 EST References: <1881@homxc.UUCP> <898@moscom.UUCP> <476@mntgfx.MENTOR.COM> <430@cpocd2.UUCP> <1093@uwmacc.UUCP> Reply-To: howard@cpocd2.UUCP (Howard A. Landman) Organization: Intel Corp. ASIC Services Organization, Chandler AZ Lines: 71 In article <1093@uwmacc.UUCP> ad@uwmacc.UUCP (Alan Lloyd Dickman) writes: >In article <430@cpocd2.UUCP>, howard@cpocd2.UUCP (Howard A. Landman) writes: >>Procedure calls are grotesquely expensive on the most common machines. >>There are several ways to try to avoid this >>overhead. One is hardware support for call/return, as in some RISC machines. >>Another way is to eliminate procedures from your language; this is what Forth >>did with its "threaded code", and is the main reason Forth is so fast. On a >>machine with multiple register windows the speed advantage of Forth would >>evaporate; of course you could also implement a hardware stack, which might >>increase its advantage! > >Please explain what you mean by multiple register windows and how they >impact Forth's speed. Thanks. Some RISC machines have multiple sets of registers, organized as overlapping windows. This approach to speeding up calls/returns was suggested to the RISC I team at Berkeley by Forest Baskett; simulations convinced us it would work for typical UNIX code, so we implemented it. It worked. The average time for a procedure call was only a few instruction cycles (occasionally, you overflow/underflow the set of windows you have and must actually save/restore, but not every call & return). Also, since the windows overlap (some registers appear in two adjacent windows), we could pass parameters and return results in registers with essentially no overhead. The reason it's fast isn't hard to see. If you can get a clean set of registers by just incrementing a window-number register, then you don't need to save all your old registers away in main memory. If that data can be reactivated by just decrementing a window-number register, then you don't need to restore it. This saves all the overhead of save/restore, as long as everything fits into your registers. The costs include chip area and slowing down of the register access time as more registers are added. SO: 1) Procedure calls/returns are expensive (slow) on most machines. 2) Forth doesn't do procedure calls/returns, so on those machines it has a speed advantage (compared to C, Pascal, Fortran, Lisp, ...). 3) On a machine where calls/returns are fast, the advantage disappears. Avoiding something which is inexpensive buys you nothing. 4) I believe that most of Forth's memory traffic is due to stack manipulations. If this is true, hardware stack support could speed Forth up quite a bit; but it would have to be better than what Burroughs did in their early "stack machines" where only the top two elements were in registers. REFERENCES: Fitzpatrick et al, "A RISCy Approach to VLSI", VLSI Design, 4th quarter 1981 - OR - Fitzpatrick et al, "VLSI Implementations of a Reduced Instruction Set Computer", CMU Conference on VLSI Systems and Computations, 1981 These are essentially the same article; the VLSI Design version has nicer graphics. They both describe the RISC I. Tamir & Sequin, "Strategies for Managing the Register File in RISC", IEEE Trans. Comput., v.C-32, November 1983 Halbert & Kessler, "Windows of Overlapping Registers", CS292R Final Report, U.C. Berkeley, June 1980 This will be hard to get, but it's the most detailed reference. Sherburne et al, "A 32-Bit NMOS Microprocessor with a Large Register File", IEEE JSSC, v.SC-19 #5, October 1984 Describes the RISC II. Sherburne, "Processor Design Tradeoffs in VLSI", PhD dissertation, Comp. Sci. Div., U.C. Berkeley, April 1984 Describes the RISC II. -- Howard A. Landman ...!intelca!mipos3!cpocd2!howard