Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!mit-eddie!ll-xn!ames!ucbcad!ucbvax!hplabs!pyramid!prls!mips!hansen From: hansen@mips.UUCP (Craig Hansen) Newsgroups: comp.arch Subject: Re: register windows Message-ID: <821@mips.UUCP> Date: Thu, 22-Oct-87 15:34:09 EST Article-I.D.: mips.821 Posted: Thu Oct 22 15:34:09 1987 Date-Received: Sun, 25-Oct-87 10:09:39 EST References: <201@PT.CS.CMU.EDU> <933@cpocd2.UUCP> Lines: 77 Keywords: register windows, interrupt latency Summary: register windows << fast loads & stores In article <933@cpocd2.UUCP>, howard@cpocd2.UUCP (Howard A. Landman) writes: > In article <201@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: > >There are those who argue that interrupts and task switches are enormously > >rare compared to subroutine calls and returns. They are right. They then > >argue that the rare events can be penalized, if this makes the common events > >run faster. (The RISC argument, if you will.) > > The argument is simply that this approach leads to the best average > performance. Which it does. Sigh. As always, RISC means many different things to different people. "The RISC argument" leads to varying conclusions, depending on what you're focusing on, and what your design team can and can't do easily. I've worked on three RISC architectures in different environments, and they each ended up quite unique. The MIPS team has the strongest software and measurement skills of the three designs, and it let us delegate a lot more to software, and quickly see where design trade-offs were taking us. Unfortunately, comparing interrupt and task switch rates to subroutine call and return rates is only part of the story. The register window approach trades hardware resources against optimizing software. The MIPS-R2000, which doesn't have hardware register windows, relies on an intelligent use of a flat, uniform, register set, implemented by optimizing compilers. As has been described in this forum previously, procedure arguments and return values are passed in registers. In addition, the compiler system identifies leaf-level routines, and allocates variables into registers, starting at these leaf-level routines, working its way up to call tree. When registers must be spilled to memory, the optimizer selects appropriate locations for the register saving and restoring code (moving up call tree and outside loops, etc.). The end result is a healthy reduction in the amount of register saving and restoring at procedure boundaries. The register window approach opts for simpler software (a good idea if you don't have software), but spends more die area, and ultimately, cycle speed, on register file accesses and register file spilling hardware. Ultimately, the selection of an architecture is the result of several inter-related design trade-offs. While it is not an absolute requirement of register windows, let me observe that every one of these machines built to date takes multiple cycles to perform canonical load and store operations. There's no question that register windows can, in some cases, reduce load and store frequencies, but to really pay their cost, they have to reduce load and store frequencies sufficiently to offset the higher cost of load and store operations on these machines. I've not been on the design team of a register windowed machine, but it seems that the H/W designers might have assumed that the register windows were going to eliminate so many memory references that they didn't spend the time they should have on making loads and stores go fast. If you are concerned about UNIX kernel performance, it would be worthwhile to note that the MIPS-R2000 instruction set and compiler system are designed to permit efficient references to data contained in structures (a reference to a structure-element through a pointer is a single-cycle operation, but is two-to-three cycles at minimum for the SPARC and Am29k register window-based machines. Now what's the relative frequency of structure references in a UNIX kernel? (Hint: Dhrystone will give you the wrong answer. I'm not trying to be glib here; dhrystone is fine benchmark for comparing some things [uncached machines without optimizing compilers, x86 compilers for Personal Computers], but the characteristics of data references, particularly the use of "addressing modes" and data dependencies, aren't representative of anything.) If you've got some dusty FORTRAN card decks that you've been using on some itty-bitty-mainframe computer, you should note that procedure calls aren't all that frequent, compared to Dhrystone. Having an optimizer that can put a couple of dozen variables (not necessarily local variables) into registers efficiently, and make fast references to global variables (FORTRAN is notorious for its use of variables stuffed into common blocks) is a big win here. The MIPS compiler system puts aside one register to point to a region of the global data space, where one single-cycle instruction can get to it's value. Again, here, the speed of basic load and store operations will matter more than whether the registers are windowed. -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...decwrl!mips!hansen