Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!hao!ames!amdcad!tim From: tim@amdcad.AMD.COM (Tim Olson) Newsgroups: comp.arch Subject: Re: register windows Message-ID: <18843@amdcad.AMD.COM> Date: Fri, 23-Oct-87 21:48:15 EST Article-I.D.: amdcad.18843 Posted: Fri Oct 23 21:48:15 1987 Date-Received: Sun, 25-Oct-87 19:01:32 EST References: <201@PT.CS.CMU.EDU> <933@cpocd2.UUCP> <821@mips.UUCP> Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca. Lines: 47 Keywords: register windows, interrupt latency In article <821@mips.UUCP>, hansen@mips.UUCP (Craig Hansen) writes: +----- | There's no question that register windows can, in some cases, reduce load | and store frequencies, but to really pay their cost, they have to reduce | load and store frequencies sufficiently to offset the higher cost of load | and store operations on these machines. I've not been on the design team of | a register windowed machine, but it seems that the H/W designers might have | assumed that the register windows were going to eliminate so many memory | references that they didn't spend the time they should have on making loads | and stores go fast. +----- Yes, the register windows eliminate *many* memory references (although fast loads/stores are still a high priority). For example, MIPS published a paper at the recent ASPLOS II convention showing dynamic load/store percentages, as well as the percent of load/store instructions that required a non-zero offset calculation: nroff asl load/store % 28% 30% non-zero offset% 88% 80% Contrast this to the statistics gathered on the Am29000 simulator (register-windowed machine): nroff asm29k load/store % 16% 16% non-zero offset% 9.0% 9.2% Now, since MIPS didn't publish the input they used on these programs to derive their numbers, we obviously cannot perform a direct comparison. We assume, however, that the input wasn't "specialized" in any way. One possible explination for this is that non-register-windowed machines have many more loads/stores to the stack during procedure call entry/exit, resulting in higher load/store percentages as well as higher utilization of the base+offset addressing mode. The main point to realize here is that a series of trade-offs are made in any processor design, and what might be a big win for one processor may have minimal impact on another. These individual tradeoffs cannot be taken as "absolutes" without examining their relationship with the rest of the architecture. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)