Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!hao!ames!amdcad!tim
From: tim@amdcad.AMD.COM (Tim Olson)
Newsgroups: comp.arch
Subject: Re: register windows
Message-ID: <18843@amdcad.AMD.COM>
Date: Fri, 23-Oct-87 21:48:15 EST
Article-I.D.: amdcad.18843
Posted: Fri Oct 23 21:48:15 1987
Date-Received: Sun, 25-Oct-87 19:01:32 EST
References: <201@PT.CS.CMU.EDU> <933@cpocd2.UUCP> <821@mips.UUCP>
Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca.
Lines: 47
Keywords: register windows, interrupt latency

In article <821@mips.UUCP>, hansen@mips.UUCP (Craig Hansen) writes:
+-----
| There's no question that register windows can, in some cases, reduce load
| and store frequencies, but to really pay their cost, they have to reduce
| load and store frequencies sufficiently to offset the higher cost of load
| and store operations on these machines. I've not been on the design team of
| a register windowed machine, but it seems that the H/W designers might have
| assumed that the register windows were going to eliminate so many memory
| references that they didn't spend the time they should have on making loads
| and stores go fast.
+-----

Yes, the register windows eliminate *many* memory references (although
fast loads/stores are still a high priority).  For example, MIPS
published a paper at the recent ASPLOS II convention showing dynamic
load/store percentages, as well as the percent of load/store
instructions that required a non-zero offset calculation:

			nroff		asl
     load/store %	  28%		 30%
     non-zero offset%	  88%		 80%

Contrast this to the statistics gathered on the Am29000 simulator
(register-windowed machine):

			nroff		asm29k
     load/store %	  16%		 16%
     non-zero offset%	  9.0%		 9.2%

Now, since MIPS didn't publish the input they used on these programs to
derive their numbers, we obviously cannot perform a direct comparison.
We assume, however, that the input wasn't "specialized" in any way.
One possible explination for this is that non-register-windowed machines
have many more loads/stores to the stack during procedure call
entry/exit, resulting in higher load/store percentages as well as higher
utilization of the base+offset addressing mode.

The main point to realize here is that a series of trade-offs are made
in any processor design, and what might be a big win for one processor
may have minimal impact on another.  These individual tradeoffs cannot
be taken as "absolutes" without examining their relationship with the
rest of the architecture.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)