Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!hao!boulder!sunybcs!bingvaxu!leah!itsgw!batcomputer!pyramid!prls!mips!hansen
From: hansen@mips.UUCP (Craig Hansen)
Newsgroups: comp.arch
Subject: Re: register windows
Message-ID: <833@mips.UUCP>
Date: Sat, 24-Oct-87 17:53:09 EST
Article-I.D.: mips.833
Posted: Sat Oct 24 17:53:09 1987
Date-Received: Mon, 26-Oct-87 05:38:46 EST
References: <201@PT.CS.CMU.EDU> <933@cpocd2.UUCP> <821@mips.UUCP> <18843@amdcad.AMD.COM>
Lines: 66
Keywords: register windows, interrupt latency
Summary: a possible expanation, but a wrong one

In article <18843@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) writes:
> [a comparison of load/store frequencies and non-zero offset frequencies]

> Now, since MIPS didn't publish the input they used on these programs to
> derive their numbers, we obviously cannot perform a direct comparison.

Even if the input were the same, because the instruction count isn't
necessarily the same for these two machines, the comparison of load/store
frequencies isn't really relevant.  In addition, as1 and am29k aren't even
the same program! (There are several variants of nroff around, too.)

> We assume, however, that the input wasn't "specialized" in any way.
> One possible explination for this is that non-register-windowed machines
> have many more loads/stores to the stack during procedure call
> entry/exit, resulting in higher load/store percentages as well as higher
> utilization of the base+offset addressing mode.

Yes, a possible explanation, but wrong. When compiling nroff (a version
released as part of UMIPS-BSD), using our inter-procedural register
allocator, and an input file of 700 lines, 4670 words, and 30530 characters,
we get the following dynamic statistics:

26.4% loads+stores
52.1 instructions per call
Average registers saved per call: 1.6
Register save+restore: 23.3% of loads+stores (7.0% of instructions)
    (This percentage includes variables which must reside in memory
     to handle call by address parameters.)

If you eliminated all these register save/restores, load+store frequency
would be 20.9% (remember that the total instruction count changes too).

Loads/stores to variables through global pointer register: 15.2% (Because of
use of the global pointer register, these references are single instructions,
all of which have non-zero offset values.) I would suspect that the Am29000
statistics do not count these as non-zero offset values, though they are
more than half of all the references.

Other load/stores: 4.2% (Zero-values offsets are frequent here within this
sub-class, but are only about 3% of total instruction references.)

> The main point to realize here is that a series of trade-offs are made
> in any processor design, and what might be a big win for one processor
> may have minimal impact on another.  These individual tradeoffs cannot
> be taken as "absolutes" without examining their relationship with the
> rest of the architecture.

Amen. Tim should also be careful, though, to realize that the tools and
compilers used to measure the architecture also influences the results.
Unless you set out to design both the architecture and the compiler
concurrently, you won't be able to make good trade-offs between them. It
should also be noted, again, that the selection of programs used as
benchmarks will influence the results too. Making your architectural
trade-offs on the basis of Dhrystone, or even nroff, which is not very
representative of more modern C code, isn't very smart.

I dare say, though, that trading the MIPS architecture for the MIPS
architecture + windowed-registers - displacements - single-cycle
load/stores, would be a loss.

Regards,
-- 
Craig Hansen
Manager, Architecture Development
MIPS Computer Systems, Inc.
...decwrl!mips!hansen