Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!rutgers!nysernic!itsgw!batcomputer!pyramid!prls!mips!earl
From: earl@mips.UUCP (Earl Killian)
Newsgroups: comp.arch
Subject: Re: register windows
Message-ID: <837@gumby.UUCP>
Date: Mon, 26-Oct-87 02:37:41 EST
Article-I.D.: gumby.837
Posted: Mon Oct 26 02:37:41 1987
Date-Received: Wed, 28-Oct-87 01:31:54 EST
References: <201@PT.CS.CMU.EDU> <933@cpocd2.UUCP> <821@mips.UUCP> <18855@amdcad.AMD.COM>
Lines: 54
Keywords: register windows, interrupt latency
Summary: more data

In article <18855@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) writes:
> In article <833@mips.UUCP> hansen@mips.UUCP (Craig Hansen) writes:
> | It should also be noted, again, that the selection of programs used as
> | benchmarks will influence the results too. Making your architectural
> | trade-offs on the basis of Dhrystone, or even nroff, which is not very
> | representative of more modern C code, isn't very smart.
> Making architectural decisions based on *any* single program isn't very
> smart.  You should examine a large body of code, looking at older,
> heavily-used programs as well as more "modern" code (output of C++
> compilers, object-oriented programming).

It sounds like everyone's in agreement, and yet, so far the discussion
has talked about one program!  What's interesting about programs is
that some statistics are fairly consistent and some vary all over the
place.  To show the variance of the statistics relevant to this
discussion, consider a wide range of programs (statistics for the
MIPSco architecture):
						  non-sp/gp  non-sp/gp
			  sp-based  reg  gp-based  0-offset non-0 offset
	    loads  stores  ld/st   ld/st   ld/st    ld/st     ld/st
            -----   -----  -----   -----   -----    -----     -----
espresso    19.6%    1.1%   0.1%    0.1%    1.3%    18.7%      0.4%
spice       26.9%   16.3%   7.2%    2.8%    4.2%     1.5%     27.5%
wolf        25.3%    8.2%   7.1%    1.9%    3.6%     8.0%     12.9%
yacc        15.7%    2.1%   0.9%    0.5%    2.5%    12.4%      1.5%
diff        16.2%    3.2%   0.5%    0.7%    4.6%     7.2%      6.4%
compress    18.3%   10.6%   0.1%    3.5%    8.1%     7.8%      9.4%
uopt        21.8%    8.4%   5.6%    5.2%    1.2%     6.8%     11.4%
as1         18.3%   11.2%   4.4%    6.8%    3.7%     3.8%     10.8%
nroff       18.8%    8.6%   0.4%    7.7%   14.5%     3.1%      1.7%
tex         21.9%   13.8%   3.6%    9.2%   10.8%     5.1%      7.0%
ccom        18.7%   12.2%   3.4%   11.9%    3.8%     5.0%      6.9%
doduc       29.4%   10.2%  10.1%    4.1%   12.3%     1.5%     11.6%

The sum of the last 5 columns should be equal to the sum of the first
2.  In other words the last 5 are a partition of the loads and stores
into sp-based, register saves and restores at procedure entry/exit,
gp-based (that is small static variables addressed by a dedicated
register, which always have a nonzero offset on the MIPSco machine),
other load/stores with zero offset, and other load/stores with nonzero
offset.  Everything is a percentage of the total instructions
executed.

My conclusion: be careful of drawing conclusions from a small number
of data points.  For example, when deciding whether to have an offset
for load/store, looking at just espresso (0.4% nonzero) or just spice
(27.5% nonzero) would lead to very different results.

Caveat: this statistics are from one architecture / compiler.  Your
actual mileage may vary.  In particular, I would bet the 29k compilers
probably try hard to avoid needing offsets.  To the extent that these
techniques succeed, it would reduce the 27.5% slowdown you'd expect in
spice.  However, I think these statistics do give some indication of
the desirability of having a load/store offset.