Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!mash
From: mash@mips.com (John Mashey)
Newsgroups: comp.arch
Subject: Re: SPARC implementation or architecture
Message-ID: <2500@spim.mips.COM>
Date: 20 Apr 91 01:13:53 GMT
References: <1991Apr17.183822.7681@elroy.jpl.nasa.gov> <41377@cup.portal.com> <1991Apr18.142341.23097@rice.edu>
Sender: news@mips.COM
Organization: MIPS Computer Systems, Inc.
Lines: 249
Nntp-Posting-Host: winchester.mips.com

In article <1991Apr18.142341.23097@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes:
>	An Analysis of MIPS and SPARC Instruction Set Utilization on
>	the SPEC Benchmarks
>	Cmelik, King, Ditzel, Kelly
>	ASPLOS-IV, 1991

>Basically, the MIPS executed more instructions, except on most of the integer
>benchmarks.  The authors note that a fairer comparison, taking into
>account register window overhead, interlocks, and annulled instructions
>still gives a 9% advantage to the SPARC.

>The biggest contributer to the difference seemed to be that the
>MIPS required two instructions to load or store a DP floating-point value.
>Hence the wide disparity in the FP benchmarks.

>Someone (Charlie Price?) from MIPS objected that the study had been carried
>out with an old generation of the MIPS compilers, and that newer numbers
>were significantly better for the MIPS.  Ditzel admitted this was possible,
>but noted that they had used what was available on the market when they did
>the study.

>The paper goes on to discuss the affect of libraries, load/store usage,
>branches, nops, integer ops, and fp ops.  Lots of good ideas for
>both architectures.  The appendix contains detailed numbers for each
>of the benchmarks.
>
>BTW, the numbers were collected with pixie (mips) and spixie (sparc).
>One the consistantly interesting parts of the conference is the methodology
>used to perform experiments.  Lots of good ideas here.

1) Definitely lots of good analysis here; it is heartening to see such
detailed analyses done on more meaningful programs, and when everybody
gets their ASPLOS proceedings, it is worth studying.  A subset of the
conclusions is fairly accurate, and in particular, the corrections
done for plausible extensions/changes are reasonably OK, as well as
most of the analysis about the effects of various features.

***********************************************************************
*Of course, most of the overall conclusions turn out to be wrong, if you
*use contemporary MIPS compilers released the same week as the Sun
*compilers used  in the study, as opposed to 1-year old MIPS compilers.
* In general, using the same approach as Sun, I get something like
* a 10% edge for MIPS (but read on to understand the numbers).
***********************************************************************

2) Sun did a perfectly reasonable thing, which is use the compilers
they could buy off-the-shelf from us for the analysis.  Unfortunately,
what fails to appear in the paper are the release dates of the compilers,
i.e., that:
	MIPS 2.10 release was 2Q90
	Sun compilers were announced for release last week

Why that might be relevant:

In the last year, I believe that Sun has devoted a large amount
of analysis and tuning of their compilers, using SPEC, among other things.
This is evidenced by, of course the exhaustive analysis in this paper,
as well as the improvements from 10.0 SPECmarks to 11.8 SPECmarks for
25MHz SPARCs (SS1+ in 2Q90 to SS IPC in 2Q91, with new compilers).

A 25MHz MIPS Magnum went from 17.8 to 18.6 in same time, and it actually
turns out that MIPS has done some (but not huge) analysis and tuning for
these benchmarks, and some of that shows up in the current compilers
(2.20), which happened to have been released within a week of when the
new Sun compilers were announced ..... although SPEC numbers for
both MIPS and Sun compilers in Beta form were published a while ago by
both.

(Note, just for the record, that there is NOTHING wrong with anybody
using the SPEC benchmarks to tune their compilers; it's infinitely better
for the buyers of computers out there if people do it with SPEC than
with Dhrystone or Whetstone.  I.e., if Sun spends a bunch of effort
analyzing SPEC to death and tuning things up, more power to them,
because the tunings are much more likely to encourage optimizations that
will help real programs, and actually DO something for a customer.)

3) The analysis in the paper is well worth reading, and the micro-level
discussions are very useful. The conclusions mostly follow from the data;
it's just that a 1-year difference in compiler choice still makes
a difference, and the general conclusions end up getting wiped out
by MIPS' compilers' gain in that year....

4) Let's start with the raw instruction counts (remember that this
need to be corrected for nops, annuls, stalls, etc ,etc).

Instruction counts in Millions:
Most numbers from Sun paper, Tables A1 and A2
Total = raw instruction counts
Total+ = SPARC counts with adjustments for window-handling,
annulled instructions, load-use stalls (i.e., a little closer comparison)
MIPS220 = equivalent of Total, but with 2.20 compilers rather than 2.10.
Notation of form (n/m) means column n divided by column m.
Most important thing to look at is differences between
columns 1 and 4, and 7 and 8.

FIRST TABLE:
COL	1	2	3	4	5	6	7	8
Source	Sun	Sun	Sun	mash	mash	Sun	Sun	mash
Bench	MIPS	SPARC	M/S	MIPS220	M/S	SPARC	M/S	M/S
	Total	Total	(1/2)	Total	(4/2)	Total+	(1/6)	(4/6)
---------------------------------------------------------------------
spice	21,569	22,878	0.94	20,114	0.88	26,516	0.81	0.76
doduc	 1,613	 1,303	1.24	 1,392	1.07	 1,335	1.21	1.04
nasa7    9,257	 6,615	1.40	 9,186	1.39	 6,719	1.38	1.37
matrx300 2,776   1,694	1.64	 2,339	1.38	 1,695	1.64	1.38
fpppp	 2,316	 1,443	1.61	 2,111	1.46	 1,472	1.57	1.43
tomcat	 1,813	 1,626	1.11	 1,738	1.07	 1,640	1.11	1.06
---------------------------------------------------------------------
FP Geometric Mean	1.30		1.19		1.25	1.15
---------------------------------------------------------------------
gcc	 1,111	 1,155	0.96	 1,149	0.99	1,317	0.84	0.87
espresso 2,829   2,931	0.97	 2,723	0.93	3,397	0.83	0.80
li	 6,023   4,661	1.29	 5,938	1.27	6,131	0.98	0.97
eqntott  1,243 	 1,322	0.94	 1,244	0.93	1,458	0.85	0.85
---------------------------------------------------------------------
Integer Geometric Mean	1.03		1.02		0.88	0.87
---------------------------------------------------------------------
Overall Geometric Mean	1.18		1.12		1.09	1.03
---------------------------------------------------------------------

Now, with this data, I draw several conclusions, some of which are rather
different than those given in the paper.  Again, please note that the
above is not PERFORMANCE data, but instruction-count (Total) or
Sun-adjusted-instruction-count (Total+) data, or my data on MIPS-1
with current compilers.

I'm particularly looking at the rightmost column above>

1) MIPS uses more instructions on floating-point [mostly due to the
lack of 64-bit FP load/stores, which is especially crucial to the
linear-algebra and related benchmarks].  The FP code improved somewhat
from 2.10 to 2.20, i.e., some of the effects seen in the paper were
from compilers where the benchmarks had barely been looked at in any
serious fashion...

2) MIPS usually uses less instructions on integer programs,
and definitely uses less instruction-equivalents, (i.e., from Total+).
The range is from .80 to .97, with a 95% confidence interval of
[0.76 to 0.99].

4) Now, lets look at the analysis at the end of the paper, which is
quite interesting (and is, in fact, a good example of the kinds of
analyses architects do or should do when figuring out how to design
and/or evolve and architecture).
What they did was:
start with the MIPS Total/Total+ counts (always equal),
and the SPARC Total+ counts, and then estimate the effects of adding
instructions to each architecture, to make them architectural-neutral,
and then compare, mostly to compare compilers, I guess.
Columns 1-3 were from Sun, and the MIPS Total+~ was computed from
the earlier Total by giving MIPS a 64-bit load/store (probably the
dominant effect). SPARC got the int<->fp improvements.
Column 4 is computed from the actual numbers for MIPS-2 machines
{R6000, R4000}, which have the 64-bit load/stores hypothesized by
Sun, plus load-interlocks and annulled-branches, and sqrt,
and a few other things.  It would have been interesting to have seen
the Sun numbers, as modified not just by the int<->fp change,
but also by the integer mul/div and anything else coming in the
next-generation SPARCs; however, the paper makes the case that
the mul/div issue is not a large one for this set of benchmarks
(maybe on the order of a percent, at most, depending on the benchmarks,
especially as the multiply left in the ineer loop of matrix300 on MIPS
has disappeared).

SECOND TABLE
COL	1	2	3	4	5	6
Source	Sun	Sun	Sun	mash	mash	mash
Bench	MIPS	SPARC	M/S	MIPS-2	M/S	M/S of adjusted
	Total+~	Total+~	(1/2)	Total	(4/2)	(col 4 of prev tab-adj)/2
---------------------------------------------------------------------
spice	20,211	25,095	0.81	18,429	0.73	0.75
doduc	 1,358	 1,301	1.04	 1,056	0.81	0.87
nasa7    6,927	 6,682	1.04	 6,454	0.97	1.03
matrx300 2,126   1,695	1.25	 2,339	0.99	1.00
fpppp	 1,616	 1,440	1.12	 1,319	0.92	0.98
tomcat	 1,377	 1,607	0.86	 1,283	0.80	0.81
---------------------------------------------------------------------
FP Geometric Mean	1.02		0.86	0.90
---------------------------------------------------------------------
gcc	 1,111	 1,262	0.88	 1,122	0.89
espresso 2,829   3,397	0.83	 2,648	0.81
li	 6,016   5,626	1.07	 5,504	0.98
eqntott  1,243 	 1,371	0.91	 1,247	0.91
---------------------------------------------------------------------
Integer Geometric Mean	0.93		0.90
---------------------------------------------------------------------
Overall Geometric Mean	0.97		0.88

Now, more notes:
1) It might be interesting to recalibrate the Sun-computed overall
Geo Mean above to allow for the MIPS 2.20 compilers.  I haven't
done this in detail, but note that going from 2.10 (on which columns
1 and 3 above are based) to 2.20 changed the Geo Means of Total+
in the earlier table from 1.10 to 1.03, with most of the effect being
in the FP area. Thus, I'd guess that the overall number as adjusted by Sun
would have come out around .94, rather than .97.

2) AS noted above, it would have been really fascinating to have seen
the actual next-generation SPARC numbers as column Total+~ above
(but I realize that would have a been a bit much to ask, since while
MIPS-2 changes are guessable, having been public for over a year,
the SPARC changes aren't yet public, I think (?))

3) So, to summarize, in the paper's conclusions, it says:
	a) "MIPS typically executes 18% more user-level instructions
	than SPARC"
	This could be rewritten as "MIPS typically executes 12%
	more user-level instructions than SPARC", although in any case,
	one must be very careful about the word "typical" here,
	in one can also say that MIPS usually executes less instructions
	on integer code, and more on FP code.  ("typical" is not
	a statistical term :-)
	b) "A fairer comparison which takes into account register-window
	overhead, load-use interlocks, and annulled instructions,
	still shows a 9% advantage for SPARC"  turns into:
	A fairer comparison shows a 3% advantage for SPARC.
	c) "most significant differences are SPARC's DP load/store,
	and MIPS compare-and-branch"
	Probably so.
	d) "When archiectural factors were factored out, the differences
	due to combined compiler/library effects were so small (3%)
	that neither MIPS nor SPARC has any significant advantage."
	Wellll...  include MIPS 2.20 compilers:
		-the 4 integer benchmarks range from 3% less to 20% less,
		with Geo mean = 13% less.
		-the 6 FP benchmarks are a little harder to figure,
		but if you take the 2.20 compiler numbers (COl 4 of
		first table), and subtract Sun's adjustments for 64-bit,
		and divide result by Sun's Total+~ column, you get
		the ratios shown in Column 6 above, whihc, not surprisingly,
		are in between what Sun computed, and what we actually get
		in MIPS-2.
		In any case this yields 10% less, using Sun's rules.

Anyway, one must be VERY careful to avoid over-generalization from a
small number of data points, and note, all of this was instruction
COUNTS, not cycles, and cycles are always more important.
Nevertheless, this kind of analysis in the Sun paper is very useful to
compiler writers (to answer question: why are THEY beating us in
that benchmark on instructions counts?  are they don't something
special? is it architecture, or our compilers missing an optimization?)

However, the major conclusion: that the two are indistinguishable,
is wrong, if you think 10% is distinguishable... 
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650