Path: utzoo!attcan!utgpu!watmath!clyde!att!osu-cis!killer!elg
From: elg@killer.DALLAS.TX.US (Eric Green)
Newsgroups: comp.arch
Subject: Re: SPARC vs. MIPS on gcc
Message-ID: <6476@killer.DALLAS.TX.US>
Date: 18 Dec 88 07:53:30 GMT
References: <82150@sun.uucp>
Organization: The Unix(R) Connection, Dallas, Texas
Lines: 109

in article <82150@sun.uucp>, edkelly%aisling@Sun.COM (Ed Kelly) says:
>             A COMPARISON OF SPARC VS MIPS ON A LARGE C PROGRAM.
> 
> For the comparison we chose a large portable C program (the GNU C Compiler rev
> 1.24) and compiled the identical source on a Sun-4/280 with the SPARC compiler
> to produce a SPARC binary, and on a MIPS M/1000 with the MIPS compiler to 
> produce a MIPS binary.

Step 1: choose a program. Fine. You did that right. Even used the
right compiler -- the standard one.


>       Then using the same data (the file gcc.c) we ran the
> benchmark on both machines and gathered the dynamic trace statistics provided 
> by SPIXSTATS and PIXIE, 
> If you are interested in architecture and wish to avoid the 
> confusion of implementation details these are the numbers of most
> interest. 

OK, so you captured dynamic trace statistics. So what. Lower number of
instructions executed doesn't necessarily mean faster execution, or
else the Vax 780 would be the world's fastest machine ;-). I happen to
agree that some sort of register window setup is a Big Advantage
architecturally, but don't think that a dogmatic "Register windows are
better" is warranted.

> 2) There are lots of NOPs in MIPS code. This is an ARCHITECTURAL feature. 
> NOPs are not benign. As well as the direct cycles lost, lots of NOPs is bad 
> for code density, and it increases instruction cache miss penalties(due to more
> memory accesses and greater probability of a miss).

The delay slots filled by NOPs also allow you to schedule instructions
on LOADs etc. when the pipeline would otherwise be stalled, which
seems to me to make the whole issue somewhat of a tossup. You can do
the same sort of instruction rearrangment without that guaranteed
delay, but it becomes more of an iffy proposition.
     As I mentioned before, if code density was the sole detirminant
of architectural quality, we should all use Vaxen.

> In summary, for this benchmark, the ARCHITECTURAL benefits of register windows 
> and annulling more than balance the ARCHITECTURAL losses in
> computational.

Hmm... I wouldn't be quite so dogmatic about it if I were you. The
information presented looks fairly convincing, but there may be
alternate explanations. The only things certain in life are death, and
taxes. 

> MIPS compiler is not significantly better than the current SPARC compiler. 
> Considering the bad press, I will admit I was surprised by this
> myself. 

Doesn't surprise me too greatly. The register windows compensate quite
well for outdated compiler technology, which is why the UCB guys used
them in the first place (so they could re-target PCC, instead of
having to dig up come compiler guys to do a moby optimizing hack).

> -----------------------------------------------------------
> total raw cycles	25,522,476		18,999,172
> 
> cache miss cycles	4,427,524*		14,000,828*
> -----------------------------------------------------------
> total machine cycles	29,950,000		33,000,000

Looks like the R1000 machine used needed a larger cache. As David
Patterson explains so ably in his various papers, a larger cache can
make up for a lot of memory bandwidth (which is why a RISC can be
faster than a Vax 780). Statistics on how much cache was available on
each machine were not published with this so-called "performance
comparison". I would not be surprised if a MIPS processor needed a
larger cache than a SPARC, just as I would not be surprised if a SPARC
needed a larger cache than a Vax 780. Again, no clear performance
advantage here. Take a few million cache misses, and the MIPS looks
better than the SPARC (cycle-wise).

[specs on cycle times, other implementation features:]

> These numbers represent significant differences in the IMPLEMENTATION
> philosophies at Sun and at MIPS.

I suspect it's a matter of cash. The more cash you have, the faster
process technology you can buy. Sun isn't exactly cash-starved ;-).

>       The MIPS performance brief has concentrated on relatively small 
> integer programs that fit in the cache and so benefit well from the single cycle
> loads and stores. This overstates the integer performance for large programs,
> which are after all what people buy fast machines to run. 

This, I agree with. So, apparently, does MIPS, since they're part of a
group trying to design better benchmarks. 

> The opinions here are my own and do not necessarily represent those of
> Sun Microsystems.

Are you sure?

I mean, it sounded so lot like a product of the Sun Microsystems PR
department! (except that they would not be so clumsy about it, of
course). 

I don't particularly like the MIPS architecture (my favorite of the
recent RISCs is the AMD29000), but the above statistics did not seem
to warrant the conclusions drawn. 

--
Eric Lee Green    ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg
          Snail Mail P.O. Box 92191 Lafayette, LA 70509              
Netter A: In Hell they run VMS.
Netter B: No.  In Hell, they run MS-DOS.  And you only get 256k.