Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!asuvax!ncar!gatech!hubcap!mark
From: mark@hubcap.clemson.edu (Mark Smotherman)
Newsgroups: comp.arch
Subject: Re: SPARC implementation or architecture
Message-ID: <1991Apr18.162205.20529@hubcap.clemson.edu>
Date: 18 Apr 91 16:22:05 GMT
References: <1991Apr17.183822.7681@elroy.jpl.nasa.gov>
Organization: Clemson University
Lines: 51

From article <1991Apr17.183822.7681@elroy.jpl.nasa.gov>, by david@elroy.jpl.nasa.gov (David Robinson):
> Has anyone compared why SPARC tends to run slower at the same clock
> speed as other RISC chips?

As Michael Slater points out in the most recent Microprocessor Report
(p. 12, vol. 5, no. 6, April 3, 1991), the current SPARC implementations
exhibit lower SPECx/MHz in part because they use a unified I/D cache.
The competing implementations from MIPS, HP, and IBM have split caches.
Also, the early SPARCstations used only a single 4-byte write buffer.

The SS2 design seems to address the write buffer problem but not the
unified cache.  (Maybe the SPEC configuration parameters should include
#write buffers and presence or absence of a cache refill buffer and
store back buffer.)

One possible explanation to the less aggressive memory system design
seen in SPARC implementations is a reliance on register windows for
performance.  John Hennessy in the Oct. 1989 IEEE video seminar on RISC
processor design noted that the register window approach was thought to
substantially lower the load/store traffic (for integers) and could
therefore tolerate simplified (i.e., slower) caches.  However, Hennessy
also noted that SPARC register windows do not help FP load/stores.

An interesting architectural comparison between SPARC and MIPS was
given by Sun folks at ASPLOS-IV:  R.F. Cmelik, et al., "An analysis of
MIPS and SPARC instruction set utilization on the SPEC benchmarks,"
pp. 290-302.  (They concluded that SPARC had the advantage, but the
MIPS folks were quick to point out that they used current Sun compilers
and year-old MIPS compilers.  It will be interesting to see how we
chew over this paper in comp.arch!)  The data presented in this paper
showed the following MIPS/SPARC ratios for memory traffic:

	int loads (in int benchmarks)	1.07
	int stores			1.00
	FP loads (in FP benchmarks)	1.92	(i.e. MIPS did twice as many)
	FP stores			2.49

They suggested that the integer ratios do not show the true value of
register windows since the dynamic procedure calling frequency of the
SPEC benchmarks is abnormally low (see p. 293).  They also noted that
MIPS-I lacks dbl. prec. FP load/store but claimed that even disallowing
DP FP l/s in SPARC would _not_ significantly reduce the memory traffic
ratios; they attributed the large FP ratios to compiler technology.

MIPS has repeated the experiment with current compilers.  Let's ask
John Mashey to post the new numbers and new ratios or to publish a
follow-up article in ACM Computer Architecture Newsletter.

-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark