Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!brutus.cs.uiuc.edu!apple!vsi1!daver!mips!winchester!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Error in Posting of SPEC numbers on IBM systems
Message-ID: <36426@mips.mips.COM>
Date: 24 Feb 90 02:41:25 GMT
References: <36189@mips.mips.COM> <14900004@hpdmd48.HP.COM>
Sender: news@mips.COM
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Inc.
Lines: 92

In article <14900004@hpdmd48.HP.COM> sritacco@hpdmd48.HP.COM (Steve Ritacco) writes:
>Ok, let's talk some architecture stuff.

>Why is it that the RIOS has a bigger data cache than instruction cache?
>This defies conventional wisdom.  Data caches are less effective than
>instruction caches and are usually made small because their hit ratio
>doesn't increase with size as rapidly as instruction cache.  If I had to
>guess what is going on, I would guess that access to the I-cache is
>very wide, to suport super-scaler, so they crammed all they could on
>the CPU chip.  This seems to be pretty effective.  IBM has shown the
>super-scaler architecture works, which up to this point I wasn't convinced
>of.  The benifits are tangible.  A 20MgHz CPU with 8K I-cache and 32K
>D-cache SPECmarked at 22.something.  That is quite impressive.  The R3000
>which to date seemed the most efficient CPU/system implementation has
>been displaced for the moment.

1) Super-scalar works, at least for getting at more of the low-level parallism
of FP code.  This is clearly shown by the IBM systems.
2) It's not clear that it works, or their specific case works.
This might be:
	a) Compilers will get better (likely) for integer.
	b) Compilers will get better, a lot (unlikely) for integer.
It is alwasy possible that there's a whole lot of mileage to be gained,
but past experience says to doubt it; it's not like this compiler technology
is a raw new technology: IBM has been doing excellent optimization for
a long time.  Certainly, our experience has been that most of the
micro-level scheduling improvements over the last few years ahve been more
in the FP area, than in the integer area.  Anyway, I'd council keeping
an open mind, but I'd also advise not just believing what some IBM
marketing guy says "Oh, we haven't really taken advantage of that.",
that it's going to get magically better.  On the other hand, if one of their
good technical folks like Marty Hopkins says there's a big jump coming, then
one should pay serious attention.

3) In general, this does raise an interesting issue, which is comparing
cache sizes.  Some people build smaller, special-purpose, N-way-set-associative
caches; some people build various-sized, direct-mapped caches from standard
SRAM.  Both ways are legitimate, and there are various tradeoffs in terms
of power, space, and cost.  One thing to be careful off is saying
that something did it with a small cache, because one would also want
to know how much a special-purpose cache chip costs, also.....
I don't claim to be unbiased: I like using standard SRAMs, because they
always get cheap, and so far, I think that chip costs argue with me,
but there are legitimate reasons for doing it the other way, too.
Of course, we're not likely to know for sure the cost of the IBM chips,
so it's not so easy to compare.

Note, just in case anyone is misled by the following, there is no
announced product from MIPS called an R4000....

>I don't doubt that the R4000 will out-perform RIOS, there is one thing
>that I wonder about though.  Will the R4000 out-perform RIOS brute force
>that is high integration and very high clock speeds, or will it beat
>it by providing greater architectual efficiency?  The only SPARC imple
>mentations that beat mips have a major clock speed difference (not to
>mention higher implemenatation cost).  One other concern is the hype
>associated with super-scaler.  In the EE-Times article someone from
>mips stated that the R4000 was going to be super-scaler.  That's the
As far as I know, no one from MIPS, who knows, has ever publicly said
that it would be super-scalar (or that it wouldn't).  What we generally
say is: there are various flavors of multiple-issue machines: superscalar,
superpipelined, and VLIW (or maybe, short VLIW, which is what I'd call
the i860 and DN10000, sort of), and that anybody who wants to be competitive
in the current round of chips needs to do one of these, and that of course
we've been working on this for several years, and are familiar with the
variations, and are doing one, or some combination, but that we
explicitly refuse to disclose which flavor we're using in the
R?000.  At some panel session within the last year or so, I commented
that architectural simulation is a necessity, because this area was far beyond
human intuition.  I said, for instance, that we'd
burned huge numbers of cycles simulating the effects of being able to
do various pairs of instructions simultaneously, and comparing results,
and that we'd been thinking about "supersonic" pipelines for years.

	Actually, what I think has happened is that super-scalar has become
	like RISC.  I.e., for a while, lots of people got convinced that
	if something had register windows, that was RISC.  Right now,
	almost anything with an aggressive pipeline gets called super-scalar,
	because for most people, the implementation nuances are irrelevant.

>first time I had heard that.  Made me wonder if it is true, or just an
>attempt to ride the hype wave.  If better performance can be had with
>a simpler design (non super-scaler) due to less complexity, why not tell

As usual, I recommend Hennessy's article in the September 89 UNIX Review;
it also contians some references to other articles with good studies of
super-scalar issues.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086