Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!apple!oliveb!mipos3!omepd!mipon2.intel.com!mcg
From: mcg@mipon2.intel.com (Steven McGeady)
Newsgroups: comp.arch
Subject: Re: 55 MIPS & 66 MIPS
Message-ID: <5277@omepd.UUCP>
Date: 28 Nov 89 03:13:20 GMT
References: <28107@amdcad.AMD.COM> <1358@bnr-rsc.UUCP> <31329@winchester.mips.COM> <22303@gryphon.COM>
Sender: news@omepd.UUCP
Reply-To: mcg@mipon2.intel.com (Steven McGeady)
Lines: 103

In article <28107@amdcad.AMD.COM>, tim@electron.amd.com (Tim Olson) writes:
> 
> In article <22303@gryphon.COM> scarter@gryphon.COM (Scott Carter) writes:
> | The 960 CA can issue three instructions per
> | cycle to the chosen three of four execute units.  I believe Intel
has figures
> | showing that on the average they could infact issue two instructions
per clock
> | _average_ [over what program set?], hence the 960CA can legitimately
be called
> | 66 Native MIPS average with 99 Native MIPS peak.
> 
> The i960CA decoder can dispatch up to 3 instructions per cycle.
> However, the decoder looks at 4 instructions at a time, and it appears
> that the decoder cannot be loaded with the next set of 4 instructions
> until the current set of instructions have all been dispatched.

This is not correct.  The instruction decoder contains a rolling quad-word
window into which instructions are loaded (potentially) every cycle.
The reason that we do not claim 99 MIPS (none of our advertising claims
this number, to the best of my knowledge - those who have heard me speak
hear me say jokingly that we run at 99 MIPS for "one whole cycle") -
is that for three instructions to be dispatched, one must be a branch.
A branch requires that a non-next line of instructions from the i-cache
be loaded, and this is not accomplished at the full rate.

> Intel compared its i960CA board running this benchmark suite with a
> 68030 (20MHz), an i960KA(20MHz), and an Am29000(16MHz) board.
> However, the board they used to benchmark the Am29000 was not designed
> for performance; rather, it was designed to test the functionality of
> ADAPT (Advanced Development and Prototyping Tool) hardware debuggers.

This is an interesting piece of history re-invention.  Step Engineering,
the current manufacturer of the STEB board,  received the design of the
board from AMD (the board has an AMD copyright on it).  Apparently, the
board was designed this way because it is impossible to build a 29K
system using normal DRAMs and achieve better performance.  We attempted
to put faster RAMs inthe STEB board, and to increase the clock speed to
20MHz, and neither worked.  We chose the STEB board not because it was
slow (even we didn't expect it to be so slow) but because it is the only
available board with a prototyping area on which we could add an SBX
connector to interface the graphics cards on which we displayed the
benchmark results.

> To provide a more fair comparison, I requested the benchmark sources
> from Intel, to run on a 30MHz Am29000 board (manufactured by YARC
> Systems).  This board uses 2-way interleaved, 100ns DRAM memory for
> instructions and 35ns SRAM for data.

This board contains separate Instruction and Data memory (using the
29k's Hardvard bus), each of which is interleaved (according to published
data I've been able to find on the board).  The 30MHz 29k's are apparently
hand-sorted - we know of no volume shipments of these parts.
This board is in no way comparable in cost, parts-count, interface
complexity, or usability to the 960CA board that was used.

> I received sources for the non-proprietary benchmarks, compiled them
> with the current version of the MetaWare HighC29k compiler, and ran
> them on the YARC card.  Here are the final results:
> 
> [tables showing the 29k approximately at par with 960CA] 

We supplied Mr. Olson with the sources to these benchmarks, as an effort
to bring an end to the warring that has been going on over benchmarking.
In exchange for freely supplying these, Mr. Olson agreed that we would
be given the resulting source code back, along with a copy of the compiler
that produced it, prior to publication of the results.  Mr. Olson has
chosen to ignore those commitments and publish numbers without noting
what compiler was used, and without providing us (or anyone else - we also
supplied the benchmarks to Michael Sleator of Microprocessor Report)
with the ability to check their validity.

It should be noted that the 960CA benchmarks were compiled with the
current GNU GCC compiler, which does *no* instruction scheduling, and thus
fails to take advantage of the multiple-instruction issue capability of
the 960CA.  We have been working on an instruction-scheduling compiler,
but it is not available for release at this time.

The lesson that this has served to teach me, who argued with our marketing
department that we should release these benchmarks to AMD under the noted
restrictions, is that we were foolish to trust AMD's word regarding feedback
of the results from the benchmarks.  Thus, I place no trust in these
numbers presented as representing any kind of objective reality.
Furthermore, I have learned my lesson with regard to cooperating.

The benchmark wars will now most certiainly be taken out of the hand of
technologists and be placed back in the hands of marketing departments.

I will reiterate here my advice to customers attempting to determine the
relative speed of the two processors:  run your own benchmarks on a board
with a memory system relevant to the design you plan to build.  The Yarc
board's memory design is an example of the most-expensive memory system
design that one can attach to the 29k - it bears no resemblance to what
can be expected with a combined I&D DRAM memory system, which is where
the only true comparison lies.  In short, don't believe AMD's benchmark
numbers, and don't believe ours.  Don't believe simulators, because AMD's
is well known at overstating performance.  Believe your own benchmarks.
And note that the STEB board is much closer to most embedded designs
that the Yarc board, and that the 960 is much more usable in the average
design that the 29k.

S. McGeady
Intel Corp.


Brought to you by Super Global Mega Corp .com