Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!ames!hc!lll-winken!maddog.llnl.gov!brooks
From: brooks@maddog.llnl.gov
Newsgroups: comp.arch
Subject: superscalar
Message-ID: <26356@lll-winken.LLNL.GOV>
Date: 3 Jun 89 18:57:40 GMT
Sender: usenet@lll-winken.LLNL.GOV
Reply-To: brooks@maddog.llnl.gov ()
Organization: Lawrence Livermore National Laboratory
Lines: 52


There has been quite a lot of discussion on what a computer architecture must
have in it to be called a "superscalar."  I thought I would contribute some
real data to this discussion.  Last week, I had the chance to benchmark Intel's
i860 running at 33MHZ with an "alpha" compiler.  The compiler did not
take advantage of delayed branches yet, and did not use any of the dual or
pipelined mode instructions.  On a radiation transport Monte Carlo code,
which is something we routinely crunch on supercomputers like the Cray
machines, that wimpy little i860 with an alpha compiler outran the
Cray 1S by 10% or so.  I don't think that anyone, including myself, took the
marketing hype that showed a little Cray machine on top of the i860 chip
seriously.  I don't think that even Intel took it seriously.  At this point,
for applications which mesh well with a cache, its not marketing hype.  Of
course, all the other microprocessor vendors are within 6 months or less of
obtaining the same performance goal.  The MIPS R3000 is probably within epsilon
of this performance level, the rumored ECL RISC implementations from various
vendors coming down the pike must be truely impressive.

For those that might say one should have compared to the XMP or YMP, the XMP is
30% faster than the Cray 1S on this application, and the YMP is 50% faster yet.
With good compilers the i860, particularly the announced 40MHZ part, or the
rumored 50MHZ models, will be knocking on the door of the YMP pretty loudly.

Needless to say, when the application starts missing cache (for any of the
microprocessors) the performance rapidly drops into a hole when compared to the
classic supercomputer.  The microprocessor vendors now need to learn the last
lesson in supercomputer architecture, which is getting adequate main memory
bandwidth.  Since interleaving memory chips with glue logic would raise cost
too much, the micro vendors need to get in close collaboration with the memory
chip vendors to get the interleaving done on the memory chips themselves.  This
may be a good way for the U.S. manufactures to get back into the memory chip
biz.  Design your micro with interleave control on the chip and then design
your memory chips that have a compatible arangement, then don't tell the
foreign memory chip vendors about the micro/memory chip interface until you
get to market.  Interleaving on the memory chip is not a difficult thing to do,
one only has to decide that it is time to do it.

Just in case the Intel marketing pukes might be tempted to use this posting
for their own purposes, please read the disclaimer below:

(C) Copyright 1989, by Eugene Brooks III, all rights reserved.
This posting is the personal opinion solely of the author, and does not
relflect the opinions of the U.S. Govt or the University of CA in any official
capacity.  This posting may be transmitted only on the USENET Newsgroup
comp.arch, for the purposes of stimulating technical discussion, and may be
excerpted for the purposes of further discussion on the USENET if the copyright
is left in place.  This posting may NOT be printed on paper, and may NOT be
used for product endorsement purposes.


brooks@maddog.llnl.gov, brooks@maddog.uucp