Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!ames!hc!lll-winken!maddog.llnl.gov!brooks From: brooks@maddog.llnl.gov Newsgroups: comp.arch Subject: superscalar Message-ID: <26356@lll-winken.LLNL.GOV> Date: 3 Jun 89 18:57:40 GMT Sender: usenet@lll-winken.LLNL.GOV Reply-To: brooks@maddog.llnl.gov () Organization: Lawrence Livermore National Laboratory Lines: 52 There has been quite a lot of discussion on what a computer architecture must have in it to be called a "superscalar." I thought I would contribute some real data to this discussion. Last week, I had the chance to benchmark Intel's i860 running at 33MHZ with an "alpha" compiler. The compiler did not take advantage of delayed branches yet, and did not use any of the dual or pipelined mode instructions. On a radiation transport Monte Carlo code, which is something we routinely crunch on supercomputers like the Cray machines, that wimpy little i860 with an alpha compiler outran the Cray 1S by 10% or so. I don't think that anyone, including myself, took the marketing hype that showed a little Cray machine on top of the i860 chip seriously. I don't think that even Intel took it seriously. At this point, for applications which mesh well with a cache, its not marketing hype. Of course, all the other microprocessor vendors are within 6 months or less of obtaining the same performance goal. The MIPS R3000 is probably within epsilon of this performance level, the rumored ECL RISC implementations from various vendors coming down the pike must be truely impressive. For those that might say one should have compared to the XMP or YMP, the XMP is 30% faster than the Cray 1S on this application, and the YMP is 50% faster yet. With good compilers the i860, particularly the announced 40MHZ part, or the rumored 50MHZ models, will be knocking on the door of the YMP pretty loudly. Needless to say, when the application starts missing cache (for any of the microprocessors) the performance rapidly drops into a hole when compared to the classic supercomputer. The microprocessor vendors now need to learn the last lesson in supercomputer architecture, which is getting adequate main memory bandwidth. Since interleaving memory chips with glue logic would raise cost too much, the micro vendors need to get in close collaboration with the memory chip vendors to get the interleaving done on the memory chips themselves. This may be a good way for the U.S. manufactures to get back into the memory chip biz. Design your micro with interleave control on the chip and then design your memory chips that have a compatible arangement, then don't tell the foreign memory chip vendors about the micro/memory chip interface until you get to market. Interleaving on the memory chip is not a difficult thing to do, one only has to decide that it is time to do it. Just in case the Intel marketing pukes might be tempted to use this posting for their own purposes, please read the disclaimer below: (C) Copyright 1989, by Eugene Brooks III, all rights reserved. This posting is the personal opinion solely of the author, and does not relflect the opinions of the U.S. Govt or the University of CA in any official capacity. This posting may be transmitted only on the USENET Newsgroup comp.arch, for the purposes of stimulating technical discussion, and may be excerpted for the purposes of further discussion on the USENET if the copyright is left in place. This posting may NOT be printed on paper, and may NOT be used for product endorsement purposes. brooks@maddog.llnl.gov, brooks@maddog.uucp