Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!ucsd!ames!ames.arc.nasa.gov!lamaster
From: lamaster@ames.arc.nasa.gov (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Re: KM's vs. Supers (medium)
Message-ID: <40049@ames.arc.nasa.gov>
Date: 8 Jan 90 20:07:12 GMT
References: <34030@mips.mips.COM> <4322@nttmhs.ntt.JP> <39807@ames.arc.nasa.gov> <4328@scolex.sco.COM>
Sender: usenet@ames.arc.nasa.gov
Organization: NASA - Ames Research Center
Lines: 64

In article <4328@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes:
>In article <39807@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:

>Ok.  I've just sat down with a list, and tried to figure something out;
>hopefully, *somebody* can help me on this.

>according to Eugene Brooks, a 66MHz R6000 will outperform a Cray-2, which
>runs at, what, 250MHz?

>So, *how* does it do this?


Once upon a time, I wrote 4 benchmarks which
showed that each of the following machines was faster:
IBM 3033, Cray-1/S, CDC Cyber 203, and CDC 7600.  I was able to do it by
knowing something about the weaknesses of each machine.

*It all depends on your applications.*  Eugene Brooks application is unusually
hard on the Cray, it appears.  On the other hand, even for codes which aren't
so hard on the Cray, there is now a cost advantage in many cases for the KMs
even if the Cray is still much faster.

>processors, and this might be best; but, then again, maybe not).  But which
>ones?  Seymour has been using 8 registers per set for the last 25 or more
>years (I don't know about the CDC 3x00 series);

This is not really correct for the Crays.  You can't forget about the
second level scalar registers ("programmable cache") or the vector registers..

> would more registers allow

He already has a lot more scalar registers than
MIPSCo.   Better to ask why the extra registers don't seem to produce
a gross advantage in cycles per instruction, which they don't seem to.  

>for faster code to be generated, up to a certain point?  How about register
>windows?

Do register windows produce fewer loads and stores?  The results seem to
indicate that they don't make much difference.   Not that they seem to hurt, 
either.  They are, it seems, no big deal- just another design choice.

>I guess part of what I'm saying, and asking, is this:  there is little
>reason why a Cray *must* be slower than a MIPS chip, and, if nothing else,

The Cray is generally faster.  The question is, rather, is it enough faster
to justify the cost.  Also, don't forget that the Cray is still the fastest
data engine around.  More throughput than anybody else.  You might even
see a Cray used as a fileserver for a farm of Killer Micros someday :-)

>there is more room on the Cray to put stuff directly in hardware (such as,
>oh, a 1 cycle multiply, or, better yet, a 1 cycle divide 8-)).  What needs
>to be done, and why hasn't it been done?

****************************************

Another speculation: would superscalar instruction issue of Cray scalar
instructions be possible?  What are the conditions necessary for issue
of multiple instructions per cycle? 

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117