Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!sco!seanf
From: seanf@sco.COM (Sean Fagan)
Newsgroups: comp.arch
Subject: KM's vs. Supers (medium)
Message-ID: <4328@scolex.sco.COM>
Date: 7 Jan 90 08:09:34 GMT
References: <34030@mips.mips.COM> <4322@nttmhs.ntt.JP> <39807@ames.arc.nasa.gov>
Reply-To: seanf@sco.COM (Sean Fagan)
Organization: The Santa Cruz Operation, Inc.
Lines: 52

In article <39807@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
>And looking just a little further ahead, an "R9000"
>(just making this up out of whole cloth) with a starting clock speed of
>100MHz, scaled up to 200 MHz by 1992 or 1993, could put Cray out of business.
>SGI will build scalar graphics workstations with half the power of the then
>current Crays at 1/100 the cost, and Ardent will do the vector version, with
>a similar price advantage.  An amusing idle speculation (?)

Ok.  I've just sat down with a list, and tried to figure something out;
hopefully, *somebody* can help me on this.  My "list" was a list of timings
for a CDC Cyber 170/760, running at 40MHz (thanks Brian!).  From experience,
I'd say that the list is correct (i.e., not a lie 8-)).  Excluding loads,
divides, and other memory references, average cycle count is between 2 and 3
clocks / instruction.

Actually, all that was for background (and to plug Cybers and Seymour 8-)).
On with the point:  the Cyber is 25 years old; I seriously doubt that any of
the Crays have *slower* performance (despite being 64-bit two's complement
instead of 60-bit one's complement, I have a high opinion of Seymour), yet,
according to Eugene Brooks, a 66MHz R6000 will outperform a Cray-2, which
runs at, what, 250MHz?

So, *how* does it do this?  The things I could come up with were:  the
Cray-2 has slower cycles than the Cyber, which is frightening; the larger
register count on the MIPS chip helps that much (possible, but I don't know;
that's why I'm asking); and / or the R6000 has more functional units than a
Cray-2.

So, now for speculation of my own.  Eugene has said that he thinks the
supercomputers of the future will be merely bunches of KM's running in
parallel; I'm not sure.  I think that Seymour (and his ilk) is definitely 
going to have to adopt some of the advantages of the KM's; I see no doubt of
that (otherwise, we *will* end up with thousands of KM's with vector
processors, and this might be best; but, then again, maybe not).  But which
ones?  Seymour has been using 8 registers per set for the last 25 or more
years (I don't know about the CDC 3x00 series); would more registers allow
for faster code to be generated, up to a certain point?  How about register
windows?

I guess part of what I'm saying, and asking, is this:  there is little
reason why a Cray *must* be slower than a MIPS chip, and, if nothing else,
there is more room on the Cray to put stuff directly in hardware (such as,
oh, a 1 cycle multiply, or, better yet, a 1 cycle divide 8-)).  What needs
to be done, and why hasn't it been done?

Sorry for the length, but this seems like a good discussion for this group
to get started on.

-- 
Sean Eric Fagan  | "Time has little to do with infinity and jelly donuts."
seanf@sco.COM    |    -- Thomas Magnum (Tom Selleck), _Magnum, P.I._
(408) 458-1422   | Any opinions expressed are my own, not my employers'.