Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cwjcc!gatech!gitpyr!loligo!mccalpin
From: mccalpin@loligo.uucp (John McCalpin)
Newsgroups: comp.arch
Subject: Re: Don't look back
Message-ID: <7330@pyr.gatech.EDU>
Date: 21 Feb 89 14:58:10 GMT
References: <13582@winchester.mips.COM> <20667@lll-winken.LLNL.GOV>
Sender: news@pyr.gatech.EDU
Reply-To: mccalpin@loligo.cc.fsu.edu (John McCalpin)
Organization: Supercomputer Computations Research Institute
Lines: 38

In article <20667@lll-winken.LLNL.GOV>
                 brooks@maddog.llnl.gov (Eugene Brooks) writes:
>In article <13582@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>A long, and well founded, analysis of why superminis are being squeezed out
>of their performance niche from the rear by VLSI based machines.
>
>This article is conservative at best, there are a whole lot of users of Cray
>time buying the latest VLSI based machines as a more cost effective alternative
>With the latest microprocessors these machines are within 1/5th of the
>performance of a Cray supercomputer for all but the most highly vectorized
>codes.  For scalar codes the performance of these microprocessors can be
>as high as 1/2 of a Cray-1S. 

I have had a great deal of trouble believing the poor performance of
"supercomputers" on scalar code lately.  I just ran the LINPACK 100x100
test of the FSU ETA-10 (10.5 ns=95 MHz) and got a result of 3.8 64-bit
MFLOPS for fully optimized (but not vectorized) code.  I used the
version of the code with unrolled loops. This performance is EXACTLY
the same as the MIPS R-3000/3010 pair running at 25 MHz.  I understand
that there must be tradeoffs, but considering the difference in cost,
this is a bit surprising....

Of course, the vectorized version runs at 60 MFLOPS on the ETA-10 now
(90 MFLOPS with the 7 ns CPU's), and gets rapidly faster for larger
systems.

I don't mean to pick on CDC/ETA --- even the fastest Cray's are going
to get caught by the highest performance RISC chips pretty soon.

I haven't seen any MC88000 results yet, but it looks to be able to put
out results in the same performance range.  Does anyone know if the
memory bandwidth of the 88000 is going to able to keep the floating-
point pipeline filled? This could push the performance of the 88000
up to closer to 10 MFLOPS....
----------------------   John D. McCalpin   ------------------------
Dept of Oceanography & Supercomputer Computations Research Institute
mccalpin@masig1.ocean.fsu.edu		mccalpin@nu.cs.fsu.edu
--------------------------------------------------------------------