Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!aplcen!uakari.primate.wisc.edu!uflorida!stat!mccalpin From: mccalpin@stat.fsu.edu (John Mccalpin) Newsgroups: comp.arch Subject: Re: The Killer Micro From Hell Summary: Cray's are still fast! Keywords: cpu starvation, memory bandwidth Message-ID: <788@stat.fsu.edu> Date: 30 Dec 89 17:26:01 GMT References: <158@csinc.UUCP> <787@stat.fsu.edu> <42701@lll-winken.LLNL.GOV> Reply-To: mccalpin@stat.fsu.edu (John Mccalpin) Organization: Supercomputer Computations Research Institute Lines: 44 In article <787@stat.fsu.edu> I wrote: >By the way, I estimate the the (soon-to-be-installed) FSU Cray >Y/MP-4/432 will only be about 125 times as fast as the new MIPS "KILLER >MICRO from HELL" on my code. Yep, they are closing the gap all right.... In article <42701@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) asked: >Would you care to enlighten the masses with regard to the basis for >this estimate? >brooks@maddog.llnl.gov, brooks@maddog.uucp The estimate is based on the _observed_ performance of an 8-processor Cray Y/MP vs a 25 MHz R-3000 (SGI 4D/2x0). The speed ratio in that case is 536:1, and this Cray is an internal machine with a 6.5 ns clock, rather than the 6 ns clock that will be installed at FSU. So applying some scaling suggests that a 4-cpu Cray Y/MP at 6 ns will be about 290 times as fast as the R-3000 box. Then scale the MIPS cpu speed by the ratio of the clocks of the R-6000 to R-3000 to reduce this ratio to about 120:1. (I am assuming a 60 MHz clock on the R-6000 --- I don't know what the exact value will be....). Since the code is highly parallelizable, a multi-processor R-6000 based machine will show good speedups up to about 16 processors. The experience on the Cray and Ardent machines suggests that a speedup of 12x should be possible on a 16-cpu system. However, multi-processor Cray Y/MP's exist today, and multi-processor R-6000 machines do not.... The code is a hybrid finite-element/finite-difference ocean circulation model written in portable FORTRAN-77. The calculations are all done in 64-bit precision, and require 64-bits for reasonable accuracy. This is all just as excuse to remind Eugene :-) that some users will still be able to make effective use of vector supercomputers. In Price/Performance ratios, the scalar KILLER MICRO's are not even significantly ahead of the traditional supercomputers on optimal codes. They are certainly not _yet_ competitive with regard to turnaround time on large vector jobs, though I agree that that will change soon as 8-16 cpu machines in the R-6000 class become available. My next project is porting this code to a Connection Machine CM-2. I anticipate about the same performance as the 8-processor Y/MP, but in a much more scalable architecture, and at about 1/4 of the price.