Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!ames!ames.arc.nasa.gov!lamaster
From: lamaster@ames.arc.nasa.gov (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Re: linpack
Message-ID: <34061@ames.arc.nasa.gov>
Date: 20 Oct 89 17:37:40 GMT
References: <35825@lll-winken.LLNL.GOV> <127@csinc.UUCP> <9079@batcomputer.tn.cornell.edu> <2203@brazos.Rice.edu> <9089@batcomputer.tn.cornell.edu> <MCCALPIN.89Oct19090641@masig3.masig3.ocean.fsu.edu>
Sender: usenet@ames.arc.nasa.gov
Organization: NASA - Ames Research Center
Lines: 80

In article <MCCALPIN.89Oct19090641@masig3.masig3.ocean.fsu.edu> mccalpin@masig3.masig3.ocean.fsu.edu (John D. McCalpin) writes:
>In article <9079@batcomputer.tn.cornell.edu> kahn@tcgould.tn.cornell.edu 
>writes:
>>Throw away ALL your copies of the LINPACK 100x100 benchmark if you
>>are interested in supercomputers.  The 300x300 is barely big enough 
>
>In article <2203@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) 
>writes:
>>Danny Sorenson mentioned recently that linpack is sort of intended
>>to show how *bad* a computer can be.  The sizes are kept
>>deliberately small so that the vector machines barely have a chance
>>to get rolling.
>
>In article <9089@batcomputer.tn.cornell.edu> kahn@batcomputer.tn.cornell.edu 
>(Shahin Kahn) writes:
>>It certainly is biased towards micros with limited memory and is
>>absolutely irrelevant as a *supercomputer* application.  Yes, it
>>can show how bad a supercomputer can be.

I found this particularly amusing.  As a longtime defender of Linpack, I have
often been accused of being biased towards big vector machines, because of
the sensitivity of Linpack to memory and FPU bandwidth, and, particularly,
the ability to stream from memory to FPU and back to memory.  Now, this
happens to be a very important property of a CPU to effectively run many codes
which I have seen over the years.  I never rate machines on the basis of Linpackin absolute terms, but you can tell a lot about a machine with low Linpack
numbers.  I never could understand why people bought 11/780's, for example :-)


>Well, I'll through in my $0.02 of disagreement with this thread.  It
>has been my experience that the poor performance of the LINPACK
>100x100 test on supercomputers is *entirely typical* of what users
>actually run on the things.

I agree that vector startup time is extremely important, and Linpack is a 
fairly "nice" program with respect to average vector length, so if vector
startup time is so long as to slow it down significantly, this is significant
to users.  On the other hand, the performance is not so poor as it once was.
See below.

> There a plenty of applications burning up
>Cray, Cyber 205, and ETA-10 cycles which have average vector lengths
>*shorter* than the average of 66 elements for the LINPACK test, and
>which are furthermore loaded down with scalar code.

I note, at this point, that the ~7 ns (~142 MHz) ETA10G achieved the fastest
single processor Linpack score of 93 MFLOPS, or, .65 FLOPs/cycle.  The
Cyber 205, using earlier compilers, achieved only 17 MFLOPS, on a 20 ns
clock, or, .34 FLOPs/cycle.  The Cray Y-MP gets .50 FLOPs/cycle, while
the Cray 1/S (in 1983) got only .15 FLOPs/cycle.   The same Cray 1/S today
gets .34 FLOPs/cycle.  (It has less memory bandwidth than the Cray X-MP
and Y-MP, so you can see this effect clearly.)  The Cray XYs and ETA machines 
are capable of achieving around 2 FLOPs/cycle in hardware.  My point is
that there has been considerable improvement in both hardware and software
and startup time penalties have been correspondingly reduced.

What is the relevance of Linpack today?  Well, it still has *some* of the
same significance that it always had, but tells less than it used to.  When
caches were small, you could extrapolate the 100x100 results to bigger jobs
without worrying.  On the big iron, your performance went *up* with larger
problem sizes, so even if 300x300x300 was typical of your problem, you knew
what to expect.  Now, with 100x100 fitting in some small caches, you need to
run a bigger job to make sure performance doesn't go *down* dramatically.
(Which it does on some micro based systems, of course.)  On the other hand, if
you switch to 300x300, you lose the information contained in the 100x100 case
wrt startup time.  So, good numbers tell you even less than they did before,
but bad numbers, in a sense, tell you even more, for the same reason.  
I wouldn't buy a machine with a bad Linpack result to do these kinds of
problems, but I would look hard at the set of machines with good results,
and would look further, to see which one was the best for the job at hand.

Sometimes I use a "grep" benchmark just for fun.  The Cray Y-MP still greps
faster than any other machine I have tested, but, I agree, it isn't the
world's most cost effective grepper out there :-)  As with all benchmarks,
you have to be careful not to fool yourself...  I would guess that an amd29000
based system might be the fastest on that particular test.

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117