Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!ames!ames.arc.nasa.gov!lamaster From: lamaster@ames.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: linpack Message-ID: <34061@ames.arc.nasa.gov> Date: 20 Oct 89 17:37:40 GMT References: <35825@lll-winken.LLNL.GOV> <127@csinc.UUCP> <9079@batcomputer.tn.cornell.edu> <2203@brazos.Rice.edu> <9089@batcomputer.tn.cornell.edu> Sender: usenet@ames.arc.nasa.gov Organization: NASA - Ames Research Center Lines: 80 In article mccalpin@masig3.masig3.ocean.fsu.edu (John D. McCalpin) writes: >In article <9079@batcomputer.tn.cornell.edu> kahn@tcgould.tn.cornell.edu >writes: >>Throw away ALL your copies of the LINPACK 100x100 benchmark if you >>are interested in supercomputers. The 300x300 is barely big enough > >In article <2203@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) >writes: >>Danny Sorenson mentioned recently that linpack is sort of intended >>to show how *bad* a computer can be. The sizes are kept >>deliberately small so that the vector machines barely have a chance >>to get rolling. > >In article <9089@batcomputer.tn.cornell.edu> kahn@batcomputer.tn.cornell.edu >(Shahin Kahn) writes: >>It certainly is biased towards micros with limited memory and is >>absolutely irrelevant as a *supercomputer* application. Yes, it >>can show how bad a supercomputer can be. I found this particularly amusing. As a longtime defender of Linpack, I have often been accused of being biased towards big vector machines, because of the sensitivity of Linpack to memory and FPU bandwidth, and, particularly, the ability to stream from memory to FPU and back to memory. Now, this happens to be a very important property of a CPU to effectively run many codes which I have seen over the years. I never rate machines on the basis of Linpackin absolute terms, but you can tell a lot about a machine with low Linpack numbers. I never could understand why people bought 11/780's, for example :-) >Well, I'll through in my $0.02 of disagreement with this thread. It >has been my experience that the poor performance of the LINPACK >100x100 test on supercomputers is *entirely typical* of what users >actually run on the things. I agree that vector startup time is extremely important, and Linpack is a fairly "nice" program with respect to average vector length, so if vector startup time is so long as to slow it down significantly, this is significant to users. On the other hand, the performance is not so poor as it once was. See below. > There a plenty of applications burning up >Cray, Cyber 205, and ETA-10 cycles which have average vector lengths >*shorter* than the average of 66 elements for the LINPACK test, and >which are furthermore loaded down with scalar code. I note, at this point, that the ~7 ns (~142 MHz) ETA10G achieved the fastest single processor Linpack score of 93 MFLOPS, or, .65 FLOPs/cycle. The Cyber 205, using earlier compilers, achieved only 17 MFLOPS, on a 20 ns clock, or, .34 FLOPs/cycle. The Cray Y-MP gets .50 FLOPs/cycle, while the Cray 1/S (in 1983) got only .15 FLOPs/cycle. The same Cray 1/S today gets .34 FLOPs/cycle. (It has less memory bandwidth than the Cray X-MP and Y-MP, so you can see this effect clearly.) The Cray XYs and ETA machines are capable of achieving around 2 FLOPs/cycle in hardware. My point is that there has been considerable improvement in both hardware and software and startup time penalties have been correspondingly reduced. What is the relevance of Linpack today? Well, it still has *some* of the same significance that it always had, but tells less than it used to. When caches were small, you could extrapolate the 100x100 results to bigger jobs without worrying. On the big iron, your performance went *up* with larger problem sizes, so even if 300x300x300 was typical of your problem, you knew what to expect. Now, with 100x100 fitting in some small caches, you need to run a bigger job to make sure performance doesn't go *down* dramatically. (Which it does on some micro based systems, of course.) On the other hand, if you switch to 300x300, you lose the information contained in the 100x100 case wrt startup time. So, good numbers tell you even less than they did before, but bad numbers, in a sense, tell you even more, for the same reason. I wouldn't buy a machine with a bad Linpack result to do these kinds of problems, but I would look hard at the set of machines with good results, and would look further, to see which one was the best for the job at hand. Sometimes I use a "grep" benchmark just for fun. The Cray Y-MP still greps faster than any other machine I have tested, but, I agree, it isn't the world's most cost effective grepper out there :-) As with all benchmarks, you have to be careful not to fool yourself... I would guess that an amd29000 based system might be the fastest on that particular test. Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117