Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!sdd.hp.com!ucsd!ogicse!borasky From: borasky@ogicse.ogi.edu (M. Edward Borasky) Newsgroups: comp.benchmarks Subject: Re: SPEC vs. Dhrystone Message-ID: <15565@ogicse.ogi.edu> Date: 3 Jan 91 15:46:45 GMT References: <44342@mips.mips.COM> <15379@ogicse.ogi.edu> <44353@mips.mips.COM> <1685@marlin.NOSC.MIL> <15546@ogicse.ogi.edu> <44465@mips.mips.COM> Distribution: comp.benchmarks Organization: Oregon Graduate Institute (formerly OGC), Beaverton, OR Lines: 61 In article mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: >>>>>> On 3 Jan 91 06:18:59 GMT, mash@mips.COM (John Mashey) said: > >mash> 1C: Compiler gimmickry >mash> For any important benchmark that is small, compilers will get tuned >mash> in ways that are absolutely useless in real life. This has happened >mash> at least with Whetstone, Dhrystone, and LINPACK. > >So what optimizations have been performed on the LINPACK 100x100 code >that are "absolutely useless" in real life? 1. There was a compiler once that actually CHEATED on the Linpack bench- mark. What they did was, any time they encountered a routine called "SGEFA" or "DGEFA", they checked the number of parameters and their type, and if they agreed with the standard usage in the LINPACK bench- mark, branched off to a machine-coded SGEFA or DGEFA, ILLEGALLY UNDER THE FORTRAN RULES IGNORING THE SGEFA/DGEFA CODE THAT IS PRESENT IN THE LINPACK BENCHMARK. There were also pre-processors that stripped out SGEFA/DGEFA or SAXPY/DAXPY, so that they would be linked in from libraries rather than compiled. This is a fine optimization technique IF CONTROLLED BY THE USER -- that was the whole REASON LINPACK was written using two levels of libraries. But the compiler CANNOT know without looking at the submitted code whether these cheats will produce correct results. 2. Another cheat was once pulled on the 1000-equation LINPACK benchmark. What this vendor did was to use a Gauss-Jordan reduction, where all the vectors are of length 1000, rather than the decreasing-length algorithm normally used. This in itself is legal; you can use any algorithm you want to. Unfortunately, the G-J reduction uses twice as many operations as the standard one, so when they computed their MEGAFLOPS, they divided how many FLOPS they had done by the time it took. Unfortunately, the Dongarra rules allow you to claim only the FLOPS required by the STANDARD LINPACK algorithm. So they claimed twice the speed they were allowed to. In fact, they also ran the benchmark in 32 bits instead of 64, to get another 2X speed boost. It turns out that the matrix in the LINPACK benchmark was ill-conditioned enough to blow up on the 32-bit arithmetic and Dongarra got suspicious. Even discounting blatant cheats like the ones I describe above, there is a wider issue here about how much COMPILE time a user is willing to expend for "perfect" compiles. It is possible using techniques of whole-program compiling to optimize out all the unnecessary checks in the LINPACK benchmark -- it's simple enough and well-structured enough so that a compiler can "realize" that DAXPY is always called with inc- rement 1, that the DAXPY can be in-lined, that N is always greater than zero, that one-trip DO loops will work, etc. But these optimizations take time -- so much time that on real codes larger than LINPACK, one of two things will happen: 1. The user will think the compiler has gone into an endless loop and zap the compile, or 2. The compiler writer, probably interfacing with Marketing, will build in lots of escapes into the optimizer so that compile times appear to be finite. There ARE users who claim they will tolerate long optimization times; my experience in Marketing was that such were few and far between. The main uses for such a compiler are large third-party application codes that are seldom compiled -- once or twice a year for the ones I worked with. But the size of these codes works AGAINST the whole- program compiler!