Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uwm.edu!bionet!agate!eos!eugene
From: eugene@eos.UUCP (Eugene Miya)
Newsgroups: comp.arch
Subject: Re: benchmarking
Message-ID: <6336@eos.UUCP>
Date: 28 Feb 90 07:43:08 GMT
References: <7393@pdn.paradyne.com> <3300102@m.cs.uiuc.edu> <36438@mips.mips.COM> <132232@sun.Eng.Sun.COM>
Reply-To: eugene@eos.UUCP (Eugene Miya)
Organization: NASA Ames Research Center, Calif.
Lines: 65

In article <132232@sun.Eng.Sun.COM> lm@sun.UUCP (Larry McVoy) writes:
>>In article <3300102@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes:
>> [doesn't like SPEC]
>
>In article <36438@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>>I'm sad to hear that what we've done so far is "no better than Dhrystone",
>>because if that's true, a whole bunch of us have wasted, in toto, at
>>least several million $ to try to do something better....
>
>I, for one, think SPEC is great.
Oh well.  Too bad.
>On the other hand, SPEC is not the end all to beat all.  No benchmark
>is.  If I could design the ideal benchmark, I'd design something that
>had a bunch of knobs that I could turn, like an I/O knob, a CPU knob, a
>memory knob, etc.  I don't have this, so I run several different
>benchmarks that measure these sorts of things.  SPEC is one, Musbus is
>another, and we have several internal/proprietary benchmarks as well.
>Some people don't like you to quote one figure from one benchmark - I
>like to see all the figures from all the benchmarks.  The more data you
>have the easier it is to weed out the spikes.

Sorry, John, I tend to suspect SPEC spent a lot of money.

Larry is not talking about a single program.  This is something I
am working on parts, when I get tiny bits of time.  And like most
research 90% of its failure.  I do not believe the future lies
in simply having more numbers.  More numbers can just be more confusing.
You want number?  Try 42.  Douglas Adams published that.

The fundamental idea which separates people is whether or not you believe
the whole a of benchmark equals or exceeds the sum of its parts.
If you believe in "magic" i.e. known optimizations, features, etc.
that wholes > than parts, then you aren't scientific about the problem.
A person won't get anywhere and you can posit little green men who
only come on Tuesdays as to why your code runs fast.  I am not saying
timings of parts should sum to a whole code, but as you work on
higher and higher conceptual ideas of programs, you can factor these
optimization, etc. into performance.

Users simply concern with pure speed will inevitably be disappointed.
I can point to analogies of performance in other areas.  The idea
of placing a VAX under a bell jam, gold-plating a code, etc.
That's all covered in an article I read after visiting the NBS entitled
Foundations of Metrology in an NBS journal.  There's ways of doing this,
but just like the platinum bar, there's limits of usefulness: hence
why we use other measuring tools, why we refine atomic clocks, etc.
Until we are willing to do that with computers, benchmarking won't get far.

I don't get any warm fuzzy feeling from the Nelson, the Loops, Dongarra, etc.
sure their's bit of truth, but you have to be willing to consider surrogates.
We want to run (with benchmarks), but we have to crawl before walking
and playing.  We are going to need a progression of research.
But most of you don't have the time or inclination to listen, so I
will go back to my hacking.

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:

  "You trust the `reply' command with all those different mailers out there?"
  "If my mail does not reach you, please accept my apology."
  {ncar,decwrl,hplabs,uunet}!ames!eugene
  Do you expect anything BUT generalizations on the net?
  [If it ain't source, it ain't software -- D. Tweten]