Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!udel!nigel.ee.udel.edu!mccalpin
From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
Newsgroups: comp.benchmarks
Subject: Re: benchmarks (SPECmarks)
Message-ID: <MCCALPIN.90Nov15090025@pereland.cms.udel.edu>
Date: 15 Nov 90 14:00:25 GMT
References: <7581@eos.arc.nasa.gov> <1146@dg.dg.com> <7589@eos.arc.nasa.gov>
Sender: usenet@ee.udel.edu
Organization: College of Marine Studies, U. Del.
Lines: 41
Nntp-Posting-Host: perelandra.cms.udel.edu
In-reply-to: eugene@eos.arc.nasa.gov's message of 15 Nov 90 06:54:35 GMT

>>>> On 15 Nov 90 06:54:35 GMT, eugene@eos.arc.nasa.gov (Eugene Miya) said:

Eugene> In article <1146@dg.dg.com> uunet!dg!lewine writes:
>	But SPEC took a particular VAX-11/780.  The 11/780 time for
>	gcc is 1482 seconds.  It is not what you get on your particular
>	VAX.  This is more like taking a gold bar in Paris and saying
>	that is the standard meter.  As look as there is only one gold
>	bar, that is not a problem.

Eugene> Particular: that's right.  Two things to add: 1) DEC knew that
Eugene> the performance of 780 models varied by as much as 10%.  Is
Eugene> 10% acceptable?  In some cases yes, others no.  

I am surprised that such a sensible person as Eugene would imply that
*any* benchmark number had a precision of <10%.  I don't believe that
it is possible to take any combination of "general-purpose" benchmarks
and use that data to predict your application (or workload)
performance to within 10%.  In fact, it is all to easy to have 10%
changes in the performance of your application itself if (as is
inevitable) it is run under conditions that differ from the formal
benchmark test.  Minor changes like operating system or compiler
upgrades, changes in the system background load, or even disk
fragmentation can produce 10% changes in wall-clock time quite
easily....

So how precise do I think the numbers are?  Well, with 6 or so years
of experience in performance evaluation of supercomputers and
high-performance workstations, I can generally [i.e., not always]
estimate the performance of my codes to within 20-25% based on a broad
suite of benchmark results (LINPACK 100x100, LINPACK 1000x1000,
Livermore Loops, hardware description with cycle counts, and maybe a
bit more).  (I deliberately ignore PERFECT since the one code that I
know in some detail [the ocean model from GFDL/Princeton] is a mess,
and I would not blame a compiler at all for having trouble vectorizing
or optimizing it [or even understanding what it is supposed to be
doing!]).

--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@vax1.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET