Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.benchmarks Subject: Re: benchmarks (SPECmarks) Message-ID: Date: 15 Nov 90 14:00:25 GMT References: <7581@eos.arc.nasa.gov> <1146@dg.dg.com> <7589@eos.arc.nasa.gov> Sender: usenet@ee.udel.edu Organization: College of Marine Studies, U. Del. Lines: 41 Nntp-Posting-Host: perelandra.cms.udel.edu In-reply-to: eugene@eos.arc.nasa.gov's message of 15 Nov 90 06:54:35 GMT >>>> On 15 Nov 90 06:54:35 GMT, eugene@eos.arc.nasa.gov (Eugene Miya) said: Eugene> In article <1146@dg.dg.com> uunet!dg!lewine writes: > But SPEC took a particular VAX-11/780. The 11/780 time for > gcc is 1482 seconds. It is not what you get on your particular > VAX. This is more like taking a gold bar in Paris and saying > that is the standard meter. As look as there is only one gold > bar, that is not a problem. Eugene> Particular: that's right. Two things to add: 1) DEC knew that Eugene> the performance of 780 models varied by as much as 10%. Is Eugene> 10% acceptable? In some cases yes, others no. I am surprised that such a sensible person as Eugene would imply that *any* benchmark number had a precision of <10%. I don't believe that it is possible to take any combination of "general-purpose" benchmarks and use that data to predict your application (or workload) performance to within 10%. In fact, it is all to easy to have 10% changes in the performance of your application itself if (as is inevitable) it is run under conditions that differ from the formal benchmark test. Minor changes like operating system or compiler upgrades, changes in the system background load, or even disk fragmentation can produce 10% changes in wall-clock time quite easily.... So how precise do I think the numbers are? Well, with 6 or so years of experience in performance evaluation of supercomputers and high-performance workstations, I can generally [i.e., not always] estimate the performance of my codes to within 20-25% based on a broad suite of benchmark results (LINPACK 100x100, LINPACK 1000x1000, Livermore Loops, hardware description with cycle counts, and maybe a bit more). (I deliberately ignore PERFECT since the one code that I know in some detail [the ocean model from GFDL/Princeton] is a mess, and I would not blame a compiler at all for having trouble vectorizing or optimizing it [or even understanding what it is supposed to be doing!]). -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@vax1.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET