Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!julius.cs.uiuc.edu!apple!agate!shelby!unix!garth!vogons!walter
From: walter@vogons.UUCP (Walter Bays)
Newsgroups: comp.arch
Subject: Re: SPECmarks for RS/6000 systems - lies???
Message-ID: <64@garth.UUCP>
Date: 12 Oct 90 17:41:01 GMT
References: <37935@ut-emx.uucp>
Sender: daemon@garth.UUCP
Reply-To: walter@apd.ingr.com (Walter Bays)
Organization: INTERGRAPH (APD) -- Palo Alto, CA
Lines: 43

In article <37935@ut-emx.uucp> ddt@walt.cc.utexas.edu (David Taylor) writes:
>The RS/6000 is a poor candidate for the SPECmark anyway, because it's
>strengths lie in just a couple of instructions exploitable in some
>programs.  The figures from those benchmarks seriously skew the SPECmark.
>Remember, it's based on the geometric mean which doesn't reflect performance
>well for poorly distributed benchmark performances.

It's probably more accurate to say the RS/6000 like all machines has
strengths and weaknesses, and some of the SPEC release 1 benchmarks hit
some of the strengths particularly well.  Presumably IBM designed the
machine to be strong in application areas they thought particulary
important, so it's not too surprising that they succeeded well for some
of the benchmarks.

I agree with you that a single geometric mean cannot characterize the
performance of such a machine, due to very large differences between
minimum and maximum performance, most obvious now for the RS/6000,
Stardent, and Intel 860, but you will see this effect for more machines
in the future.  As CPU's get faster by exploiting more fine-grained
parallelism in different ways, the differences increase, and the
"little" machines are becoming as difficult to classify as the
supercomputers have always been.  John Mashey's "Your Mileage May Vary"
paper is a very good treatment of the issue.

Difficulties interpreting SPECmarks for these machines does not mean the
machines nor the SPEC benchmarks are flawed, just that you have to look
beyond a single number.  If you know that your workload is adequately
represented by 6 of the benchmarks with a fixed amount of work to do,
you could use a weighted harmonic mean of those 6.  If you have latent
demand that will consume all available resources, then weighted
geometric mean may be more appropriate.  But in no case will speed on
tomcatv get your C compilations done more quickly, nor slowness on the
Lisp interpreter degrade your Spice simulations.

I think there's a big need for some commercial benchmark companies and
trade journals to get into the business of helping ordinary users
interpret the benchmark results for their own situations.

---
Double Disclaimer: speaking for myself, not for Intergraph nor for SPEC
Walter Bays		Phone (415) 852-2384	FAX (415) 856-9224
EMAIL uunet!ingr.com!bays   or   uunet!{apple.com,pyramid.com}!garth!walter
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303