Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!julius.cs.uiuc.edu!apple!agate!shelby!unix!garth!vogons!walter From: walter@vogons.UUCP (Walter Bays) Newsgroups: comp.arch Subject: Re: SPECmarks for RS/6000 systems - lies??? Message-ID: <64@garth.UUCP> Date: 12 Oct 90 17:41:01 GMT References: <37935@ut-emx.uucp> Sender: daemon@garth.UUCP Reply-To: walter@apd.ingr.com (Walter Bays) Organization: INTERGRAPH (APD) -- Palo Alto, CA Lines: 43 In article <37935@ut-emx.uucp> ddt@walt.cc.utexas.edu (David Taylor) writes: >The RS/6000 is a poor candidate for the SPECmark anyway, because it's >strengths lie in just a couple of instructions exploitable in some >programs. The figures from those benchmarks seriously skew the SPECmark. >Remember, it's based on the geometric mean which doesn't reflect performance >well for poorly distributed benchmark performances. It's probably more accurate to say the RS/6000 like all machines has strengths and weaknesses, and some of the SPEC release 1 benchmarks hit some of the strengths particularly well. Presumably IBM designed the machine to be strong in application areas they thought particulary important, so it's not too surprising that they succeeded well for some of the benchmarks. I agree with you that a single geometric mean cannot characterize the performance of such a machine, due to very large differences between minimum and maximum performance, most obvious now for the RS/6000, Stardent, and Intel 860, but you will see this effect for more machines in the future. As CPU's get faster by exploiting more fine-grained parallelism in different ways, the differences increase, and the "little" machines are becoming as difficult to classify as the supercomputers have always been. John Mashey's "Your Mileage May Vary" paper is a very good treatment of the issue. Difficulties interpreting SPECmarks for these machines does not mean the machines nor the SPEC benchmarks are flawed, just that you have to look beyond a single number. If you know that your workload is adequately represented by 6 of the benchmarks with a fixed amount of work to do, you could use a weighted harmonic mean of those 6. If you have latent demand that will consume all available resources, then weighted geometric mean may be more appropriate. But in no case will speed on tomcatv get your C compilations done more quickly, nor slowness on the Lisp interpreter degrade your Spice simulations. I think there's a big need for some commercial benchmark companies and trade journals to get into the business of helping ordinary users interpret the benchmark results for their own situations. --- Double Disclaimer: speaking for myself, not for Intergraph nor for SPEC Walter Bays Phone (415) 852-2384 FAX (415) 856-9224 EMAIL uunet!ingr.com!bays or uunet!{apple.com,pyramid.com}!garth!walter USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303