Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!mips!winchester!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: SPECmarks for RS/6000 systems - lies???
Message-ID: <41935@mips.mips.COM>
Date: 4 Oct 90 22:35:10 GMT
References: <4734@lure.latrobe.edu.au>
Sender: news@mips.COM
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Inc.
Lines: 93

In article <4734@lure.latrobe.edu.au> CCHD@lure.latrobe.edu.au (Huw Davies - La Trobe University Computer Centre) writes:
>I have just got a copy of the September 1990 SPECwatch and I am
>a bit concerned about the following paragraph:
>
>"There also seems to be a problem with replicating IBM's RS6000
>SPECmark results, and with achieving the expected levels of
>performance with other code. It's known that IBM extensively
>modified the compilers used to compile the benchmarks. If these
>"knobs and dials" turn out to be not readily accessible to users
>of the production compilers shipped with the systems, SPEC
>will be faced with its first serious cheating problem. The usual
>prize (a trial subscription or four month extension of an
>existing subscription) for the first person to provide
>independent RS/6000 SPECmark results using the compilers shipped
>with the products."

It would be interesting to see other people's ability to replicate
the results, but let's be real careful before branding things lies.
The intent of SPEC is that users understand what they have, and
what versions of things are being used to get results, and in general,
the existing forms of disclosure do that fairly well.  Unfortunately,
what they do NOT do, is in the published form, describe all of the
compiler options.  (Sometimes they do, sometimes they don't, for
lack of space.) Now, when a SPECtape comes out, the makefiles are
there for the people who've reported results, but if someone
reports results AFTER a release tape, you can't find that out easily.

So, it is quite possible that:
	a) Somebody at IBM ran these things, turned knobs and dials
	on the compilers appropriately, (and there are plenty of
	knobs and dials on most compilers) and got these results.
	And in fact, as I have high respect for the folks at IBM
	doing the SPEC stuff, I personally believe that they got
	the answers they say they got, although I have not personally
	run them.
	b) In any such case, it is possible that:
		1) There are magic options that only the vendor knows
		about.  This is considered a no-no.
		2) There are magic tools, that only the vendor has, for
		analyzing the programs, to figure out the options that
		should be used.  One would expect, that a user who runs
		the result should get the same answers, even if the user
		has no obvious way to derive the right options.
		3) There are tools and explanations available to the
		user, which provide the same performance, with work,
		which would be expected to happen in the normal way that
		people would work.
		4) You just say -Ox, and if you need to do something
		else, the compiler tells you.
	Now, in this hierarchy, the ideal is 4), 3) is OK for some people, 2)
	is getting kind of marginal, and 1) is really a no-no, unless
	the magic options just aren't released yet, but will be.
	c) Of course, users must assess for themselves what to think,
	when faced with big performance differences between levels
	2, 3, and 4.
	d) SPEC is continually working to tighten this up, because the
	goal is that a user can for replicate results easily, and we're
	not quite there yet, sometimes.
	e) Certainly, a good calibration is to run the SPEC stuff the
	way you normally would, by starting with the vanilla makefiles,
	and supplying the options you'd pick from a quick reading of the
	systems cc & f77 manual pages.

I often use an extensive computer==car analogy, whose performance measurement
part goes like this:

unreal:	drag-strip, short distance in a straight-line as fast as possible,
	don't care if vehicle useful on the road.
exaggerated: (Dhrystone mips): on the road, but only downhill
real exaggerated: (peak mips & mflops, guaranteed not to exceed): drive it
	off a cliff, and measure as it falls.... hard on the drivers, but
	that's the way it goes
reality: up-hill, down-hill, around curves: Monte Carlo, etc, driven
	by real people

now, SPEC is a fairly good approximation to one slice of reality, with
the obvious niggle that the numbers reported by vendors are with real
machines, running on real programs .... but usually with skilled racing
drivers who can extract the most performance from their machines.
Sometimes, an average person can get the same performance, sometimes
not.

But, anyway, let us be careful not to characterize something as lies,'
when there are perfectly legitimate reasons, well within the rules,
that might explain this.  A far more interesting question is to
ask, in general, of all of us is:
	What effort does it take to achieve given levels of performance?
	What's the difference between -O and -O5 -x -y -z5000 -q -k300....?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086