Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: SPECmarks for RS/6000 systems - lies??? Message-ID: <41935@mips.mips.COM> Date: 4 Oct 90 22:35:10 GMT References: <4734@lure.latrobe.edu.au> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 93 In article <4734@lure.latrobe.edu.au> CCHD@lure.latrobe.edu.au (Huw Davies - La Trobe University Computer Centre) writes: >I have just got a copy of the September 1990 SPECwatch and I am >a bit concerned about the following paragraph: > >"There also seems to be a problem with replicating IBM's RS6000 >SPECmark results, and with achieving the expected levels of >performance with other code. It's known that IBM extensively >modified the compilers used to compile the benchmarks. If these >"knobs and dials" turn out to be not readily accessible to users >of the production compilers shipped with the systems, SPEC >will be faced with its first serious cheating problem. The usual >prize (a trial subscription or four month extension of an >existing subscription) for the first person to provide >independent RS/6000 SPECmark results using the compilers shipped >with the products." It would be interesting to see other people's ability to replicate the results, but let's be real careful before branding things lies. The intent of SPEC is that users understand what they have, and what versions of things are being used to get results, and in general, the existing forms of disclosure do that fairly well. Unfortunately, what they do NOT do, is in the published form, describe all of the compiler options. (Sometimes they do, sometimes they don't, for lack of space.) Now, when a SPECtape comes out, the makefiles are there for the people who've reported results, but if someone reports results AFTER a release tape, you can't find that out easily. So, it is quite possible that: a) Somebody at IBM ran these things, turned knobs and dials on the compilers appropriately, (and there are plenty of knobs and dials on most compilers) and got these results. And in fact, as I have high respect for the folks at IBM doing the SPEC stuff, I personally believe that they got the answers they say they got, although I have not personally run them. b) In any such case, it is possible that: 1) There are magic options that only the vendor knows about. This is considered a no-no. 2) There are magic tools, that only the vendor has, for analyzing the programs, to figure out the options that should be used. One would expect, that a user who runs the result should get the same answers, even if the user has no obvious way to derive the right options. 3) There are tools and explanations available to the user, which provide the same performance, with work, which would be expected to happen in the normal way that people would work. 4) You just say -Ox, and if you need to do something else, the compiler tells you. Now, in this hierarchy, the ideal is 4), 3) is OK for some people, 2) is getting kind of marginal, and 1) is really a no-no, unless the magic options just aren't released yet, but will be. c) Of course, users must assess for themselves what to think, when faced with big performance differences between levels 2, 3, and 4. d) SPEC is continually working to tighten this up, because the goal is that a user can for replicate results easily, and we're not quite there yet, sometimes. e) Certainly, a good calibration is to run the SPEC stuff the way you normally would, by starting with the vanilla makefiles, and supplying the options you'd pick from a quick reading of the systems cc & f77 manual pages. I often use an extensive computer==car analogy, whose performance measurement part goes like this: unreal: drag-strip, short distance in a straight-line as fast as possible, don't care if vehicle useful on the road. exaggerated: (Dhrystone mips): on the road, but only downhill real exaggerated: (peak mips & mflops, guaranteed not to exceed): drive it off a cliff, and measure as it falls.... hard on the drivers, but that's the way it goes reality: up-hill, down-hill, around curves: Monte Carlo, etc, driven by real people now, SPEC is a fairly good approximation to one slice of reality, with the obvious niggle that the numbers reported by vendors are with real machines, running on real programs .... but usually with skilled racing drivers who can extract the most performance from their machines. Sometimes, an average person can get the same performance, sometimes not. But, anyway, let us be careful not to characterize something as lies,' when there are perfectly legitimate reasons, well within the rules, that might explain this. A far more interesting question is to ask, in general, of all of us is: What effort does it take to achieve given levels of performance? What's the difference between -O and -O5 -x -y -z5000 -q -k300....? -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086