Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!utkcs2!de5 From: de5@ornl.gov (Dave Sill) Newsgroups: comp.benchmarks Subject: Re: benchmarks (SPECmarks) Keywords: validity meaning Message-ID: <1990Nov20.210922.4090@cs.utk.edu> Date: 20 Nov 90 21:09:22 GMT References: <7581@eos.arc.nasa.gov> <1146@dg.dg.com> <7589@eos.arc.nasa.gov> <1148@dg.dg.com> <108988@convex.convex.com> Sender: news@cs.utk.edu (USENET News System) Reply-To: Dave Sill Organization: Oak Ridge National Laboratory Lines: 136 In article <108988@convex.convex.com>, rosenkra@convex.com (William Rosencranz) writes: > >first off: what are SPEC ratings (or any standard bm ratings for that >matter) meant to do? answer this question in your mind first before >proceeding... They're meant to make it possible to get some idea of the performance one can expect a system to provide, without requiring that one observe the performance directly. >i really see no point whatsoever in relating an execution time on one >machine to that of another "standard" machine, no matter how standard, >(except possibly the old "that's the way we've ALWAYS done it before", >e.g. "MIPS"), just to come up with some single "standard" unit of >performance. The reason for relating performance on an unknown system to that of a known one is to give the numbers some relevance. If you tell me your system does 15 gigafloogles/second, that tells me nothing unless I know what a floogle is. But if you tell me your system scored 11.2 SPECfloogles, I can get a handle on whether 15 GF/s is fast or not, at least if I have any VAX experience--or another machine whos SPECfloogle score I know. >if I were buying (instead of selling :-), i'd want to see wallclock and >cpu times, because i, as a human being, can relate to time far easier >the "SPECs" or whatever. Sure, but the absolute wall clock isn't going to tell you anything. It when you compare the values for different systems that gain information from the results. So what if the floogle benchmark runs in 1:26? That means nothing. Give me a list of floogle times, and I'll probably normalize them on some machine I'm familiar with (or maybe the slowest machine in the list). It's the relative performance that's important. >if something runs in 10 seconds, compared to >100 seconds, i know i can sit and wait, call it "interactive". if >something runs in 10 min vs 1 hour, i know i can go out to lunch in the >latter case. a SPEC of 1.345 vs a SPEC of 4.345 means nothing, until >i translate to time anyway. time is easier to "heft", as it were. Only if you are intimately familiar with what's being done. What's the difference between 10 minutes versus 100 minutes and 10 SPECfloogles versus 100 SPECfloogles? Both indicate the same relative performace, and both measure the same absolute performance. The difference is that with the former you need two numbers to compare, but with the latter you have the built-in VAX value: 10 SPECfloogles is 10 times faster than SPEC's VAX 11/780. Not perfect, but better than nothing. >further, i'd want to see how the "standard" bm results scale with >problem size, especially on cache-based memory systems. because a >buy decision based on a single number could come back to haunt me. This is a valid point, but has nothing to do with whether wall clock or relative-to-known values are reported. >i'd also want to know what sort of performance enhancements i could >expect if i wanted to put 1 hour, 1 day, and 1 week's effort into >the optimization of any particular code, if possible. Lotsa luck. I don't know of any benchmarks that attempt to anticipate what gains could be made by optimization, by you or anyone else. >i'd also want to compare a vendor's peak performance with how well >it did on standard bm's or on my own. Just ask the vendors, they'll be glad to give you peak performance figures. :-) >finally, i'd want to see what sort of support i can expect from the >vendor. granted, pre-sales and post-sales activities can vary >greatly, but i think i can shake out a vendor during the sales >cycle, as most saavy buyers can. This isn't a benchmarking issue at all. Benchmarking can't and shouldn't attempt to prevent the foolish buyer from buying foolishly. Raw performance is just one criterion that should be part of a procurement effort. >and >believe me, if i see 2 or 3 systems with uni-number ratings within >say 5% of each other, i sure as heck would not say "these machines >are identical, so let's buy the cheaper one becasue it has better >price/SPECperformance". I couldn't agree more. >i'd want to look at the raw data anyway, and >probably run my sort of workload on them to really get an idea of >what i can expect. SPEC provides the real data. The SPECmark is just a handy single figure of merit. Better that than dhrystone-mips. As for testing them yourself: have at it. Sometimes that's not feasible, and that's what benchmarks are for. >similarly, if i see two machines that differ by >alot in some particular individual tests, i' want to know why. Again, I agree. But identifying the reason is not benchmarking issue. Identifying the difference *is*. >the basic problem i see with these uni-number ratings is that >people can make up their minds, even subconsciously, based on a >first impression. So what do you propose? Outlawing single figures of merit? Better to have one subject to much scrutiny and well-understood than to have something ad hoc, unreliable, informal, etc. >this is human nature. you always have that in >the back of your mind. and it is easy to just say "2 > 1.5" >rather than "based on some real workload, and on problem size, >and on vendor support, and on application availability, and on >whatever, 2 is not necessarily > 1.5". No, 2 *is* greater than 1.5. Always. The problem is that there may be more important issues that aren't so easily quantifiable. >surely we can give more credit to the intellect of people making >buy decisions than that? You're the one who seems to think people are going to base their decisions solely on a SFM. The detailed data is available to those who want it. I don't mean to come across as some kind of SPEC apologist, I just think what they're doing is better than what was done before they existed. -- Dave Sill (de5@ornl.gov) Martin Marietta Energy Systems Workstation Support