Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!mash From: mash@mips.com (John Mashey) Newsgroups: comp.benchmarks Subject: Re: Which benchmarks are useless? Keywords: benchmarks date statistical correlation Message-ID: <2717@spim.mips.COM> Date: 26 Apr 91 04:22:34 GMT References: <18049@sunquest.UUCP> <15159@helios.TAMU.EDU> <1749@marlin.NOSC.MIL> Sender: news@mips.COM Distribution: comp.benchmarks Organization: MIPS Computer Systems, Inc. Lines: 43 Nntp-Posting-Host: winchester.mips.com In article <1749@marlin.NOSC.MIL> aburto@marlin.NOSC.MIL (Alfred A. Aburto) writes: >What is the correlation between Dhrystone 2.1 results and Integer >SPECmarks? How 'bad' is Dhrystone really compared to Integer SPECmarks? >Don't really need to compute a correlation, but just show a table of >comparable results (Integer SPECmark results vs Dhrystone results relative >to VAX-11/780). I don't have the numbers handy, and am about to go out of town again. However, there are a number of combinations where Dhrystone would predict that machine A is 25% faster than machine B, but on SPEC integer, machine B is 25% faster than machine A, or equivalent combinations where the prediction is 50% off. Combinations like this include RS/6000 vs MIPS, or Intel i860 vs MIPS, at appropriate clock rates. A particular case is RS/6000 Model 320, which SPECints around 16, but Dhrystone (1.1) is around 27.5, versus MIPS Magnum (25Mhz, not the newer 33s), which has SPECint at 19.5, but has a lower Dhrystone than the RS/6000. If I find time, I'll dig out the numbers, but I've seen enough data over the years to have stopped collecting it. What it said was: a) Dhrystone ALWAYS gives a higher VAX-mips rating than SPECint. (except maybe the VAX-11/780 :-) 1.1 is worse (higher) than 2.1, but 2.1 is high also. the raio ranges from about 1.1 up to at least 1.6, maybe even as high as 2X. b) The Dhrystone:SPECint ratios grossly track with a single product line, except that small-cache machines of a family look more better on Dhrystone than on SPECint. > >What if I took 4 (or N) integer programs (different than used by SPEC) >and ran them on various systems and computed performance relative to >the VAX-11/780. Would these integer results agree with the integer >SPECmark results for the same systems? Would they even be close? Depends on the benchmarks. If you look at the data, you find that MIPS's mips-ratings are rather close to SPECint, and the reason is that the set of benchmarks we used itnernally for the integer side (which are actually include much worse cache-busters than SPECint), and account for a few billion cycles of execution .... correlate with SPECint to within 10% or closer ... and they existed BEFORE SPEC. Of course, one of the benchmarks (espresso) was included in both. Anyway, the answer is: if you run substantive integer benchmarks, single-user, I think SPECint is a pretty good predictor. -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650