Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!mash
From: mash@mips.com (John Mashey)
Newsgroups: comp.benchmarks
Subject: Re: Which benchmarks are useless?
Keywords: benchmarks  date  statistical correlation
Message-ID: <2717@spim.mips.COM>
Date: 26 Apr 91 04:22:34 GMT
References: <18049@sunquest.UUCP> <15159@helios.TAMU.EDU> <1749@marlin.NOSC.MIL>
Sender: news@mips.COM
Distribution: comp.benchmarks
Organization: MIPS Computer Systems, Inc.
Lines: 43
Nntp-Posting-Host: winchester.mips.com

In article <1749@marlin.NOSC.MIL> aburto@marlin.NOSC.MIL (Alfred A. Aburto) writes:
>What is the correlation between Dhrystone 2.1 results and Integer 
>SPECmarks?  How 'bad' is Dhrystone really compared to Integer SPECmarks?
>Don't really need to compute a correlation, but just show a table of
>comparable results (Integer SPECmark results vs Dhrystone results relative
>to VAX-11/780).
I don't have the numbers handy, and am about to go out of town again.
However, there are a number of combinations where Dhrystone would predict
that machine A is 25% faster than machine B, but on SPEC integer,
machine B is 25% faster than machine A, or equivalent combinations where
the prediction is 50% off.  Combinations like this include RS/6000 vs
MIPS, or Intel i860 vs MIPS, at appropriate clock rates.  A particular
case is RS/6000 Model 320, which SPECints around 16, but Dhrystone (1.1)
is around 27.5, versus MIPS Magnum (25Mhz, not the newer 33s), which
has SPECint at 19.5, but has a lower Dhrystone than the RS/6000.
If I find time, I'll dig out the numbers, but I've seen enough data over
the years to have stopped collecting it.  What it said was:
	a) Dhrystone ALWAYS gives a higher VAX-mips rating than SPECint.
	(except maybe the VAX-11/780 :-)  1.1 is worse (higher) than 2.1,
	but 2.1 is high also.  the raio ranges from about 1.1 up to at
	least 1.6, maybe even as high as 2X.
	b) The Dhrystone:SPECint ratios grossly track with a single
	product line, except that small-cache machines of a family look
	more better on Dhrystone than on SPECint.
>
>What if I took 4 (or N) integer programs (different than used by SPEC)
>and ran them on various systems and computed performance relative to
>the VAX-11/780. Would these integer results agree with the integer 
>SPECmark results for the same systems? Would they even be close?
Depends on the benchmarks.  If you look at the data, you find that MIPS's
mips-ratings are rather close to SPECint, and the reason is that the
set of benchmarks we used itnernally for the integer side (which are
actually include much worse cache-busters than SPECint), and account for
a few billion cycles of execution .... correlate with SPECint to within
10% or closer ... and they existed BEFORE SPEC.  Of course, one of the
benchmarks (espresso) was included in both.
Anyway, the answer is: if you run substantive integer benchmarks,
single-user, I think SPECint is a pretty good predictor.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650