Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!rutgers!lll-lcc!pyramid!prls!mips!mash
From: mash@mips.UUCP
Newsgroups: comp.arch
Subject: Re: 01/31/87 Dhrystone Results and Source
Message-ID: <114@winchester.mips.UUCP>
Date: Mon, 16-Feb-87 02:31:44 EST
Article-I.D.: winchest.114
Posted: Mon Feb 16 02:31:44 1987
Date-Received: Tue, 17-Feb-87 03:27:35 EST
References: <2348@homxb.UUCP> <15203@onfcanim.UUCP>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 33
Keywords: Benchmark, performance measurement

In article <1224@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes:
>Has anyone actually tried to evaluate the Dhrystone (and other benchmarks)
>by seeing how well it predicts performance on real applications?  It would
>seem straightforward to take ten random applications running on specific test
>data, measure their performance on some target machine/compiler combinations,
>and statistically analyze how much of the peformance differences had been
>predicted by the Dhrystone figures.
1) There's probably an interesting M.S. thesis in here somewhere.
>
>The debate on flaws of the Dhrystone is quite interesting, but it would be nice
>to have some real data on how good or bad the Dhrystone was.  I'm not even sure
>that a good benchmark is possible in principle - that is, I wonder whether
>it is possible to come up with a single number which can predict
>(with any reasonable accuracy) performance on a range of different
>applications.

2) Most people I know don't believe very much in single-number performance
metrics.

3) Althought I raised this issue in the first place, there do appear to
be a few applications that grossly correlate [and I mean grossly]
with Dhrystone, i.e., if you saw the Performance Brief I posted here a
few months ago, there was actually a reasonable correlation of it with
things like grep/diff/yacc/nroff, i.e., integer user-level programs of
moderate [but not huge] size, although it sometimes overstated the performance
of small-cached micros versus superminis.  This effect is typical of small
benchmarks: if it fits into the cache, you get something that correlates
a bit better with raw CPU/cache speed; the more it doesn't fit, the more
you're measuring cache-main-memory performance.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086