Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!lll-lcc!pyramid!prls!mips!mash From: mash@mips.UUCP Newsgroups: comp.arch Subject: Re: 01/31/87 Dhrystone Results and Source Message-ID: <114@winchester.mips.UUCP> Date: Mon, 16-Feb-87 02:31:44 EST Article-I.D.: winchest.114 Posted: Mon Feb 16 02:31:44 1987 Date-Received: Tue, 17-Feb-87 03:27:35 EST References: <2348@homxb.UUCP> <15203@onfcanim.UUCP> Reply-To: mash@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 33 Keywords: Benchmark, performance measurement In article <1224@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes: >Has anyone actually tried to evaluate the Dhrystone (and other benchmarks) >by seeing how well it predicts performance on real applications? It would >seem straightforward to take ten random applications running on specific test >data, measure their performance on some target machine/compiler combinations, >and statistically analyze how much of the peformance differences had been >predicted by the Dhrystone figures. 1) There's probably an interesting M.S. thesis in here somewhere. > >The debate on flaws of the Dhrystone is quite interesting, but it would be nice >to have some real data on how good or bad the Dhrystone was. I'm not even sure >that a good benchmark is possible in principle - that is, I wonder whether >it is possible to come up with a single number which can predict >(with any reasonable accuracy) performance on a range of different >applications. 2) Most people I know don't believe very much in single-number performance metrics. 3) Althought I raised this issue in the first place, there do appear to be a few applications that grossly correlate [and I mean grossly] with Dhrystone, i.e., if you saw the Performance Brief I posted here a few months ago, there was actually a reasonable correlation of it with things like grep/diff/yacc/nroff, i.e., integer user-level programs of moderate [but not huge] size, although it sometimes overstated the performance of small-cached micros versus superminis. This effect is typical of small benchmarks: if it fits into the cache, you get something that correlates a bit better with raw CPU/cache speed; the more it doesn't fit, the more you're measuring cache-main-memory performance. -- -john mashey DISCLAIMER: UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086