Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ut-sally!husc6!endor!reiter From: reiter@endor.harvard.edu (Ehud Reiter) Newsgroups: comp.arch,comp.sys.nsc.32k Subject: Re: "Unoptimizing" Dhrystone Message-ID: <1706@husc6.UUCP> Date: Tue, 21-Apr-87 09:52:51 EST Article-I.D.: husc6.1706 Posted: Tue Apr 21 09:52:51 1987 Date-Received: Wed, 22-Apr-87 02:49:46 EST References: <4190@nsc.nsc.com> <951@moscom.UUCP> <2577@intelca.UUCP> <999@mips.UUCP> <312@gumby.UUCP> Sender: news@husc6.UUCP Reply-To: reiter@harvard.UUCP (Ehud Reiter) Organization: Aiken Computation Lab Harvard, Cambridge, MA Lines: 25 Xref: mnetor comp.arch:1028 comp.sys.nsc.32k:98 In article <312@gumby.UUCP> earl@mips.UUCP (Earl Killian) writes: >At MIPS, we use the following programs for benchmarking and >architectural study: > >All of these have very different statistics. On the subject of benchmarks, I've noticed that many people who do use multiple programs for benchmarking tend to average the results together. Even MIPS, whose Performance Brief is the best of its kind that I've seen, does this to some degree (e.g. their "UNIX Total", in comparing an M/500 to a VAX-11/780, averages together relative performance figures ranging from 4.7 to 6.7 into an average figure of 5.6). Since averaging can be a statistically dubious process (especially if the programs being averaged are not weighted by some kind of "freqency of use" parameter), perhaps it would be better to report the full range instead of an average (so, in the above example, the M/500 would be rated as 4.7-6.7, not 5.6). Reporting the range would also have the advantage of giving people an idea of the variances involved. So, if the community does shift to using multiple programs for benchmarking (which I strongly approve of), we should consider reporting "relative performance" not as a single average number, but rather as a range. Ehud Reiter reiter@harvard (ARPA,BITNET,UUCP)