Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ut-sally!husc6!endor!reiter
From: reiter@endor.harvard.edu (Ehud Reiter)
Newsgroups: comp.arch,comp.sys.nsc.32k
Subject: Re: "Unoptimizing" Dhrystone
Message-ID: <1706@husc6.UUCP>
Date: Tue, 21-Apr-87 09:52:51 EST
Article-I.D.: husc6.1706
Posted: Tue Apr 21 09:52:51 1987
Date-Received: Wed, 22-Apr-87 02:49:46 EST
References: <4190@nsc.nsc.com> <951@moscom.UUCP> <2577@intelca.UUCP> <999@mips.UUCP> <312@gumby.UUCP>
Sender: news@husc6.UUCP
Reply-To: reiter@harvard.UUCP (Ehud Reiter)
Organization: Aiken Computation Lab Harvard, Cambridge, MA
Lines: 25
Xref: mnetor comp.arch:1028 comp.sys.nsc.32k:98

In article <312@gumby.UUCP> earl@mips.UUCP (Earl Killian) writes:
>At MIPS, we use the following programs for benchmarking and
>architectural study:
>	<list of 10 or so programs>
>All of these have very different statistics.

On the subject of benchmarks, I've noticed that many people who do use
multiple programs for benchmarking tend to average the results together.
Even MIPS, whose Performance Brief is the best of its kind that I've seen,
does this to some degree (e.g. their "UNIX Total", in comparing an M/500
to a VAX-11/780, averages together relative performance figures ranging
from 4.7 to 6.7 into an average figure of 5.6).  Since averaging can be
a statistically dubious process (especially if the programs being averaged
are not weighted by some kind of "freqency of use" parameter), perhaps it
would be better to report the full range instead of an average (so, in the
above example, the M/500 would be rated as 4.7-6.7, not 5.6).  Reporting the
range would also have the advantage of giving people an idea of the variances
involved.

So, if the community does shift to using multiple programs for benchmarking
(which I strongly approve of), we should consider reporting "relative
performance" not as a single average number, but rather as a range.

					Ehud Reiter
					reiter@harvard	(ARPA,BITNET,UUCP)