Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!mcnc!ece-csc!ncrcae!ncr-sd!bigbang!celerity!ps
From: ps@celerity.UUCP (Pat Shanahan)
Newsgroups: comp.arch
Subject: Re: Benchmarking
Message-ID: <203@celerity.UUCP>
Date: Thu, 28-May-87 17:54:04 EDT
Article-I.D.: celerity.203
Posted: Thu May 28 17:54:04 1987
Date-Received: Sat, 30-May-87 09:32:24 EDT
References: <415@winchester.UUCP> <642@percival.UUCP> <426@winchester.UUCP> <2100@husc6.UUCP>
Reply-To: ps@celerity.UUCP (Pat Shanahan)
Organization: Celerity Computing, San Diego, Ca.
Lines: 57

In article <2100@husc6.UUCP> reiter@endor.UUCP (Ehud Reiter) writes:
>...
>The point is, there is a great demand out there for simple, single figure
>performance numbers which are in the public domain.  No matter how much we
>complain that single figures are meaningless, people out there in the real
>world are going to continue using them.  There's a reason why MIPS and
>Dhrystones are so often quoted.

This is very unfortunate, if true. People who believe simple, single figure
performance numbers are doomed to be suprised by reality.

>
>And, we can do better than Dhrystone!  We all know what the problems with
>Dhrystone are - can't be globally optimized, too much string handling,
>too small, etc.  We can certainly write a benchmark which, although still
>"bad", will be much better than Dhrystone.

I agree. I don't know of any real C program that does as much structure
assignment as the C Dhrystone. I think that C performance is important
enough to justify a benchmark that reflects how the language is actually
used.

>
>I think we can even get away with replacing single-number benchmarks by
>two number benchmarks, which would give a high and low performance figure
>instead of just a single performance figure (that is, the benchmark would
>consist of lots of programs.  The performance numbers would be normalized
>against some standard (good old 4.2BSD VAX-11/780?), and the summary
>statistics would be the highest and lowest of the normalized numbers).

I think a better approach would be the one taken in the Livermore loops
benchmark. The report includes the performance for the individual loops, as
well as summary information such as the harmonic mean. I am not sure if
high and low would really help much, except in convincing people that single
numbers are meaningless. The extreme outliers can be due to architectural
choices that are good for most programs but bad for certain exceptional
programs. For example, pipelining may be good for real programs, but bad for
an artifical test of jump performance. If you are going to report high and
low it is very important to make all the benchmark programs reasonably
mixed. If you are going to report individual results this is less critical.

>
>In summary, we can't write a perfect benchmark, but we can write a better
>benchmark.
>
>					Ehud Reiter
>					reiter@harvard	(ARPA,BITNET,UUCP)
>					reiter@harvard.harvard.EDU  (new ARPA)


It should certainly be possible to write a better benchmark of C performance
than the Dhrystone.
-- 
	ps
	(Pat Shanahan)
	uucp : {decvax!ucbvax || ihnp4 || philabs}!sdcsvax!celerity!ps
	arpa : sdcsvax!celerity!ps@nosc