Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!rutgers!seismo!lll-lcc!pyramid!prls!mips!mash
From: mash@mips.UUCP
Newsgroups: comp.arch,comp.sys.misc
Subject: Re: 01/31/87 Dhrystone Results and Source
Message-ID: <112@winchester.mips.UUCP>
Date: Sun, 8-Feb-87 19:16:24 EST
Article-I.D.: winchest.112
Posted: Sun Feb  8 19:16:24 1987
Date-Received: Tue, 10-Feb-87 02:17:43 EST
References: <2348@homxb.UUCP> <15203@onfcanim.UUCP> <293@ames.UUCP> <2366@homxb.UUCP>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 81
Keywords: Benchmark, C, performance measurement
Xref: watmath comp.arch:308 comp.sys.misc:331

In article <2366@homxb.UUCP> gemini@homxb.UUCP (Rick Richardson) writes:
...TIME, TIMES, Dhrystone accuracy, etc....
>
>Besides, anybody who quibbles over a 10% difference isn't looking
>at the whole picture when selecting a machine.  Dhrystones just
>get you looking at the right performance arena.  Other factors
>(software, support, migration path, etc.) will get you to the
>final decision.

1) This seems like a reasonable answer, which is in agreement with Gene
Miya's adverse comments on ANY single figure of merit. [That's why we
end up publishing a Performance Brief that's pretty large, to include
enough different benchmarks and explanations to have even a chance of
meaning anything.]

2) I'd generally consider Dhrystone to have about a single digit of accuracy.
In particular, there are all sorts of anomalies with regard to heavily-cached
machines, and the rules regarding allowable optimizations.  For example,
on "realistic" integer benchmarks, our "5MIPS" M/500s are about 20%
faster than a "4MIPS" VAX 8600, and many hours of working on both says
this is consistent, although the Dhrystone numbers would claim the M/500
(somewhere in the 10-12,000 range, depending on exactly what optimizations
are/aren't allowed for consistency) is 1.7X to 1.9X the 8600's 6000-7000.
I'd strongly expect that Dhrystone often overstates the performance of
some micros relative to superminis: for example, Intergraph's recently
published numbers for it's 30Mhz Clipper box include about 8300 Dhrystones,
making it look faster than the 8600, even though it was somewhat/substantially
slower on any of the more realistic benchmarks that they showed.

3) As Rick has always maintained, things like Dhrystone [or Whetstone,
or Linpack, or...] are just a starting point.  However, as it stands,
Dhrystone is the only integer benchmark of any size whatsoever
[since things like Ackerman's function, sieve, etc are way too small
to be good predictors of anything] that is commonly cited and collected.
Maybe it's time to either create a Dhrystone 2.0, or some other integer
benchmark to get at least 1 figure of merit that's a bit better,
although there is still no substitute for running the real applications.

4) How about some suggestions for ways to improve Dhrystone?
(This is not necessarily an attempt to make it "perfect" (no such thing) or
even necessarily a good model for operations statistics, but at least to
make some obvious improvements.)
Here are a few suggestions for starters:

a) Eliminate the problems that stop one from using real optimizers.
	As it stands, it is very hard to compare any numbers, since you don't
	know what sorts of optimizations are done.  This is particularly
	handicapping to those of us who have real optimizers, but can't use
	them for the test.  In some cases, if you eliminate the global
	optimizer, you also eliminate optimizations which are included in
	other people's compilers [that don't have separate optimizers],
	so they get the benefit of the techniques.
	Note that real optimizers are much more prevalent than they were when
	Dhrystone was proposed. As Rick says in his Dhrystone notes:

	"To summarize, DHRYSTONES by themselves are not anything more than
	a way to win free beers when arguing 'Box-A versus Box-B' religion.
	They do provide insight into Box-A/Compiler-A versus Box-A/Compiler-B
	comparisons."

	Right now, the best compilers are penalized [10-20%, from our numbers].

b) STRCPY: we've seen strcpy take up to 30% of the execution time.
Does anybody have any real nubmers of real programs on the real usage
patterns for the str* and mem* routines? [Call frequency, distribution
of string sizes, etc]  We suspect that there's more strcpy time than
most real programs.

c) Other areas [much more work]:
	1) Make program bigger to avoid small-cache effects.
	2) Rework overall program behavior to be more like substantial
	programs [we do a lot of statistics-gathering on references,
	instructions frequency, branch counts, function calls, etc, and
	Dhrystone as it sits isn't excpetionally representative.]

Enough: again, no 1 figure-of-merit does it, but let's at least keep this
one in proper perspective [which I think Rick does].
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086