Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!bloom-beacon!husc6!purdue!decwrl!nsc!grenley
From: grenley@nsc.nsc.com (George Grenley)
Newsgroups: comp.arch
Subject: Re: Benchmarking
Message-ID: <6868@nsc.nsc.com>
Date: 8 Oct 88 00:54:41 GMT
References: <2220003@hpausla.HP.COM> <46500026@uxe.cso.uiuc.edu> <6683@nsc.nsc.com> <6684@nsc.nsc.com> <4263@wright.mips.COM> <6729@nsc.nsc.com> <10498@reed.UUCP> <4655@winchester.mips.COM>
Reply-To: grenley@nsc.nsc.com.UUCP (George Grenley)
Organization: National Semiconductor, Sunnyvale
Lines: 76

The following discussion started with a posting of mine about organizing some
head to head benchmark comparisons.  I wanted to give all interested parties
a chance to look at one another's hardware...  The primary reason is because
it is difficult if not impossible to reproduce most vendors' benchmark
numbers - and I specifically include my employer, NSC, in this category.  We
publish 16600 for Dhry1.1 at 30 mhz, no wait state - but no '532 has ever
run exactly that number.  (it came from the simulator).

One reason is simply, most companies won't spend the money to go out and get
other companies' hardware to test.

In article <4655@winchester.mips.COM> mash@winchester.UUCP (John Mashey) writes:
>In article <10498@reed.UUCP> mdr@reed.UUCP (Mike Rutenberg) writes:
 ...
>>But it is so hard to make it run and yet be "the same code."

(deleted, refernce to why Dhry is susceptible to over optimization...)

>
>>@BEGIN(Black Magic)
>>You can do things to trick the compiler into keeping the loop.  A null
>>procedure the loop calls will do the trick if compiled separately.  But
>>then you have to also put a call to this null procedure in the main
>>dhrystone loop.  But this may do bad things to your numbers, especially
>>if it affects your cache hit-rate.  And this will change the numbers
>>you get, not in a positive way.
>>@END(Black Magic)
>
>>I wish benchmarks would be rewritten to be a ultimately portable &
>>really really smart about outwitting too-smart compilers.  It would be
>>nice to be able to run a benchmark program totally unchanged.  This
>>would avoid the temptation or need to modify the tests.

AGREED! SO LET'S DO IT!  Time for Dhry 3.0, or whatever.  It seems to me the
easiest way to tackle the loop-that-does-nothing problem is to have it do
something, preferably process a variable that is supplied at run time, so 
the compiler cannot know what it is going to do...

But in any case, some new CPU benchmarks need to be developed.  Perhaps we can
all agree that an existing one is suitable, or perhaps we need to create a
new one.

>(back from 3 weeks' Down Under; it will take a while to catch up!)
>
>ONE MORE TIME:
>	use large, real programs as benchmarks.
>	do NOT use small programs as benchmarks
>	be especially careful of small synthetic benchmarks
>
>Two of the most counterproductive things people can be doing are:
>	a) Tuning compilers to optimize small benchmarks, especially
>	with optimizations that don't really matter much on real
>	programs. (Optimizations that actually matter elsewhere are fine.)
>	b) Continually reworking synthetic benchmarks to stay ahead
>	of advances in compiler optimization.
>It is sad how much effort across this business has gone down the rat-holes.

Agreed on all counts, especially regarding waste effort (quadbyte string
compare on a certain new processor - can you say "dhrystone in microcode"?)

The only drawback to John's generally correct suggestion is the lack of any
standards for larger benchmark programs, for integer.  Whet and Linpak seem
pretty well established as FP b'marks, although I wonder whether they're not
a bit cooked sometimes... (I heard once that a Fortran compiler was released
which SPECIFICALLY checked the soruce to see if it was Whet, and if it was,
stuck in a VERY fast routine).

Someone on this net suggested using GNU's public domain versions of various
Unix utilities (grep, nroff, etc).  Sounds like a good plan to me - it doesn't
matter if they're unix compatible, just so they compile and run.

Perhaps the first step is to convene a working group to standardize this
stuff and promote its use.  I volunteer.  Any others?

George Grenley
NSC