Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker.mit.edu!ai-lab!life!burley
From: burley@geech.ai.mit.edu (Craig Burley)
Newsgroups: comp.benchmarks
Subject: Re: BYTE and benchmarks (anecdote)
Message-ID: <BURLEY.91Jan15092556@geech.ai.mit.edu>
Date: 15 Jan 91 14:25:56 GMT
References: <EACHUS.91Jan8114647@aries.linus.mitre.org>
	<1991Jan09.171216.21923@convex.com> <josef.663497368@ugum01>
	<1991Jan14.202646.5677@murdoch.acc.Virginia.EDU>
Sender: news@ai.mit.edu
Distribution: comp
Organization: Free Software Foundation 545 Tech Square Cambridge, MA 02139
Lines: 62
In-reply-to: gl8f@astsun.astro.Virginia.EDU's message of 14 Jan 91 20:26:46 GMT

In article <1991Jan14.202646.5677@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes:

   Silly me. I propose we use the "infinite loop" benchmark instead of
   SPEC.

At my last job, I did some work on the ROM-based microcode, primarily
optimizing for space and, sometimes, speed, so new features could be
added to the ROM.

One thing I made smaller and, therefore, faster, was the idle loop.
It had half the instructions of the previous version yet worked as
well.

Since the idle loop accounted for 99.99% of the execution profile of
these machines, I reported to management that I had singlehandedly
improved system performance by a factor of two!

They were very impressed!  (-:

Seriously, it does seem difficult to produce benchmarks that don't show,
to an unspecifiable extent, a combination of hardware speed and compiler-
produced optimizations.  Globally optimizing compilers, as they become
available, might well turn many popular benchmarks into, Fortranically,
STOP statements, or WRITEs of constants.

One way to help avoid this is to have a two-phase benchmark.  The first
phase uses system information like date and time to seed a random number,
then writes a data file containing values derived from this number.
The second phase, the one actually measured, then reads this file and
does the measureable calculations on it.

Of course, this might appear to corrupt timings wanted for just CPU
activity with disk activity.  Even trying to isolate, in the code, one
mode from another by reading the whole file into memory, has its
problems (a global optimizer might decide, if it doesn't see the
call to begin the timer between the reads and the computes, to
reduce memory requirements and perhaps increase speed by folding the
compute loop into the I/O loop).

If the data set were small compared to the compute size of the problem --
for example, if an NP-complete problem like traveling salesman were
computed using whatever favorite (to-be-benchmarked) techniques based
on distance data in a file -- this I/O problem could be reduced to the
vanishing point.  Also, by using an NP-complete problem, even a
globally optimizing compiler is less likely to find a better algorithm
that significantly affects the benchmark (if it does, that's big news,
right?!).

The other important solution, I'd like to stand on my soapbox and say,
is the proliferation of free software such that any machine you want
to benchmark is likely to already have compiler technology equivalent
to that used for the machine with which it is being compared.  For
example, doing 80486 vs. 68040 benchmarks for C code is likely to be
much more helpful in isolating the hardware characteristics (when this
is desired -- it isn't always, of course!) if GNU CC is used to do
the compilations on both machines.  Of course, one machine's description
might have more coverage as far as various optimizations, and differences
in library routines would remain, but at least you wouldn't be comparing
code compiled using cse against code compiled without it, for example.
--

James Craig Burley, Software Craftsperson    burley@ai.mit.edu