Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!hplabs!otter.hpl.hp.com!otter!sfk
From: sfk@otter.hpl.hp.com (Steve Knight)
Newsgroups: comp.lang.scheme
Subject: Re: Benchmarking Scheme
Message-ID: <6260007@otter.hpl.hp.com>
Date: 2 May 91 10:04:53 GMT
References: <JAFFER.91May1163805@kleph.ai.mit.edu>
Organization: Hewlett-Packard Laboratories, Bristol, UK.
Lines: 28

Whenever the topic of benchmarks comes around, I am always impressed by
the subtle depth of the problems.  It seems to be an area in which the demand
for simple answers is much greater than the supply.

> So what are the prospects for machine independent performance
> measures?

Very poor.  Folks are simply too incautious about the interpretation of
benchmarks in any case.  

(A case in point -- some folks I know were doing a whole series of benchmarks 
on some software.  They wrote up all the tests real nice.  However, I couldn't
make head or tail of their figures when I compared it against other results
I'd checked out.  So I got on the machine & tried the tests.  My first test was
to check out the timing software they were using.  It was wrong by a factor
of 2.  This is by no means a unique example.)

I guess I can add one point.  When doing benchmarks across machines, it is 
important to establish what the timing and storage measurements are actually 
measuring.  Does the timing include process swapping, page swapping, and other
admin overheads?  Does it correlate with a stopwatch?  Does it include
time spent in the kernel?  Does it include garbage collection time (how many
were there)?  Do two C programs of known relative performance give equal
performance ratios on a range of machines?  If you're on 68xxx machines,
do the same test but avoiding 68020 instructions, to check if its ineffective
use of the 68020 instruction set that causes the difficulties.  And so on.

Steve