Asri-unix.1239
net.works
utzoo!decvax!cca!sri-unix!pratt@Shasta@Sumex-Aim
Fri Apr 16 07:22:15 1982
Minow
Martin Minow's observation about the FORTRAN program that was optimized
from a doubly nested loop to two instructions is well taken.  The same
observation could be applied to other well-known benchmarks.  The Takeuchi
function is a popular benchmark for comparing call/return/parameter-passing
overhead on different machine-language combinations.  The existence of a
closed form for this function opens up the possibility that a clever
optimizer could discover the closed form.  As Martin says, people have been
known to tune compilers for benchmarks, so this possibility is less remote
than it might seem.  Furthermore if one is going to accept as valid the
optimization Martin cited, on what basis would one reject some other
optimization as invalid?

However Martin's inference that benchmarks are unrepresentative of machine
performance is not the inference I would have drawn.  A better inference is
that a benchmark written in a high-level language but intended to reveal the
performance of the underlying machine should solve a well-defined problem in
an efficient way, so as to prevent the optimizer from playing too large a
role.  

Not all benchmarks need be efficient.  A customer may be interested in knowing
how slow things will go if the programmers are sloppy, e.g. adding up the
numbers from 1 to n with a loop instead of using a closed form.  Clearly
a sloppy benchmark is called for in such a case.

Moral: when benchmarking, choose benchmarks matched to what it is you want
to measure.

Corollary: there is no such thing as the universal benchmark.

--Vaughan Pratt