Path: utzoo!attcan!uunet!husc6!bbn!rochester!pt.cs.cmu.edu!k.gp.cs.cmu.edu!lindsay
From: lindsay@k.gp.cs.cmu.edu (Donald Lindsay)
Newsgroups: comp.arch
Subject: Re: Benchmarking
Summary: Program generators have amazing leverage
Message-ID: <3285@pt.cs.cmu.edu>
Date: 12 Oct 88 03:25:39 GMT
References: <2220003@hpausla.HP.COM> <46500026@uxe.cso.uiuc.edu> <6683@nsc.nsc.com> <6684@nsc.nsc.com> <4263@wright.mips.COM> <6729@nsc.nsc.com> <10498@reed.UUCP> <4655@winchester.mips.COM> <6868@nsc.nsc.com> <1988Oct9.011633.13259@utzoo.uucp> <4853@winchester.mips.C
Sender: netnews@pt.cs.cmu.edu
Organization: Carnegie-Mellon University, CS/RI
Lines: 55

In article <6899@nsc.nsc.com> grenley@nsc.nsc.com.UUCP (George Grenley) writes:
>I've received a lot of email on b'marking; one individual pointed out that
>the database community "scales" the size of the b'mark (i.e., size of dbase)
>to the size of machine.  An interesting idea.  I think we should consider
>taking some of the small integer b'marks, and "enlarge"them by having the
>program call itself recursively in a non-trivial way.  Then, the test would
>consist of running the program at, say, 1 through 1000 levels of recursion,
>or whenver you run out of RAM.  Then, publish the performance numbers.
>Comments?  I am willing to volunteer to drive this if anyone (like, f'rinstance,
>someone who can code better than me) wants to help.

First, I am solidly behind the idea that the best benchmark is the user's
application.

That said, synthetic benchmarks might as well be as good as they can be.
So, some guidelines:
 - the code working set must be adjustable, without upper bound.
 - the data working set, likewise.
 - the compiler must be prevented from inlining.
 - the compiler must be prevented from eliminating dead code.
 - the benchmark must be small, so that it can be presented in full in
   reports. (This avoids the "slight change" problem, as well as permitting
   easy shipment.)

There is a fairly simple way to achieve these ends. Do not write a
benchmark program: write a program which writes out the benchmark program.
A simple loop in the Generator program allows the creation of arbitrarily
large source files. (Since compilers can get bent by this, the Generator
should also generate multiple source files.) The procedure names will be
somewhat unimaginative: f0001, f0002, and so on. If the source files are in
C, then it's fair to generate macros and macro calls, simply to reduce the
file space requirements.

Next, the Generator should write the code to fill an array with pointers
to these functions. Similarly, we need a data array.

Next, we need a portable routine which generates pseudo-random numbers.
(Portable mostly means that it avoids arithmetic overflow.) The quality of
the randomness is unimportant, as long as it doesn't get stuck at 0 or
other such silliness.  The generated program will use the randoms to form
subscripts, either into the data array, or into the function pointer array.
In this way, we may control the size of the working sets.

Since the functions should (largely) be accessed via the array, inlining is
defeated. Avoid dead code.

I have no comment concerning the contents of the routines: the Generator is
independent of this, and should be able to generate several benchmarks (for
instance, an integer one, and a float one).

Since the benchmarks must be told how "big" to be, the benchmark report
form should be written as part of the benchmark. This must specify how many
runs must be made, and with exactly what parameters. 
-- 
Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science