Path: utzoo!utgpu!attcan!uunet!munnari!otc!metro!ipso!stcns3!stca77!peter
From: peter@stca77.stc.oz (Peter Jeremy)
Newsgroups: comp.arch
Subject: Re: Benchmarking
Message-ID: <318@stca77.stc.oz>
Date: 18 Oct 88 23:06:21 GMT
References: <2220003@hpausla.HP.COM> <46500026@uxe.cso.uiuc.edu> <6683@nsc.nsc.com> <6684@nsc.nsc.com> <4263@wright.mips.COM> <6729@nsc.nsc.com> <10498@reed.UUCP> <4655@winchester.mips.COM> <6868@nsc.nsc.com> <1988Oct9.011633.13259@utzoo.uucp> <4853@winchester.mips.C
Reply-To: peter@stca77.stc.oz (Peter Jeremy)
Organization: Alcatel-STC, Alexandria, AUSTRALIA
Lines: 97

My comments in the following are very C orientated.  I realise this is not
very portable but, most of you will be familiar with C, most other languages
are (or could be) capable of doing the same thing and I am not familiar with
recent compiler capabilities in other languages.

In article <3285@pt.cs.cmu.edu> lindsay@k.gp.cs.cmu.edu (Donald Lindsay) writes:
>In article <6899@nsc.nsc.com> grenley@nsc.nsc.com.UUCP (George Grenley) writes:
>> [ Offers to write scalable synthetic benchmark, if no-one else wants to ]
>
>First, I am solidly behind the idea that the best benchmark is the user's
>application.

I think we can all take this as read.  Unfortunately in most cases it is
impractical.  Synthetic benchmarks are our best substitute, as long as we
know what we are doing (marketroid "benchmark" results are a glaring
example of not knowing what they are doing :-).

>That said, synthetic benchmarks might as well be as good as they can be.
>So, some guidelines:
> [ code and data working sets fully adjustable, small benchmark presented
> in full in the report ]

> - the compiler must be prevented from inlining.
I think this statement may need some more thought.  putc() is a 'function'
that has been 'inlined' since the beginning of C - it was implemented as a
macro because, until very recently, C compilers didn't allow function
inlining.  Inlining small functions makes sense because the function call
overhead (both size and time) is a significant portion of the size of the
function.

What is needed is a way to differentiate between the following classes of
functions:
	1) small library routines (eg strcpy, strcmp)
	2) large very general library routines (printf, scanf)
	3) other library routines
	4) small synthetic routines simulating small routines
	5) small synthetic routines simulating large routines
Some recent C compilers are capable of inlining functions in group 1, and
analysing parameters to functions in group 2 to possibly replace them with
less generalised (and smaller) library routines.  I see no reason to stop
the compiler doing this (although it is generally possibly by compiler
switches or include file changes) because it will do the same to _all_ code
and a synthetic benchmark should be "typical" in this regard.

Small routines that are simulating large routines must not be inlined.  I
think this is what Donald was talking about.

Small routines that are simulating small routines are a grey area.  In a
typical large application that was written knowing that inlining functions
was an option, the author might choose to inline some functions.  Thus
inline functions could be used in application programs and a benchmark
should take this into account.

I believe that a benchmark should take into account the capabilities of the
software development environment since having a system that can execute
"good" code (eg hand-crafted assembler) blindingly fast is not much good if
the only compilers available generate atrocious code.  This means that the
benchmark should attempt to use all the compiler's capabilities, whilst
preventing the compiler from mangling those routines that are simulating
large blocks of application code. 

> - the compiler must be prevented from eliminating dead code.
Why?  If code is dead, it stays dead whether it is a synthetic benchmark or
an application.  What is needed is some way of differentiating between
compilers that are capable of detecting (and removing) dead code in a large
application, and those that are only capable of detecting dead code in "toy"
situations (ie synthetic benchmarks).

The problem with this requirement is that many (most?) compilers don't have
the switches to allow this - they always remove the dead code they find. 
And a compiler that does support this switch probably does a better job of
dead code detection.

> [ Write a program to generate the benchmark program.  Description of what the
> generator program should do mostly deleted. ]
>
>Next, we need a portable routine which generates pseudo-random numbers.
>(Portable mostly means that it avoids arithmetic overflow.) The quality of
>the randomness is unimportant, as long as it doesn't get stuck at 0 or
>other such silliness.
It needs to be sufficiently random that the OS/hardware memory management
and caching routines can't take advantage of the number or order of
references.

>Since the functions should (largely) be accessed via the array, inlining is
>defeated. Avoid dead code.

This automatically biases the result.  Whilst I don't have figures, I
suspect that very _few_ function calls in a typical application are
indirect.  Whilst this does prevent a compiler from using any global
optimization tricks it might know, it also provides an unfair advantage to
processors that can efficiently execute indirect function calls.
-- 
Peter Jeremy (VK2PJ)         peter@stca77.stc.oz
Alcatel-STC Australia        ...!munnari!stca77.stc.oz!peter
41 Mandible St               peter%stca77.stc.oz@uunet.UU.NET
ALEXANDRIA  NSW  2015