Path: utzoo!utgpu!attcan!uunet!munnari!otc!metro!ipso!stcns3!stca77!peter From: peter@stca77.stc.oz (Peter Jeremy) Newsgroups: comp.arch Subject: Re: Benchmarking Message-ID: <318@stca77.stc.oz> Date: 18 Oct 88 23:06:21 GMT References: <2220003@hpausla.HP.COM> <46500026@uxe.cso.uiuc.edu> <6683@nsc.nsc.com> <6684@nsc.nsc.com> <4263@wright.mips.COM> <6729@nsc.nsc.com> <10498@reed.UUCP> <4655@winchester.mips.COM> <6868@nsc.nsc.com> <1988Oct9.011633.13259@utzoo.uucp> <4853@winchester.mips.C Reply-To: peter@stca77.stc.oz (Peter Jeremy) Organization: Alcatel-STC, Alexandria, AUSTRALIA Lines: 97 My comments in the following are very C orientated. I realise this is not very portable but, most of you will be familiar with C, most other languages are (or could be) capable of doing the same thing and I am not familiar with recent compiler capabilities in other languages. In article <3285@pt.cs.cmu.edu> lindsay@k.gp.cs.cmu.edu (Donald Lindsay) writes: >In article <6899@nsc.nsc.com> grenley@nsc.nsc.com.UUCP (George Grenley) writes: >> [ Offers to write scalable synthetic benchmark, if no-one else wants to ] > >First, I am solidly behind the idea that the best benchmark is the user's >application. I think we can all take this as read. Unfortunately in most cases it is impractical. Synthetic benchmarks are our best substitute, as long as we know what we are doing (marketroid "benchmark" results are a glaring example of not knowing what they are doing :-). >That said, synthetic benchmarks might as well be as good as they can be. >So, some guidelines: > [ code and data working sets fully adjustable, small benchmark presented > in full in the report ] > - the compiler must be prevented from inlining. I think this statement may need some more thought. putc() is a 'function' that has been 'inlined' since the beginning of C - it was implemented as a macro because, until very recently, C compilers didn't allow function inlining. Inlining small functions makes sense because the function call overhead (both size and time) is a significant portion of the size of the function. What is needed is a way to differentiate between the following classes of functions: 1) small library routines (eg strcpy, strcmp) 2) large very general library routines (printf, scanf) 3) other library routines 4) small synthetic routines simulating small routines 5) small synthetic routines simulating large routines Some recent C compilers are capable of inlining functions in group 1, and analysing parameters to functions in group 2 to possibly replace them with less generalised (and smaller) library routines. I see no reason to stop the compiler doing this (although it is generally possibly by compiler switches or include file changes) because it will do the same to _all_ code and a synthetic benchmark should be "typical" in this regard. Small routines that are simulating large routines must not be inlined. I think this is what Donald was talking about. Small routines that are simulating small routines are a grey area. In a typical large application that was written knowing that inlining functions was an option, the author might choose to inline some functions. Thus inline functions could be used in application programs and a benchmark should take this into account. I believe that a benchmark should take into account the capabilities of the software development environment since having a system that can execute "good" code (eg hand-crafted assembler) blindingly fast is not much good if the only compilers available generate atrocious code. This means that the benchmark should attempt to use all the compiler's capabilities, whilst preventing the compiler from mangling those routines that are simulating large blocks of application code. > - the compiler must be prevented from eliminating dead code. Why? If code is dead, it stays dead whether it is a synthetic benchmark or an application. What is needed is some way of differentiating between compilers that are capable of detecting (and removing) dead code in a large application, and those that are only capable of detecting dead code in "toy" situations (ie synthetic benchmarks). The problem with this requirement is that many (most?) compilers don't have the switches to allow this - they always remove the dead code they find. And a compiler that does support this switch probably does a better job of dead code detection. > [ Write a program to generate the benchmark program. Description of what the > generator program should do mostly deleted. ] > >Next, we need a portable routine which generates pseudo-random numbers. >(Portable mostly means that it avoids arithmetic overflow.) The quality of >the randomness is unimportant, as long as it doesn't get stuck at 0 or >other such silliness. It needs to be sufficiently random that the OS/hardware memory management and caching routines can't take advantage of the number or order of references. >Since the functions should (largely) be accessed via the array, inlining is >defeated. Avoid dead code. This automatically biases the result. Whilst I don't have figures, I suspect that very _few_ function calls in a typical application are indirect. Whilst this does prevent a compiler from using any global optimization tricks it might know, it also provides an unfair advantage to processors that can efficiently execute indirect function calls. -- Peter Jeremy (VK2PJ) peter@stca77.stc.oz Alcatel-STC Australia ...!munnari!stca77.stc.oz!peter 41 Mandible St peter%stca77.stc.oz@uunet.UU.NET ALEXANDRIA NSW 2015