Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker.mit.edu!ai-lab!life!burley From: burley@geech.ai.mit.edu (Craig Burley) Newsgroups: comp.benchmarks Subject: Re: BYTE and benchmarks (anecdote) Message-ID: Date: 15 Jan 91 14:25:56 GMT References: <1991Jan09.171216.21923@convex.com> <1991Jan14.202646.5677@murdoch.acc.Virginia.EDU> Sender: news@ai.mit.edu Distribution: comp Organization: Free Software Foundation 545 Tech Square Cambridge, MA 02139 Lines: 62 In-reply-to: gl8f@astsun.astro.Virginia.EDU's message of 14 Jan 91 20:26:46 GMT In article <1991Jan14.202646.5677@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes: Silly me. I propose we use the "infinite loop" benchmark instead of SPEC. At my last job, I did some work on the ROM-based microcode, primarily optimizing for space and, sometimes, speed, so new features could be added to the ROM. One thing I made smaller and, therefore, faster, was the idle loop. It had half the instructions of the previous version yet worked as well. Since the idle loop accounted for 99.99% of the execution profile of these machines, I reported to management that I had singlehandedly improved system performance by a factor of two! They were very impressed! (-: Seriously, it does seem difficult to produce benchmarks that don't show, to an unspecifiable extent, a combination of hardware speed and compiler- produced optimizations. Globally optimizing compilers, as they become available, might well turn many popular benchmarks into, Fortranically, STOP statements, or WRITEs of constants. One way to help avoid this is to have a two-phase benchmark. The first phase uses system information like date and time to seed a random number, then writes a data file containing values derived from this number. The second phase, the one actually measured, then reads this file and does the measureable calculations on it. Of course, this might appear to corrupt timings wanted for just CPU activity with disk activity. Even trying to isolate, in the code, one mode from another by reading the whole file into memory, has its problems (a global optimizer might decide, if it doesn't see the call to begin the timer between the reads and the computes, to reduce memory requirements and perhaps increase speed by folding the compute loop into the I/O loop). If the data set were small compared to the compute size of the problem -- for example, if an NP-complete problem like traveling salesman were computed using whatever favorite (to-be-benchmarked) techniques based on distance data in a file -- this I/O problem could be reduced to the vanishing point. Also, by using an NP-complete problem, even a globally optimizing compiler is less likely to find a better algorithm that significantly affects the benchmark (if it does, that's big news, right?!). The other important solution, I'd like to stand on my soapbox and say, is the proliferation of free software such that any machine you want to benchmark is likely to already have compiler technology equivalent to that used for the machine with which it is being compared. For example, doing 80486 vs. 68040 benchmarks for C code is likely to be much more helpful in isolating the hardware characteristics (when this is desired -- it isn't always, of course!) if GNU CC is used to do the compilations on both machines. Of course, one machine's description might have more coverage as far as various optimizations, and differences in library routines would remain, but at least you wouldn't be comparing code compiled using cse against code compiled without it, for example. -- James Craig Burley, Software Craftsperson burley@ai.mit.edu