Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!caip!rutgers!husc6!panda!genrad!decvax!tektronix!tekcrl!tekchips!willc From: willc@tekchips.UUCP Newsgroups: net.arch Subject: Re: The correct mean to use when comparing benchmark performance Message-ID: <698@tekchips.UUCP> Date: Thu, 2-Oct-86 18:40:44 EDT Article-I.D.: tekchips.698 Posted: Thu Oct 2 18:40:44 1986 Date-Received: Sat, 4-Oct-86 11:08:02 EDT References: <549@cubsvax.UUCP> Reply-To: willc@tekchips.UUCP (Will Clinger) Organization: Tektronix, Inc., Beaverton, OR. Lines: 80 Keywords: benchmark averaging In article <549@cubsvax.UUCP> peters@cubsvax.UUCP (Peter S. Shenkin) writes: >HOW TO NORMALIZE: > >Suppose this is the raw data: > Machine A Machine B >Benchmark 1 10.0 5.0 >Benchmark 2 10.0 20.0 >----------------------------------------- >arith mean 10.0 12.5 > >Now, it is DUMB to normalize each benchmark separately. THAT, and not >arithmetic mean, is what gives rise to artifacts. Instead >let's normalize to the arithmetic mean, first of Machine A, then B: > >Normalized to A: > Machine A Machine B >Benchmark 1 1.0 .5 >Benchmark 2 1.0 2.0 >----------------------------------------- >arith mean 1.0 1.25 > > >Normalized to B: > Machine A Machine B >Benchmark 1 0.8 0.4 >Benchmark 2 0.8 1.6 >----------------------------------------- >arith mean 0.8 1.0 So the advertising manager for Machine B notices that Benchmark 1 consists of 1 iteration, while Benchmark 2 consists of 1000 iterations. That doesn't seem quite fair, so he/she re-runs benchmark 1 with 1000 iterations instead of 1 to obtain the raw data: Machine A Machine B Benchmark 1 10000.0 5000.0 Benchmark 2 10.0 20.0 ----------------------------------------- arith mean 5005.0 2560.0 Normalizing according to the procedure recommended in the text quoted above: Normalized to A: Machine A Machine B Benchmark 1 1.998 .999 Benchmark 2 .0001998 .0003996 ----------------------------------------- arith mean .9990999 .4996998 Normalized to B: Machine A Machine B Benchmark 1 3.90625 1.953125 Benchmark 2 0.000390625 .00078125 ----------------------------------------- arith mean 1.9533203125 .976953125 Advertisements for Machine B then proclaim that it is nearly twice as fast as Machine A. The advertising manager for Machine A cries foul. Can you help him/her adjust these same benchmarks to prove that Machine A is really twice as fast as Machine B? Artifacts are neither art nor facts. By the way, I see no flaw in the proof that appears in Philip J Fleming and John J Wallace, "How not to lie with statistics: the correct way to summarize benchmark results", CACM Volume 29 Number 3 (March 1986), pages 218-221. I'm not very happy with their presentation, primarily because they never give a clear statement of their theorem, which I paraphrase: The geometric mean is the only function of n positive real arguments that is reflexive, symmetric, and multiplicative. It's fair to take issue with their proof, but if you're going to do so I'd like to know which step(s) of their proof you find unconvincing, or which of the three properties you feel is dispensable for an unweighted average of normalized benchmark results. William Clinger Tektronix Computer Research Laboratory willc%tekchips@tek