Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!caip!rutgers!husc6!panda!genrad!decvax!tektronix!tekcrl!tekchips!willc
From: willc@tekchips.UUCP
Newsgroups: net.arch
Subject: Re: The correct mean to use when comparing benchmark performance
Message-ID: <698@tekchips.UUCP>
Date: Thu, 2-Oct-86 18:40:44 EDT
Article-I.D.: tekchips.698
Posted: Thu Oct  2 18:40:44 1986
Date-Received: Sat, 4-Oct-86 11:08:02 EDT
References: <549@cubsvax.UUCP>
Reply-To: willc@tekchips.UUCP (Will Clinger)
Organization: Tektronix, Inc., Beaverton, OR.
Lines: 80
Keywords: benchmark averaging

In article <549@cubsvax.UUCP> peters@cubsvax.UUCP (Peter S. Shenkin) writes:
>HOW TO NORMALIZE:
>
>Suppose this is the raw data:
>		Machine A	Machine B
>Benchmark 1	10.0		 5.0
>Benchmark 2	10.0		20.0
>-----------------------------------------
>arith mean	10.0		12.5
>
>Now, it is DUMB to normalize each benchmark separately.  THAT, and not
>arithmetic mean, is what gives rise to artifacts.  Instead
>let's normalize to the arithmetic mean, first of Machine A, then B:
>
>Normalized to A:
>		Machine A	Machine B
>Benchmark 1	1.0		 .5
>Benchmark 2	1.0		2.0
>-----------------------------------------
>arith mean	1.0		1.25
>
>
>Normalized to B:
>		Machine A	Machine B
>Benchmark 1	0.8		0.4
>Benchmark 2	0.8		1.6
>-----------------------------------------
>arith mean	0.8		1.0

So the advertising manager for Machine B notices that Benchmark 1 consists
of 1 iteration, while Benchmark 2 consists of 1000 iterations.  That
doesn't seem quite fair, so he/she re-runs benchmark 1 with 1000 iterations
instead of 1 to obtain the raw data:

		Machine A	Machine B
Benchmark 1	10000.0		 5000.0
Benchmark 2	   10.0		   20.0
-----------------------------------------
arith mean	 5005.0		 2560.0

Normalizing according to the procedure recommended in the text quoted
above:

Normalized to A:
		Machine A	Machine B
Benchmark 1	1.998		 .999
Benchmark 2	 .0001998	 .0003996
-----------------------------------------
arith mean	 .9990999	 .4996998


Normalized to B:
		Machine A	Machine B
Benchmark 1	3.90625		1.953125
Benchmark 2	0.000390625	 .00078125
-----------------------------------------
arith mean	1.9533203125	 .976953125

Advertisements for Machine B then proclaim that it is nearly twice as
fast as Machine A.  The advertising manager for Machine A cries foul.
Can you help him/her adjust these same benchmarks to prove that Machine
A is really twice as fast as Machine B?

Artifacts are neither art nor facts.

By the way, I see no flaw in the proof that appears in Philip J Fleming
and John J Wallace, "How not to lie with statistics: the correct way to
summarize benchmark results", CACM Volume 29 Number 3 (March 1986),
pages 218-221.  I'm not very happy with their presentation, primarily
because they never give a clear statement of their theorem, which I
paraphrase:  The geometric mean is the only function of n positive real
arguments that is reflexive, symmetric, and multiplicative.  It's fair
to take issue with their proof, but if you're going to do so I'd like to
know which step(s) of their proof you find unconvincing, or which of the
three properties you feel is dispensable for an unweighted average of
normalized benchmark results.

William Clinger
Tektronix Computer Research Laboratory
willc%tekchips@tek