Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!caip!pyrnj!esquire!cmcl2!rna!cubsvax!peters
From: peters@cubsvax.UUCP (Peter S. Shenkin)
Newsgroups: net.arch
Subject: The correct mean to use when comparing benchmark performance
Message-ID: <549@cubsvax.UUCP>
Date: Tue, 30-Sep-86 14:01:56 EDT
Article-I.D.: cubsvax.549
Posted: Tue Sep 30 14:01:56 1986
Date-Received: Thu, 2-Oct-86 21:23:00 EDT
Reply-To: peters@cubsvax.UUCP (Peter S. Shenkin)
Organization: Columbia Univ. Bio. CG Fac., NY
Lines: 59
Keywords: benchmark averaging
Summary: in favor of arithmetic mean

Eugene Miya kindly sent me a long epistle justifying geometric mean.
The point was that if one normalizes each benchmark separately, the
choice of machine influences the ultimate rank-order when the arithmetic
mean, but not the arithmetic mean is taken.  However, I feel it is
incorrect to normalize each benchmark separately.  When all benchmarks
are normalized to the arithmetic mean for any given machine, the
comparisons are identical, regardless of what machine is used.  Thus
I believe the issue is primarily how to normalize.  Given this, the
arithmetic mean has advantages that nothing other means do not.  Read on...

HOW TO NORMALIZE:

Suppose this is the raw data:
		Machine A	Machine B
Benchmark 1	10.0		 5.0
Benchmark 2	10.0		20.0
-----------------------------------------
arith mean	10.0		12.5

Now, it is DUMB to normalize each benchmark separately.  THAT, and not
arithmetic mean, is what gives rise to artifacts.  Instead
let's normalize to the arithmetic mean, first of Machine A, then B:

Normalized to A:
		Machine A	Machine B
Benchmark 1	1.0		 .5
Benchmark 2	1.0		2.0
-----------------------------------------
arith mean	1.0		1.25


Normalized to B:
		Machine A	Machine B
Benchmark 1	0.8		0.4
Benchmark 2	0.8		1.6
-----------------------------------------
arith mean	0.8		1.0

The results are identical throughout, despite the use of the arithmetic
mean throughout.  Any other mean used throughout would, I believe, also
give identical results.


SO WHY SHOULD WE PREFER ARITHMETIC MEAN?:

However, the arithmetic mean is directly related to the application
of benchmark timings to real-world (nebulous though this entire field may
be).  These averages would reflect real-word performance -- modulo this
cloudiness -- if the application set were well-represented equally by
the two benchmarks;  but this approach also works for weighted averages.
Other means, such at the geometric, do not have this property

SUMMARY:  arithmetic mean wins, if normalization is properly performed.

Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA

P.S.:  I like this writeup so much I'm posting it to the net!  Thanks for
the inspiration...