Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!caip!pyrnj!esquire!cmcl2!rna!cubsvax!peters From: peters@cubsvax.UUCP (Peter S. Shenkin) Newsgroups: net.arch Subject: The correct mean to use when comparing benchmark performance Message-ID: <549@cubsvax.UUCP> Date: Tue, 30-Sep-86 14:01:56 EDT Article-I.D.: cubsvax.549 Posted: Tue Sep 30 14:01:56 1986 Date-Received: Thu, 2-Oct-86 21:23:00 EDT Reply-To: peters@cubsvax.UUCP (Peter S. Shenkin) Organization: Columbia Univ. Bio. CG Fac., NY Lines: 59 Keywords: benchmark averaging Summary: in favor of arithmetic mean Eugene Miya kindly sent me a long epistle justifying geometric mean. The point was that if one normalizes each benchmark separately, the choice of machine influences the ultimate rank-order when the arithmetic mean, but not the arithmetic mean is taken. However, I feel it is incorrect to normalize each benchmark separately. When all benchmarks are normalized to the arithmetic mean for any given machine, the comparisons are identical, regardless of what machine is used. Thus I believe the issue is primarily how to normalize. Given this, the arithmetic mean has advantages that nothing other means do not. Read on... HOW TO NORMALIZE: Suppose this is the raw data: Machine A Machine B Benchmark 1 10.0 5.0 Benchmark 2 10.0 20.0 ----------------------------------------- arith mean 10.0 12.5 Now, it is DUMB to normalize each benchmark separately. THAT, and not arithmetic mean, is what gives rise to artifacts. Instead let's normalize to the arithmetic mean, first of Machine A, then B: Normalized to A: Machine A Machine B Benchmark 1 1.0 .5 Benchmark 2 1.0 2.0 ----------------------------------------- arith mean 1.0 1.25 Normalized to B: Machine A Machine B Benchmark 1 0.8 0.4 Benchmark 2 0.8 1.6 ----------------------------------------- arith mean 0.8 1.0 The results are identical throughout, despite the use of the arithmetic mean throughout. Any other mean used throughout would, I believe, also give identical results. SO WHY SHOULD WE PREFER ARITHMETIC MEAN?: However, the arithmetic mean is directly related to the application of benchmark timings to real-world (nebulous though this entire field may be). These averages would reflect real-word performance -- modulo this cloudiness -- if the application set were well-represented equally by the two benchmarks; but this approach also works for weighted averages. Other means, such at the geometric, do not have this property SUMMARY: arithmetic mean wins, if normalization is properly performed. Peter S. Shenkin Columbia Univ. Biology Dept., NY, NY 10027 {philabs,rna}!cubsvax!peters cubsvax!peters@columbia.ARPA P.S.: I like this writeup so much I'm posting it to the net! Thanks for the inspiration...