Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ut-sally!husc6!cmcl2!rna!cubsvax!peters From: peters@cubsvax.UUCP (Peter S. Shenkin) Newsgroups: net.arch Subject: Re: The correct mean to use when comparing benchmark performance Message-ID: <554@cubsvax.UUCP> Date: Sat, 4-Oct-86 19:20:42 EDT Article-I.D.: cubsvax.554 Posted: Sat Oct 4 19:20:42 1986 Date-Received: Tue, 7-Oct-86 19:17:38 EDT References: <549@cubsvax.UUCP> Reply-To: peters@cubsvax.UUCP (Peter S. Shenkin) Organization: Columbia Univ. Bio. CG Fac., NY Lines: 50 In article hammond@petrus.UUCP (Rich A. Hammond) writes: >Peter S. Shenkin writes: >> However, the arithmetic mean is directly related to the application >> of benchmark timings to real-world (nebulous though this entire field may >> be). These averages would reflect real-word performance -- modulo this >> cloudiness -- if the application set were well-represented equally by >> the two benchmarks; but this approach also works for weighted averages. >> Other means, such at the geometric, do not have this property >> >> SUMMARY: arithmetic mean wins, if normalization is properly performed. >I respectfully disagree, if you work the arithmetic out, you don't need to >normalize at all using your method, just compare the sums of the benchmark >times. This is because you assume that: >a) The benchmarks are a representative sample of actual load, >and >b) that comparison of the component times is unimportant. > >I claim that for the environment of most network news readers the first >is in fact false, most people have no idea what the load on their system >is composed of... > >The second is often false, [since] >...one would like to compare the individual programs run times. > ...normalization to >the arithmetic mean can be factored out and not done. Normalization to >the individual component time, on the other hand, gives cases where the >ratio is trivial to compute because you're dividing by 1. > >In the context of the CACM article, both assumptions are false: >the benchmarks aren't representative of the load(no system calls), >and the comparison of interest was individual program times and not the >sum. What the CACM article pointed out was that under those conditions, >the geometric mean was the only one to use to get ratios of machine >performance that were independent of the machine normalized to. WHAT >THE CACM ARTICLE DIDN'T SAY (AND SHOULD HAVE) WAS THAT THE PERFORMANCE >RATIO WAS PRETTY WORHLESS ANYWAY, SO THAT COMPUTING IT "CORRECTLY" IS >A MOOT POINT. [ my emphasis -- peters ] I would like to put myself on record as agreeing with every one of Rich's remarks. (I jumped into the discussion solely on the question of which of the means more closely represents system performance subject to assumptions (a) and (b) ). It seems to me that the bottom line is that there's no one- dimensional measure of performance that's any good; arithmetic mean fails because (a) and (b) are too difficult so satisfy, and geometric mean fails 'cause it doesn't mean a damn thing, self-consistent though it may be. Peter S. Shenkin Columbia Univ. Biology Dept., NY, NY 10027 {philabs,rna}!cubsvax!peters cubsvax!peters@columbia.ARPA