Path: utzoo!attcan!uunet!know!zaphod.mps.ohio-state.edu!usc!apple!amdcad!mozart.amd.com!proton!tim From: tim@proton.amd.com (Tim Olson) Newsgroups: comp.arch Subject: Re: Benchmark performance ratios Message-ID: <1990Nov19.170400.12437@mozart.amd.com> Date: 19 Nov 90 17:04:00 GMT References: <39896@ut-emx.uucp> Sender: usenet@mozart.amd.com (Usenet News) Reply-To: tim@amd.com (Tim Olson) Organization: Advanced Micro Devices; Sunnyvale, CA Lines: 54 In article <39896@ut-emx.uucp> guru@ut-emx.uucp (chen liehgong) writes: | I have a few queries regarding benchmark performance ratios. | | 1. If the benchmark consists of a set of programs (eg. the | livermore loops) is the overall performance ratio of the architecture | under test (as compared to a standard one) calculated as the | harmonic mean of the performance ratios (say speed-ups) obtained | for each program (or livermore loop)? If so, Why is the harmonic mean | used instead of the arithmetic or geometric means? Fleming and Wallace, in their paper entitled "How Not to Lie With Statistics: The Correct Way to Summarize Benchmark Results" [CACM March 1986, Voluem 29 #3] say that the arithmetic mean should be used when the individual benchmarks are reported in absolute time, while the geometric mean should be used when individual benchmarks are normalized to some "known machine." James Smith, in the paper entitled "Characterizing Computer Performance With a Single Number" [CACM, October 1988, #10] argues that the harmonic mean should be used, but only again with absolute quantities such as MFLOPS (normalization should occur after the mean has been calculated). The problem with mean calculations based upon absolute quantities (seconds, MFLOPS, etc.) is that there is an implicit weighting of the benchmarks based upon how long they run. This is fine if the benchmarks are designed such that the relative runtimes of the benchmarks correspond to the actual runtime ratios expected in the real application(s). However, this is rarely the case -- a benchmark suite typically contains a large number of varied programs that don't have an overall relationship. Because of this, I think that the best thing that can be done is to give each benchmark equal weighting. If this is done, then the geometric mean of the normalized performances should be used (e.g. SPEC). | 2. If different kinds of benchmarks (eg. integer performance, floating- | point performance or livermore loops, whetstones and dhrystones) are | used, how is the overall performance ratio (speed-up) calculated? i.e., | Which mean (AM, GM or HM) should be used? The type of benchmark makes no difference, as long as it is measured consistantly among each of the machines to get a normalized performance. | 3. If the performance ratio is changed (say from speed-up to percentage | decrease in execution time - in clock cycles) do the answers to 1 and 2 | above, remain the same? I don't believe you can average using %increase/decrease -- you must convert this into normalized performance first, average using the geometric mean, then re-convert into %increase/decrease. -- -- Tim Olson Advanced Micro Devices (tim@amd.com)