Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!hplabs!otter.hpl.hp.com!otter!sfk From: sfk@otter.hpl.hp.com (Steve Knight) Newsgroups: comp.lang.scheme Subject: Re: Benchmarking Scheme Message-ID: <6260007@otter.hpl.hp.com> Date: 2 May 91 10:04:53 GMT References: Organization: Hewlett-Packard Laboratories, Bristol, UK. Lines: 28 Whenever the topic of benchmarks comes around, I am always impressed by the subtle depth of the problems. It seems to be an area in which the demand for simple answers is much greater than the supply. > So what are the prospects for machine independent performance > measures? Very poor. Folks are simply too incautious about the interpretation of benchmarks in any case. (A case in point -- some folks I know were doing a whole series of benchmarks on some software. They wrote up all the tests real nice. However, I couldn't make head or tail of their figures when I compared it against other results I'd checked out. So I got on the machine & tried the tests. My first test was to check out the timing software they were using. It was wrong by a factor of 2. This is by no means a unique example.) I guess I can add one point. When doing benchmarks across machines, it is important to establish what the timing and storage measurements are actually measuring. Does the timing include process swapping, page swapping, and other admin overheads? Does it correlate with a stopwatch? Does it include time spent in the kernel? Does it include garbage collection time (how many were there)? Do two C programs of known relative performance give equal performance ratios on a range of machines? If you're on 68xxx machines, do the same test but avoiding 68020 instructions, to check if its ineffective use of the 68020 instruction set that causes the difficulties. And so on. Steve