Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!uxc!uxc.cso.uiuc.edu!mcdurb!aglew From: aglew@mcdurb.Urbana.Gould.COM Newsgroups: comp.arch Subject: Re: Criteria ... [really: are N des Message-ID: <28200313@mcdurb> Date: 13 May 89 16:18:00 GMT References: <517@daitc.daitc.mil> Lines: 58 Nf-ID: #R:daitc.daitc.mil:517:mcdurb:28200313:000:2972 Nf-From: mcdurb.Urbana.Gould.COM!aglew May 13 11:18:00 1989 >When the goal is measurement, why shouldn't the test conditions and >predicted outputs be made publicly available? > >Jonathan Krueger Well, sure... when I publish internal performance reports I try to specify the conditions, and although I haven't yet put out a "MIPS Performance Report" for Motorola MCD, when and if I'm asked to do so I will specify as much about the conditions as possible. But - have you ever tried to completely describe the configuration of a computer system that you are benchmarking? There's a lot of detail there. It's easy to get 4 times as much configuration text as it is numbers. "Just write down the important things" you say - and certainly I'll try. But even that can be a lot. And even then, it is easy to be bitten by configuration parameters that you didn't know about. Recently, for example, I was bitten by a difference between two apparently identical boards - same copper, same firmware. But a PAL change, to increase reliability, that had a "negligible effect on performance" (so nobody bothered to tell me). Well, in a real customer system the effect would probably be negligible -- but in the particular aspect of system performance I was looking at it made a big difference. Bottom line: yes, a responsible performance evaluator publishes his measurement conditions, and uses widely available benchmarks so that you can try to reproduce the results. But please don't flame me when you can't reproduce my results exactly. I take any benchmarks from people that are not at the factory door for their respective manufacturer with a large grain of salt. Oh, I'll believe the numbers - but there is probably a large variance. Take numbers for large computers with a larger grain of salt. Crays, Convexes, Goulds, etc. have a lot of people messing around with them - field service engineers changing a jumper that "has no effect on performance" but does. Fortunately, it's easier to maintain configuration control of small systems, like Motorola DELTA boxes, SUNs, MIPS' boxes. When I came into this role at Motorola MCD I got a very good piece of advice from someone who had done the same thing at Intel: get 100% control of the hardware you are going to be making measurements on. Shoot anyone else who comes near it. I can do this because I am more worried with evaluating OS performance than hardware (does that tweak to the disk driver make things run faster?), so I can keep hardware relatively constant while varying OS code. But I have nothing but pity for the poor sods who have to evaluate new hardware as it comes out of the shop - constantly managing a mish-mash of prototypes, latest revs, etc. Ouch! Last word: performance evaluation requires good statistical techniques more and moreto handle variations. I haven't seen such analysis in the published benchmark reports yet, but I expect to soon -- and, after all, benchmark reports are only a small part of performance evaluation.