Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!uxc!uxc.cso.uiuc.edu!mcdurb!aglew
From: aglew@mcdurb.Urbana.Gould.COM
Newsgroups: comp.arch
Subject: Re: Criteria ... [really: are N des
Message-ID: <28200313@mcdurb>
Date: 13 May 89 16:18:00 GMT
References: <517@daitc.daitc.mil>
Lines: 58
Nf-ID: #R:daitc.daitc.mil:517:mcdurb:28200313:000:2972
Nf-From: mcdurb.Urbana.Gould.COM!aglew    May 13 11:18:00 1989


>When the goal is measurement, why shouldn't the test conditions and
>predicted outputs be made publicly available?
>
>Jonathan Krueger 

Well, sure... when I publish internal performance reports I try
to specify the conditions, and although I haven't yet put out a
"MIPS Performance Report" for Motorola MCD, when and if I'm asked
to do so I will specify as much about the conditions as possible.

But - have you ever tried to completely describe the configuration
of a computer system that you are benchmarking?  There's a lot of
detail there. It's easy to get 4 times as much configuration
text as it is numbers.  "Just write down the important things"
you say - and certainly I'll try. But even that can be a lot.
And even then, it is easy to be bitten by configuration parameters
that you didn't know about. Recently, for example, I was bitten
by a difference between two apparently identical
boards - same copper, same firmware. But a PAL change, to increase
reliability, that had a "negligible effect on performance" 
(so nobody bothered to tell me). 
    Well, in a real customer system the effect would probably be
negligible -- but in the particular aspect of system performance
I was looking at it made a big difference.

Bottom line: yes, a responsible performance evaluator publishes
his measurement conditions, and uses widely available benchmarks
so that you can try to reproduce the results.
    But please don't flame me when you can't reproduce my results 
exactly.
    I take any benchmarks from people that are not at the factory
door for their respective manufacturer with a large grain of salt.
Oh, I'll believe the numbers - but there is probably a large
variance.
    Take numbers for large computers with a larger grain of salt.
Crays, Convexes, Goulds, etc. have a lot of people messing around with
them - field service engineers changing a jumper that "has no effect
on performance" but does.
    Fortunately, it's easier to maintain configuration control of
small systems, like Motorola DELTA boxes, SUNs, MIPS' boxes.
When I came into this role at Motorola MCD I got a very good piece
of advice from someone who had done the same thing at Intel:
get 100% control of the hardware you are going to be making 
measurements on. Shoot anyone else who comes near it.
    I can do this because I am more worried with evaluating OS
performance than hardware (does that tweak to the disk driver make
things run faster?), so I can keep hardware relatively constant
while varying OS code.  But I have nothing but pity for the poor
sods who have to evaluate new hardware as it comes out of the shop
- constantly managing a mish-mash of prototypes, latest revs, etc.
Ouch!

Last word: performance evaluation requires good statistical techniques
more and moreto handle variations. I haven't seen such analysis
in the published benchmark reports yet, but I expect to soon --
and, after all, benchmark reports are only a small part of performance
evaluation.